Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Stochastic Hybrid Systems Modeling &
Middleware-enabled DDDAS for Next-
generation US Air Force Systems
FA9550-13-1-0227
Acknowledgments: Dr. Frederica Darema
Aniruddha Gokhale Associate Professor, Dept of EECS &
Institute for Software Integrated Systems
Vanderbilt University, Nashville, TN, USA
Email: [email protected]
PI Meeting, Dec 1-3, 2014
IBM T. J. Watson Center, Yorktown Heights, NY
Role of Systems Software in DDDAS
• DDDAS provides symbiotic feedback
control between the application
instrumentation and its simulation to
steer a system on its trajectory
• Systems software enables dynamic
resource provisioning for model
learning, model execution and
dynamic instrumentation among
other things 2
•Ship-wide QoS Doctrine & Readiness Display
•Network latency
•& bandwidth
•Workload &
•Replicas
•CPU & memory
•Connections &
•priority bands
•Network latency
•& bandwidth
•Workload &
•Replicas
•CPU & memory
•Connections &
•priority bands
•Control
•Vars. •}
•Local
•middleware
•Qo
S •Qo
S
•TBMD
Application •AAW Application
•Control
•Algorithm •Control
•Algorithm
•Control
•Algorithm •Control
•Algorithm
•Control
•Algorithm •Control
•Algorithm
•Requested QoS
•Measured QoS
•Global
•Middleware
•Network latency
•& bandwidth
•Workload &
•Replicas
•CPU & memory
•Connections &
•priority bands
Applications Modeling & Systems Software (AMASS)
AMASS targets DDDAS Applications Modeling & Systems Software
areas
Leverages prior work on
instrumentation, statistical
algorithms, & adaptive
software infrastructure for
Naval combat systems
• Prof. Aniruddha Gokhale (PI), Prof. Xenofon
Koutsoukos and Prof. Douglas Schmidt (Co-PIs)
• Students
• Faruk Caglar
• Shashank Shekhar
• Shweta Khare
• Michael Walker
• Violetta Vylegzhanina
• Anirban Bhattacharjee
• Hamzah Abdelaziz
• Kyoungho An
• Other collaborators
• Dr. Sumant Tambe (RTI)
• Dr. Abhishek Dubey, Dr. William Otte, Dr. Nilabja Roy (All VU)
4
Our Team
• Prior projects with AFRL
• We have worked with Steven Drager and William
McKeever on Software Producibility projects
• Aniruddha Gokhale did the summer faculty program
in 2009
• Some publications jointly authored with AFRL
program managers
• Doug Schmidt has served on scientific
advisory board
5
Team Experience working with Air Force
AMASS Focus Areas • Applications modeling using stochastic hybrid modeling for model building
& anytime algorithms for incremental model refinement
DDDAS Modeling & Infrastructure ArchitectureOnline updates
(Re)Instrument
DDDAS Systems SoftwareModel
Repository
DDDAS Online Model Learning & Model
Generation Environment
Dynamic
Resource
Management
Model Update &
Deployment and
Configuration
Model Fidelity
Decision
Support System
Me
asu
rem
en
t
Operationalized
DoD System
sense
Traditional
Control Loop
Collection of
Running DoD
System Models
steer
Heterogeneous Multi-layered Resource Platforms for Model Executions
Data Center
Runtime query & retrieval
AMASS Focus Areas • Applications modeling using stochastic hybrid modeling for model building
& anytime algorithms for incremental model refinement
• Systems software comprising dynamic resource management, deployment
& configuration, & online model updates to support distributed & real-time
model execution & control
DDDAS Modeling & Infrastructure ArchitectureOnline updates
(Re)Instrument
DDDAS Systems SoftwareModel
Repository
DDDAS Online Model Learning & Model
Generation Environment
Dynamic
Resource
Management
Model Update &
Deployment and
Configuration
Model Fidelity
Decision
Support System
Me
asu
rem
en
t
Operationalized
DoD System
sense
Traditional
Control Loop
Collection of
Running DoD
System Models
steer
Heterogeneous Multi-layered Resource Platforms for Model Executions
Data Center
Runtime query & retrieval
8
Team Interaction
• Weekly meeting
• Redmine-based project management
• Meeting notes on project wiki page
• Git version control for software and publications
• Project started in Sept 2013
• Year 1 Accomplishments
• Cloud-focused resource management for satisfying
DDDAS applications QoS (e.g., required for DDDAS
simulations, model learning), and real-time stream
processing
• Demonstrated one simple end-to-end scenario that uses
stochastic models and simulations, and resource
management using lightweight virtualization
• Publications, a doctoral student PhD proposal defense
• Year 2 Plans
• Mobile device-based instrumentation, Real-time
streaming processing, resource mgmt across a spectrum
of resources, model learning
• Collaborate with DDDAS teams for application use cases 9
Summary of Contributions
Summary of Publications (1/2)
Journal
• Shashank Shekhar, Hamza Abdelaziz, Michael Walker, Faruk Caglar, Aniruddha
Gokhale, and Xenofon Koutsoukos, “A Simulation as a Service Cloud Middleware,”
The Springer Journal of Annals of Telecommunications, 2014 (in submission).
• Faruk Caglar and Aniruddha Gokhale, “iOverbook: Intelligent Resource-Overbooking
to Support Soft Real-time Applications in the Cloud,” 7th International Conference on
Cloud Computing (IEEECloud), Alaska, USA, June 27, 2014 (invited to International
Journal of Cloud Computing)
Book Chapters
• Shashank Shekhar, Shweta Khare, Faruk Caglar, Aniruddha Gokhale, Douglas
Schmidt, and Xenofon Koutsoukos, “Middleware-enabled DDDAS,” Book Chapter in
Springer, 2014 (in submission).
Panel
• Aniruddha Gokhale, “Systems Software Challenges for InfoSymbiotics
Systems/DDDAS,” SuperComputing 2014 panel on InfoSymbiotic Systems/DDDAS,
New Orleans, LA, Nov 2014
Summary of Publications (2/2) Conference & Workshop Publications
• Faruk Caglar, Shashank Shekhar, and Aniruddha Gokhale. “iPlace: An Intelligent
and Tunable Power- and Performance-Aware Virtual Machine Placement Technique
for Cloud-based Real-time Applications,” 17th IEEE Symposium on
Object/Component/Service-oriented Real-time Distributed Computing (ISORC),
Reno, Nevada, USA, June 10, 2014
• Faruk Caglar and Aniruddha Gokhale, “iOverbook: Intelligent Resource-Overbooking
to Support Soft Real-time Applications in the Cloud,” 7th International Conference on
Cloud Computing (IEEECloud), Alaska, USA, June 27, 2014
• Faruk Caglar, Shashank Shekhar, and Aniruddha Gokhale. “iTune: Engineering the
Performance of Xen Hypervisor via Autonomous and Dynamic Scheduler
Reconfiguration,” (in submission)
• Faruk Caglar, Shashank Shekhar and Aniruddha Gokhale, “Towards a Performance
Interference-aware Virtual Machine Placement Strategy for Supporting Soft Real-
time Applications in the Cloud,” 3rd International Workshop on Real-time and
Distributed Computing in Emerging Applications (REACTION 2014), Rome, Italy,
Dec 2, 2014. (to appear)
• Shweta Khare, Kyoungho An, Aniruddha Gokhale and Sumant Tambe, ,”Functional
Reactive Stream Processing for Data-centric Publish/Subscribe,” Submitted to
IPDPS 2015, Hyderabad, India.
• DDDAS application simulations (and model learning
algorithms) require resources to execute
12
Area 1: Cloud-based Resource Mgmt
Resource Pool (e.g.
Data Center)
Model of the Resource
Pool
instrument
control
DDDAS Application
Model of the DDDAS
Application
instrument
control • Simulations can
execute in the cloud
• Applications have
different QoS
requirements
• Need resource
management in the
cloud data center
• Apply DDDAS
principles to the cloud
data center
• To achieve this vision, we need to instrument a data
center and obtain resource utilization information
13
Cloud Data Center Instrumentation: How to?
Google Data Center info from 2011
Model of the Google Data
Center
instrument
control
DDDAS Application
Model of the DDDAS
Application
instrument
control
• We used a pre-
instrumented
trace log from
• Solved various
resource mgmt
problems
• We leveraged cluster trace made available by Google for a period of 29 days in May 2011.
• Data is available for more than 12,000 host machines
• Data comprises machine events, machine attributes, jobs, tasks, constraints, and resource usage details.
• Resource usage data contains about 1.2 billion rows
14
Data from an Instrumented Data Center
Google Data Center
(May 2011)
Model of the Google Data
Center
machine learning techniques
Cloud Data Center Architecture •Management and
Orchestration of
Cloud Environment
•Delivery of cloud-
based applications
and services
•Virtual Machine
Management on top
of Host Machines
Focus so far is only on the compute resources;
storage and I/O to be considered later
Challenge 1:Autonomous and
Dynamic Scheduler Reconfiguration
Virtualization Layer comprises scheduling
mechanism to share the physical CPU
Scheduling mechanism is usually
configured by certain parameters in the
hypervisor
Performance of an application running in
the VM is directly impacted by the
configuration
•Finding the optimum scheduling
configuration is required
Solution to Challenge 1: iTune
iTune : An Intelligent and Autonomous Self-tuning
Middleware to Optimize the Scheduler Parameters of the
Virtualization Mechanism
• Method is applicable to all scheduling environments
• Specifically, we focus on Xen hypervisor
• Tunes the parameters of the default scheduler in the Xen
hypervisor, which is a credit-based CPU scheduler
• iTune tunes the Xen’s credit scheduler parameters by
dealing with changing workload on the host machine
• Based on the empirical insights, it was proved that (1) CPU
Utilization, (2) CPU Overbooking Ratio, and (3) VM Count are
strong features to be used for workload clustering.
Challenge 2: Accommodating Multiple
Tasks using Resource Overbooking
Overbooking helps to increase energy
efficiency and resource utilization.
Common practice to make the business
model more profitable (e.g. airlines,
hotels, cell phone operators)
•How to systematically identify
effective overbooking ratios?
Solution to Challenge 2: iOverbook
iOverbook : Intelligent Resource-Overbooking to Support
Soft Real-time Applications in the Cloud
Machine learning approach to making systematic and
online determination of overbooking ratios.
Utilizes historic data of tasks and host machines in the
cloud
Extracts their resource usage patterns
Predicts future resource usage and expected mean
performance of host machines.
Used cluster trace log released by Google.
Challenge 3: Power- and Performance-
aware VM Placement Aims to tolerate faults, balance
workload, eliminate hotspots, etc.
concerns
Virtual machines are migrated in the
data center
Power and performance tradeoffs are
critical concerns faced by CSPs
How to find the aptly suited host
machine for power- and performance-
aware VM placement?
Challenge 4: Performance Interference
Effects on App Performance Analyzing the performance anomalies
Cloud systems are multi-tenant
CSPs overbook physical system
resources
Resource overbooking and noisy
neighbors can lead to performance
interference and anomalies among VMs
How to predict the performance
interference and the faults that may
occur before a VM placement
decision is made?
Solutions to Challenges 3 & 4 iPlace: An intelligent and Tunable Power- and Performance-
aware Virtual Placement Middleware
• The goal of iPlace is to find an aptly suited host machine by carefully considering the energy efficiency of the data center and performance requirements of soft-real time applications.
• Placement decision is based on power changes and performance effects to the applications
hALT :harmonious Art of Living Together
• Performance Interference-aware Virtual Machine Placement
Strategy for Supporting Soft Real-time Applications in the Cloud
• hALT extracts the best VM collocation patterns by utilizing
features such as CPU, Memory usage, and performance.
• hALT assumes that CSPs overbook their underlying cloud
infrastructure to save energy costs.
Problem
• Instrumented data must be processed on-the-fly
• Must handle dynamic changes in incoming sources of
streams
• Should be able to use archived data (history)
Solution
• We are using real-time stream processing
• Combining the power of real-time publish/subscribe
(i.e., sources/sinks of info) with reactive programming
• Achieve scale-out (pub/sub) and scale-up (reactive)
• Concretely, we combined the Data Distribution Service
(DDS) with .NET Reactive extensions (Rx.NET)
25
Area 2: Stream Processing for Model Learning
SIMaaS Cloud Middleware
HOST CLUSTERHOST CLUSTER
. . .. . .
Docker
Host 1
Simulation Cloud
Docker Host k
Container Manager (CM)
Result Aggregator (RA)
Docker
Host n
Docker Host 1 . . .
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Sim
Container
Performance Monitor (PM)
SIMaaS
Manager
(SM)
26
Area 3: Simulation-as-a-Service
• Middleware to support “Simulation-as-a-Service” for users to
host their simulations (e.g., DDDAS application simulations)
• Stochastic Physics model of heating of a building – large
number of parallel simulations are executed
• Resource management using Docker containers
• Virtual machines were deemed too heavy weight
Emerging Context for DDDAS
• No longer a single system that
needs to be steered but rather
need to steer multiple systems
simultaneously
• Requires trade-offs
• Deal with uncertainty
• Large-scale Big Data and Large-
scale Big Computation
• adaptive traffic light, street lights 27
• Multiple interconnected
systems (systems of
systems)
• Emergence of Internet of
Things (IoT) (and variants)
• Multiple phases needing DDDAS systems s/w
• Feedback and adaptation among the different
phases and artifacts of system software
28
New Responsibilities for DDDAS Systems S/W
DDDAS MODEL SIMULATION
Dynamic Adaptation in Sensing
Dynamic Discovery
of Info Sources
Scalable and real-
time stream processing
Provision Resources for learning
and sim
• This is the vision we started with
29
Lessons Learned & Our Needs
Resource Pool (e.g.
Data Center)
Model of the Resource
Pool
instrument
control
DDDAS Application
Model of the DDDAS
Application
instrument
control • But we don’t have
real DDDAS
applications –
rather we use
emulated applns
from Google trace
• We need to use
DDDAS community’s
models as our
workloads
• Understand how our
solutions will work with
these real applns;
develop new solns
• DDDAS Applications Community
• Utilize the application simulation models and execute them
on our cloud to create a realistic scenario of workloads
• Spoken to multiple DDDAS Applications researchers for their
applications (Yuri, Richard, Alok, Eric); more synergies
sought
• Other synergies are possible, e.g., in model learning
• DDDAS Systems Community
• Combine our work with security, parallel processing
• Spoken to systems researchers (Salim, Sanjay, Vaidy) and
utilizing mobile test bed (Shuvra)
• Industry and Govt agencies
• e.g., IBM’s work in events, stream processing, IoT
• AFRL’s work in live DBMS (communicated with Alex and
Erik) 30
Collaboration Opportunities
• Use DDDAS application case studies for resource
management
• Instrumentation using mobile devices
• Executing simulation models across range of
devices – not just a cloud data center
• Real-time stream processing with reactive
extensions
• Dynamic resource management
• Dynamic offloading from mobile devices
• Just-in-time resource provisioning
• Model learning
• Stochastic hybrid systems modeling
• Submitted DURIP and NSF/AFOSR proposals 31
Ongoing and Future Focus
• IPDPS Workshop
• IPDPS 2015 organizers have agreed to provide us
a ½ day workshop slot on the last day of
conference
• Suggest ideas, get community ideas on workshop
theme
• Participation from the community
• GPCE 2015 Conference with SPLASH
• Generative programming conference
• Gokhale is serving as program chair
• If you have ideas, please submit (due date in June,
conference in Oct in Pittsburgh, PA)
• Looking for volunteers to serve on program
committee 32
Upcoming Events of Interest
• Modeling and Systems Software project for
DDDAS
• Completed one year and 2 months
• Initial focus
• resource management in the cloud data center using
emulated workloads and pre-instrumented data centers
• Initial ideas on real-time stream processing for model
learning
• End-to-end scenario
• Focus for years 2 and 3
• Involve mobile entities, instrumentation, model learning
• Collaborate with DDDAS community for real workloads,
testbeds, etc
33
Concluding Remarks