Upload
jasmine-coughlin
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Addressing Complexity in Emerging Cyber-Ecosystems – Experiments with Autonomic
Computational ScienceManish Parashar*
Center for Autonomic ComputingThe Applied Software Systems Laboratory
Rutgers, The State University of New Jersey
*In collaboration with S. Jha & O. Rana
eSI Visitor Seminar – 01/13/10
Outline of My Presentation
• Computational Ecosystems– Unprecedented opportunities, challenges
• Autonomic computing – A pragmatic approach for addressing complexity!
• Experiments with autonomics for science and engineering
• Concluding Remarks
eSI Visitor Seminar – 01/13/10
The Cyberinfrastructure Vision
• “Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools…”
- NSF’s Cyberinfrastructure Vision for 21st Century Discovery
• A global phenomenon; several LARGE deployments– UK National Grid Service (NGS) /European Grid Infrastructure (EGI),
TeraGrid, Open Science Grid (OSG), EGEE, Cybera, DEISA, etc., etc.
• New capabilities for computational science and engineering– seamless access
• resources, services, data, information, expertise, …
– seamless aggregation– seamless (opportunistic) interactions/couplings
Cyberinfrastructure => Cyber-Ecosystems
21st century Science and Engineering: New Paradigms & Practices
• Fundamentally data-driven/data intensive
• Fundamentally collaborative
eSI Visitor Seminar – 01/13/10
Unprecedented opportunities for Science/Engineering
• Knowledge-based, information/data-driven, context/content-aware computationally intensive, pervasive applications– Crisis management, monitor and predict natural phenomenon,
monitor and manage engineered systems, optimize business processes
• Addressing applications in an end-to-end manner!– Opportunistically combine computations, experiments, observations,
data, to manage, control, predict, adapt, optimize, …
• New paradigms and practices in science and engineering?– How can it benefit current applications?– How can it enable new thinking in science?
eSI Visitor Seminar – 01/13/10The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL)
Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.
Assimilate data & reservoir properties into the evolving reservoir model.
Use simulation and optimization to guide future production.
Data Driven
ModelDriven
eSI Visitor Seminar – 01/13/10
Many Application Areas ….• Hazard prevention, mitigation and response
– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist attacks
• Critical infrastructure systems– Condition monitoring and prediction of future capability
• Transportation of humans and goods – Safe, speedy, and cost effective transportation networks and vehicles (air,
ground, space)• Energy and environment
– Safe and efficient power grids, safe and efficient operation of regional collections of buildings
• Health– Reliable and cost effective health care systems with improved outcomes
• Enterprise-wide decision making– Coordination of dynamic distributed decisions for supply chains under
uncertainty• Next generation communication systems
– Reliable wireless networks for homes and businesses
• … … … …
• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org Source: M. Rotea, NSF
eSI Visitor Seminar – 01/13/10
The Challenge: Managing Complexity, Uncertainty (I)
• Increasing application, data/information, system complexity– Scale, heterogeneity, dynamism, unreliability, …
• New application formulations, practices– Data intensive and data driven, coupled, multiple
physics/scales/resolution, adaptive, compositional, workflows, etc.
• Complexity/uncertainty must be simultaneously addressed at multiple levels– Algorithms/Application formulations
• Asynchronous/chaotic, failure tolerant, …
– Abstractions/Programming systems• Adaptive, application/system aware, proactive, …
– Infrastructure/Systems• Decoupled, self-managing, resilient, …
eSI Visitor Seminar – 01/13/10
The Challenge: Managing Complexity, Uncertainty (II)• The ability of scientists to realize the potential of
computational ecosystems is being severely hampered due to the increased complexity and dynamism of the applications and computing environments.
• To be productive, scientists often have to comprehend and manage complex computing configurations, software tools and libraries as well as application parameters and behaviors.
• Autonomics and self-* can help ?(with the “plumbing” for starters…)
eSI Visitor Seminar – 01/13/10
Outline of My Presentation
• Computational Ecosystems– Unprecedented opportunities, challenges
• Autonomic computing – A pragmatic approach for addressing complexity!
• Experiments with autonomics for science and engineering
• Concluding Remarks
eSI Visitor Seminar – 01/13/10
The Autonomic Computing Metaphor
• Current paradigms, mechanisms, management tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems and applications
• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self
protecting, highly decentralized, heterogeneous architectures that work !!!
• Goal of autonomic computing is to enable self-managing systems/applications that addresses these challenges using high level guidance– Unlike AI duplication of human thought is not the ultimate goal!
“Autonomic Computing: An Overview,” M. Parashar, and S. Hariri, Hot Topics, Lecture Notes in Computer Science, Springer Verlag, Vol. 3566, pp. 247-259, 2005.
eSI Visitor Seminar – 01/13/10Motivations for Autonomic Computing
Source: http:idc 2006
8/12/07: 20K people + 60 planes held at LAX after computer failure prevented customs from screening arrivals
8/3/07: (EPA) datacenter energy use by 2011 will cost $7.4 B, 15 power plants, 15 Gwatts/hour peak
Source:http://www.almaden.ibm.com/almaden/talks/Morris_AC_10-02.pdf
Key ChallengeCurrent levels of scale, complexity and dynamism make it infeasible for humans to effectively manage and control systems and applications
2/27/07: Dow fell 546. Since worst plunge took place after 2:30 pm, trading limits were not activated
8/1/06: UK NHS hit with massive computer outage. 72 primary care + 8 acute hospital trusts affected.
eSI Visitor Seminar – 01/13/10
Autonomic Computing – A Pragmatic Approach• Separation + Integration + Automation !
• Separation of knowledge, policies and mechanisms for adaptation
• The integration of self–configuration, – healing, – protection,–optimization, …
• Self-* behaviors build on automation concepts and mechanisms– Increased productivity, reduced operational costs, timely and effective
response
• System/Applications self-management is more than the sum of the self-management of its individual components
M. Parashar and S. Hariri, Autonomic Computing: Concepts, Infrastructure, and Applications, CRC Press, Taylor & Francis Group, ISBN 0-8493-9367-1, 2007.
eSI Visitor Seminar – 01/13/10
Autonomic Computing Theory• Integrates and advances several fields
– Distributed computing• Algorithms and architectures
– Artificial intelligence• Models to characterize,
predict and mine data and behaviors
– Security and reliability• Designs
and models of robust systems
– Systems and software architecture• Designs and models of
components at different IT layers– Control theory
• Feedback-based control and estimation– Systems and signal processing theory
• System and data models and optimization methods
• Requires experimental validation
(From S. Dobson et al., ACM Tr. on Autonomous & Adaptive Systems, Vol. 1, No. 2, Dec. 2006.)
eSI Visitor Seminar – 01/13/10
Autonomics for Science and Engineering ?
• Autonomic computing aims at developing systems and application that can manage and optimize themselves using only high-level guidance or intervention from users– dynamically adapt to changes in accordance with business policies
and objectives and take care of routine elements of management
• Separation of management and optimization policies from enabling mechanisms – allows a repertoire of a mechanisms to be automatically
orchestrated at runtime to respond to heterogeneity, dynamics, etc.
• E.g., develop strategies that are capable of identifying and characterizing patterns at design and at runtime and, using relevant (dynamically defined) policies, managing and optimizing the patterns.
• Application, Middleware, Infrastructure
• Manage application/information/system complexity• not just hide it!
• Enabling new thinking, formulations• how do I think about/formalize my problem
differently?
eSI Visitor Seminar – 01/13/10A Conceptual Framework for ACS
(GMAC 07, with S. Jha and O. Rana)
• Hierarchical• Within and across level …
eSI Visitor Seminar – 01/13/10
Crosslayer Autonomics
eSI Visitor Seminar – 01/13/10
Existing Autonomic Practices in Computational Science (GMAC 09, SOAR 09, with S. Jha and O. Rana)
Autonomic tuning by the application
Autonomic tuning of the application
eSI Visitor Seminar – 01/13/10
Spatial, Temporal and Computational Heterogeneity and Dynamics in SAMR
Simulation of combustion based on SAMR (H2-Air mixture; ignition via 3 hot-spots)
Temperature
OH Profile
Temporal
Heterogeneity
Spatial Heterogeneity
Courtesy: Sandia National Lab
eSI Visitor Seminar – 01/13/10
Autonomics in SAMR
• Tuning by the application– Application level: when and where to refine– Runtime/Middleware level: When, where, how to partition and
load balance– Runtime level: When, where, how to partition and load balance– Resource level: Allocate/de-allocate resources
• Tuning of the application, runtime – When/where to refine– Latency aware ghost synchronization– Heterogeneity/Load-aware partitioning and load-balancing– Checkpoint frequency– Asynchronous formulations– …
eSI Visitor Seminar – 01/13/10
Outline of My Presentation
• Computational Ecosystems– Unprecedented opportunities, challenges
• Autonomic computing – A pragmatic approach for addressing complexity!
• Experiments with autonomics for science and engineering
• Concluding Remarks
eSI Visitor Seminar – 01/13/10
Autonomics for Science and Engineering – Application-level Examples
eSI Visitor Seminar – 01/13/10
Coupled Fusion Simulations: A Data Intensive Workflow
eSI Visitor Seminar – 01/13/10Autonomic Data Streaming and In-Transit Processing for Data-Intensive Workflows• Workflow with coupled simulation codes, i.e., the edge
turbulence particle-in-cell (PIC) code (GTC) and the microscopic MHD code (M3D) -- run simultaneously on separate HPC resources
• Data streamed and processed enroute -- e.g. data from the PIC codes filtered through “noise detection” processes before it can be coupled with the MHD code
• Efficiently data streaming between live simulations -- to arrive just-in-time -- if it arrives too early, times and resources will have to be wasted to buffer the data, and if it arrives too late, the application would waste resources waiting for the data to come in
• Opportunistic use of in-transit resources “An Self-Managing Wide-Area Data Streaming Service,” V. Bhat*, M. Parashar, H. Liu*, M. Khandekar*, N. Kandasamy, S. Klasky, and S. Abdelwahed, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Volume 10, Issue 7, pp. 365 – 383, December 2007.
eSI Visitor Seminar – 01/13/10
Autonomic Data Streaming & In-Transit Processing
– Application level• Proactive QoS management strategies using model-based LLC controller• Capture constraints for in-transit processing using slack metric
– In-transit level• Opportunistic data processing using dynamic in-transit resource overlay• Adaptive run-time management at in-transit nodes based on slack
metric generated at application level– Adaptive buffer management and forwarding
Application Level “Proactive” management
Simulation
LLC Controller
Slack metric Generator
In-Transit nodeSimulation
Slack metric Generator
In-Transit Level “Reactive” management
Slack metric corrector
Coupling
Slack metric corrector
Budget estimation
Slack metric adjustment
metric updates
Sink
Data flow
eSI Visitor Seminar – 01/13/10
Autonomics for Coupled Fusion Simulation Workflows
eSI Visitor Seminar – 01/13/10
Autonomic Streaming: Implementation/Deployment
• Simulation Workflow– SS = Simulation Service (GTC)
– ADSS = Autonomic Data Streaming Service
• CBMS = LLC Controller based buffer management service
• DTS = Data Transfer service
– DAS = Data Analysis Service
– SLAMS = Slack Manager Service
– PS = Processing Service
– BMS = Buffer Management Service
– ArchS = Archiving data at sink
Sort data
Scale data
Data Producers
SS
NERSC
Rutgers University
ADSSArchSDAS
DAS
CBMS DTSDAS
SS
ORNL
ADSS
Data In-Transit
Data Consumers
SLAMS
DTS
PS
PPPL
FFT
DAS DAS
Rutgers University
VisSDASBMS
SLAMS
BudjS
SLAMS
Sink
SLAMS
FFT
• Simulations executes on leadership class machines at ORNL and NERSC
• In-transit nodes located at PPPL and Rutgers
eSI Visitor Seminar – 01/13/10
Adaptive Data Transfer
• No congestion in intervals 1-9 – Data transferred over WAN
• Congested at intervals 9-19 – Controller recognizes this congestion and advises the Element Manager, which in
turn adapts DTS to transfer data to local storage (LAN).
• Adaptation continues until the network is not congested – Data sent to the local storage by the DTS falls to zero at the 19th controller interval.
Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24
Dat
a T
ran
sfer
rred
by
DT
S(M
B)
0
20
40
60
80
100
120
140
Ban
dw
idth
(M
b/s
ec)
0
20
40
60
80
100
120
DTS to WANDTS to LANBandwidthCongestion
eSI Visitor Seminar – 01/13/10
Adaptation of the Workflow
• Create multiple instances of the Autonomic Data Streaming Service (ADSS)– Effective Network
Transfer Rate dips below the threshold (our case around 100Mbs)
Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160
% N
etw
ork
th
rou
gh
pu
t
0
20
40
60
80
100
Nu
mb
er
of
AD
SS
In
sta
nc
es
0
1
2
3
4
5
% Network throughput vs MbpsNumber of ADSS Instances vs Mbps
TransferSimulation
ADSS-0 Data TransferBuffer
Data TransferBuffer
Data TransferBuffer
ADSS-1
ADSS-2
% Network throughput is difference between the max and current network transfer rate
eSI Visitor Seminar – 01/13/10
Reservoir Characterization: EnKF-based History Matching (with S. Jha)
• Black Oil Reservoir Simulator – simulates the
movement of oil and gas in subsurface formations
• Ensemble Kalman Filter– computes the Kalman
gain matrix and updates the model parameters of the ensembles
• Hetergeneous, dynamic workflows
• Based on Cactus, PETSc
eSI Visitor Seminar – 01/13/10
Experiment Background and Set-Up (2/2)• Key metrics
– Total Time to Completion (TTC)– Total Cost of Completion (TCC)
• Basic assumptions– TG gives the best performance but is relatively more restricted
resource.– EC2 is a relatively more freely available but is not as capable.
• Note that the motivation of our experiments is to understand each of the usage scenarios and their feasibility, behaviors and benefits, and not to optimize the performance of any one scenario.
eSI Visitor Seminar – 01/13/10Autonomic Integration of HPC Grids & Clouds (with S. Jha)• Acceleration: Clouds used as accelerators to improve
the application time-to-completion – alleviate the impact of queue wait times or exploit an
additionally level of parallelism by offloading appropriate tasks to Cloud resources
• Conservation: Clouds used to conserve HPC Grid allocations, given appropriate runtime and budget constraints
• Resilience: Clouds used to handle unexpected situations– handle unanticipated HPC Grid downtime, inadequate
allocations or unanticipated queue delays
eSI Visitor Seminar – 01/13/10
Objective I: Using Clouds as Acceleratorsfor HPC Grids (1/2)• Explore how Clouds (EC2) can be used as accelerators
for HPC Grid (TG) work-loads– 16 TG CPUs (1 node on Ranger) – average queuing time for TG was set to 5 and 10 minutes.– the number of EC2 nodes from 20 to 100 in steps of 20. – VM start up time was about 160 seconds
eSI Visitor Seminar – 01/13/10
Objective I: Using Clouds as Acceleratorsfor HPC Grids (2/2)
The TTC and TCC for Objective I with 16 TG CPUs and queuing times set to 5 and 10 minutes. As expected, more the number of VMs that are made available, the greater the acceleration, i.e., lower the TTC. The reduction in TTC is roughly linear, but is not perfectly so, because of a complex interplay between the tasks in the work load and resource availability
eSI Visitor Seminar – 01/13/10
Objective II: Using Clouds for ConservingCPU-Time on the TeraGrid• Explore how to conserve fixed allocation of CPU hours
by offloading tasks that perhaps don’t need the specialized capabilities of the HPC Grid
Distribution of tasks across EC2 and TG, TTC and TCC, as the CPU-minute allocation on the TG is increased.
eSI Visitor Seminar – 01/13/10
Objective III: Response to Changing Operating Conditions (Resilience) (1/4)• Explore the situation where resources that were
initially planned for, become unavailable at runtime, either in part or in entirety– How can Cloud services be used to address this situations and
allow the system/application to respond to a dynamic change in availability of resources.
• Initially 16 TG CPUs for 800 minutes allocated. After about 50 minutes of execution (i.e., 3 Tasks were completed on the TG), available CPU time is change to only 20 CPU minutes remain
eSI Visitor Seminar – 01/13/10
Objective III: Response to Changing Operating Conditions (Resilience) (2/4)
Allocation of tasks to TG CPUs and EC2 nodes for usage mode III. As the 16 allocated TG CPUs become unavailable after only 70 minutes rather than the planned 800 minutes, the bulk of the tasks are completed by EC2 nodes.
eSI Visitor Seminar – 01/13/10
Objective III: Response to Changing Operating Conditions (Resilience) (3/4)
Number of TG cores and EC2 nodes as a function of time for usage mode III. Note that the TG CPU allocation goes to zero after about 70 minutes causing the autonomic scheduler to increase the EC2 nodes by 8.
eSI Visitor Seminar – 01/13/10
Objective III: Response to Changing Operating Conditions (Resilience) (4/4)
Overheads of resilience on TTC and TCC.
eSI Visitor Seminar – 01/13/10
Autonomic Formulations/Programming
Element Manager
Functional Port
Autonomic Element
Control Port
Operational Port
ComputationalElement
Element Manager
Functional Port
Autonomic Element
Control Port
Operational Port
ComputationalElement
Element Manager
Event generation
Actuatorinvocation
OtherInterface
invocation
Internalstate
Contextualstate
Rules
Element Manager
Event generation
Actuatorinvocation
OtherInterface
invocation
Internalstate
Contextualstate
Rules
Application workflow
Composition manager
Application strategiesApplication requirements
Interaction rules
Interaction rules
Interaction rules
Interaction rules
Behavior rules
Behavior rules
Behavior rules
Behavior rules
Application workflow
Composition manager
Application strategiesApplication requirements
Interaction rules
Interaction rules
Interaction rules
Interaction rules
Behavior rules
Behavior rules
Behavior rules
Behavior rules
eSI Visitor Seminar – 01/13/10
The Instrumented Oil Field• Production of oil and gas can take advantage of installed sensors that
will monitor the reservoir’s state as fluids are extracted• Knowledge of the reservoir’s state during production can result in better
engineering decisions – economical evaluation; physical characteristics (bypassed oil, high pressure
zones); productions techniques for safe operating conditions in complex and difficult areas
Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes
Assimilate data & reservoir properties into the evolving reservoir model
Use simulation and optimization to guide future production, future data acquisition strategy
“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.
eSI Visitor Seminar – 01/13/10
Effective Oil Reservoir Management: Well Placement/Configuration
• Why is it important – Better utilization/cost-effectiveness of existing reservoirs– Minimizing adverse effects to the environment
Better Management
Less Bypassed Oil
Bad Management
Much Bypassed Oil
eSI Visitor Seminar – 01/13/10
Optimize• Economic revenue• Environmental hazard• …Based on the present subsurface knowledge and numerical model
Improve numerical model
Plan optimal data acquisition
Acquire remote sensing data
Improve knowledge of subsurface to reduce uncertainty
Update knowledge of model
Man
agem
ent
dec
isio
n
START
Dynamic Decision Dynamic Decision SystemSystem
Dynamic Data-Dynamic Data-Driven Assimilation Driven Assimilation
Data assimilation
Subsurface characterization
Experimental design
Autonomic Autonomic Grid Grid MiddlewareMiddleware
Grid Data ManagementGrid Data ManagementProcessing MiddlewareProcessing Middleware
Autonomic Reservoir Management: “Closing the Loop” using Optimization
eSI Visitor Seminar – 01/13/10
An Autonomic Well Placement/Configuration Workflow
If guess not in DBinstantiate IPARS
with guess asparameter
Send guesses
MySQLDatabase
If guess in DB:send response to Clientsand get new guess fromOptimizer
OptimizationService
IPARSFactory
SPSA
VFSA
ExhaustiveSearch
DISCOVERclient
client
Generate Guesses Send GuessesStart Parallel
IPARS InstancesInstance connects to
DISCOVER
DISCOVERNotifies ClientsClients interact
with IPARS
AutoMate Programming System/Grid Middleware
History/ Archived Data
Sensor/ContextData
Oil prices, Weather, etc.
eSI Visitor Seminar – 01/13/10Autonomic Oil Well Placement/Configuration
permeability Pressure contours3 wells, 2D profile
Contours of NEval(y,z,500)(10)
Requires NYxNZ (450)evaluations. Minimum
appears here.
VFSA solution: “walk”: found after 20 (81) evaluations
eSI Visitor Seminar – 01/13/10
Autonomic Oil Well Placement/Configuration (VFSA)
“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.
eSI Visitor Seminar – 01/13/10
Summary• CI and emerging computational ecosystems
– Unprecedented opportunity• new thinking, practices in science and engineering
– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …
• Autonomic Computing can address complexity and uncertainty– Separation + Integration + Automation
• Experiments with Autonomics for science and engineering – Autonomic data streaming and in-transit data manipulation,
Autonomic Workflows, Autonomic Runtime Management, …
• However, there are implications– Added uncertainty– Correctness, predictability, repeatability– Validation