Upload
ronni
View
34
Download
0
Embed Size (px)
DESCRIPTION
An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows. Xiao Liu 1 , Zhiwei Ni 2 , Zhangjun Wu 2 , Dong Yuan 1 , Jinjun Chen 1 , Yun Yang 1 1 SUCCESS ( Centre for Computing and Engineering Software Systems ), Swinburne University of Technology - PowerPoint PPT Presentation
Citation preview
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows
Xiao Liu1, Zhiwei Ni2, Zhangjun Wu2, Dong Yuan1, Jinjun Chen1, Yun Yang1
1SUCCESS (Centre for Computing and Engineering Software Systems), Swinburne University of Technology
Melbourne, Australia2Institute of Intelligent Management, Hefei University of Technology
Hefei, China
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Outline
> Background
– Workflow Technology Group
– SwinDeW Family, SwinGrid, SwinCloud
> Brief Overview: Workflow Temporal QoS Support
> Handling Temporal Violations in Scientific Workflows
– Problem Analysis
– An Effective Light-Weight Handling Framework
– Two-Stage Local Workflow Rescheduling Strategy
> Evaluation
> Summary
2
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Workflow Technology Group Overview
> WT group is a part of SUCCESS (Centre for Computing and Engineering Software Systems), a Tier-1 university research centre at Swinburne University of Technology. Our group conducts research into workflow technologies for complex software systems and services including peer-to-peer, grid, and cloud computing based e-science, e-business, transactional and inter-organisational workflows.
3
Leader:Prof Yun Yang
Visitors (7-8/09):Prof Lee OsterweilProf. Lori Clarke
Researchers:Dr Jinjun Chen (Senior Lecture)Xiao Liu (PostDoc)Dong Yuan (PhD)Gaofeng Zhang (PhD)Wenhao Li (PhD)
Dahai Cao (PhD)Xuyun Zhang (PhD)
Others:Prof Ryszard KowalczykProf Chengfei Liu
Dr Jun Yan (Wollongong)Prof Hai Jin (HUST)Prof Mingshu Li (ISCAS)Prof Qing Wang (ISCAS)Prof Zhiwei Ni (HFUT)Prof Jinpeng Huai (BUAA)
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
SwinDeW Family
> SwinDeW – Swinburne Decentralised Workflow- foundation prototype based on p2p
– SwinDeW – past
– SwinDeW-A (for Agents) – ARC DP06
– SwinDeW-G (for Grid) – past
– SwinDeW-V (for Verification) – current (ARC DP)
– SwinDeW-C (for cloud) – current (ARC LP)
– Others: SwinDeW-B / -S / -P / -G – past
> Current Projects:
– ARC DP110101340, Cost effective storage of massive intermediate data in cloud computing applications, Duration: 2011-2013
– ARC LP0990393, Novel cloud computing based on workflow technology for managing large numbers of process instances, Duration: 2010-2012.
4
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
SwinGrid to SwinCloud
5
Swinburne Computing Facilities
Astrophysics Supercomputer
VMware
Cloud Simulation Environment
Data Centres with Hadoop
· GT4· SuSE Linux
Swinburne CS3
…...
…...
· GT4· CentOS Linux
Swinburne ESR
…...
…...
· GT4· CentOS Linux
Activity
Workflow Execution
UKVPAC
HongKong
SwinburneCS3
· SwinDeW-G· GT4· CentOS Linux
BeihangCROWN· SwinDeW-G· CROWN· Linux
SwinburneESR
· SwinDeW-G· GT4· CentOS Linux
AstrophysicsSupercomputer
· SwinDeW-G· GT4· SuSE Linux
PfC
na 1na
2na
3na 4na
5na 6na Na
ma 1ma
2ma
3ma 4ma
5ma 6ma Ma
Amazon Data Centre
Google Data Centre
Microsoft Data Centre
SwinDeW-G Grid Computing
Infrastructure
Commercial Cloud
Infrastructure
VMVMVM VM VMVMVM VMVMVMVMVM
……..
……..
……..Application
Layer
Platform Layer
Unified Resource
Layer
Fabric Layer
SwinCloud……..
VM
SwinDeW-C Peer
SwinDeW-C Coordinator Peer
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Outline
> Background
– Workflow Technology Group
– SwinDeW Family, SwinGrid, SwinCloud
> Brief Overview: Workflow Temporal QoS Support
> Handling Temporal Violations in Scientific Workflows
– Problem Analysis
– An Effective Light-Weight Handling Framework
– Two-Stage Local Workflow Rescheduling Strategy
> Evaluation
> Summary
6
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Scientific Workflows
> Scientific Workflow often underlies many large-scale complex e-science applications such as climate modeling, astrophysics, structural biology and chemistry, earth quake simulation and disaster recovery.
> Scientific workflows are usually deployed in distributed high performance computing infrastructures such as cluster, grid and cloud.
> Compared with conventional business workflows, most scientific workflow are more data and/or computation intensive, less human interaction, large scale, complex process structures.
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Temporal QoS Support for Scientific Workflows
> Motivation: most e-science applications are time constrained with global temporal constraints (deadlines) and local temporal constraints (milestones) to achieve some pre-defined goals on schedule.
> Basic requirements: automation and cost-effectiveness.
> Challenges: highly dynamic system environments, changing process structures, charge for the usage of resources
> Solution: A Novel Probabilistic Temporal Framework and Its Strategies for Cost-Effective Delivery of High QoS in Scientific Cloud Workflow Systems [PhD Thesis - Xiao Liu]
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Lifecycle Support of Temporal QoS
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Lifecycle Support of Temporal QoS
> At workflow build-time modeling stage
– Component 1: temporal constraint setting
• Forecasting activity durations [eScience08], [JSS10b]
• Setting both coarse-grained and fine-grained temporal constraints [BPM08], [CCPE09], [JCSS10]
– Component 2: temporal consistency monitoring
• Temporal checkpoint selection [ICSE08], [TAAS07]
• Temporal verification [CCPE07], [ToSEM09]
– Component 3: temporal violation handling
• Temporal violation handling point selection [TSE]
• Temporal violation handling [CCGrid], [JSS10a], [TSE], [ICPADS]
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Outline
> Background
– Workflow Technology Group
– SwinDeW Family, SwinGrid, SwinCloud
> Brief Overview: Workflow Temporal QoS Support
> Handling Temporal Violations in Scientific Workflows
– Problem Analysis
– An Effective Light-Weight Handling Framework
– Two-Stage Local Workflow Rescheduling Strategy
> Evaluation
> Summary
11
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Problem Analysis
> Basic requirements: automation and cost-effectiveness
> 1) How to define fine-grained recoverable temporal violations.
– Define statistical recoverable and non-recoverable temporal violations, to avoid heavy-weight exception handling strategies and facilitate light-weight ones
– Divide fine-grained recoverable temporal violations, to facilitate the choice of different handling strategies with different capability (higher capability, higher cost)
> 2) Which light-weight effective exception handling strategies to be facilitated.
– Employ or design a set of light-weight handling strategies, from low capability to high capability (low cost to high cost)
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
An Effective Light-Weight Handling Framework
> Three levels of temporal violations
– Level I, Level II and Level III
> Corresponding three levels of temporal violation handling strategies
– TDA, ACOWR and TDA+ACOWR
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Three Levels of Handling Strategies
> TDA (Time Deficit Allocation) [CCPE07]
– TDA is to actively propagate small time deficits to the subsequent workflow activities so that they may be compensated by their saved execution time.
> ACOWR (Ant Colony Optimisation based Workflow Rescheduling) [CCGrid10]
– Based on our general two-stage local workflow rescheduling strategy
– Using ACO as the metaheuristic algorithm
> TDA+ACOWR (the hybrid strategy of TDA and ACOWR)
– One time TDA and multiple times of ACOWR (normally smaller than 3)
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
A General Two-Stage Workflow Local Rescheduling Strategy
> Handling temporal violations with workflow rescheduling
> Key objective: reduce or ideally remove the time deficit at the current checkpoint, i.e. to reduce the execution time of the subsequent activities after the checkpoint in the violated workflow segment as much as possible
> Requirement 1: fighting good balance between time deficit compensation and the completion time of other activities (workflow activities and general tasks, with or without temporal constraints) – from the overall makespan perspective
> Requirement 2: utilising available resources in the system rather than recruiting additional resources – from the overall cost perspective
15
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Integrated Task Resource List
16
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
17
Pseudo-code for An Abstract Strategy
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Outline
> Background
– Workflow Technology Group
– SwinDeW Family, SwinGrid, SwinCloud
> Brief Overview: Workflow Temporal QoS Support
> Handling Temporal Violations in Scientific Workflows
– Problem Analysis
– An Effective Light-Weight Handling Framework
– Two-Stage Local Workflow Rescheduling Strategy
> Evaluation
> Summary
18
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Evaluation
> Performance analysis and comparison (with GA) for ACOWR
– Optimisation on Total Makespan
– Optimisation on Total Cost
– Time Compensation on Violated Workflow Segment
– CPU Time
> Effectiveness evaluation of the three-level handing framework
– Violation Rate of Global Temporal Constraints and Local Temporal Constraints
– Cost Analysis
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Optimisation on Total Makespan
20
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Optimisation on Total Cost
21
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Time Compensation on Violated Workflow Segment
22
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
CPU Time
23
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Experiment Results on Temporal Violation Rates
24
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Cost Analysis
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Outline
> Background
– Workflow Technology Group
– SwinDeW Family, SwinGrid, SwinCloud
> Brief Overview: Workflow Temporal QoS Support
> Handling Temporal Violations in Scientific Workflows
– Problem Analysis
– An Effective Light-Weight Handling Framework
– Two-Stage Local Workflow Rescheduling Strategy
> Evaluation
> Summary
26
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
Summary
> Temporal QoS Support is Critical in e-Science Applications
> Temporal Violation Handling in Scientific Workflows
– Automatic, Cost-Effective
– Level I, Level II and Level III
– TDA, ACOWR, TDA+ACOWR
> A Two-Stage Workflow Local Rescheduling Strategy
• ACO, GA, PSO, many other metaheuristics
> Future Work
– Data movement cost
– More scheduling algorithms
27
X. Liu , Z. Ni , Z. Wu , D. Yuan , J. Chen , Y. Yang, ICPADS10, 10-12-2010, Shanghai, China
The End – Thank You!
> Any questions or comments?
> Email: [email protected]
> Website: http://www.ict.swin.edu.au/personal/xliu/
> An extension of this paper, titled “A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems,” has been accepted by Journal of Systems and Software (JSS), http://dx.doi.org/10.1016/j.jss.2010.10.027.
28