Upload
yehuda
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology. Outline. Cloud Computing & Cloud Workflow Systems Introduction to cloud workflow systems. A brief overview of grid workflow systems. - PowerPoint PPT Presentation
Citation preview
Data Management in Cloud Workflow Systems
Dong Yuan
Faculty of Information and Communication Technology
Swinburne University of Technology
Outline
> Cloud Computing & Cloud Workflow Systems
– Introduction to cloud workflow systems. A brief overview of grid workflow systems.
> Data Management in Cloud Workflow Systems
– New features and research issues
> Cloud Computing Environment and SwinDeW-C
– Our simulation environment and cloud workflow system
> Cloud Computing & Cloud Workflow Systems
Cloud Computing
> Some new features of cloud computing
– Large data centres with cheap hardware
– Virtualisation
– Internet based and SOA
• SaaS, PaaS, IaaS
– Market driven and cost model
> Research of cloud computing has emerged in many areas
– Data mining, Database, Parallel computing & Scientific application, Content delivery
Cloud Workflow Systems
> Grid workflow systems
– Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON
– Gridbus, GridFlow
> Build-time: focus on data modelling.
– Kepler: actor-oriented data modelling. Taverna - Sculf. ASKALON - AGWL
> Runtime: adopt Data Grid system
– Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS), GSB, DaltOn
Cloud Workflow Systems
> Architecture
– Based on Internet
– Platform as a Service
– More distributed
Unified Resources
Fabric
Platform
Web Portal
User
Workflow Application
Workflow Specification
Cloud Service
Virtual Machine
Cloud Service Cloud
Service
Cloud Service
Cloud Service
Local Data CentreGlobalCloud
Cloud Service ProviderCloud
Service Provider
Cloud Service Provider
Cloud Service Provider
> Data Management in Cloud Workflow Systems
Data Management in Cloud Workflow Systems
> New features and challenges– Independent of users and automatic
– Cost driven
• computation cost, storage cost, data transfer cost
– Data dependency
• Task – data, data – data, derivation
> Some research issues– Data partition, placement, replication, synchronisation,
provenance, catalogue, meta-data, consistence, reduction, storage, movement, etc.
Data Placement in Cloud Workflow Systems
> Data Placement: to decide where to store the application data in the distributed data centres
> Aims:
– Reduce data movement
– Reduce task waiting time
> Strategy:
– Data dependency: dataset – dataset
– Build-time: existing data, runtime: generated data (also intermediate data)
Data Replication in Cloud Workflow Systems
> Data replication: for one dataset, store several copies in different places (data centres)
> Aims:
– Increase data security
– Fast data access
– Reduce data movement
> Strategy:
– Dynamic replication.
Intermediate Data Storage in Cloud Workflow Systems
> Intermediate data storage is especially importance in scientific workflows
> Aim:
– Reduce system cost
> Strategy:
– Intermediate data can be regenerated with data provenance information
– Selectively store some key intermediate datasets
> Cloud computing environment and SwinDeW-C
Simulation Cloud
Swinburne Cluster
VMware
SwinDeW-C
…... …...Physical Machines
Layer
Virtual Machines
Layer
ApplicationsLayer
Data Centres with Hadoop
Web Portal
Related key system components of SwinDeW-C
User Interface Module
Data Management Module
Data Placement Component
Data Replication Component
Intermediate data storage Component
Data Catalogue
Flow Management Module
Process Repository
Task Management Module
Scheduler
Resource Management Module
…...
Web PortalMonitoring Component
Uploading Component
Meta-data Management Component
Provenance Data
Collection
End
> Questions?