Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
TruEGrid Project SeminarMy Research Topic
Ana Cristina Alves de Oliveira [email protected]
1University of Technology of Dresden (TU Dresden)
2Federal University of Campina Grande (UFCG)
3Federal Institute of Education, Science and Technology of Paraíba (IFPB)Campina Grande Campus
October 25th, 2013
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 1 / 17
Outline
1 My Work in TU Dresden
2 Cost Model
3 Scheduling Performance EvaluationMethodologyObtained Results
4 My PhD: Previous and Future Work
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 2 / 17
My Work in TU Dresden
Work at TU Dresden: From June to December
1 Design of a cost model for Data-as-a-Service2 Design of a scheduling system3 Development of a scheduling simulator4 Performance evaluation
Preliminary results
5 Deployment and demo of the scheduling system into a productionenvironment (current)
6 Improvements to the scheduling system7 Technical report writing
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 3 / 17
My Work in TU Dresden
Work at TU Dresden: From June to December
1 Design of a cost model for Data-as-a-Service2 Design of a scheduling system3 Development of a scheduling simulator4 Performance evaluation
Preliminary results
5 Deployment and demo of the scheduling system into a productionenvironment (current)
6 Improvements to the scheduling system7 Technical report writing
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 3 / 17
Cost Model
Datacenter and Scheduling Representation
Figure: Scheduling architecture
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 4 / 17
Cost Model
VM Access to the Replicated Data
Query VMs access single partition, but data is replicated
Figure: VM Query accessing a data replica from the storage
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 5 / 17
Cost Model
Definition of Variables
Figure: Query VM accessing a data replica from the storage
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 6 / 17
Performance Evaluation Methodology
Scheduling Strategies
1 Cost-based
Based on partial informationk and R are not known, thus not applied to decide on the VM placement
2 Cost-based+
Based on complete and perfect informationk and R are known and used to decide on the VM placement
3 Random
We select randomly an i in replica set {A, B, C}
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 7 / 17
Performance Evaluation Methodology
Assumptions
One query per VM
Each query selects data from 1 file
No file is shared among queries
Each datacenter host may have at most 8 VMs running (according to VMresource requirements)
All VMs have the same computational resources
All queries have the same input and output sizes
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 8 / 17
Performance Evaluation Methodology
Treatments and System Model
Table: Evaluation Treatments
Nr Parameter Factors Levels
1 #DCs 402 #Hosts per DC {1,5}
Table: System Model
Nr Parameter Configuration
1 v (cpu cost) Random double in the range [0.065,3.41)2 t (bandwidth cost) Random double in the range [0.015,0.51)3 q Random double in the range [0.05,1)4 k Random double in the range [0.7,3.5)5 #queries = #VMs = #files 406 #file replicas 3 (randomly placed among datacenters)7 Max(#VMs per host) 88 Output size (bytes) 100
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 9 / 17
Performance Evaluation Obtained Results
Understanding the Simulation
Let D be the set of datacenters:D = {d1,d2, . . . ,d|D|}, where |D|= 40
Let Q be the set of queries:Q = {q1,q2, . . . ,q|Q|}, where |Q|= 40
Let P(qz ,di ,dj) be the price of executing query qz at datacenter di byretrieving data from dj ; where 1≤ i, j ≤ |D| and 1≤ z ≤ |Q|Each experiment treatment was replicated 35 times
The results show the mean values for the normalized price with aconfidence interval of 95%
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 10 / 17
Performance Evaluation Obtained Results
Understanding the Normalized Price Metric
Normalized Price: NP(qz ,e, r)
The normalized price of the query qz in the r -esim replica of experiment e:
NP(qz ,e, r) =P(qz ,di ,dj)
minPqz
;di and dj were chosen by the scheduler (1)
minPqz = minP(qz ,di ,dj);∀di ,dj ∈ D
Note: The datacenters with available resources to schedule a VM will vary according to the number of VMsthat are actually running on them. The prices are based on the initial set of DCs.
Optimal Normalized Price
The scheduling objective (to achieve the minimal price):
NP(qz ,e, r)→ 1;∀z,e, r (2)
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 11 / 17
Performance Evaluation Obtained Results
Treatment 1: 40 DCs and 1 Host/DC (up to 8 VMs each)
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 12 / 17
Performance Evaluation Obtained Results
Treatment 2: 40 DCs and 5 Hosts/DC (up to 40 VMs each)
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 13 / 17
My PhD and Research Interest
My PhD and Research Interest
I worked on the design and implemention of a SLA monitoring softwarefunded by the Brazilian National Network for Research and Education(RNP - Rede Nacional de Ensino e Pesquisa) under the Just-in-Time(JiT) Clouds Project
JiT Clouds is an open source middleware to federate resources intoprivate, public or hibrid cloudshttp://jitclouds.lsd.ufcg.edu.br
Research interests:1 Monitoring and analysis of network traffic2 Network anomaly detection3 Accounting and pricing of cloud computing services
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 14 / 17
My PhD and Research Interest
Network Traffic Monitoring Architecture
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 15 / 17
My PhD and Research Interest
Integration of the Engine with a Cloud Platform
Figure: Example suits the Jit-Clouds Platform
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 16 / 17
Summary
Outlook
Currently working with VM prices based on network and CPU costs
We intend to extend the VM cost model to place VMs also considering theenergy efficiency (aligned with the LEADS Project)
As part of my PhD, we also intend to model the costs of network traffic inthe presence of SLA breaches, which will be taken into account to thebilling service and the decision making system
Ana Cristina Oliveira (TU Dresden) TruEGrid Project Seminar October 25th, 2013 17 / 17