Upload
angel-sutton
View
216
Download
0
Embed Size (px)
DESCRIPTION
3 Energy in the DW World: Scenarios ExcelERP Databases Extraction TransformationLoading Sources Logical Design Query Processing: Date Cars Cities sum Peugeot Toyota Renault 1Qtr 2Qtr 3Qtr 4Qtr Paris Poitiers Tours Sum Physical Design Conceptual Design [Xu13], [Lang09], [Harizopoulos09], [Kunjir12], [Lang11] MirabelProject ( FP7 ) Deployment Design
Citation preview
Eco-DMW: Eco-Design Methodology for Data Warehouses
Amine ROUKH1, Ladjel BELLATRECHE2, Ahcène BOUKORCA2, Selma BOUARAR2
1University of Mostaganem, Algeria2LIAS/ISAE-ENSMA, Futuroscope, France
Laboratoire d’Informatique et d’Automatique pour les Systèmes
Context
68%
28%4% 0,01%
IT EquipementCoolingBuilding and IT Power LossLighting and General Receptacle
o DBMS is one of the major energy consumer;o Performance-oriented Design
Towards an Energy-aware DBMS Design
Database Design: Big PictureEnergy Consumption of a Data Center
2
3
Energy in the DW World:Scenarios
Excel
ERP
Databases
Extr
actio
n
Tran
sfor
mati
on
Load
ing
Sources Logical Design
Query
Processing:
Date
Cars
Citie
s
sum
sum Peugeot
ToyotaRenault
1Qtr 2Qtr 3Qtr 4Qtr
Paris
Poitiers
Tours
Sum
Physical DesignConceptual Design
[Xu13], [Lang09],[Harizopoulos09],[Kunjir12], [Lang11]
Mirabel Project(FP7)
Deployment Design
4
Selection of Optimization Structures (OS)
Formalization1. A DW Schema2. A Very Large Workload Q3. A Set of OS4. A Set of Constraints C related to
OS5. Non-Functional Requirements
(NFR)
Objective: selects schemes of OS satisfying NFR and respecting C
OS
Redundant(ROS)
e.g. materialized views
Non-Redundant
(NROS)e.g. Horizontal
Partitioning Constraints:
o Storage;o Maintenance
Constraint:o Number of
final fragments
Revisit of the Physical Design Phase Recommendations of Stavros Harizopoulos (CIDR’09) and Goetz Graefe (HP)
Physical DesignCharacteristics of OS
5
Background & State of Art
Contributions
Experimental Studies
Summary
Agenda
6
Power represents the rate of doing work, or energy per a unit of time (watts).
Energy is the ability to do work (joules).
Baseline Power: the power consumption when the machine is idle.
Active Power: the power consumption due to the execution of the workload.
Basic Concepts
Peak power: represents the maximum power.
Average power: average power consumed during the query execution.
State-of-art Contributions Experimental Studies Summary
777
Definition of query processing cost models @ operation processing level:
e.g. [Xu’13, Kunjir’12, Roukh’15]
New Techniques:
Purchase Person Product
σ
π
⋈ ⋈σ
3 watts
5 watts
2 watts
3 watts + 2 watts5 watts + 3
watts
10 watts + 13 watts
15 watts + 28 watts
3 watts + 43 watts
Current Node Power +
Inherited Power
Total Power : 46 watts
o The proposition of cost-driven techniques for reducing energy.o QED (Explicit Delay) [Lang’09], E2DBMS (Automatic Feedback Control)
implemented in Postgres [Tu’11]
State of Art State-of-art Contributions Experimental Studies Summary
8
Challenges:1. Accurate cost models (power consumption and processing time);2. A good trade-off between these two costs.
Case Study: Materialized View Selection Problem (MVSP)
Energy @ Physical Design State-of-art Contributions Experimental Studies Summary
MV Cube
Motivating Example State-of-art Contributions Experimental Studies Summary
Materialized Views Execution Time(min)
Power(watts)
C, P, D, S, L 10.83 16.07J32 3.2 19.73J31 5.13 18.06
J31 , J32 , J30 2.28 21.17J29 6.18 17.66
J29 , J31 , J30 2.45 21.01J29 , J30 , J31 , J32 1.9 23.11
Materialized view selection using two objective functions:
(1) query processing cost and (2) power consumption
7 Queries of SSB
9
Cost Models
ii n
jjio
n
jjcpui COSTIOCOSTCPUQPower
11
__)(
Power required of a given query Qi with nj operations is (intuitively):
1. IO_COST: number of I/O required to run the specified operation;2. CPU_COST: number of CPU Cycle and buffer cache gets required.
Use a Machine Learning Technique to calcule βi: Multiple Polynomial Regression
ii n
jjio
n
jjcpui COSTIOCOSTCPUQTime
11
__)(
1. αIO: CPU time of one CPU Cycle.2. αCPU: IO time to execute one IO operation.
State-of-art Contributions Experimental Studies Summary
Processing time of a given query Qi (nj operations)
ε) (CPU_COST β) (IO_COSTβ
)× CPU_COST (IO_COST β) (CPU_COSTβ
)(IO_COSTβ ) (CPU_COST β) (IO_COST β ) = βPower(Qi
414
413
52
4
23210
10
Identification of Pareto Front Set
7 queries
Evolutionary Algorithms to Select Non-dominated Solutions (due to the Search Space of MVSP)
State-of-art Contributions Experimental Studies Summary
11
Genetic AlgorithmCoding
Fitness Evaluation
Crossover
Stop
Mutation
Selection
Itr < Itr_NbrYesNo
1 1 0 1 {J29,J30,J32}
Bit string representation
Pareto-ranking based on multi objective genetic algorithm (NSGA-II)
Flip Bit Mutation
Half uniform crossover
0 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1 0
0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 0
P1 P2
C1 C2
*Weighted sum of the objective functions
State-of-art Contributions Experimental Studies Summary
WSOF*
Set of MV Conf
Final MV Conf
12
Experimental Study (I) Environment
® Linux, Dell Precision, Intel Core i5
2.27GHz, 4GB RAM® Oracle 11gR2, Java® R language for regression® MOEA Framework
Data Warehouse ® SSB benchmark ® Scale Factor = 10
Power Measurement :® Watts UP? Pro ES power meter® 1 hertz sampling frequency
State-of-art Contributions Experimental Studies Summary
13
Experimental Study (II) State-of-art Contributions Experimental Studies Summary
Study of the Quality of our Cost Models
14
Our Algorithm vs. Exhaustive Algorithm
State-of-art Contributions Experimental Studies Summary
Experimental Study (III)
Workload of 200 queries:1. MOEA: 7s2. BNL*: 4 days!
*Block-Nested Loops algorithm to get Pareto front points
15
Size of Materialized Views vs. Performance and Power Consumption.
Experimental Study (IV) State-of-art Contributions Experimental Studies Summary
#Cost Analysis
16
Performance and Power/Energy Saving
State-of-art Contributions Experimental Studies Summary
Experimental Study (V)
• Origin: workload without optimization100
timeMV
timeMVii Power
PowerPowergPowerSavin
17
TPC-H 10GB
TPC-H 100GB
TPC-DS 100GB
0.0%
1.0%
2.0%
3.0%
4.0%Average Power Error
Avg Error
Environment® Dell PowerEdge, Intel Xeon E3 2.67GHz, 10GB RAM, 1TB HDD
Data Warehouse ® TPC-H, TPC-DS® Scale Factor = 10, 100
Experimental Study (VI) State-of-art Contributions Experimental Studies Summary
18
Summary
Energy-aware Physical Design Power & Query Processing Cost Models (machine learning)
MVSPVery Large Workload
A multi-objective materialized view selection algorithm
Experimental Studies Active power savings up to 38% and total energy savings up to 84%
Generalization of the Methodology to other OS
Study of the variation of deployment platforms of DBMS
Integration of Energy in earlier phases of the design.
State-of-art Contributions Experimental Studies Summary
19