Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems Taha Rafiq MMath...

Elasca: Workload-Aware Elastic Scalability for Partition Based

Database Systems

Taha RafiqMMath Thesis Presentation

24/04/2013

Outline

1. Introduction & Motivation2. VoltDB & Elastic Scale-Out Mechanism3. Partition Placement Problem4. Workload-Aware Optimizer5. Experiments & Results6. Supporting Multi-Partition Transactions7. Conclusion

INTRODUCTION & MOTIVATION

DBMS Scalability

Replication

Partitioning

Traditional (DBMS) Scalability

Higher Load

Add Resources

Better Performance

Ability of a system to be enlarged to handle growing amount of work

Expensive Downtime

Elastic (DBMS) Scalability

Higher Load

Dynamically Add

Resources

Better Performance

Use of computer resources which vary dynamically to meet a variable workload

NoDowntime

Elastically Scaling a Partition Based DBMS

Re-Partitioning

Partition 1

Node 1Partition 1

Node 1

Partition 2

Node 2

Scale Out

Scale In

Elastically Scaling a Partition Based DBMS

Partition Migration

Node 1

Node 2

Scale Out

Scale In

Partition Migration for Elastic Scalability

MechanismHow to add/remove nodes and move

partitions

Policy/StrategyWhich partitions to move when and where

during scale out/scale in

Elasca

Elastic Scale-Out Mechanism

Partition Placement & Migration Optimizer

VOLTDB & ELASTIC SCALE-OUT MECHANISM

What is VoltDB?

• In memory, partition based DBMS– No disk access = very fast

• Shared nothing architecture, serial execution– No locks

• Stored procedures– No arbitrary transactions

• Replication– Fault tolerance & durability

VoltDB Architecture

ES1 ES2

Initiator

Client Interface

ES1 ES2

Initiator

Client Interface

ES1 ES2

Initiator

Client Interface

Client ClientClient Client

Single-Partition Transactions

ES1 ES2

Initiator

Client Interface

ES1 ES2

Initiator

Client Interface

ES1 ES2

Initiator

Client Interface

Multi-Partition Transactions

ES1 ES2

Initiator

Client Interface

ES1 ES2

Initiator

Client Interface

ES1 ES2

Initiator

Client Interface

ES3 ES4

Initiator

Client Interface

ES1 ES2Scale-Out Node

(Failed)

Initiator

Client Interface

Overcommitting Cores

• VoltDB suggests:Partitions per node < Cores per node

• Wasted resources when load is low or data access is skewed

IdeaAggregate extra partitions on each node

and scale out when load increases

PARTITION PLACEMENT PROBLEM

Given…Cluster and System Specifications

Number of CPU cores

MemoryMax. Number of Nodes

Given…

P1 P2 P3 P4 P5 P6 P7 P80

Load Per Partition

Partition

Given…

P1 P2 P3 P4 P5 P6 P7 P80

Size of Each Partition

Partition

Given…

Partition Node 1 Node 2 Node 3

Current Partition-to-Node Assignment

Find…

Partition Node 1 Node 2 Node 3

P1 ? ? ?

P2 ? ? ?

P3 ? ? ?

P4 ? ? ?

P5 ? ? ?

P6 ? ? ?

P7 ? ? ?

P8 ? ? ?

Optimal Partition-to-Node Assignment (For Next Time Interval)

Optimization Objectives

Maximize ThroughputMatch the performance of a static, fully

provisioned system

Minimize Resources UsedUse the minimum number of nodes required

to meet performance demands

Optimization Objectives

Minimize Data MovementData movement adversely affects system performance and incurs network costs

Balance Load EffectivelyMinimizes the risk of overloading a node

during the next time interval

WORKLOAD-AWARE OPTIMIZER

System Overview

Statistics Collected

α. Maximum number of transactions that can be executed on a partition per second– Max capacity of Execution Sites

β. CPU overhead of host-level tasks– How much CPU capacity the Initiator uses

Effect of β

Estimating CPU Load

CPU Load Generated by Each Partition

Average CPU Load of Host-Level Tasks Per Node

Average CPU Load Per Node

Optimizer Details

• Mathematical Optimization vs. Heuristics• Mixed-Integer Linear Programming (MILP)• Can be solved using any general-purpose

solver (we use IBM ILOG CPLEX)• Applicable for wide variety of scenarios

Objective Function

Minimizes data movement as primary objective and balances load as secondary objective

Effect of ε

Minimizing Resources Used

• Calculate the minimum number of nodes that can handle the load of all the partitions– Non-integer assignment

• Explicitly tell optimizer how many nodes to use• If optimizer can’t find solution with minimum

nodes, it tries again with N + 1 nodes

Constraints

• Replication: Replicas of a given partition must be assigned to different nodes

• CPU Capacity: Sum of the load of partitions must be less than capacity of node

• Memory Capacity: All the partitions assigned to a node must fit in its memory

• Host-Level Tasks: The overhead of host-level tasks must not exceed capacity of single core

Staggering Scale In

• Fluctuating workload can result in excessive data movement

• Staggering scale in mitigates this problem• Delay scaling in by s time steps• Slightly higher resources used to provide

stability

EXPERIMENTAL EVALUATION

Optimizers Evaluated

• ELASCA: Our workload-aware optimizer• ELASCA-S: ELASCA with staggered scale in• OFFLINE: Offline optimizer that minimizes

resources used and data movement• GREEDY: A greedy first-fit optimizer• SCO: Static, fully provisioned system (no

optimization)

Benchmarks Used

• TPC-C: Modified to make it cleanly partitioned and fit in memory (3.6 GB)

• TATP: Telecommunication Application Transaction Processing Benchmark (250 MB)

• YCSB: Yahoo! Cloud Serving Benchmark with 50/50 read/write ratio (1 GB)

Dynamic Workloads

• Varying the aggregate request rate– Periodic waveforms • Sine, Triangle, Sawtooth

• Skewing the data access– Temporal skew– Statistical distributions• Uniform, Normal, Categorical, Zipfian

Temporal Skew

P1 P2 P3 P4 P5 P6 P7 P8

Experimental Setup

• Each experiment run for 1 hour• 15 time intervals– Optimizer run every four minutes

• Combination of simulation and actual runs– Exact numbers for data movement, resources

used and load balance through simulation

• Cluster has 4 nodes, 2 separate client machines

Data Movement (TPC-C)

Triangle Wave (f = 1)

Triangle Wave (f = 1), Zipfian Skew

Computing Resources Saved (TPC-C)

Load Balance (TPC-C)

Database Throughput (TPC-C)

Sine Wave (f = 2)

Sine Wave (f = 2), Normal Skew

Database Throughput (TATP)

Sine Wave (f = 2)

Database Throughput (YCSB)

Sine Wave (f = 2)

Optimizer Scalability

SUPPORTING MULTI-PARTITION TRANSACTIONS

Factors Affecting Performance

• Maximum MPT Throughput (η): The maximum number of transactions an execution site can coordinate per second

• Probability of MPTs (pmpt): Percentage of transactions that are MPTs

• Partitions Involved in MPTs: The number of partitions involved in MPTs

Changes to Model

CPU load generated by each partition is equal to sum of:

1. Load due to transaction work (same as SPTs)2. Load due to coordinating MPTs

Maximum MPT Throughput

Probability of MPTs

Effect on Resources Saved

Effect on Data Movement

CONCLUSION

Related Work

• Data replication and partitioning• Database consolidation• Live database migration• Key-value stores• Data placement

Elasca

Partition Placement & Migration Optimizer

Conclusion

• Elasca = Mechanism + Optimizer• Workload-Aware Optimizer– Meets performance demands– Minimizes computing resources used– Minimizes data movement– Effectively balances load

• Scalable to large problem sizes for online setting

Future Work

• Migrating to VoltDB 3.0– Intelligent client routing, master/slave

partitions

• Supporting multi-partition transactions• Automated parameter tuning• Transaction mixes• Workload prediction

Thank You

Questions?

Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems Taha Rafiq MMath...

Documents

Office 365 by sajid rafiq

Technology Explained The Digital Oilfield Are We There Yet? Emerson Process TAHA TAHA · TAHA TAHA Emerson Process Management The production management software was used for planning

Media app summit rafiq ahmed

Taha Science Isu

pepsi taha

Ruby Basics by Rafiq

MMath and Social Studies ath and Social Studieslabountysleaders.weebly.com/uploads/1/3/3/0/13304603/... · 2019-05-03 · MMath and Scienceath and Science MMath and Social Studies

Dr. Rafiq Zakaria Campus - mima.ac.inmima.ac.in/pdfs/MANDATORY DISCLOSURE-REVISED.pdf · Address of the Institution Dr. Rafiq Zakaria Campus, Dr. Rafiq Zakaria Marg, Rauza Bagh, Aurangabad

Pregnancy Hypertension Rafiq

JD - Taha, 2011

Taha Presentation

Blood grouping dr. rafiq

Solutions Manual Taha

Surah taha

Taha Payroll

DR. RAFIQ ZAKARIA I.A.S. ACADEMY

Taha & wAw Alphabets Terminology · {Taha} & {wAw}, The Universal Methods of Writing – Alphabets Terminology 0 Taha & wAw Alphabets Terminology Writing Geometric Method {Taha} {TaYa}

EXTERNAL EXAMINER’S REPORT ACADEMIC YEAR: 2016-17 · MMBS-MATH MMath, BSc Mathematics BS-MATH&STAT BSc Mathematics and Statistics MMBS-MA&ST MMath, BSc ... exam difficultly that

Taha 7ed Cap2

Office 365 Education - Sajid Rafiq