190
Workload-Aware Performance Tuning for Autonomous DBMSs Zhengtong Yan, Jiaheng Lu, Naresh Chainani, Chunbin Lin ICDE 2021 Tutorial

Workload-Aware Performance Tuning for Autonomous DBMSs

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload-Aware Performance Tuning for Autonomous DBMSs

Zhengtong Yan, Jiaheng Lu, Naresh Chainani, Chunbin Lin

ICDE 2021 Tutorial

Page 2: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A:

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B:● Workload-Based Tuning● Amazon Redshift● Open Challenges and Discussion

Page 3: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A:

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B:● Workload-Based Tuning● Amazon Redshift● Open Challenges and Discussion

Page 4: Workload-Aware Performance Tuning for Autonomous DBMSs

p The scale and complexity of database administration have surpassed humans:→ Large instance number (e.g., millions in cloud environments)→ Complex tuning tasks (e.g., large knob number)→ Dynamic workloads

p 73% DBAs think performance tuning occupies the most timep 75% of IT management budgets are spent on database management

Why Autonomous?

Ideal Solution: Autonomous DatabasesØ Find better configurations than DBAsØ Reduce administration costØ Focus more on higher-value business activities

Page 5: Workload-Aware Performance Tuning for Autonomous DBMSs

p Academia: January 2017p Peloton p CMU Database Group

What are Autonomous Databases?

p Industry: September 2017p Oracle 18c Databasep Oracle

Image Source: http://www.moreajays.com/2017/11/what-is-oracle-18c-autonomous-database.html

Page 6: Workload-Aware Performance Tuning for Autonomous DBMSs

What are Autonomous Databases? (cont.)

p The concept in Peloton by CMU: n A DBMS that can deploy, configure, and tune itself automatically

without any human intervention.→ Select actions to improve performance (e.g., throughput, latency, cost).→ Choose when to apply an action.→ Learn from these actions and refine future decision making processes.

Source: https://15721.courses.cs.cmu.edu/spring2019/slides/25-selfdriving.pdf

Page 7: Workload-Aware Performance Tuning for Autonomous DBMSs

What are Autonomous Databases (cont.) p The concept by Oracle: n Uses machine learning to automate database tuning, security, backups,

updates, and other routine management tasks without human intervention

n Self-* Characteristics:→ Self-Driving→ Self-Securing→ Self-Repairing

Ø This tutorial focuses on self-tuning which is the most important aspect of self-driving.

Source: https://www.oracle.com/database/what-is-autonomous-database.html

Page 8: Workload-Aware Performance Tuning for Autonomous DBMSs

p Automatic ≠ Autonomous

n A process (e.g., tuning and backup) that can be accomplished automatically is still not autonomous if DBAs must make decisions

n The design goal of “autonmous” is to minimize or eliminate human

labor — and associated human error — and ensure data safety and optimal performance.

Ø Most of the existing tuning tools are actually automatic.

Automatic vs. Autonomous

Source: Oracle Autonomous Database For Dummies®, 2nd Special Edition

Page 9: Workload-Aware Performance Tuning for Autonomous DBMSs

From Autonomic Computing to Autonomous Database

p Proposed by IBM in early 2000'sp MAPE-K control loop: Monitor,

Analyse, Plan, Execute, KnowledgepSelf-Managementn Self-configurationn Self-optimizationn Self-healingn Self-protectionp Elements can interact with each other

Structure of an autonomic element

Image Source: [KD 2003]

Page 10: Workload-Aware Performance Tuning for Autonomous DBMSs

p Autonomous DBMS is not a new idea!n People have been working on autonomous database for 45 years.

p Why possible?n More data for DBMSn Better Hardwaren Advances in AI technologies (ML, DL, and RL)

The Age of Autonomous Database?

Page 11: Workload-Aware Performance Tuning for Autonomous DBMSs

p The behavior of an autonomous DBMS:f : Cofiguration × Workload → Performance

– Configuration○ resource allocation ○ physical design○ logical design, etc.

– Workload inforamtion○ workload type○ workload shifts○ arrival rate, etc.

– Performance metrics○ response time○ throughput○ reliability, etc.

Workload-Aware Tuning Mechanism

n Workload-aware tuning mechanismØ utilize workload information to adjust

configurations for gaininging optimal performance.

Page 12: Workload-Aware Performance Tuning for Autonomous DBMSs

p Performance tuning is not a one-off taskn DBA need keep a constant eye on the database performance as the tuning work

carried out earlier could be invalidated due to multitude of reasons.

p No one-size-fits-all configuration that works for all workloadsn The first challenge in a self-driving DBMS is to understand an application’s

workload. [PAA 2017]

p Predictive Tuning vs. Reactive Tuningn Re-configure reactively after a load increase is too laten Re-configure reactively would place additional burden on the already-

overloaded system

Why Workload-Aware?

Page 13: Workload-Aware Performance Tuning for Autonomous DBMSs

p Purest definitionn the total requests made by users and applications of a systemØ e.g., the workload would be all of the SQL statements that have been

submitted to the instance to work on regardless of being executed

p A more liberal definitionn the statistics that can measure measure, quantify, or characterize

workload Ø e.g., CPU usage, memory usage, elapsed time

Ø Thus, the definition depends on what you're trying to accomplish.

What Is Your Definition of Database Workload ?

Source: https://www.databasejournal.com/features/oracle/article.php/3794731/What-Is-Your-Definition-of-Database-Workload.htm

Page 14: Workload-Aware Performance Tuning for Autonomous DBMSs

Challenges for Workload-Aware Tuning

#1 Workload's Diversity and Heterogeneity

p A DBMS may have a large number of instances

p Each instance might encompass various types of workload

p Different types have different featuresØ e.g., transactional workloads are comprised of short-running transactions

which modify few recordsØ e.g., analytical workloads are usually long-running read-only queries which

process considerable amount of data.

Page 15: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload of a B2W’s databases over three days[TEL 2018] Three differnet workload patterns [MAH 2018]

#2 Workload's Dynamic Characteristicsp Dynamic change in sizep Various patterns

Challenges for Workload-Aware Tuning (cont.)

Page 16: Workload-Aware Performance Tuning for Autonomous DBMSs

#3 Workload’s Complex Influence on Performance

p NP-HardØ High-dimensional and continuous space of configuration parameters

p Non-linear effect of hardware/parameters on performanceØ It is difficult to quantify, for example the amount of hardware resources, for

changing workloadsØ It is difficult to decide how much performance is compromised for a given

workload

Challenges for Workload-Aware Tuning (cont.)

Page 17: Workload-Aware Performance Tuning for Autonomous DBMSs

A General Workflow of Workload-Aware Tuning

how to obtain accurate workload information

how to properly use the workload information to optimize performance

Page 18: Workload-Aware Performance Tuning for Autonomous DBMSs

p P-Store [TEL 2018] Time Series Analysis + Dynamic Programming → An Elastic Database System

p Peloton [PAA 2017]

Examples of Workload-Aware Tuning System

Predict future workload Determine when and how to reconfigure a database

Page 19: Workload-Aware Performance Tuning for Autonomous DBMSs

Examples of Workload-Aware Tuning System (cont.)

p Hyrise [JK 2018]

Page 20: Workload-Aware Performance Tuning for Autonomous DBMSs

p SDDP: Alibaba's Self-Driving Database Platform [FFL 2019]

Examples of Workload-Aware Tuning System (cont.)

Page 21: Workload-Aware Performance Tuning for Autonomous DBMSs

p SIGMOD'13 “Workload Management for Big Data Analytics” [AS 2013]

p PVLDB'19 “Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems” [LCH 2019]

Ø This Tutorial (ICDE'21) focuses onWorkload-aware tuning: how to obtain workload information and automatically tune a DBMS based on those workload information

Related Topics and Tutorials

Page 22: Workload-Aware Performance Tuning for Autonomous DBMSs

• [AS 2013] "Workload management for big data analytics." In SIGMOD, 2013.• [FFL 2019] "Cloud-native database systems at Alibaba: Opportunities and challenges." In PVLDB, 2019.• [JK 2018] "Self-Driving: From General Purpose to Specialized DBMSs." In PLDB PHD Workshop. 2018.• [KD 2003] "The vision of autonomic computing." Computer, IEEE, 2003.• [LCH 2019] “Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems,” In

PVLDB, 2019.• [MAH 2018] "Query-based Workload Forecasting for Self-Driving Database Management Systems." In

SIGMOD, 2018• [PAA 2017] “Self-Driving Database Management Systems,” in CIDR, 2017.• [TEL 2018] "P-store: An elastic database system with predictive provisioning". In SIGMOD, 2018.

References

Page 23: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A:

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B:● Workload-Based Tuning● Amazon Redshift● Open Challenges and Discussion

Page 24: Workload-Aware Performance Tuning for Autonomous DBMSs

p Workload type is a key consideration in tuning the DBMSp Workload classification is the first and key step towards an autonomous

DBMSp Workload Types

→ OLTP: Online Transactional Processing→ OLAP: Online Analytical Processing→ DSS: Decision Support System→ BI: Business Intelligence→ Hybrid and interactive workloadsØ e.g., mixed fast transactions and complex analytical queries that run concurrently and

interact with each other

Workload Classification

Page 25: Workload-Aware Performance Tuning for Autonomous DBMSs

OLTP vs. OLAP

p High volume of transactionsp Fast processingp Normalized datap Many tablesp Many concurrent users

p High volume of datap Slow queriesp Denormalized datap Fewer tablesp Fewer concurrent users

OLTP

OLAP

• Example: ATM center• Standardized and simple queries • “Who bought X?”

• Example: financial reporting system• Complex queries involving aggregations.• “How many people bought X?”

Page 26: Workload-Aware Performance Tuning for Autonomous DBMSs

p Target → Group the similar workloads into the same class → Classify the workloads into different categories (e.g., OLTP, DSS, mixed type)p Unsupervised Learning

→ Clustering based on distance function [HGR 2009, MNE 2020]→ SemiDiscrete Decomposition, Singular Value Decomposition [WMS 2004]

p Supervised Learning → Classication & Regression Tree [ZDL 2014] → Decision Tree Induction [EMS 2008, EM2009]

p Reasoning → Case-based Reasoning [AMM 2014, RKM 2018, SRS 2020]

Methods for Workload Classification

Page 27: Workload-Aware Performance Tuning for Autonomous DBMSs

Unsupervised Learning (1)

p Clustering: grouping similar workload events into classes based on distance functionn Minkowski distance funtion [HGR 2009, MNE 2020]

the number of the union

the number of the interesction

Minkowski distance between two workloads

distance between two features and

Page 28: Workload-Aware Performance Tuning for Autonomous DBMSs

Unsupervised Learning (1)

p An Example [MNE 2020]n A demonstration of the feature vectors clustering from workload events

feature vectors clustering

Page 29: Workload-Aware Performance Tuning for Autonomous DBMSs

Unsupervised Learning (2)p Target: Group individual BI queries into general classes based on system

resource usage [WMS 2004] p Methods: view the workload dataset as a matrix and decompose the

matrixn SDD: SemiDiscrete Decomposition

n SVD: Singular Value DecompositionTA USV

A XDY

Page 30: Workload-Aware Performance Tuning for Autonomous DBMSs

Unsupervised Learning (2)

SVD + SDD plot of queries, dimensions 1 and 2

Ø 4 clusters of queries Ø queries appear to scale well

e.g., query 13 points are very closely together (the same is true for most of the other queries.)

Page 31: Workload-Aware Performance Tuning for Autonomous DBMSs

Supervised Learning (1)p Idea: identifiy workload types by the database status variables which are

highly affected by changing workloads [ZDL 2014]

Classification and Regression Tree

Page 32: Workload-Aware Performance Tuning for Autonomous DBMSs

Supervised Learning (2)

p Target: identifying a DBMS workload as either OLTP or DSS [EMS 2008, EM2009]

p Based on the workload characteristics that differentiate OLTP from DSS

p Decision Trees Induction

Page 33: Workload-Aware Performance Tuning for Autonomous DBMSs

Supervised Learning (2)

p Training several classifiersn Classifier(C, H): training on TPC-C and TPC-H

lassifier(O, B): training on the Ordering and Browsing profiles of the TPC-W

n Hybrid Classifier (HC): training on a mix of the TPC-H and the Browsing profile workloads as a DSS sample, and a mix of the TPC-C and the Ordering profile workloads as an OLTP sample

n Graduated-Hybrid Classifier (GHC): training on TPC-H to be a Heavy DSS (HD) workload, the Browsing profile to be a Light DSS (LH) workload, TPC-C to be a Heavy OLTP (HO) workload and the Ordering profile to be a Light OLTP (LO) workload

Page 34: Workload-Aware Performance Tuning for Autonomous DBMSs

Reasoning

p Case-Based Reasoning (CRB) [AMM 2014, RKM 2018, SRS 2020]p Step 1: select key workload parametersØ 10 from 291 status variables of MySQL 5.1Ø Com_ratio, Innodb_log_writes, key_reads, key_read_rquests, key_writes, key_write_requests,

opened_files and Qcache_hits, Questions, sort_rows and table_locks_immediate.

Key_write Key_reads table_locks_immediate

Page 35: Workload-Aware Performance Tuning for Autonomous DBMSs

Phases of Case-Based Reasoning

p Retrievep perform matching between incoming and stored casesp determine the similarity of cases through Euclidean distance

p Reusep find the differences among the current and the retrieved casep identify the portion of the retrieved case that must be stored as a new case

p Revisep evaluate the reuse solution and corrects it by applying the domain knowledge

p Revisep insert the valuable information of the current case into the knowledge base

Retrieve Reuse Revise Retain

Page 36: Workload-Aware Performance Tuning for Autonomous DBMSs

Autonomic Computing Perspective of CBR

Page 37: Workload-Aware Performance Tuning for Autonomous DBMSs

Evaluation of CBR [AMM 2014]

p Dataset: TPC-C and TPC-Hp Database: MySQL

Page 38: Workload-Aware Performance Tuning for Autonomous DBMSs

• [AMM 2014] “Database Workload Management Through CBR and Fuzzy Based Characterization,” Applied Soft Computing, 2014.

• [EM 2009] “The Psychic–Skeptic Prediction Framework for Effective Monitoring of DBMS Workloads,” Data & Knowledge Engineering, 2009.

• [EMS 2008] "Is it DSS or OLTP: automatically identifying DBMS workloads." Journal of Intelligent Information Systems, 2008.

• [HGR 2009] “Consistent On-line Classification of DBS Workload Events.” In CIKM, 2009.• [MCJ 2013] “Performance and Resource Modeling in Highly-concurrent OLTP Workloads.” In SIGMOD, 2013.• [MNE 2020] "Feedback control loop design for workload change detection in self-tuning NoSQL wide column

stores." Expert Systems with Applications, 2020.• [RKM 2018] “Performance Prediction and Adaptation for Database Management System Workload Using

Case-Based Reasoning Approach,” Information Systems, 2018.• [SRS 2020] "A Novel Optimized Case-Based Reasoning Approach With K-Means Clustering and Genetic

Algorithm for Predicting Multi-Class Workload Characterization in Autonomic Database and Data Warehouse System." IEEE Access, 2020.

• [WMS 2004] “Developing A Characterization of Business Intelligence Workloads for Sizing New Database Systems.” In DOLAP, 2004.

• [ZDL 2014] “Workload Characterization of Autonomic DBMSs Using Statistical and Data Mining Techniques,” in WAINA. IEEE, 2009.

References

Page 39: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B● Workload-Based Tuning● An Demonstration of Amazon Redshift● Open Challenges and Discussion

Page 40: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload Characteristics

p Workload Shift → When the workload will change (change information)

p Arrivale Rate → When and how many workloads will arrive (volume information) p Resource Usage → How many CPU, memory, I/O, and storage are utilized (consumption information)p Execution Time → How long the workloads will take to run (time information) p The Next Query or Operator → What is the next query or operator that a user will execute

Page 41: Workload-Aware Performance Tuning for Autonomous DBMSs

p Time Series Analysis→ e.g., FFT, AR, ARMA, ARIMA, SPAR

p Analytical Modeling→ e.g., calibrated cost model, interaction-aware analysis

p Experiment-driven → e.g., experiment + samplingp Stochastic Process Modeling

→ e.g., markov models p Machine Learning-based

→ e.g., indicator+linear regression → e.g., query-plan vector + tree-structure DNN p Graph Embedding

Methods for Workload Forecasting (Prediction)

Page 42: Workload-Aware Performance Tuning for Autonomous DBMSs

Goals for Workload Forecasting (Prediction) p Good Accuracyp Low Costp Handle Multiple Workload Types → e.g., OLTP, OLAP, and mixed type → e.g., single and concurrent queriesp Handle Multiple Major Patterns → e.g., cyclic pattern, spike patterns p Robust

→ e.g., robust to noisy workloads and workload fluctuations p Adaptive to databases systems changes

→ e.g., workload changes, data changes, hardware changes

Page 43: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload Forecasting (Prediction)

Workload Forecasting (Prediction)

Workload Shift Arrival Rate Execution Time The Next Query/ Operator

• Machine learning• Time series analysis

• QueryBot 5000 [MAH 2018]• P-Store [TEL 2018]

• Stochastic process modeling

• Time series analysis

• Machine learning• Experiment-driven • Analytic modeling• Graph embedding

• Machine learning• Reinforcement

learning• Stochastic process

modeling

• n-Gram [HR 2008]• Workload Periodicity

Analysis [HHR 2010]

• PQR Tree [GMD 2008]• KCCA [GKD 2009]• B2L [DCP 2011]• Experiment-based [ADA 2011]• Analytical-based [WCH 2013]• QPPNet [MO2019]• GPredictor [ZSL 2020]

• PROMISE [CS 2000]• Markov models [DYW

2009]• RNN [YH 2020]• RNN [MCS 2021]• Exact Q-Learning [MCS

2021]

Resource Usage

• Machine learning• Time series analysis

• CPU Usage Prediction [BRB 2016]

• AR-RNN [LMG 2020]• Capacity planning [HHR

2020]

Page 44: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload Forecasting (Prediction)

Workload Forecasting (Prediction)

Workload Shift Arrival Rate Execution Time The Next Query/ Operator

• Machine learning• Time Series analysis

• QueryBot 5000 [MAH 2018]• P-Store [TEL 2018]

• Stochastic process modeling

• Time Series analysis

• Machine learning• Experiment-driven • Analytic modeling• Graph Embedding

• Machine learning• Reinforcement

learning• Stochastic process

modeling

• n-Gram [HR 2008]• Workload Periodicity

Analysis [HHR 2010]

• PQR Tree [GMD 2008]• KCCA [GKD 2009]• B2L [DCP 2011]• Experiment-based [ADA 2011]• Analytical-based [WCH 2013]• QPPNet [MO2019]• GPredictor [ZSL 2020]

• PROMISE [CS 2000]• Markov models [DYW

2009]• RNN [YH 2020]• RNN [MCS 2021]• Exact Q-Learning [MCS

2021]

Resource Usage

• Machine learning• Time Series analysis

• CPU Usage Prediction [BRB 2016]

• AR-RNN [LMG 2020]• Capacity planning [HHR

2020]

Page 45: Workload-Aware Performance Tuning for Autonomous DBMSs

n-Gram [HR 2008]

p Workload Shift: A significant change in the workload that could require re-configuration is called a workload shift

p Idea: workload is determined by SQL statements or statement-templates which lead to specific usage patterns.

Page 46: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload Periodicity Analysis [HHR 2010]

p Target: predict periodic workload changesp Discrete Fourier Transform (DFT)p Model Interval Analysis

Page 47: Workload-Aware Performance Tuning for Autonomous DBMSs

Workload Forecasting (Prediction)

Workload Forecasting (Prediction)

Workload Shift Arrival Rate Execution Time The Next Query/ Operator

• Machine learning• Time Series analysis

• QueryBot 5000 [MAH 2018]• P-Store [TEL 2018]

• Stochastic process modeling

• Time Series analysis

• Machine learning• Experiment-driven • Analytic modeling• Graph Embedding

• Machine learning• Reinforcement

learning• Stochastic process

modeling

• n-Gram [HR 2008]• Workload Periodicity

Analysis [HHR 2010]

• PQR Tree [GMD 2008]• KCCA [GKD 2009]• B2L [DCP 2011]• Experiment-based [ADA 2011]• Analytical-based [WCH 2013]• QPPNet [MO2019]• GPredictor [ZSL 2020]

• PROMISE [CS 2000]• Markov models [DYW

2009]• RNN [YH 2020]• RNN [MCS 2021]• Exact Q-Learning [MCS

2021]

Resource Usage

• Machine learning• Time Series analysis

• CPU Usage Prediction [BRB 2016]

• AR-RNN [LMG 2020]• Capacity planning [HHR

2020]

Page 48: Workload-Aware Performance Tuning for Autonomous DBMSs

QueryBot 5000 [MAH 2018] p Workload shifts does not provide any volume informationp Target: Predict the arrival rate of queries in the future based on

historical datap arrival rate: queries/min, queries/hourp provides multiple horizons (short- and long-term) with different intervals

Source: [MAH 2018]

How far into the future a model can predict

The time granularity at which the model can predict

Page 49: Workload-Aware Performance Tuning for Autonomous DBMSs

Three Different Workload Patterns

1) Cycles• Cyclic patterns• Example: BusTracker

2) Growth and Spikes• query volume increases over time• Example: Admission

3) Workload Evolution• Evolve over time• Example: MOOC

Source: [MAH 2018]

Page 50: Workload-Aware Performance Tuning for Autonomous DBMSs

Workflow of QueryBot 5000

#1 Pre-Processor• Templatization• Semantics equivalence

check

#2 Cluster• Arrival rate feature• Distance function

#3 Forecaster• Model combination• Ensemble learning• Hybrid model

Page 51: Workload-Aware Performance Tuning for Autonomous DBMSs

Prediction Result for Cyclic Patterns

Source: [MAH 2018]

• The 1-hour horizon prediction is more accurate than the 1-week horizon

• Both predicted patterns of the arrival rates can mimic the actual patterns.

Page 52: Workload-Aware Performance Tuning for Autonomous DBMSs

Prediction Result for Spike Patterns

Source: [MAH 2018]

• ENSEMBLE and its two base models (LR and RNN) are unable to predict the spikes.

• KR is the only model that can predicts the spikes.

Ø HYBRID: combine ENSEMBLE with KR to achieve both good MSE and the ability to predict spikes.

Page 53: Workload-Aware Performance Tuning for Autonomous DBMSs

P-Store [TEL 2018] p Key ideas: use predictive modeling to elastically reconfigure the database

before workload spikes occurp predictive provisioning (P-Store) vs. reactive provisioning (E-Store)p use time-series model to predict the future loadp use a novel dynamic programming algorithm for scheduling reconfigurations

• : time intervals

• the cost over time intervals

Page 54: Workload-Aware Performance Tuning for Autonomous DBMSs

Time Series Prediction in P-Store p Sparse Periodic Auto-Regression (SPAR)

p capturing time-dependent correlations in the data

is the number of previous periods is the number of recent load measurements and are parameters of the model

is a forecasting period (how long in the future we plan to predict)measures the offset of the load in the recent past

Page 55: Workload-Aware Performance Tuning for Autonomous DBMSs

A Workload Forecasting (Prediction)

Workload Forecasting (Prediction)

Workload Shift Arrival Rate Execution Time The Next Query/ Operator

• Machine learning• Time Series analysis

• QueryBot 5000 [MAH 2018]• P-Store [TEL 2018]

• Stochastic process modeling

• Time Series analysis

• Machine learning• Experiment-driven • Analytic modeling• Graph Embedding

• Machine learning• Reinforcement

learning• Stochastic process

modeling

• n-Gram [HR 2008]• Workload Periodicity

Analysis [HHR 2010]

• PQR Tree [GMD 2008]• KCCA [GKD 2009]• B2L [DCP 2011]• Experiment-based [ADA 2011]• Analytical-based [WCH 2013]• QPPNet [MO2019]• GPredictor [ZSL 2020]

• PROMISE [CS 2000]• Markov models [DYW

2009]• RNN [YH 2020]• RNN [MCS 2021]• Exact Q-Learning [MCS

2021]

Resource Usage

• Machine learning• Time Series analysis

• CPU Usage Prediction [BRB 2016]

• AR-RNN [LMG 2020]• Capacity planning [HHR

2020]

Page 56: Workload-Aware Performance Tuning for Autonomous DBMSs

CPU Usage Prediction [BRB 2016]p Forecast Resource Usage

p CPU usage (CPU rate): a measurement in units of CPU core per time unitp Methods

p Time Series Models: Autoregressive Integrated Moving Average model (ARIMA), Exponential Smoothing(ETS)

p Regression Methods: Auto-Regression (AR), Lassop Machine Learning Methods: Multi-Layer Perceptron (MLP), Support Vector

Regression (SVM)

Page 57: Workload-Aware Performance Tuning for Autonomous DBMSs

AR-RNN [LMG 2020]

p Adaptive Recollected Recurrent Neural Network (AR-RNN)p Predict future workload with a recollection mechanismp Encoder-decoder architectureØ a k-dimension multi-encoder moduleØ an attention mechanism based decoder module

Page 58: Workload-Aware Performance Tuning for Autonomous DBMSs

A Workload Forecasting (Prediction)

Workload Forecasting (Prediction)

Workload Shift Arrival Rate Execution Time The Next Query/ Operator

• Machine learning• Time Series analysis

• QueryBot 5000 [MAH 2018]• P-Store [TEL 2018]

• Stochastic process modeling

• Time Series analysis

• Machine learning• Experiment-driven • Analytic modeling• Graph Embedding

• Machine learning• Reinforcement

learning• Stochastic process

modeling

• n-Gram [HR 2008]• Workload Periodicity

Analysis [HHR 2010]

• PQR Tree [GMD 2008]• KCCA [GKD 2009]• B2L [DCP 2011]• Experiment-based [ADA 2011]• Analytical-based [WCH 2013]• QPPNet [MO2019]• GPredictor [ZSL 2020]

• PROMISE [CS 2000]• Markov models [DYW 2009]• RNN [YH 2020]• RNN [MCS 2021]• Exact Q-Learning [MCS 2021]

Resource Usage

• Machine learning• Time Series analysis

• CPU Usage Prediction [BRB 2016]

• AR-RNN [LMG 2020]• Capacity planning [HHR

2020]

Page 59: Workload-Aware Performance Tuning for Autonomous DBMSs

Query Performance Prediction (QPP)p QPP problems

p How long they will run → Executon time (the most important metric)p How much CPU time they will needp How many disk I/Os they will incurp How many messages they will send

p QPP can be used forp Workload mangement

Ø e.g., admission control → Run this query or not?Ø e.g., query scheduling → when should we run this query?

p System sizing/ Resource allocationØ e.g., how many CPUs, disks, buffer pool is needed to run?

p Capacity planningp Parameter tuningp Physical design: e.g., what kind of index is needed?p ....

Page 60: Workload-Aware Performance Tuning for Autonomous DBMSs

Challenges of QPP for Single Query p Large range of execution timesn from milliseconds to hours

p Skewed data distributionsp Erroneous cardinality estimatesp Complex query plansp Terabytes (or petabytes) of data

Page 61: Workload-Aware Performance Tuning for Autonomous DBMSs

Challenges of QPP for Concurrent Queries p Queries have complex correlations (interactions)

n global buffer sharing conflict, lock conflict, resource competitions

p Constraints of database configurations

Ø Predicting execution time for concurrent queries is more challenging but more important than the prediction for single queries in real applications

Page 62: Workload-Aware Performance Tuning for Autonomous DBMSs

Timeline of Query Performance Prediction (QPP)

2020 (Graph Embedding)2019 (Deep Learning)

• Tree-structured neural network

2013 (Analytic Cost Modeling)

• indicator• linear regression 2011 (Experiment-driven)

• sampling• statistical models• adopted by IBM DB2

Static WorkloadsAnalytical Workloads

2011 (B2L)Dynamic WorkloadsConcurrent Workloads

• KCCA2008 (Machine Learning)

• binary tree

• Dynamic Workloads• OLAP and BI Workloads

2009 (Machine Learning)

DSS WorkloadsSingle Workloads

2014 (Uncertainty Measure)Calibrated Cost ModelQueueing network

• Single and Dynamic Workloads• DSS Workloads

• Distribution-basedDynamic WorkloadsConcurrent Workloads

Single and Dynamic WorkloadsConcurrent Workloads

• Single Workloads• DSS Workloads

Page 63: Workload-Aware Performance Tuning for Autonomous DBMSs

PQR Tree (Prediction of Query Runtime Tree) [GMD 2008]

p Key Insightsp Insight 1: unnecessary to estimate a precise value for execution time - it is sufficient

to produce an estimate of the query execution times in the form of time rangesp Insight 2: use the query plan vector and load vector as the input of ML models

Input 1: query plan vector and cost

Input 2: load vector and execution cost

schedules the queries

Output: execution time of the query under current load conditions

• fed back to the workload manager• detect problem queries (such as

runaway queries)

Page 64: Workload-Aware Performance Tuning for Autonomous DBMSs

An Example of PQR Tree

Sample PQR Tree

p Problem definition: predict a time range for each query with a form of:

where is the query execution time and and are the bounds of predicted intervalp PQR tree: a binary tree

→ each node is an associated 2-class classifier→ each node contains examples for training→ each node has an associated accuracy→ each node corresponds to a time range

p Drawbacks→ time ranges are not known prior→ have to fix the number of time ranges a priori

p Advantages→ different classifiers for different time ranges

Page 65: Workload-Aware Performance Tuning for Autonomous DBMSs

Predicting Multiple Performance Metrics [GKD 2009] p Idea: predict multiple performance metrics of business intelligence

queries using a statistical machine learning approach.

n Query statement and query plan informationn Predicte multiple metrics simultaneously

• Elapsed time• Records used• Disk I/Os• CPU and memory usage• Message bytes

n Predict for both short- and long-running queries

Page 66: Workload-Aware Performance Tuning for Autonomous DBMSs

Prediction based on KCCA p The key considerations when choosing ML models

p Can find multivariate correlations among the query properties and query performance metrics

p Kernel Canonical Correlation Analysis (KCCA): uses a kernel function to compute a “ ” between every pair of query vectors and performance vectors

Page 67: Workload-Aware Performance Tuning for Autonomous DBMSs

An Example of KCCA-based Prediction

Page 68: Workload-Aware Performance Tuning for Autonomous DBMSs

Evalution Results p Experiment 1: training with 1027

queriesp Experiment 2: training with 90 queriesp Experiment 3: two-step prediction with

multiple modelsp Experiment 4: training and test with

different tables

Experiment 1 Experiment 2

Experiment 3 Experiment 4

Ø More data in the training set is always better

Ø Fewer outliers can improve the performance

Page 69: Workload-Aware Performance Tuning for Autonomous DBMSs

B2L: Predicting Query Latency using BAL [DCP 2011]

p Problem definition: Given a collection of queries { } that concurrently executing on the same machine at arbitrary stages of their execution, predict when each query will finish its execution.

p Key Idea: Use query indicators to quantify the performance impact of concurrently executing queriesp BAL: Buffer Access Latency → a robust and good indicator for query performance

even in the presence of concurrencyp averaging BAL over the duration of a query can capture the interactions of disk

seeks, sequential reads, OS cache hits and buffer pool hitsp BAL is a good indicator for latency is because it is averaged over many samples over

the lifetime of a query

Page 70: Workload-Aware Performance Tuning for Autonomous DBMSs

System Model

Training Phase Prediction Phase

p Example: predicting the BAL of query which is being run with queries and

Ø training takes about two daysØ the cost for prediction is negligible

Page 71: Workload-Aware Performance Tuning for Autonomous DBMSs

Prediction Results

B2cB Predictions Fit of Latency to BAL

• each line is an individual B2L model• an average error of just 5%.

Page 72: Workload-Aware Performance Tuning for Autonomous DBMSs

An Interaction-Aware Predictor [ADA 2011]

p Key Ideasp #1 Idea (Queries interaction modeling): take the query interactions into account for

estimating workload completion times of concurrent queries p #2 Idea (Experiment-driven): build performance models using an experiment-driven

technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples

p Probelm definition: Given the query batch, scheduling policy, and multi-programming level, can we predict (ahead of time) how long the database system will take to process the entire batch of queries?

• First-in-First-out• Shortest-Job-First

Page 73: Workload-Aware Performance Tuning for Autonomous DBMSs

System Architecture

p Sampling and Modeling (Off-line Phase)p conducts experiments to sample the

space of possible interactionsp each experiment runs a chosen query

mixs p Simulating Workload Execution (On-line

Phase)p workload simulator uses a recurrence

relation in conjunction with interaction-aware performance models to simulate the execution of the workload as a sequence of query mixes

Page 74: Workload-Aware Performance Tuning for Autonomous DBMSs

Calibrated Cost Model for QPP [WCZ 2013]

p Ideas: use the current cost models for predicting query execution timep However, we cannot directly use the current cost model of the optimizers

p Naïve Scaling: Predict the execution time by scaling the cost estimate ,

p Why naive scaling fails: consider the cost model of PostgreSQL

Ø Both the and the could be incorrect!

Image Source: [ACR 2012]

Page 75: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries: Cost Model in DBMSs

p When given an query, a cost-based optimizer introduces a plan enumeration algorithm to find a plan, and a cost model to obtain the cost of that plan, and finally selects the plan with the lowest cost.

p Cost: units of work or resource used. The query optimizer uses disk I/O, CPU usage, and memory usage as units of work.

Image Source: [VAA 2015]

Page 76: Workload-Aware Performance Tuning for Autonomous DBMSs

Cost Model in PostgreSQLp PostgreSQL uses a combination of I/O and CPU costs that are weighted

by constant factors.→ the I/O cost includes: seq_page_cost ( ), random_page_cost ( ) → the CPU cost includes: cpu_tuple_cost ( ), cpu_index_tuple_cost ( ), cpu_operator_cost ( )

p The cost of PostgreSQL can be calulated as a linear combination:

depends on both the accuracy of the and the

Ø The accuracy of is decided by cardinality estimation approachesØ The cost model CANNOT be directly used for predicting execution time

Page 77: Workload-Aware Performance Tuning for Autonomous DBMSs

Constant factors in the cost model of PostgreSQL

p Used for disk-oriented system (not in-memory system) p The default values are meant to reflect the relative difference between random

access, sequential access and CPU costs.Ø e.g., processing a tuple ( ) is 400x cheaper than reading it from a page ( ).

p They are best treated as averages over the entire mix of queries.p Change them on the basis of just a few experiments is very risky.

Cost Unit Value

: seq_page_cost 1.0: random_page_cost 4.0: cpu_tuple_cost 0.01: cpu_index_tuple_cost 0.005: cpu_operator_cost 0.0025

Source: https://www.postgresql.org/docs/10/runtime-config-query.html

Page 78: Workload-Aware Performance Tuning for Autonomous DBMSs

PostgreSQL's Assumptions for Cardinality Estimation

p Assumption #1: Uniformity → All values, except for the most-frequent ones, are assumed to have the same number of tuplesp Assumption #2: Independence → The predicates on attributes (in the same table or from joined tables) are independentp Assumption #3: Principle of Inclusion → The domains of the join keys overlap such that the keys from the smaller domain have matches in the larger domain

Ø Unfortunately! Those assumptions are frequently wrong in realworld data sets.

Page 79: Workload-Aware Performance Tuning for Autonomous DBMSs

An Evaluation on Cardinality and Cost Estimation

p An experimental and analysis paper in PVLDB'15, “How Good Are Query Optimizers, Really?”

p Evaluate the correctness of cardinality estimates generated by DBMS optimizers as the number of joins increases.→ Join Order Benchmark (JOB) → Five DBMSs using 100k queries

p Some key findings: → All cardinality estimators routinely produce large errors → Cost model has much less influence on query performance than the cardinality estimates → Better to improve cardinality estimation instead of cost estimation

Page 80: Workload-Aware Performance Tuning for Autonomous DBMSs

An Evaluation on Cardinality and Cost Estimation (cont.)

Source1: [VAA 2015] Source2: CMU 15-721, https://15721.courses.cs.cmu.edu/spring2020/slides/22-costmodels.pdf

Page 81: Workload-Aware Performance Tuning for Autonomous DBMSs

An Evaluation on Cardinality and Cost Estimation (cont.)

Source: [VAA 2015]

p PostgreSQL picks the join algorithm on a purely cost-based basis

p Being “purely cost-based” can lead to very bad query plans

p We should take into account n the inherent uncertainty of

cardinality estimatesn the asymptotic complexities of

different algorithms, e.g., Ø Nested-loop join: Ø Hash join: Slowdown compared to using true cardinalities

60.6%

Page 82: Workload-Aware Performance Tuning for Autonomous DBMSs

An Evaluation on Cardinality and Cost Estimation (cont.)

Source: [VAA 2015]

p Cost and Runtimen Predicted cost vs. Actual runtime for

different cost modelsn Blue straight line: fitting with the linear

regression modeln Poor cardinality estimates lead to a

large number of outliers and a very wide standard error area

Page 83: Workload-Aware Performance Tuning for Autonomous DBMSs

Calibrate the Cost Model [WCZ 2013]p To overcome naïve scaling:

→ Method1: use machine learning [ACR 2012]→ Method2: calibrate the and the in the cost model [WCZ 2013]

p Cost models become much more effective after calibration

Prediction by Naïve Scaling Prediction by Calibration

Page 84: Workload-Aware Performance Tuning for Autonomous DBMSs

The Calibration Frameworkp How can we calibrate the and the ?

→ Calibrate the c’s: use profiling queries.→ Calibrate the n’s: refine cardinality estimates

Ø Calibrate c’s by running a family of profiling queries on the system

Ø Calibrate n’s by re-estimating cardinalities using a sampling-based approach

Page 85: Workload-Aware Performance Tuning for Autonomous DBMSs

Evaluation

Uniform Data

Skewed Data

: (calibrated) + (true cardinalities): (calibrated) + (cardinalities by

optimizer-without refinement): (calibrated) + (cardinalities by

sampling-with refinement) : naive scaling

Compare with• Naive scaling• SVM• REP trees

not consistent

Page 86: Workload-Aware Performance Tuning for Autonomous DBMSs

Analytic Modeling [WCH 2013] p The previous calibrated cost model only considers standalone queryp However, real world database workloads are concurrent and dynamic p Problem Definition: at time , predict the (remaining) execution time for

each query in the mix.

Page 87: Workload-Aware Performance Tuning for Autonomous DBMSs

Main Ideas p Recall the PostgreSQL’s cost model: → The won’t change! Even if the query is running together with other queries. → Only the will change!p The change at boundaries of phases during execution.

contention between q1 and q2

I/O cost changes

buffer pool sharing

Time q1 q2t1 - -t2 increases -t3 decrease decreaset4 - decrease

Page 88: Workload-Aware Performance Tuning for Autonomous DBMSs

Query Decomposition

p Pipelines division based on → blocking and nonblocking operators → execution order

Ø Decomposition result: a sequence of pipelines:

Page 89: Workload-Aware Performance Tuning for Autonomous DBMSs

Progressive Predictor p The execution of a query mix can be thought of as → multiple stages of mixes of pipelines.

Ø 8 mixes of pipelines during the execution of the 3 queriesØ Next, we need a prediction model for a mix of pipelines!

Page 90: Workload-Aware Performance Tuning for Autonomous DBMSs

Predictive Models p The key challenge is to compute the in cost model when the

pipelines are concurrently running.p Two choices:n Mahcine Learning-basedn An Analytic-Model Based Approach: Model the system with a queueing network.

1. Two service centers: Disk, CPU.2. Pipelines are customers.3. The are the residence times per visit of a customer.

A queueing network

Page 91: Workload-Aware Performance Tuning for Autonomous DBMSs

Evaluation

TPC-H2 (with more expensive templates)

Competitors:• REP trees• Baseline: Calibrated Cost Model for QPP [WCZ 2013]ü predict execution time as if it were the only query running

On MB1 (mixes of heavy index scans)

Page 92: Workload-Aware Performance Tuning for Autonomous DBMSs

Uncertainty Measure for QPP [WWH 2014]

p Key Ideasp Estimations are more useful with confidence intervalsp From single point estimation to distribution-based estimation p View the and in the cost model as random variables rather than constants

Page 93: Workload-Aware Performance Tuning for Autonomous DBMSs

QPPNet: A Deep Learning Approach [MP 2019] p Key Ideas

p Problem: Existing approaches often fail to capture the complex interactions between query operators and input relations

p Solution: use a novel plan-structured neural network which can match the structure of any optimizer-selected query xecution plan

Page 94: Workload-Aware Performance Tuning for Autonomous DBMSs

Plan-Structured DNNs p Structures

p consists of operator-level neural networks (neural units)p the query plan is modeled as a tree of neural units

p Operator-level neural unitsp a neural unit is a neural networkp each unit outputs both a latency and a data vector

Neural unit corresponding to a scan operator Neural unit corresponding to a join operator

Page 95: Workload-Aware Performance Tuning for Autonomous DBMSs

Plan-Structured DNNs (cont.)

A neural network for a simple join query

p Trees of neural units → stack the neural unit to generate a tree structures → each operator is replaced with its corresponding neural unit

• leaf neural unit: leaf nodes of the query plan tree

• internal neural unit: internal operator in the query plan

Page 96: Workload-Aware Performance Tuning for Autonomous DBMSs

Performance Evaluation p SVM: Support Vector Machine [ACR 2012]p RBF: Resource-based featuresp TAM: Tuned analytic model (TAM) [WCZ 2013] → Calibrated Cost Model p DNN: Non-tree structured DNN

Page 97: Workload-Aware Performance Tuning for Autonomous DBMSs

Performance Evaluation (cont.)

Error distributions for TPC-DS

Ø All of the tested techniques gave unbiased estimations

Ø QPPNet achieves a tighter (lower variance) error distribution

Multi-modal error distribution• a common occurrence in

DNN using L2 loss functions that are not well-structured to their problem domain

Page 98: Workload-Aware Performance Tuning for Autonomous DBMSs

Performance Evaluation (cont.)

Warm cache vs. Clod cache Concurrent queries

p BAL: Buffer Access Latency [DCP 2011]• Warm cache has larger error• Cache information can be utilized to

improve the performance

Page 99: Workload-Aware Performance Tuning for Autonomous DBMSs

GPredictor [ZSL 2020]

p Problem Definition: Given a query workload with multiple concurrent queries, predict the execution time of each query

p Idea: Use a graph embedding network to encode the features and a prediction network to predict query performancep each vertex in the graph model is a node in the query planp each edge between two vertices denotes the correlations between them

Page 100: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Graph Embedding p Embedding

p Machine Learning is all about vectorsp Representing (encoding) complex objects (text, image, speech, graph) into a vector

with a reduced number of features.p Examples: query2vec, word2vec, node2vec, graph2vec

p Graph embeddingp converts the graph data into a lower dimensional space (vectorized feature spaces)p preserve the graph structural information and graph propertiesp graph algorithms can be computed more efficiently in the low dimensional space

p Realted Research Problemsp graph analyticsp representation learning.

Page 101: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Graph Embedding (cont.) p Definition

Given a graph , a graph embedding is a mapping

such that and the function preserves some proximity measure defined on graph .

p Graph embedding techniquesp Matrix Factorizationp Random Walkp Deep Learning: e.g., autoencoder, Graph Neural Networkp Edge Reconstruction Based Optimizationp Graph Kernelp Generative Model

Page 102: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Graph Embedding (cont.)

Image Source: [HYL 2017]

An graph embedding example based on encoder-decoder approach

Encoder: maps the node to a low-dimensional vector embedding,

Decoder: extracts user-specified information from the low-dimensional embedding

Page 103: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Graph Embedding (cont.) p Graph embedding in Neo4j (a graph database)

p The Neo4j Graph Data Science Library supports several graph embedding algorithms

Source: https://neo4j.com/developer/graph-data-science/graph-embeddings/

Page 104: Workload-Aware Performance Tuning for Autonomous DBMSs

Architecture of GPredictor

#1 Workload2Graph• Extract workload

features • Collect statistics

#2 Graph Optimizer• Update the graph• Compact the graph

#3 Performance Predictor• Embed the graph• Predict query performance

Graph Graph

Page 105: Workload-Aware Performance Tuning for Autonomous DBMSs

Graph Modeling 1/2p Vertex modeling: obtain query plan, and extract operators from plans as

vertex features

Optimzer

Operator type

Query predicate

Sample bitmap

Estimated cost

Feature Stack

Page 106: Workload-Aware Performance Tuning for Autonomous DBMSs

Graph Modeling 2/2p Edge modeling: compute 4 types of correlations as edges

p data dependencies between operators, data sharing, data conflicts, resources competition

Parent-Child Data Sharing Data Conflict Resource Competition

Table sharing: and use the same table aka_name

Intermediate Results Sharing: and have the same subquery

Data conflict: may wait for the lock until vertex finishes.

Table sharing: and use the same table aka_name

Page 107: Workload-Aware Performance Tuning for Autonomous DBMSs

Graph Construction Vertex Matrix

Edge Matrix

Page 108: Workload-Aware Performance Tuning for Autonomous DBMSs

Graph Embedding for QPPp Step1: embed graph features with learned weights

• In each graph embedding layer:

is the neighborhood vertices of every vertex is the activation function

1 ' 1/2 1( )l l l lH D E D W H

H3embedded matrix

Vertex Matrix

Edge Matrix

Page 109: Workload-Aware Performance Tuning for Autonomous DBMSs

Graph Embedding for QPPp Step2: predict the performance based on the embeddings

H3

p A three-layer perceptron p input layer maps into preferable feature

space p hidden layer conducts data bstraction on

and outputs an abstracted matrix p output layer makes predictions on and

outputs the performance matrix Ø execution timeØ startup timeØ memory requirementØ CPU utilization

embedded matrix

Page 110: Workload-Aware Performance Tuning for Autonomous DBMSs

Graph Update and Compaction

p Handle workload changesp Graph Upadatep adding a new queryp removing some finished

operatorsp Graph Compactp compact several vertices into a

compound vertex ifü the execution time of any two

vertices has overlapü any two vertices have no parent-

child/data-sharing/data-conflict relationships • e.g., have time overlap and do not have relationships,

so compact them into a compound vertex

Page 111: Workload-Aware Performance Tuning for Autonomous DBMSs

Performance Comparison

Source: [ZSL 2020]

p BAL: Buffer Access Latency (BAL) [DCP 2011]p DL: QPPNet [MP 2019]p TLSTMCost: Tree LSTM-based cost model [SL 2019] p Graph (U): Graph Emdedding [ZSL 2020]

Page 112: Workload-Aware Performance Tuning for Autonomous DBMSs

Takeways for Execution Time Predictionp Real world workloads are both dynamic and concurrent

n Static workloads → Dynamic worklodsn Standalone workloads → Concurrent workloads

p For Traditional Machien Learningn The features used are import: indicator or query plan feature (raw feature)

p For Deep Learningn General DNN and decision tree cannot represente the link between nodesn Tree-Structure DNN is better than general DNN and decision treen Graph Emdedding is maybe the state-of-the-art technique for QPP

Page 113: Workload-Aware Performance Tuning for Autonomous DBMSs

A Workload Forecasting (Prediction)

Workload Forecasting (Prediction)

Workload Shift Arrival Rate Execution Time The Next Query/ Operator

• Machine learning• Time Series analysis

• QueryBot 5000 [MAH 2018]• P-Store [TEL 2018]

• Stochastic process modeling

• Time Series analysis

• Machine learning• Experiment-driven • Analytic modeling• Graph Embedding

• Machine learning• Reinforcement

learning• Stochastic process

modeling

• n-Gram [HR 2008]• Workload Periodicity

Analysis [HHR 2010]

• PQR Tree [GMD 2008]• KCCA [GKD 2009]• B2L [DCP 2011]• Experiment-based [ADA 2011]• Analytical-based [WCH 2013]• QPPNet [MO2019]• GPredictor [ZSL 2020]

• PROMISE [CS 2000]• Markov models [DYW

2009]• RNN [YH 2020]• RNN [MCS 2021]• Exact Q-Learning [MCS

2021]

Resource Usage

• Machine learning• Time Series analysis

• CPU Usage Prediction [BRB 2016]

• AR-RNN [LMG 2020]• Capacity planning [HHR

2020]

Page 114: Workload-Aware Performance Tuning for Autonomous DBMSs

Predict the Next Query (Operator)

p A Summary of Methodsp Markov Modeling [CS 2000, DYW 2009]p Machine LearningØ RNN [YH 2020, MCS 2021]

p Reinforcement LearningØ Q-Learning [MCS 2021]

Page 115: Workload-Aware Performance Tuning for Autonomous DBMSs

PROMISE [CS 2000]

p PROMISE: Predicting User Behavior in Multidimensional Information System Environments

p Target: Estimate the next query that an application will execute using Markov Models based on the analysis of user behavior

p Query → Statep A query corresponds to a

state in the Markov Models

Page 116: Workload-Aware Performance Tuning for Autonomous DBMSs

Markov models + Petri-net [DYW 2009]

p Target: Estimate the next transaction a user will executep Key Ideas

p use markov process to model the relationship between transactions where no parallel exists.

p use Petri-net to model the synchronization and parallel split workflow patterns

p Transaction → Statep each transaction is an individual state

in first-order Markov model

Page 117: Workload-Aware Performance Tuning for Autonomous DBMSs

Auto-Suggest [YH 2020]p Business analysts and data scientists spend up to 80% of their time on data

preparationp Key Ideas

p learn-to-recommend data preparation steps best suited to given user datap predict the next operator (e.g., join operator) that a user will execute

Page 118: Workload-Aware Performance Tuning for Autonomous DBMSs

ML And RL-based [MCS 2021]

p Problem definition Given an SQL query qui issued at timestep i, predict the fragments in the next query qui+1 at timestep i+1

p Schema-aware SQL fragment embedding mechanismn one-hot encoding

p Predictorsn RNNn Exact Q-Learning

Page 119: Workload-Aware Performance Tuning for Autonomous DBMSs

RNN - Temporal Predictors

p Training Phasen pairs of embeddings as inputn batch normalization layern drop-out regularization

p Test Phasen pick the top-K next query candidates from the historical pool

Page 120: Workload-Aware Performance Tuning for Autonomous DBMSs

Exact Q-Learning

p Agent: predictor modelp State/action space: set of distinct embeddings of the training queriesp Environmentp Reward

Page 121: Workload-Aware Performance Tuning for Autonomous DBMSs

A Comparison of Forecasting/ Prediction MethodsCategory Strengthens WeaknessesExperiment-driven

• Find good settings based on real system test runs • Work across different system versions and hardware

• Very time consuming due to multiple actual runs • Not cost effective for ad-hoc queries

Analytic Modeling

• Very efficient for predicting performance• Good accuracy in many (basic) scenarios

• Hard to capture complexity of system internals & pluggable components • Models often based on simplified assumptions

Machine Learning

• Ability to capture complex system and workload dynamics • Independence from system internals and hardware • Learning based on real observations of system performance

• Require large training sets which are expensive to collect • Training from history logs leads to under-fitting • Typically low accuracy for unseen queries• Hard to choose the proper model

Page 122: Workload-Aware Performance Tuning for Autonomous DBMSs

• [ACR 2012] "Learning-based query performance modeling and prediction." In ICDE. IEEE, 2012.• [ADA 2011] "Predicting completion times of batch query workloads using interaction-aware models and

simulation." In EDBT. 2011.• [BRB 2016] "A forecasting methodology for workload forecasting in cloud systems." IEEE Transactions on

Cloud Computing, 2016.• [CS 2000] “PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems,” in

DaWaK. Springer, 2000.• [DCP 2011] "Performance prediction for concurrent database workloads". In SIGMOD,2011.• [DYW 2009] “Towards Workflow-driven Database System Workload Modeling,” in DBTest, 2009.• [GKD 2009] "Predicting multiple metrics for queries: Better decisions enabled by machine learning". In ICDE,

IEEE, 2009.• [GMD 2008] "PQR: Predicting query execution times for autonomous workload management." In ICAC. IEEE,

2008.• [HHR 2010] “Towards Workload-aware Self-management: Predicting Significant Workload Shifts,” In ICDE

Workshops, IEEE, 2010.• [HHR 2020] “Database Workload Capacity Planning using Time Series Analysis and Machine Learning,” in

SIGMOD, 2020.

References 1/3

Page 123: Workload-Aware Performance Tuning for Autonomous DBMSs

• [HR 2008] “Autonomic Databases: Detection of Workload Shifts with n-Gram-Models,” in ADBIS. Springer, 2008.

• [HYL 2017] "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584, 2017.

• [LMG 2020] "Adaptive Recollected RNN for Workload Forecasting in Database-as-a-Service." In ICSOC. Springer, 2020.

• [MAH 2018] "Query-based Workload Forecasting for Self-Driving Database Management Systems." In SIGMOD, 2018.

• [MCS 2021] "Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future." ACM Transactions on Database Systems, 2021.

• [MP 2019] "Plan-Structured Deep Neural Network Models for Query Performance Prediction." In PVLDB, 2019.

• [PJZ 2011] “On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems,” In PVLDB, 2012.

• [SAT 2020] "Adaptive learning of aggregate analytics under dynamic workloads." Future Generation Computer Systems, 2020.

• [SL 2019] "An End-to-End Learning-based Cost Estimator." In PVLDB, 2019.

References 2/3

Page 124: Workload-Aware Performance Tuning for Autonomous DBMSs

• [TEL 2018] "P-store: An elastic database system with predictive provisioning". In SIGMOD, 2018.• [VAA 2015] “How good are query optimizers, really?.” In PVLDB, 2015.• [WCH 2013] "Towards predicting query execution time for concurrent and dynamic database workloads." In

PVLDB, 2013.• [WCZ 2013] "Predicting query execution time: Are optimizer cost models really unusable?." In ICDE. IEEE,

2013.• [WWH 2014] "Uncertainty aware query execution time prediction." In PVLDB, 2014.• [YH 2020] "Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks." In

SIGMOD, 2020.• [ZSL 2020] "Query performance prediction for concurrent queries using graph embedding." In PVLDB, 2020.

References 3/3

Page 125: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A:

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B:● Workload-Based Tuning● Amazon Redshift● Open Challenges and Discussion

Page 126: Workload-Aware Performance Tuning for Autonomous DBMSs

p Problem definition : n Given an estimated workload, find the best configuration to maximize

the system performance.n Given specified goals for performance measures and workload

properties, find the lowest-cost configuration that satisfies the performance goals.

p Two critical problems:n How to extract and utilize the workload information n How to find the optimal tuning actions (e.g., how much amount of

hardware resource is needed for future workloads)

Tuning Problem Definition

Page 127: Workload-Aware Performance Tuning for Autonomous DBMSs

p Existing Methodsn Rule-based: conversion rules modelling, hierarchy of rules, etc.n Cost modelingn Machine learning: DNN, Pairwise DNN, etc.n Reinforcement learning: Deep Q-learning, Deep Deterministic Policy

Gradient, etc.

p Tuning Tasksn Database designØ Physical design: indexes, materialized views, partitioning, and storageØ Logical design: schema design

n Resource provisioning: CPU, RAM, Disk I/O, buffer pool size, etc.

Workload-based Tuning

Page 128: Workload-Aware Performance Tuning for Autonomous DBMSs

Tuning Method Trend Training cost

Time2005 2017 2019

Rule-basedCost Modeling

Machine LearningReinforcement Learning

Required expert knowledge on database

Rule-based

Cost ModelingMachine Learning

Reinforcement Learning

Page 129: Workload-Aware Performance Tuning for Autonomous DBMSs

p Rule-basedp Cost Modelingp Machine Learning-basedp Reinforcement Learning-based

Workload-Based Tuning

Page 130: Workload-Aware Performance Tuning for Autonomous DBMSs

p Tuning based on rules derived from DBAs’ expertise, experience, and knowledge, or Rule of Thumb default recommendation.

Rule-based Tuning Methods

p Representative works n Rules based on Producer-Consumer Model [ZZS 2020]n Conversion rules modelling for logical design [LM 2015]n Hierarchy of rules [DLN 2016]

Page 131: Workload-Aware Performance Tuning for Autonomous DBMSs

p Problem n automatic optimization of data

placement parametersn focus on the inter-job write once

read many scenario

p Idean Step1: automatically predict future

workloads’ data access patternsn Step2: tune data placement

parameters accordingly to optimize the performance

WATSON [ZZS 2020]

Page 132: Workload-Aware Performance Tuning for Autonomous DBMSs

A Workload-Driven Logical Design Approach [LM 2015]

p Use rules that converts each conceptual constructor to an equivalent representation in the NoSQL document logical model

Page 133: Workload-Aware Performance Tuning for Autonomous DBMSs

Automated Scaling [DLN 2016]

p Major Ideasn Identify a set of statistically-robust

signals and propose a decision logic to improve accuracy in estimating resource demands.

n This decision logic comprising a manually constructed hierarchy of rules that use multiple signals to determine the resource demands of the tenant’s workload

Ø Good rules requires a deep understanding about the knowledge of database engines

Page 134: Workload-Aware Performance Tuning for Autonomous DBMSs

p Rule-basedp Cost Modelingp Machine Learning-basedp Reinforcement Learning-based

Workload-based Tuning

Page 135: Workload-Aware Performance Tuning for Autonomous DBMSs

p A cost model establishes a performance model by cost functions based on the deep understanding of system components.

Cost Modeling Tuning Methods

Page 136: Workload-Aware Performance Tuning for Autonomous DBMSs

p STMM: Adaptive Self-Tuning Memory in DB2

STMM [SGL 2006]

n Target: memory tuningn Methods: control theory, runtime

simulation modeling, cost-benefit analysis, operating system resource analysis

n Reallocate memory for several critical components (e.g., compiled statement cache, sort, and buffer pools)

Page 137: Workload-Aware Performance Tuning for Autonomous DBMSs

H2O [AIA 2014]

Problemp None of fixed data storage layouts

is a universally good solutionp Different workloads require

different storage layouts and data access methods in order to achieve good performance

Page 138: Workload-Aware Performance Tuning for Autonomous DBMSs

H2O [AIA 2014]

p Adaptive data layouts along with hybrid query execution strategiesn Continuously adapts based on the workloadn Every single query is a trigger to decide how the respective data should be

stored and how it should be accessed.n New layouts are created or old layouts are refined on-the-fly as we process

in- coming queries.n At any given point in time, there may be several different storage formats

(e.g., rows, columns, groups of attributes) co-existing and several execution strategies used.

n Same piece of data may be stored in more than one formats if different parts of the query workload need to access it in different ways.

Page 139: Workload-Aware Performance Tuning for Autonomous DBMSs

p Self-desinging key-value storen how a specific design change in the underlying storage of a data system would

affect performancen what would be the optimal data structure (from a given set of designs) given

workload characteristics and a memory budget.

p Key Ideasn Design Space: know all possible data structure designsn Learne Cost Models: compute the performance properties of any given data

structure design

Key-Value Store [IDQ 2019]

Page 140: Workload-Aware Performance Tuning for Autonomous DBMSs

Unified Cost Model

n Cost model: a closed-form equation for each one of the core performance metricsn Measure the worst-case number of I/Os issued for each of the operation types

Page 141: Workload-Aware Performance Tuning for Autonomous DBMSs

Cost of Different Data Structures

Page 142: Workload-Aware Performance Tuning for Autonomous DBMSs

p Key Ideasn Adaptively partition views based on the access patterns of a workloadn Cost-based decisions: use cost-based view and fragment eviction policy to

adapt to evolving workloadsn Cost model is based an estimate of the runtime

p Problem Definitionn Given a workload of queries to be executed and a pool size

limit , choose a sequence of configurations in order to minimize the total execution time of the workload plus the time spent on view creation:

DeepSea [DGT 2017]

Page 143: Workload-Aware Performance Tuning for Autonomous DBMSs

AQWA [AMH 2015]

p Cost Modeling for Partitionnmodels the cost of executing the

queriesn associates with each data

partition the corresponding costn cost function integrates both the

data distribution and the query workload

Page 144: Workload-Aware Performance Tuning for Autonomous DBMSs

p Rule-basedp Cost Modelingp Machine Learning-basedp Reinforcement Learning-based

Workload-based Tuning

Page 145: Workload-Aware Performance Tuning for Autonomous DBMSs

p Machine Learning (ML) approaches aim to tune database automatically by taking advantages of ML methods.

Machine Learning Tuning Methods

Page 146: Workload-Aware Performance Tuning for Autonomous DBMSs

Doraemon [TDW 2019] - Index

p Key Ideas of Learned Indexn CDF: cumulative distribution functionn approximate the CDF function by

machine learning models

Page 147: Workload-Aware Performance Tuning for Autonomous DBMSs

DBSeer [MCJ 2013] - Reource Allocation

p Key Ideasn statistical models for resource and

performance analysis

p Workflow: n Collecting Logsn Preprocessing / Clusteringn Modeling

Page 148: Workload-Aware Performance Tuning for Autonomous DBMSs

p Key Ideasn Factor Analysis: transform high dimension parameters to few factorsn Kmeans: Cluster distinct metricsn Lasso: Rank parameters (Identify important knobs)n Gaussian Process: Predict and tune performance

OtterTune [APG 2017] - Knob Tuning

Page 149: Workload-Aware Performance Tuning for Autonomous DBMSs

iBTune [TZL 2019] p Problem: tune buffer size in cloud databases

p Challengesn #1: workload may dynamically change on each instancen #2: each database instance requires a different buffer size

p Key Ideas of Solutionn leverages the information from similar workloads to find out the

tolerable miss ratio of each instancen utilizes the relationship between miss ratios and allocated memory

sizes to individually optimize the target buffer pool sizes.

Page 150: Workload-Aware Performance Tuning for Autonomous DBMSs

Workflow of iBTune

p Pairwise DNN n instance-to-vector embeddingn inputs: measurements from a pair

of similar instances as n predict the upper bounds of the

response times (RT)

Page 151: Workload-Aware Performance Tuning for Autonomous DBMSs

p Rule-basedp Cost Modelingp Machine Learning-basedp Reinforcement Learning-based

Workload-based Tuning

Page 152: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Reinforcement Learning p Markov Decision Process (MDP)n Set of states , set of actions , initial state n Transition model Ø e.g.,

n Reward function Ø e.g.,

n Goal: maximize cumulative reward in the long runn Policy: mapping from or (deterministic vs. stochastic)

p MDP solversn Dynamic Programmingn Monte Carlo Methodsn Deep Q-lerning

Page 153: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Reinforcement Learning (cont.)

p RL Set-Upn Agent interacts with the environment by taking actions and receiving feedbacksn Feedback is in the form of rewardsn Agent’s utility is defined by the reward functionn Must (learn to) act to maximize expected rewards

p Reinforcement learningn Transitions and rewards usually not availablen How to change the policy based on experiencen How to explore the environment

Page 154: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Reinforcement Learning (cont.) p Value functionsn estimates how good it is to perform a given action in a given staten state value function: Ø expected return when starting in s and following

n state-action value function: Ø expected return when starting in , performing , and following

n useful for finding the optimal policy Ø can estimate from experienceØ pick the best action using

p Bellman equationn expresses a relationship between the value of a state and the values of its

successor states.

Page 155: Workload-Aware Performance Tuning for Autonomous DBMSs

Preliminaries of Reinforcement Learning (cont.)

p Q-functionsn The Q-function captures the expected total future reward that can be received by

an agent in state when taking a certain action

p Learning algorithmsn Value Learning: directly approximates (Bellman optimality eqn)Ø independent of the policy being followedØ only requirement: keep updating each pair

n Policy Learning: find

Page 156: Workload-Aware Performance Tuning for Autonomous DBMSs

p Knob Tuningn CDBTune [ZLZ 2019]n QTune [LZL 2019]

p Index Tuningn NoDBA [SSD 2018]n DQN-based [LBP 2020]n DBA bandits [POR 2020]

p View Generationn DQM [LEK 2019]n [YLF 2020]n AutoView [HLY 2021]

A Summary of RL-based Tuning p Partitioningn GridFormation [DPP 2018] n Extended-GridFormation [DPP 2019]n Cloud Partitioning Advisor [HBR 2020]

p Resource Allocationn REIM [LZL 2020]

p Workload Schedulingn SmartQueue [ZMK 2020]n Decima [MSB 2019]

p Replication n [FCP 2020]

Page 157: Workload-Aware Performance Tuning for Autonomous DBMSs

CDBTune [ZLZ 2019] - Knob Tuning

p Reinforcement learningn State: knobs and metricsn Reward: performance changen Action: recommended knobsn Policy: Deep Neural network

p Key idean Feedback: try-and-error methodn Deep Deterministic Policy Gradientn Actor critic algorithm

Page 158: Workload-Aware Performance Tuning for Autonomous DBMSs

QTune [LZL 2019] - Knob Tuning

p More tuning levels than CDBTunen Query-leveln Workload-level n Cluster-level

Page 159: Workload-Aware Performance Tuning for Autonomous DBMSs

DBA bandits [POR 2020] -Index tuning

n Model index tuning as a multi-armed banditn Achieve a worst-case safety guarantee against any optimal fixed policyn Advantages: eschew the DBA and the (error-prone) query optimizer

Page 160: Workload-Aware Performance Tuning for Autonomous DBMSs

DQM [LEK 2019] - Materialized Views Selection

p Insightsn Selection policies can be effectively

trained with an asynchronous RL algorithm

n RL runs paired counter-factual experiments during system idle times to evaluate the incremental value of persisting certain views.

n Obviates the need for accurate cardinality estimation or hand-designed scoring heuristics.

Page 161: Workload-Aware Performance Tuning for Autonomous DBMSs

AutoView [HLY 2021] - Materialized Views (MVs) Mangement

n Goal: automatically generate MVs by analyzing the query workload and utilize the MVs to optimize queries

n Modules: MV candidate generation, MV cost/benefit estimation, MV selection, and MV-aware query rewriting

Page 162: Workload-Aware Performance Tuning for Autonomous DBMSs

Encoder-Reducer DDQN Model of AutoView

p Componentsn Environment: stores the global MV

selection state and calculates the total benefit

n Agent: two neural networks and the experience replay mechanism

n Reward: the change of total benefit after each action

Page 163: Workload-Aware Performance Tuning for Autonomous DBMSs

Cloud Partitioning Advisor [HBR 2020] - Partitioning

n Problem: finding an optimal partitioning scheme for a given schema and workloadn Idea: let a DRL agent learn the cost tradeoffs of different partitioning schemes and can

thus automate the partitioning decision (not rely on accurate cost models)

Page 164: Workload-Aware Performance Tuning for Autonomous DBMSs

SmartQueue [ZMK 2020] - Query Scheduling

p Goaln Increase query performance by

reducing disk reads n Learn a scheduling strategy that

improves cache hitsp SmartQueuen learns how to order the execution of

queries to minimize disk access requests

n continuously learns from past scheduling decisions and adapts to new data access and caching patterns

Page 165: Workload-Aware Performance Tuning for Autonomous DBMSs

• [AIA 2014] “H2O: A Hands-free Adaptive Store,” in SIGMOD, 2014.• [AMH 2015] "AQWA: adaptive query workload aware partitioning of big spatial data." in PVLDB, 2015.• [AMM 2014] “Database Workload Management Through CBR and Fuzzy Based Characterization,”

Applied Soft Computing, 2014• [APG 2017] “Automatic database management system tuning through large-scale machine learning,” in

SIGMOD, 2017.• [DGT 2017] “DeepSea: Progressive Workload-Aware Partitioning of Materialized Views in Scalable Data

Analytics”, in EDBT, 2017.• [DLN 2016] “Automated Demand-driven Resource Scaling in Relational Database-as-a-Service,” in

SIGMOD, 2016.• [DPP 2018] "GridFormation: towards self-driven online data partitioning using reinforcement learning."

in aiDM. 2018.• [DPP 2019] "Automated vertical partitioning with deep reinforcement learning." European Conference

on Advances in Databases and Information Systems. Springer, 2019.• [FCP 2020] "Self-tunable DBMS Replication with Reinforcement Learning." IFIP International Conference

on Distributed Applications and Interoperable Systems. Springer, 2020.• [HBR 2020] “Learning A Partitioning Advisor for Cloud Databases,” in SIGMOD, 2020.

References 1/3

Page 166: Workload-Aware Performance Tuning for Autonomous DBMSs

• [HLY 2021] “An Autonomous Materialized View Management System with Deep Reinforcement Learning.” In ICDE, 2021.

• [IDQ 2019] “Design Continuums and the Path Toward Self-Designing Key-Value Stores That Know and Learn,” in CIDR, 2019.

• [KB 2020] “Black or White? How to develop an autotuner for memory-based analytics,” in SIGMOD, 2020.• [KKS 2018] “Workload-Aware CPU Performance Scaling for Transactional Database Systems,” in SIGMOD,

2018.• [LBP 2020] “An Index Advisor Using Deep Reinforcement Learning”, in CIKM, 2020.• [LEK 2019] “Opportunistic view materialization with deep reinforcement learning”, arXiv preprint

arXiv:1903.01363. 2019.• [LM 2015] “A Workload-Driven Logical Design Approach for NoSQL Document Databases,” in iiWAS, 2015.• [LZL 2019] “QTune: A Query-aware Database Tuning System with Deep Reinforcement Learning,” in

VLDB, 2019.• [MCJ 2013] “Performance and Resource Modeling in Highly-concurrent OLTP Workloads,” in SIGMOD,

2013.• [MSB 2019] "Learning scheduling algorithms for data processing clusters." in SIGCOMM, 2019.

References 2/3

Page 167: Workload-Aware Performance Tuning for Autonomous DBMSs

• [POR 2020] “DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees”, arXiv preprint arXiv:2010.09208. 2020.

• [SGL 2006] “Adaptive Self-Tuning Memory in DB2.” in VLDB, 2006.• [SSD 2018] “The Case for Automatic Database Administration Using Deep Reinforcement Learning,”

arXiv:1801.05643, 2018• [STZ 2019] "Scheduling OLTP transactions via machine learning." arXiv preprint arXiv:1903.02990, 2019.• [TDW 2019] “Learned Indexes for Dynamic Workloads,” arXiv:1902.00655, 2019.• [TZL 2019] “iBTune: Individualized Buffer Tuning for Largescale Cloud Databases,” in PVLDB, 2019.• [YLF 2020] “Automatic view generation with deep learning and reinforcement learning”, in ICDE, 2020.• [ZLZ 2019] “An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement

Learning,” in SIGMOD, 2019.• [ZMK 2020] "Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning." arXiv preprint

arXiv:2007.10568, 2020.• [ZZS 2020] “WATSON-A Workflow-based Data Storage Optimizer for Analytics,” in MSST, 2020.

References 3/3

Page 168: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A:

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B:● Workload-Based Tuning● Amazon Redshift● Open Challenges and Discussion

Page 169: Workload-Aware Performance Tuning for Autonomous DBMSs

Amazon Redshift - IntroductionAmazon Redshift is a fast, fully managed, petabyte-scale data warehouse.

Amazon Redshift has tens of thousands of customers.

Page 170: Workload-Aware Performance Tuning for Autonomous DBMSs

Amazon Redshift - IntroductionAmazon Redshift automates performance tuning by using Machine Learning algorithms.

Workload Manager (WLM)

• WLM automatically allocates resources using ML prediction models.• WLM schedules short-running queries ahead of longer-running queries by using ML prediction models

MV auto-refresh

• Materialized View (MV) uses predictive algorithms to figure out when is the best time to refresh materialized views.

Automatic distribution keys

• Determine the appropriate distribution key by using ML algorithms.• Build graphs and sophisticated scoring models to predict the benefit of distribution keys on query

performance.

Automatic table sort

• Provide an efficient and automated way to maintain sort order of the data to optimize query performance

• Use Machine Learning to prioritize which blocks of table to sort by analyzing historical query patterns

Page 171: Workload-Aware Performance Tuning for Autonomous DBMSs

Amazon Redshift

AmazonRedshiftcluster

Leader node

Compute nodes

Workload manager

concurrent query process

Workload Manager (WLM)

• WLM allocates resources for queries• WLM schedules queries to execute• WLM monitors system performance

Page 172: Workload-Aware Performance Tuning for Autonomous DBMSs

Amazon Redshift

Manual WLM

before 2017

• Fixed concurrency• Fixed memory

Short Query Acceleration

(SQA)

2017

Queries less than 20 sec accelerated via express lane

Concurrency Scaling (CS)

Q1 2019

Handle workload spikes by routing queued queries to different clusters that are automatically provisioned

AutoWLM

Q3 2019

• Use Machine Learning models to predict execution time and memory

• Flexible memory assignment and concurrency

• Queries are assigned with priorities

Page 173: Workload-Aware Performance Tuning for Autonomous DBMSs

Manual WLM SQA AutoWLMCS

Simple fixed workload

• Hard to set optimal configuration• Re-tune every time• Idle queues waste resources• Short queries blocked by long queries• Concurrency is not run-time adjustable• ……

ETL queue

Memory (%)82

Concurrency on main15

Concurrency scaling mode-

Timeout (ms)-

Query monitoring rules (0)

Analytic queue

Memory (%)18

Concurrency on main10

Concurrency scaling mode-

Timeout (ms)-

Query monitoring rules (0)

Fixed memory and concurrency for each queue

Page 174: Workload-Aware Performance Tuning for Autonomous DBMSs

- Use Machine learning model to identify short queries- Route short queries to a system created short query queue • Train prediction models on each cluster

• Refresh underlying models periodically

• Use linear regression and decision tree

• Timeout when prediction is incorrect

• Elastic SQA

In a busy system, 1 second query may wait in the waiting queue for 1 minute

Running query

ETL queue (82% mem, 15 concurrency)

Short query queue

Analytic queue (18% mem, 10 concurrency)

36s300s50s

89s

Queued query

……

15 running queries

……

10 running queries

966s

1s3s

3s

Manual WLM SQA AutoWLMCS

Page 175: Workload-Aware Performance Tuning for Autonomous DBMSs

• Most clusters are very busy for 1~2 hours in a day

• Instead of provisioning compute for peak periods, CS saves costs by allowing the customer to provision compute to satisfy steady state needs and automatically scale as needed to handle the peak periods

Main cluster

ETL queue (82% mem, 15 concurrency)

Short query queue

Analytic queue (18% mem, 10 concurrency)

89s

……

15 running queries

……

10 running queries

966s

1s3s3s

Main cluster

Manual WLM SQA AutoWLMCS

100 running queries

Concurrency Scaling clusters

36s300s50s

Page 176: Workload-Aware Performance Tuning for Autonomous DBMSs

- Automatically manage resources (memory, concurrency, …) by applying ML algorithms.- Schedule shorter queries ahead of longer queries- More important jobs (e.g., CEO jobs) get more resources by setting higher priorities

ETL queue

Memory (%)82

Concurrency on main15

Concurrency scaling mode-

Timeout (ms)-

Query monitoring rules (0)

Analytic queue

Memory (%)18

Concurrency on main10

Concurrency scaling mode-

Timeout (ms)-

Query monitoring rules (0)

Manual WLM configuration

ETL queue

Memory (%)Auto

Concurrency on mainAuto

Concurrency scaling mode-

Query PriorityHighest

Query monitoring rules (0)

Analytic queue

Memory (%)Auto

Concurrency on mainAuto

Concurrency scaling mode-

Query PriorityLow

Query monitoring rules (0)

AutoWLM configuration

Manual WLM SQA AutoWLMCS

Page 177: Workload-Aware Performance Tuning for Autonomous DBMSs

LN

CN CN CN

WLM

• Apply ML models to predict execution time and memory

• Improve user experience by scheduling short queries first

• Manage memory and system concurrency adaptively

• Feedback mechanism to deal with wrong predictionPredict time and memory

ML Models

Manual WLM SQA AutoWLMCS

Page 178: Workload-Aware Performance Tuning for Autonomous DBMSs

LN

CN CN CN

H

H H

N

L

H H N L

WLM

• Apply ML models to predict execution time and memory

• Improve user experience by scheduling short queries first

• Manage memory and system concurrency adaptively

• Feedback mechanism to deal with wrong prediction

• Allow users to specify priorities for queries

• System automatically assigns priorities if not set

Predict time and memory

ML Models

Manual WLM SQA AutoWLMCS

H H

H H

N

L

H H

H H

N

L

H

With priority

CEO job gets more resources as it has high priority.

Starvation control

Admission control

Page 179: Workload-Aware Performance Tuning for Autonomous DBMSs

Manual WLM SQA AutoWLMCS

Demo

Page 180: Workload-Aware Performance Tuning for Autonomous DBMSs

Accuracy of prediction models

Time prediction Memory prediction

Over 97% accuracy for both time and memory on the

average

Manual WLM SQA AutoWLMCS

Page 181: Workload-Aware Performance Tuning for Autonomous DBMSs

0.4 8.8 16.2 25.443.6

84.8 91.7 93.5

143.3

317.6

1.3

68.684.7 89.7

105.3 117.4131.1 140.9 147.0

196.8

385.1

0

50

100

150

200

250

300

350

400

[0s-1s] (1s-5s] (5s-10s] (10s-20s] (20s-30s] (30s-40s] (40s-50s] (50s-60s] (60s-120s) (60s-180s) (180s- )

TPCDS-3T, 20 streams, 99 queries per stream

AutoWLM-DynamicPriorityAssign AutoWLM-NoPriority

resp

onse

tim

e (s

ec)

Improve response time by ~5 times

Priority vs noPriority (TPCDS)

Manual WLM SQA AutoWLMCS

Page 182: Workload-Aware Performance Tuning for Autonomous DBMSs

TPC-H 3T and TPC-H 100GB datasets https://code.amazon.com/packages/Mixed-WorkLoad-tpc-h/trees/mainline

Manual WLM Auto WLMQueues/Query

Groups Memory % Max Concurrency Memory % Max

ConcurrencyDashboard 24 5 Auto Auto

Report 25 6 Auto AutoDataScience 25 4 Auto Auto

COPY 25 3 Auto AutoDefault 1 1 Auto Auto

Manual WLM vs AutoWLM (TPCH)

Manual WLM SQA AutoWLMCS

Page 183: Workload-Aware Performance Tuning for Autonomous DBMSs

TPC-H 3T and TPC-H 100GB datasets https://code.amazon.com/packages/Mixed-WorkLoad-tpc-h/trees/mainline

Manual WLM Auto WLMQueues/Query

Groups Memory % Max Concurrency Memory % Max

ConcurrencyDashboard 24 5 Auto Auto

Report 25 6 Auto AutoDataScience 25 4 Auto Auto

COPY 25 3 Auto AutoDefault 1 1 Auto Auto

Manual WLM vs AutoWLM (TPCH)

Manual WLM SQA AutoWLMCS

lower is better

Page 184: Workload-Aware Performance Tuning for Autonomous DBMSs

Queue Name Memory Concurrency

Analyst 72% 12

DBA 4% 2

Load 20% 5

Default 1% 1

Queue Name Memory Concurrency Priority

Analyst Auto Auto Highest

DBA Auto Auto Normal

Load Auto Auto Normal

Default Auto Auto Normal

Production Manual WLM Configuration Recommended Auto WLM Configuration

AutoWLM vs well-tuned manual WLM (Customer data)

• Concurrency Scaling usage was reduced by 2x with Auto WLM.• Elapsed time improved: P50 = 8.5% improvement, P90=6.3% improvement, P99 = 15.8%

improvement• Queue times reduced by >2x for all levels up to p99

Manual WLM SQA AutoWLMCS

Page 185: Workload-Aware Performance Tuning for Autonomous DBMSs

Amazon RedshiftManual WLM SQA AutoWLMCS

Take away

Amazon Redshift utilizes Machine Learning models to predict execution time and memory for queries

Amazon Redshift schedules shorter queries ahead of longer queries

Amazon Redshift adaptively manages memory and system concurrency

Improved throughput, better user experience, good system utilization

=

+

+

Page 186: Workload-Aware Performance Tuning for Autonomous DBMSs

Amazon Redshift

Naresh Chainani: [email protected] Krishnamurthy: [email protected] Lin: [email protected]

New features in Redshift:

• Support data sharing queries (cross-database, cross-cluster, cross-account, …)

• Redshift ML (Use SQL to train ML models and do high performance in-database inference)

• ……

Page 187: Workload-Aware Performance Tuning for Autonomous DBMSs

OutlinePart A:

● Motivation and Background● Workload Classification● Workload Forecasting (Prediction)

Part B:● Workload-Based Tuning● Amazon Redshift● Open Challenges and Discussion

Page 188: Workload-Aware Performance Tuning for Autonomous DBMSs

p Robust workload classification, forecasting, and predictionØ How to realize robust classifiers and predictors for problematic workloads (e.g.,

the noisy workload patterns)?p Tuning with inaccurate workload informationØ How to enhance the performance even in spite of the fact that the workload

prediction may be inaccurate?p Insufficient training dataØ How to acquire or even generate more valid training data for machine learning

approaches?p Incremental training of tuning modelsØ How to effectively retrain and update the models for new data after deploying

them in the real production environment (e.g., in the cloud)?

Open Challenges for Workload-Aware Tuning

Page 189: Workload-Aware Performance Tuning for Autonomous DBMSs

Acknowledgement

p We would like to thank the all the organizers of ICDE 2021 and attendees for participating in this tutorial.

p We would also like to thank all the authors of the referenced papers for their contributions to the papers as well as the slides available on the internet. We have borrowed generously from their papers and slides for this tutorial.

p We would also like to thank the Redshift WLM team members Gaurav Saxena and Mohammad Rahman

Page 190: Workload-Aware Performance Tuning for Autonomous DBMSs