27
Self-Driving Database Management Systems CIDR 2017 @andy_pavlo

Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Self-Driving Database Management Systems

CIDR 2017 @andy_pavlo

Page 2: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

1920s

Cornelius Von Pavlo

1950s

Joseph Pavlo

1980s

Timothy Pavlo

Page 4: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Possible

» Physical Database Design

» Resource Allocation

» Query Optimization & Tuning

» Knob Configuration

4

Page 5: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

What’s Different?

» Previous tools only dealt with handling problems in the past.

» Humans still make final decisions.

» Hardware & algorithm advancements.

5

Page 7: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Planning 3

Search Tree

Action Sequence

2 Forecasting

Historical Workload

Predicated Workload

1 Clustering

Clusters

Workload Monitor

Page 8: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

#1 – Clustering

» Group similar queries together to improve the forecasting models.

» Logical vs. Physical Features

7

Page 9: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

8 SELECT C_ID FROM CUSTOMER WHERE C_W_ID = ? AND C_D_ID = ? AND C_LAST = ? ORDER BY C_FIRST

table={CUSTOMER} attributes={C_ID,C_W_ID,C_D_ID,C_LAST} orderby={C_FIRST} aggregate={Ø}

Logical Features

Page 10: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

8 SELECT C_ID FROM CUSTOMER WHERE C_W_ID = ? AND C_D_ID = ? AND C_LAST = ? ORDER BY C_FIRST

table={CUSTOMER} attributes={C_ID,C_W_ID,C_D_ID,C_LAST} orderby={C_FIRST} aggregate={Ø}

Logical Features

Page 11: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

8 SELECT C_ID FROM CUSTOMER WHERE C_W_ID = ? AND C_D_ID = ? AND C_LAST = ? ORDER BY C_FIRST

table={CUSTOMER} attributes={C_ID,C_W_ID,C_D_ID,C_LAST} orderby={C_FIRST} aggregate={Ø}

Logical Features

Page 12: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

8 SELECT C_ID FROM CUSTOMER WHERE C_W_ID = ? AND C_D_ID = ? AND C_LAST = ? ORDER BY C_FIRST

table={CUSTOMER} attributes={C_ID,C_W_ID,C_D_ID,C_LAST} orderby={C_FIRST} aggregate={Ø}

Logical Features

Physical Features tuplesRead={##} tuplesWritten={##} cpu={##} memory={##}

lockWait={##} indexPages={##} networkRead={##} networkWritten={##}

Page 13: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

8 table={CUSTOMER} attributes={C_ID,C_W_ID,C_D_ID,C_LAST} orderby={C_FIRST} aggregate={Ø}

Logical Features

Physical Features tuplesRead={##} tuplesWritten={##} cpu={##} memory={##}

lockWait={##} indexPages={##} networkRead={##} networkWritten={##}

Lacks Execution Info –

Fixed/Immutable + Cheap to Compute +

Unstable/Changes –

Descriptive + Identifies Problems +

Page 14: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

#2 – Forecasting

» Generate forecasting models for each cluster to predict future arrival rate.

» Multiple horizons & intervals.

9

Page 15: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

10

Gaming Stats

Bus Tracking

Admissions

Real Workload Predicted Workload

LSTM RNN Linear Regression

24 Hours

24 Hours

24 Hours

7 Days

7 Days

7 Days

120 Days

30 Days

120 Days

LR LSTM LSTM

Page 16: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

#3 – Planning

» Generate optimization actions for the DBMS based on the workload forecasts.

» Select a sequence of actions that optimize the target metric.

11

Page 17: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

12 Action Catalog

Action Sequence

• • • Search Tree

Page 18: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

12 Action Catalog

AddIndex(i)

Action Sequence

• • • Search Tree

Cost – Benefit +

Page 19: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

12 Action Catalog

AddIndex(i)

Forecast Models

Action Sequence

• • • Search Tree

Cost – Benefit +

Affected Clusters

Optimizer

Expected Resource Usage

Page 20: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

12 Action Catalog

AddIndex(i)

Forecast Models

Action Sequence

• • • Search Tree

Cost – Benefit +

Affected Clusters

Optimizer

Expected Resource Usage

Page 21: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Search Tree

Optimizer

Expected Resource Usage

Page 22: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Demo

» Peloton (v2017-01)

» TPC-C with 100 warehouses

» Database loaded without indexes

13

Page 23: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Current Status

» Clusters/forecasts computed off-line.

» No universal planning algorithm.

» We lost our catalog, planner, and optimizer in the “purge”.

14

Page 24: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

More Self-Driving

TensorFlow Integration

LLVM Execution Engine

Cascades Optimizer

Intra-Query Parallelism

2017 2016 In-Memory / NVM Storage

Open Bw-Tree

WAL (SSD) / WBL (NVM)

Index / Layout Tuning

Apache v2.0 License

Page 25: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius

Unsolved Problems

» Cluster Prioritization (OLTP vs. OLAP)

» Self-Driving Components Interference

» Human Interactions

» “Traditional” ML Problems

16

Page 26: Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius