119
Detailed Schedule Week No. Date (Mi) Topic Lecture Topic Exercises 1 20.2.2013 Introduction --- 2 27.2.2013 ER, UML --- 3 6.3.2013 Relational Model ER 4 13.3.2013 SQL I Start project 5 20.3.2013 Guest Lecture, SQL II Relational Model 6 27.3.2013 Integrity Constraints --- 7 3.4.2013 --- --- 8 10.4.2013 Normal forms I SQL 9 17.4.2013 Normal forms II IC, Project: Part I 10 24.4.2013 Query Processing I Normal forms 11 1.5.2013 (Query Processing II) Normal forms, Proj. 12 8.5.2013 Transactions Query Processing 13 15.5.2013 Synchronization Transactions 14 22.5.2013 Security Synchronization 15 29.5.2013 Object-relational Databases End Project: Part 2 1

Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Detailed Schedule

Week No. Date (Mi) Topic Lecture Topic Exercises

1 20.2.2013 Introduction ---

2 27.2.2013 ER, UML ---

3 6.3.2013 Relational Model ER

4 13.3.2013 SQL I Start project

5 20.3.2013 Guest Lecture, SQL II Relational Model

6 27.3.2013 Integrity Constraints ---

7 3.4.2013 --- ---

8 10.4.2013 Normal forms I SQL

9 17.4.2013 Normal forms II IC, Project: Part I

10 24.4.2013 Query Processing I Normal forms

11 1.5.2013 (Query Processing II) Normal forms, Proj.

12 8.5.2013 Transactions Query Processing

13 15.5.2013 Synchronization Transactions

14 22.5.2013 Security Synchronization

15 29.5.2013 Object-relational Databases End Project: Part 2

1

Page 2: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Project Part II

Starting Point: Existing DB Application (Java + SQL)ideally, your system from Part I of the project

otherwise, the student enrollment demo app (+ queries)

TaskImplement a DB library: relational algebra on file system

Replace all JDBC statement with calls to your library

GoalUnderstand the internals of a database system

Understand that there is always an alternative

Brief glimpse on the „Information Systems“ lecture2

Page 3: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Info Systems Courses: Overview

Data Modelling

Information

Systems

Big Data EAI Web Eng.

Advanced

Systems Lab

Info Retrieval Data Wareh. …

3

Page 4: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

The Data Management Universe

Project Part I Project Part 2

4

Page 5: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Logs Indexes DB Catalogue

Storage Manager

TA Management

Recovery

Runtime Schema

Query Optimizer DBMS

DML-Compiler DDL-Compiler

Application Ad-hoc QueryManagement

toolsCompiler

„Naive“

User

Expert

User

App-

Developer

DB-

admin

External Storage (e.g., disks)

Components of a Database System

5

Page 6: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Application

Query Processor

Data Manager

(Indexes, Records)

Storage Manager

(Pages)

SQL {tuples}

get/put block

Tra

nsa

ctions

(Lo

ckin

g,

Loggin

g)

Meta

data

Mgm

t

(Sche

ma

, S

tats

)

Server (Network, App Buffers, Security)

Storage System

(disks, SSDs, SAN, …) 6

Page 7: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Why use a DBMS?

Avoid redundancy and inconsistencyQuery Processor, Transaction Manager, Catalog

Rich (declarative) access to the dataQuery Processor

Synchronize concurrent data accessTransaction Manager

Recovery after system failuresTransaction Manager, Storage Layer

Security and privacyServer, Catalog

7

Page 8: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Application

Query Processor

Indexes, Records

Manager, Pages

SQL / rel. algebra {tuples}

get/put block

Tra

nsa

ctions

(Lo

ckin

g,

Loggin

g)

Meta

data

Mgm

t

(Sche

ma

, S

tats

)

Server (Network, App Buffers, Security)

Storage System (disks, SSDs, SAN, …)

Query Processor

Data Manager

Storage Manager

Project Part II 8

Page 9: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

What does a Database System do?

Input: SQL statement

Output: {tuples}

1. Translate SQL into a set of get/put req. to backend storage

2. Extract, process, transform tuples from blocks

Tons of optimizations

Efficient algorithms for SQL operators (hashing, sorting)

Layout of data on backend storage (clustering, free space)

Ordering of operators (small intermediate results)

Semantic rewritings of queries

Buffer management and caching

Parallel execution and concurrency

Outsmart the OS

Partitioning and Replication in distributed system

Indexing and Materialization

Load and admission control

+ Security + Durability + Concurrency Control + Tools9

Page 10: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Database Optimizations

Query Processor (based on statistics)Efficient algorithms for SQL operators (hashing, sorting)

Ordering of operators (small intermediate results)

Semantic rewritings of queries

Parallel execution and concurrency

Storage Manager Load and admission control

Layout of data on backend storage (clustering, free space)

Buffer management and caching

Outsmart the OS

Transaction Manager Load and admission control

Tools (based on statistics)Partitioning and Replication in distributed system

Indexing and Materialization

10

Page 11: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

DBMS vs. OS Optimizations

Many DBMS tasks are also carried out by OSLoad control

Buffer management

Access to external storage

Scheduling of processes

What is the difference?DBMS has intimate knowledge of workload

DBMS can predict and shape access pattern of a query

DBMS knows the mix of queries (all pre-compiled)

DBMS knows the contention between queries

OS does generic optimizations

Problem: OS overrides DBMS optimizations!11

Page 12: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Application

Query Processor

Indexes, Records

Manager, Pages

SQL / rel. algebra {tuples}

get/put block

Tra

nsa

ctions

(Lo

ckin

g,

Loggin

g)

Meta

data

Mgm

t

(Sche

ma

, S

tats

)

Server (Network, App Buffers, Security)

Storage System (disks, SSDs, SAN, …)

Query Processor

Data Manager

Storage Manager

12

Page 13: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Query Processor

SQL

Parser

Rewrite

Optimizer

QGM

QGM

Co

mp

iler

CodeGen

QGM++

InterpreterPlan

{tuples}

Ru

ntim

e S

ys

tem

13

Page 14: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

SQL -> Relational Algebra

SQL

select A1, ..., An

from R1, ..., Rk

where P;

A1, ..., An(P (R1 x ... x Rk ))

A1, ..., An

P

x

x Rk

x R3

R2R1

Relational Algebra

14

Page 15: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Runtime System

Three approachesA. Compile query into machine code

B. Compile query into relational algebra and interpret that

C. Hybrid: e.g., compile predicates into machine code

What to do?A: better performance

B: easier debugging, better portability

Project: use Approach B

Query Interpreterprovide implementation for each algebra operator

define interface between operators15

Page 16: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Algorithms for Relational Algebra

Table Accessscan (load each page at a time)

index scan (if index available)

SortingTwo-phase external sorting

Joins(Block) nested-loops

Index nested-loops

Sort-Merge

Hashing (many variants)

Group-by (~ self-join)Sorting

Hashing16

Page 17: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Two-phase External Sorting

Phase I: Create Runs1. Load allocated buffer space with tuples

2. Sort tuples in buffer pool

3. Write sorted tuples (run) to disk

4. Goto Step 1 (create next run) until all tuples processed

Phase II: Merge Runs Use priority heap to merge tuples from runs

Special cases buffer >= N: no merge needed

buffer < sqrt(N): multiple merge phases necessary

(N size of the input in pages)

17

Page 18: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

18

External Sort

97

17

3

5

27

16

2

99

13

Page 19: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

19

External Sort

97

17

3

5

27

16

2

99

13

97

17

3

load

Page 20: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

20

External Sort

97

17

3

5

27

16

2

99

13

3

17

97

sort

Page 21: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

21

External Sort

97

17

3

5

27

16

2

99

13

3

3

17

9717

97

write

run

Page 22: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

22

External Sort

97

17

3

5

27

16

2

99

13

5

3

17

9727

16

load

Page 23: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

23

External Sort

97

17

3

5

27

16

2

99

13

5

3

17

97

5

16

27

16

27

sort & write

Page 24: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

24

External Sort

97

17

3

5

27

16

2

99

13

2

3

17

97

5

16

27

99

13

load

Page 25: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

25

External Sort

97

17

3

5

27

16

2

99

13

2

3

17

97

5

16

27

2

13

99

13

99

End of Phase 1

Page 26: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

26

External Sort

3

3

17

97

5

16

27

2

13

99

5

2

merge

Page 27: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

27

External Sort

2

3

3

17

97

5

16

27

2

13

99

5

2

merge

Page 28: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

28

External Sort

2

33

3

17

97

5

16

27

2

13

99

5

13

merge

Page 29: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

29

External Sort

2

3

517

3

17

97

5

16

27

2

13

99

5

13

merge

Page 30: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

30

External Sort

2

3

517

3

17

97

5

16

27

2

13

99

16

13

merge

Page 31: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

31

External Sort

2

3

5

13

17

3

17

97

5

16

27

2

13

99

16

13

Page 32: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Multi-way Merge (N = 7; M = 2)

32

Page 33: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Analysis

N: size of table in pages

M: size of (available) main memory in pages

IO CostO(N): if M >= sqrt(N)2 * N: if M >= N

4 * N: if N > M >= sqrt(N)

O(N logM N): if M < sqrt(N)Base of logarithm: in O notation not relevant, but constants matter

CPU Cost (M >= sqrt(N))Phase 1 (create N/M runs of length M): O(N * log2 M)

Phase 2 (merge N tuples with heap): O(N * log2 N/M)

Exercise: Do CPU cost increase/decrease with M?

33

Page 34: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Sorting Summary

Complexity: N * log(N) theory is right, butDB people care about CPU and IO complexity

Constants matter!

Buffer allocation matters! Many concurrent queries?

More main memory can hurt performance!

Main memory is large. Do two-way sort because…Parallelize sorting on different machines

Or many concurrent sorts on same machine

But more than 2-ways very rare in practice

Knuth suggests Replacement SelectionIncreases length of runs

But, higher constant for CPU usage

Typically, not used in practice34

Page 35: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

(Grace) Hash Join

35

Page 36: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Grace Hash Join

36

Page 37: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Sorting vs. Hashing

Both techniques can be used for joins, group-by, …binary and unary matching problems

Same asymptotic complexity: O(N log N)

In both IO and CPU

Hashing has lower constants for CPU complexity

IO behavior is almost identical

Merging (Sort) vs. Partitioning (Hash)Merging done afterwards; Partitioning done before

Partitioning depends on good statistics to get right

Sorting more robust. Hashing better in average case!37

Page 38: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model

Plan contains many operatorsImplement each operator indepently

Define generic interface for each operator

Each operator implemented by an iterator

Three methods implemented by each iteratoropen(): initialize the internal state (e.g., allocate buffer)

char* next(): produce the next result tuple

close(): clean-up (e.g., release buffer)

N.B. Modern DBMS use a Vector Modelnext() returns a set of tuples

Why is that better?38

Page 39: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

39

R S T

scan scan scan

NLJoin

NLJoin

Application

Page 40: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

40

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: execute()

Page 41: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

41

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: execute()

open()

Page 42: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

42

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: execute()

open()

open()

open()

Page 43: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

43

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: execute()

open()

open()

open/close for

each R tuple

Page 44: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

44

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: execute()

open()

open/close for

each R,S tuple

Page 45: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

45

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

next()

Page 46: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

46

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

r1

Page 47: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

47

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

r1

open()

Page 48: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

48

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

r1

next()

Page 49: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

49

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

r1

s1

Page 50: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

50

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

r1

next()

Page 51: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

51

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

next()

r1

s2

Page 52: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

52

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

r1, s2

Page 53: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

53

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

r1, s2

open()

Page 54: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

54

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

r1, s2

next()

Page 55: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

55

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

next()

r1, s2

t1

Page 56: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterator Model at Work

56

R S T

scan scan scan

NLJoin

NLJoin

ApplicationJDBC: next()

r1, s2, t1

… r2, r3, … … s3, s4, … … t2, t3, …

Page 57: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Iterators Summary: Easy & Costly

Principledata flows bottom up in a plan (i.e. operator tree)

control flows top down in a plan

Advantagesgeneric interface for all operators: great information hiding

easy to implement iterators (clear what to do in any phase)

works well with JDBC and embedded SQL

supports DBmin and other buffer management strategies

no overheads in terms of main memory

supports pipelining: great if only subset of results consumed

supports parallelism and distribution: add special iterators

Disadvantageshigh overhead of method calls

poor instruction cache locality57

Page 58: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Query Processor

SQL

Parser

Rewrite

Optimizer

QGM

QGM

Co

mp

iler

CodeGen

QGM++

InterpreterPlan

{tuples}

Ru

ntim

eS

ys

tem

58

Page 59: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

SQL -> Relational Algebra

SQL

select A1, ..., An

from R1, ..., Rk

where P;

A1, ..., An(P (R1 x ... x Rk ))

A1, ..., An

P

x

x Rk

x R3

R2R1

Relational Algebra

59

Page 60: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

SQL -> QGM

SQL

select a

from R

where a in (select b

from S);

QGM

R

a

b

S

in

60

Page 61: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Parser

Generates rel. alg. tree for each sub-queryconstructs graph of trees: Query Graph Model (QGM)

nodes are subqueries

edges represent relationships between subqueries

Extended rel. algebra because SQL more than RAGROUP BY: G operator

ORDER BY: sort operator

DISTINCT: can be implemented with G operator

Parser needs schema informationWhy? Give examples.

Why can`t a query be compiled into one tree?

61

Page 62: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

SQL -> Relational Algebra

SQL

select A1, ..., An

from R1, ..., Rk

where P;

A1, ..., An(P (R1 x ... x Rk ))

A1, ..., An

P

x

x Rk

x R3

R2R1

Relational Algebra

62

Page 63: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

63

Example: SQL -> Relational Algebra

select Title

from Professor, Lecture

where Name = ´Popper´ and

PersNr = Reader

Professor Lecture

Name = ´Popper´ and PersNr=Reader

Title

Title (Name = ´Popper´ and PersNr=Reader (Professor Lecture))

Page 64: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

64

First Optimization: Push-down

select Title

from Professor, Lecture

where Name = ´Popper´ and

PersNr = Reader

Professor

Lecture

PersNr=Reader

Title

Title (PersNr=Reader ((Name = ´Popper´Professor) Lecture))

Name = ´Popper´

Page 65: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

65

Second Optimization: Push-down

select Title

from Professor, Lecture

where Name = ´Popper´ and

PersNr = Reader

Professor

Lecture

PersNr=Reader

Title

Name = ´Popper´

PersNr Title,Reader

Page 66: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Correctness: Push-down

Title (PersNr=Reader ((Name = ´Popper´ Professor) Lecture))

(composition of projections)

Title (Title,PersNr,Reader (… ((…Professor) Lecture)))

(commutativity of and )

Title (… (Title,PersNr,Reader ((…Professor) Lecture)))

(commutativity of and )

Title (… (PersNr (…Professor) Title,Reader (Lecture)))

66

Page 67: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Second Optimization: Push down

Correctness (see previous slide – example generalizes)

Why is it good? ( almost same reason as for reduces size of intermediate results

but: only makes sense if results are materialized; e.g. sortdoes not make sense if pointers are passed around in iterators

67

Page 68: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

68

Third Optimization: + x = Aselect Title

from Professor, Lecture

where Name = ´Popper´ and

PersNr = Reader

Professor

Lecture

ATitle

Name = ´Popper´

PersNr Title,Reader

Page 69: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Third Optimization: + x = ACorrectness by definition of A operator

Why is this good?

x always done using nested-loops algorithmA can also be carried out using hashing, sorting, index support

choice of better algorithm may result in huge wins

x produces large intermediate resultsresults in a huge number of „next()“ calls in iterator model

method calls are expensive

Selection, projection push-down are no-brainersmake sense whenever applicable

do not need a cost model to decide how to apply them

(exception: expensive selections, projections with UDF)

done in a phase called query rewrite, based on rules

More complex query rewrite rules…69

Page 70: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Unnesting of Views

Example: Unnesting of Viewsselect A.xfrom Awhere y in

(select y from B)

Example: Unnesting of Viewsselect A.xfrom Awhere exists

(select * from B where A.y = B-y)

Is this correct? Why is this better? (not trivial at all!!!)

select A.xfrom A, Bwhere A.y = B.y

70

select A.xfrom A, Bwhere A.y = B.y

Page 71: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Query Rewrite

Example: Predicate Augmentationselect *from A, B, Cwhere A.x = B.x

and B.x = C.x

select *from A, B, Cwhere A.x = B.x

and B.x = C.xand A.x = C.x

Why is that useful?

71

Page 72: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Pred. Augmentation: Why useful?

… x

… 1

… 3

… 5

… …

72

A (odd numbers)

… x

… 1

… 2

… 3

… …

B (all numbers)

… x

… 2

… 4

… 6

… …

C (even numbers)

• Cost((A A C) A B) < Cost((A A B) A C)

• get second join for free

• Query Rewrite does not know that, …

• but it knows that it might happen and hopes for optimizer…

• Codegen gets rid of unnecessary predicates (e.g., A.x = B.x)

Page 73: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Query Optimization

Two tasksDetermine order of operators

Determine algorithm for each operator (hashing, sorting, …)

Components of a query optimizerSearch space

Cost model

Enumeration algorithm

Working principleEnumerate alternative plans

Apply cost model to alternative plans

Select plan with lowest expected cost73

Page 74: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Query Optimization: Does it matter?

A x B x Csize(A) = 10,000

size(B) = 100

size(C) = 1

cost(X x Y) = size(X) + size(Y)

cost( (A x B) x C) = 1,010,001cost(A x B) = 10,100

cost(X x C) = 1,000,001 with X = A x B

cost ( A x (B x C)) = 10,201cost(B x C) = 101

cost(A x X) = 10,100 with X = B x C 74

Page 75: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Query Opt.: Does it matter?

A x B x Csize(A) = 1000

size(B) = 1

size(C) = 1

cost(X x Y) = size(X) * size(Y)

cost( (A x B) x C) = 2000cost(A x B) = 1000

cost(X x C) = 1000 with X = A x B

cost ( A x (B x C)) = 1001cost(B x C) = 1

cost(A x X) = 1000 with X = B x C 75

Page 76: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Search Space: Relational Algebra

Associativity of joins: (A A B) A C = A A (B A C)

Commutativity of joins: A A B = B A A

Many more rulessee Kemper/Eickler or Garcia-Molina text books

What is better: A A B or B A A?

it depends

need cost model to make decision

76

Page 77: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Search Space: Group Bys

SELECT … FROM R, S WHERE R.a = S.a GROUP BY R.a, S.b;

GR.a, S.b(R A S)

GS.b(GR.a(R) A S)

Often, many possible ways to split & move group-bysagain, need cost model to make right decisions

77

Page 78: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Cost Model

Cost MetricsResponse Time (consider parallelism)

Resource Consumption: CPU, IO, network

$ (often equivalent to resource consumption)

PrincipleUnderstand algorithm used by each operator (sort, hash, …)estimate available main memory buffers

estimate the size of inputs, intermediate results

Combine cost of operators: sum for resource consumption

max for response time (but keep track of bottlenecks)

Uncertaintiesestimates of buffers, interference with other operators

estimates of intermediate result size (histograms)78

Page 79: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Equi-Width Histogram

79

SELECT * FROM person WHERE 25 < age < 40;

Page 80: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Equi-Depth Histogram

0

10

20

30

40

50

60

20 bis 42 42 bis 48 48 bis 53 53 bis 59 59 bis 70

80

SELECT * FROM person WHERE 25 < age < 40;

Page 81: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Multi-Dimensional Histogram

0

10

20

30

40

50

60

20 bis 30 30 bis 40 40 bis 50 50 bis 60 60 bis 70

70-100

100-150

150-250

81

SELECT * FROM person

WHERE 25 < age < 40 AND salary > 200;

;

Page 82: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Enumeration Algorithms

Query Optimization is NP hard even ordering or Cartesian products is NP hard

in general impossible to predict complexity for given query

Overview of AlgorithmsDynamic Programming (good plans, exp. complexity)

Greedy heuristics (e.g., highest selectivity join first)

Randomized Algorithms (iterative improvement, Sim. An., …)

Other heuristics (e.g., rely on hints by programmer)

Smaller search space (e.g., deep plans, limited group-bys)

ProductsDynamic Programming used by many systems

Some systems also use greedy heuristics in addition

82

Page 83: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Dynamic Programming

access_plans: enumerate all ways to scan a table (indexes, …)

join_plans: enumerate all ways to join 2 tables (algos, commut.)

prune_plans: discard sub-plans that are inferior (cost & order)

83

Page 84: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Access Plans

SELECT * FROM R, S, T WHERE R.a = S.a AND R.b = T.bORDER BY R.c;

Assume Indexes on R.a, R.b, R.c, R.d

scan(R): cost = 100; order = none

idx(R.a): cost = 100; order = R.a

idx(R.b): cost = 1000; order = R.b

idx(R.c): cost = 1000; order = R.c

idx(R.d): cost = 1000; order = none

Keep blue plans only. Why?And how can all that be? (Whole lecture on all this.)

84

Page 85: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Access Plans for S

SELECT * FROM R, S, T WHERE R.a = S.a AND R.b = T.bORDER BY R.c;

Assume Indexes on S.b, S.c, S.d

scan(S): cost = 1000; order = none

idx(S.b): cost = 10000; order = none

idx(S.c): cost = 10000; order = none

idx(S.d): cost = 10000; order = none

85

Page 86: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Access Plans for T

SELECT * FROM R, S, T WHERE R.a = S.a AND R.b = T.bORDER BY R.c;

Assume Indexes on T.a, T.b

scan(T): cost = 10; order = none

idx(T.a): cost = 100; order = none

idx(T.b): cost = 100; order = T.b

86

Page 87: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Join Plans for R join S

SELECT * FROM R, S, T WHERE R.a = S.a AND R.b = T.bORDER BY R.c;

Consider all combinations of (blue) access plans

Consider all join algorithms (NL, IdxNL, SMJ, GHJ, …)

Consider all orders: R x S, S x R

Prune based on cost estimates, interesting orders

Some examples:scan(R) NLJ scan(S): cost = 100; order = none

scan(S) IdxNL Idx(R.a): cost = 1000; order = none

idx(R.b) GHJ scan(S): cost = 150; order = R.b

idx(R.b) NLJ scan(S): cost = 250; order = R.b

87

Page 88: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Join Plans for three-way (+) joins

SELECT * FROM R, S, T WHERE R.a = S.a AND R.b = T.bORDER BY R.c;

Consider all combinations of joins (assoc., commut.)e.g., (R A S) A T, S A (T A R), ….

sometimes even enumerate Cartesian products

Use (pruned) plans of prev. steps as building blocksconsider all combinations

Prune based on cost estimates, interesting ordersinteresting orders for the special optimality principle here

gets more complicated in distributed systems

Exercise: Space and Time complexity of DP for DBMS.88

Page 89: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Application

Query Processor

Indexes, Records

Manager, Pages

SQL / rel. algebra {tuples}

get/put block

Tra

nsa

ctions

(Lo

ckin

g,

Loggin

g)

Meta

data

Mgm

t

(Sche

ma

, S

tats

)

Server (Network, App Buffers, Security)

Storage System (disks, SSDs, SAN, …)

Query Processor

Data Manager

Storage Manager

89

Page 90: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Storage System Basics

Storage is organized in a hierarchycombine different media to mimic one ideal storage

Storage systems are distributeddisks organized in arrays

cloud computing: DHT over 1000s of servers (e.g., S3)

advantages of distributed storage systems cost: use cheap hardware

performance: parallel access and increased bandwidth

fault tolerance: replicate data on many machines

Storage access is non uniformmulti-core machines with varying distance to banks

sequential vs. random on disk and SSDs

place hot data in the middle of disk90

Page 91: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Idea (1min)

Building (10min)

City (1.5h)

Pluto (2 years)

Andromeda

(2000 years)91

Storage Hierarchy

<1ns

register

<10ns

L1/L2 Cache

100ns

Main Memory

3-10 ms

Disk

secs

Tape

Page 92: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Why a Storage Hierarchy?

Mimics ideal storage: speed of register at cost of tapeunlimited capacity // tape

zero cost // tape

Persistent // tape

zero latency for read + write // register

infinte bandwidth // register

How does it work?Higher layer „buffers“ data of lower layer

Exploit spatial and temporal locality of applications

92

Page 93: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Disks: Sequential vs. Random IO

Time to read 1000 blocks of size 8 KB?

Random access:trnd = 1000 * t

= 1000 * (ts + tr + ttr) = 1000 * (10 + 4.17 + 0.16)= 1000 * 14.33 = 14330 ms

Sequential access:tseq = ts + tr + 1000 * ttr + N * ttrack-to-track seek time

= ts + tr + 1000 * 0.16 ms + (16 * 1000)/63 * 1 ms= 10 ms + 4.17 ms + 160 ms + 254 ms ≈ 428 ms

Need consider this gap in algorithms!

[Information Systems Class] 93

Page 94: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Storage Manager

Control all access to external storage (i.e., disks)implements external storage hierarchy (SSD, tape, disks, …)

optimize heterogeneity of storage

outsmarts file system: operating system caching

write-ahead logging for redo and undo recovery

Oracle, Google, etc. implement their own file system

Management of files and blockskeep track of files associated to the database (catalog)

group set of blocks into pages (granularity of access)

Buffer managementsegmentation of buffer pool

clever replacement policy; e.g., MRU for sequential scans

pin pages (no replacement while in use)94

Page 95: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Database = { files }

A file = variable-sized sequence of blocksBlock is the unit of transfer to disk. Typically, 512B

A page = fixed-sized sequence of blocks. A page contains records or index entries

(special case blobs. One record spans multiple pages)

typical page size: 8KB for records; 16 KB for index entries

Page is logical unit of transfer and unit of bufferingBlocks of same page are prefetched, stored on same track on disk 95

Page 96: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Table Spaces, Files, and Tables

Table Space

Table

belongs

belongs

N

1

1 File

N

96

Page 97: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Table Spaces

Explicit management of files allocated by the databaseEach file belongs to a table space

Each table stored in a table space

Cluster multiple tables in the same file

DDL OperationsCREATE TABLESPACE wutz

DATAFILE a.dat SIZE 4MB, b.dat SIZE 3MB;ALTER TABLESPACE wutz ADD DATAFILE …;CREATE TABLE husten TABLESPACE wutz;

Warning: Full table space crash the DBMSClassic emergency situation in practice

97

Page 98: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Buffer Management

Keep pages in main memory as long as possibleMinimize disk I/Os in storage hierarchy

Critical questionsWhich pages to keep in memory (replace policy)?

When to write updated pages back to disk (trans. mgr.)?

General-purpose replacement policiesLRU, Clock, … (see operating systems class)

LRU-k: replace page with k th least recent usage

2Q: keep two queues: hot queue, cold queueAccess moves page to hot queue

Replacement from cold queue

98

Page 99: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Access Patterns of Databases

Sequential: table scansP1, P2, P3, P4, P5, …

Hiearchical: index navigationP1, P4, P11, P1, P4, P12, P1, P3, P8, P1, P2, P7, P1, P3, P9, …

Random: index lookupP13, P27, P3, P43, P15, …

Cyclic: nested-loops joinP1, P2, P3, P4, P5, P1, P2, P3, P4, P5, P1, P2, P3, P4, P5, …

99

Page 100: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

DBMin [Chou, DeWitt, VLDB 1985]

ObservationsThere are many concurrent queries

Each query is composed of a set of operators

Allocate memory for each operator of each query

Adjust replacement policy according to access pattern

Examplesscan(T): 4 pages, MRU replacement

indexScan(X) 200 pages for Index X, LRU replacement

100 pages for Table T, Random replacement

hashJoin(R, S)200 pages for Table S

Allows to do load control and further optimizationsEconomic model for buffer allocation, priority-based BM, …100

Page 101: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

DBMin: Buffer Segmentation

scan(T)

DB

index(S.b)

SELECT *

FROM T, S

WHERE T.a=S.b

SELECT *

FROM T

fetch(S)

scan(T)

DB Buffer Pool

101

Page 102: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

DBMS vs. OS: Double Page Fault

DBMS needs Page XPage X is not in the DB buffer pool

DBMS evicts Page Y from DB buffer poolmake room for X

But, Page Y is not in the OS Cache

OS reads Page Y from disk (swap)

SummaryLatency: need to wait for (at least) two I/Os

Cost: If Y updated, up to three I/Os to write Y to disk

Utilization: Same page held twice in main memory

If you are interested in DB/OS co-design, … 102

Page 103: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Application

Query Processor

Indexes, Records

Manager, Pages

SQL / rel. algebra {tuples}

get/put block

Tra

nsa

ctions

(Lo

ckin

g,

Loggin

g)

Meta

data

Mgm

t

(Sche

ma

, S

tats

)

Server (Network, App Buffers, Security)

Storage System (disks, SSDs, SAN, …)

Query Processor

Data Manager

Storage Manager

103

Page 104: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Data Manager

Maps records to pagesimplement „record identifier“ (RID)

Implementation of IndexesB+ trees, R trees, etc.

Index entry ~ Record (same mechanism)

Freespace ManagementIndex-organized tables (IOTs)

Various schemes

Implementation of BLOBs (large objects)variants of position trees

104

Page 105: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Structure of a Record

Fixed length fieldse.g., number(10,2), date, char[100]

direct access to these fields

Variable length fieldse.g., varchar[100]

store (length, pointer) as part of a fixed-length field

store payload information in a variable-length field

access in two steps: retrieve pointer + chase pointer

NULL Values

Bitmap: set 1 if value of a field is NULL

Bitmap fixed-length variable length fields

105

Page 106: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Inside a Page

record identifier (rid):<pageno, slotno>

indexes use rids to ref. records

record position (in page):slotno x bytes per slot

records can move in pageif records grow or shrink

if records are deleted

no need to update indexes106

Page 107: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

What happens when a page is full?

Problem: A record grows because of an updateE.g., a varchar field is updated so that record grows

Idea: Keep a placeholder (TID)Move the record to a different page

Keep a „forward“ (TID) at „home“ page

If record moves again, update TID at „home“ page

AssessmentAt most two I/Os to access a record; typically, only one

Flexibility to move records within and across pages

No need to update references to record (i.e., indexes)

107

Page 108: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

TID (Forwarding) Concept

911, 2

911 4711

4711, 3Wutz 23

108

Page 109: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Freespace Management

Find a page for a new recordMany different heuristics conceivable

All based on a list of pages with free space

Append OnlyTry to insert into the last page of free space list.

If no room in last page, create a new page.

Best FitScan through list and find min page that fits.

First Fit, Next FitScan through list and find first / next fit

Witnesses: Classify buckets

IOT: organize all tuples in a B+ treeLet the B+ tree take care of splitting and freespace mgmt.

109

Page 110: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Application

Query Processor

Indexes, Records

Manager, Pages

SQL / rel. algebra {tuples}

get/put block

Tra

nsa

ctions

(Lo

ckin

g,

Loggin

g)

Meta

data

Mgm

t

(Sche

ma

, S

tats

)

Server (Network, App Buffers, Security)

Storage System (disks, SSDs, SAN, …)

Query Processor

Data Manager

Storage Manager

110

Page 111: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Meta-data Management: Catalog

All meta-data stored in tablesAccessed internally using SQL

Eat your own dogfood

Kinds of meta-dataSchema: used to compile queries

Table spaces: files used to store database

Histograms: estimate result sizes; query optimization

Parameters (cost of I/O, CPU speed, …): query optimization

Compiled Queries: used for (JDBC) PreparedStatements

Configuration: AppHeap Size, Isolation Level, …

Users (login, password): used for security

Workload Statistics: index advisors 111

Page 112: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

What does a Database System do?

Input: SQL statement

Output: {tuples}

1. Translate SQL into a set of get/put req. to backend storage

2. Extract, process, transform tuples from blocks

Tons of optimizations

Efficient algorithms for SQL operators (hashing, sorting)

Layout of data on backend storage (clustering, free space)

Ordering of operators (small intermediate results)

Semantic rewritings of queries

Buffer management and caching

Parallel execution and concurrency

Outsmart the OS

Partitioning and Replication in distributed system

Indexing and Materialization

Load and admission control

+ Security + Durability + Concurrency Control + Tools112

Page 113: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Database Optimizations

Query Processor (based on statistics)Efficient algorithms for SQL operators (hashing, sorting)

Ordering of operators (small intermediate results)

Semantic rewritings of queries

Parallel execution and concurrency

Storage Manager Load and admission control

Layout of data on backend storage (clustering, free space)

Buffer management and caching

Outsmart the OS

Transaction Manager Load and admission control

Tools (based on statistics)Partitioning and Replication in distributed system

Indexing and Materialization

113

Page 114: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

DBMS vs. OS Optimizations

Many DBMS tasks are also carried out by OSLoad control

Buffer management

Access to external storage

Scheduling of processes

What is the difference?DBMS has intimate knowledge of workload

DBMS can predict and shape access pattern of a query

DBMS knows the mix of queries (all pre-compiled)

DBMS knows the contention between queries

OS does generic optimizations

Problem: OS overrides DBMS optimizations!114

Page 115: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

What to optimize?

Feature Traditional Cloud

Cost [$] fixed optimize

Performance [tps, secs] optimize fixed

Scale-out [#cores] optimize fixed

Predictability [($)] - fixed

Consistency [%] fixed ???

Flexibility [#variants] - optimize

[Florescu & Kossmann, SIGMOD Record 2009]

Put $ on the y-axis of your graphs!!!

115

Page 116: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Experiments [Loesing et al. 2010]

TPC-W Benchmark

throuphput: WIPS

latency: fixed depending on request type

cost: cost / WIPS, total cost, predictability

Players

Amazon RDS, SimpleDB

S3/28msec [Brantner et al. 2008]

Google AppEngine

Microsoft Azure

116

Page 117: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Scale-up Experiments

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

WIP

S

EB

MySQL RDS Large SDB S3 AE/C Azure Ideal

SDB

AE/C

S3

Azure

RDS

MySQL

117

Page 118: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

Cost / WIPS (m$)

Low Load Peak Load

Amazon RDS 1.212 0.005

S3 / 28msec - 0.007

Google AE/C 0.002 0.028

MS Azure 0.775 0.005

118

Page 119: Detailed Schedule - Systems Group @ ETH · Project Part II Starting Point: Existing DB Application (Java + SQL) ideally, your system from Part I of the project otherwise, the student

What do you need for Project Part 2

Storage ManagerManagement of files

Simple buffer management

Free space management and new page allocation

Data ManagerSlotted pages

Query ProcessorImplementation of scan, join, group-by

Iterator model

(external for extra credit)

Catalog, Server, Transaction Manager-

Be pragmatic! Get it running!119