63
Data Placement Problems in Database Applications An Zhu An Zhu Stanford University

Data Placement Problems in Database Applications

Embed Size (px)

DESCRIPTION

Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00] - PowerPoint PPT Presentation

Citation preview

Page 1: Data Placement Problems in Database Applications

Data Placement Problems in Database Applications

An ZhuAn Zhu

Stanford University

Page 2: Data Placement Problems in Database Applications

04/19/23 AZ 2

Data Placement

Data objects Multiple disks Assignment of objects to disks

Optimize performance Optimize I/O Handle dynamic situations

Page 3: Data Placement Problems in Database Applications

04/19/23 AZ 3

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 4: Data Placement Problems in Database Applications

04/19/23 AZ 4

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 5: Data Placement Problems in Database Applications

04/19/23 AZ 5

Multimedia Storage Systems

Movie objects Clients/subscribers Parallel disks

Limited storage: # of movies—Nj

Limited bandwidth: # of clients—Cj

Homogeneous system: Nj=k, Cj=L, j Uniform ratio: Cj/Nj=r, j

Page 6: Data Placement Problems in Database Applications

04/19/23 AZ 6

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Total Storage: 12, Total Capacity: 1800

Page 7: Data Placement Problems in Database Applications

04/19/23 AZ 7

An Example

000/600

000/600

400/600

100100

100 400400100100100

Total Storage: 12, Total Capacity: 1800

Page 8: Data Placement Problems in Database Applications

04/19/23 AZ 8

An Example

400/600

000/600

400/600

400400100100

Total Storage: 12, Total Capacity: 1800

Page 9: Data Placement Problems in Database Applications

04/19/23 AZ 9

Not All Clients Can be Satisfied

400/600

600/600

400/600

400

Total Satisfied Clients: 1400/1800=7/9

Page 10: Data Placement Problems in Database Applications

04/19/23 AZ 10

Sliding Window Algorithm

Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less)

with at least L combined clients Assign the first L clients to the disk and

reconsider leftover clients

Page 11: Data Placement Problems in Database Applications

04/19/23 AZ 11

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

100

Page 12: Data Placement Problems in Database Applications

04/19/23 AZ 12

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

200

Page 13: Data Placement Problems in Database Applications

04/19/23 AZ 13

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

400

Page 14: Data Placement Problems in Database Applications

04/19/23 AZ 14

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

400

Page 15: Data Placement Problems in Database Applications

04/19/23 AZ 15

An Example

000/600

000/600

000/600

100 100100100100100

100 400400100100100

Max window size k=4

700

Page 16: Data Placement Problems in Database Applications

04/19/23 AZ 16

An Example

000/600

000/600

600/600

100 100100100100100

100 400100 0 0 0

Max window size k=4

Page 17: Data Placement Problems in Database Applications

04/19/23 AZ 17

An Example

000/600

000/600

600/600

100 100100100100100

100 400100

Max window size k=4

Page 18: Data Placement Problems in Database Applications

04/19/23 AZ 18

An Example

600/600

000/600

600/600

100 100100100100100

Max window size k=4

400

Page 19: Data Placement Problems in Database Applications

04/19/23 AZ 19

An Example

600/600

400/600

600/600

100 100

Total Satisfied Clients: 1600/1800=8/9

Page 20: Data Placement Problems in Database Applications

04/19/23 AZ 20

Theoretical Bounds

Satisfies at least fraction of total clients

In the worst case, no algorithm can satisfy more clients

Translates to an -approximation

PTAS: (1+)-approximation, >0

21

11

k

21

11

k

Page 21: Data Placement Problems in Database Applications

04/19/23 AZ 21

Theoretical Bounds

Satisfies at least fraction of total clients

In the worst case, no algorithm can satisfy more clients

Translates to an -approximation

PTAS: (1+)-approximation, >0

21

11

k

21

11

k

Page 22: Data Placement Problems in Database Applications

04/19/23 AZ 22

Proof Sketch

Load vs. storage saturated: ML, MS

Least loaded disk: cL ML+MS=M, 0<c<1 All remaining movies each have no

more than cL/k clients Initial instance is feasible (w.l.o.g.)

Page 23: Data Placement Problems in Database Applications

04/19/23 AZ 23

An Example

600/600

400/600

600/600

100 100

Total Satisfied Clients: 1600/1800=8/9

ML=2, MS=1,

c=400/600

cL/k=100

Page 24: Data Placement Problems in Database Applications

04/19/23 AZ 24

Proof Outline

If there is a load saturated disk with less than k movies All clients are satisfied

Otherwise At most ML movies are left Satisfy at least fraction of

the clients 21

11

k

Page 25: Data Placement Problems in Database Applications

04/19/23 AZ 25

Lemma

If any of the load saturated disk has less than k objects

Any k-1 remaining movies in the list has L clients or more

Page 26: Data Placement Problems in Database Applications

04/19/23 AZ 26

Lemma

The remaining disks are all load saturated

So, all clients are satisfied

At least LAt least L

Page 27: Data Placement Problems in Database Applications

04/19/23 AZ 27

Otherwise…

Each disk has exactly k movies Total assigned movies: M·k

Initial movies: N M·k “New” movies generated: ML

# of movies left: ≤ ML

# of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k

Page 28: Data Placement Problems in Database Applications

04/19/23 AZ 28

Otherwise…

Total clients: ≤ M·L Assigned clients: ML·L + Ms·cL Total # of remaining clients : ≤ Ms·(1-c)L Final bound:

cLMLM

LcMkMcL

S

U

SL

SL

)1(,/min

21

11

k

Page 29: Data Placement Problems in Database Applications

04/19/23 AZ 29

Simulation Results

M=5L=100N=M·k

Zipf with=0.0( i-1 )

Page 30: Data Placement Problems in Database Applications

04/19/23 AZ 30

Recap

The problem is NP-complete PTAS: best possible approximation

bound : best possible absolute bound

Sliding window algorithm: practical with O((M+N)log(M+N)) running time

21

11

k

Page 31: Data Placement Problems in Database Applications

04/19/23 AZ 31

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the total I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 32: Data Placement Problems in Database Applications

04/19/23 AZ 32

Relational Databases

Objects: indexes, tables, views Multiple disks Minimize the total I/O access time

Page 33: Data Placement Problems in Database Applications

04/19/23 AZ 33

Past Work

Full striping Split uniformly across all available disks Utilize I/O parallelism

: transfer rate

200MB=0.05s/MB,Tt=10s200MB

Page 34: Data Placement Problems in Database Applications

04/19/23 AZ 34

=0.05s/MB,Tt=2.5s

Past Work

Full striping Split uniformly across all available disks Utilize I/O parallelism

: transfer rate

200MB =0.05s/MB,Tt=10s50MB50MB50MB

50MB50MB 50MB 50MB

Page 35: Data Placement Problems in Database Applications

04/19/23 AZ 35

Past Work

Co-accessed objects with Random I/O Seek time/per block size: 0.01s/0.1MB Seek rate: =0.1s/MB Smaller object dominates

50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB

AB

Ts=50·2=10s

Page 36: Data Placement Problems in Database Applications

04/19/23 AZ 36

Past Work

Combined access time Transfer time: Tt=(50+100)·=7.5s Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s

50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB

AB

Page 37: Data Placement Problems in Database Applications

04/19/23 AZ 37

Past Work

Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’]

Combined time: 200·=10s

100MB 100MB200MB 200MB

Page 38: Data Placement Problems in Database Applications

04/19/23 AZ 38

Data Layout Problem

Work Load (SQL DML) A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time

Page 39: Data Placement Problems in Database Applications

04/19/23 AZ 39

Theoretical Questions

Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time

Hard Minimizing transfer time alone is a “good”

approximation

Page 40: Data Placement Problems in Database Applications

04/19/23 AZ 40

Transfer Time

Heterogeneous disks Different rate: j

Storage constraint: cj

Objects Different size: si

Access frequency: i,i’

Solvable using Linear Programming (LP)

Page 41: Data Placement Problems in Database Applications

04/19/23 AZ 41

LP

',',',

,',',

,',,',

,

,

,

min

',,

,',,)(

,

,

,,0

iiiiii

jiiii

jjijijii

ji

ji

jiji

ji

T

iitT

jiixxt

jcx

isx

jix

Amount of object i assigned to disk j

Each object must be completely assigned

Each disk’s storage limit is kept

Transfer time for (i,i’) on disk j

Overall transfer time for (i,i’)

Minimize the total transfer time

Page 42: Data Placement Problems in Database Applications

04/19/23 AZ 42

Seek Time

Hard even on disks with no storage constraint

Integral assignment Each object is assigned to one machine

only Conversion from a fraction assignment

with no loss

Page 43: Data Placement Problems in Database Applications

04/19/23 AZ 43

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 Want: each file is spread uniformly

across a subset of disks

100MB 100MB200MB 200MB

150MB100MB

A B ABC C

Page 44: Data Placement Problems in Database Applications

04/19/23 AZ 44

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002+1252

100MB 100MB200MB 200MB

125MB125MB

A B ABC C

Page 45: Data Placement Problems in Database Applications

04/19/23 AZ 45

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002

100MB 100MB200MB 200MB

125MB125MB250MB

A B ABC C

Page 46: Data Placement Problems in Database Applications

04/19/23 AZ 46

Conversion

f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0 Each file resides on only one disk

200MB400MB250MB

100MB 100MB200MB200MB250MB

A B ABC C

Page 47: Data Placement Problems in Database Applications

04/19/23 AZ 47

Implications

A polynomial time algorithm Equivalent to Minimum Edge Deletion

k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to

approximate

Page 48: Data Placement Problems in Database Applications

04/19/23 AZ 48

Combined Time

Let

Hard to approximate: ·, 1>>0 Optimize transfer time alone gives 1+

j

jj

max

)(),min(2)(

))(1(),min(2)(

212121

212121

xxxxxx

xxxxxx

Page 49: Data Placement Problems in Database Applications

04/19/23 AZ 49

Outline

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 50: Data Placement Problems in Database Applications

04/19/23 AZ 50

Load Rebalancing

Access pattern changes Initial layout no longer balanced

2

1

54

3 6

7

8

9

1011

MAX LOAD

Page 51: Data Placement Problems in Database Applications

04/19/23 AZ 51

Load Rebalancing

Relocate objects Minimize the max load with k moves

2

1

543 6

78

9

1011

MAX LOAD

Page 52: Data Placement Problems in Database Applications

04/19/23 AZ 52

Simple Algorithm (O(nlogn))

Step 1: Repeat k times Remove the largest object from the most

loaded disk The resulting max load: L(1)

Step2: Relocate the removed k objects Assign each object to the least loaded

disk The resulting max load: L(2)

Page 53: Data Placement Problems in Database Applications

04/19/23 AZ 53

Example (k=3)

Step1: L(1) OPT

2

1

543 6

78

9

1011

MAX LOAD

9

MAX LOAD

1

6

L(1)

Page 54: Data Placement Problems in Database Applications

04/19/23 AZ 54

Example (k=3)

Step2: L(2) OPT + S 2OPT Overall: max(L(1),L(2)) 2OPT

2 543

78

1011

91

6

MIN LOAD6 MIN LOAD

1 9

L(2)

Page 55: Data Placement Problems in Database Applications

04/19/23 AZ 55

Can We Do Better?

Blindly remove the large object is not wise

2

1

543 6

78

9

1011

MAX LOAD

Page 56: Data Placement Problems in Database Applications

04/19/23 AZ 56

How can we do better

Take care of large objects Large objects: size >1/2OPT Small objects: size 1/2OPT

2

1

543 6

78

9

1011

OPT

Page 57: Data Placement Problems in Database Applications

04/19/23 AZ 57

Revising The Plan

Step 1: Repeat k times Remove the largest object from the most

loaded disk The resulting max load: L(1) OPT

Step2: Relocate the removed k objects Assign each object to the least loaded

disk The resulting max load: L(2) OPT +S

2OPT

Page 58: Data Placement Problems in Database Applications

04/19/23 AZ 58

Revised Plan

Step 1: with no more than k moves Shuffle large objects and remove small

objects The resulting max load: L(1) 3/2 OPT

Step2: Relocate the removed objects Assign each object to the least loaded

disk (they are all small) The resulting max load: L(2) OPT +S

3/2 OPT just to fill in the space

Page 59: Data Placement Problems in Database Applications

04/19/23 AZ 59

Example

Step 1

2

1

543 6

78

9

1011

MAX LOAD

2

1

10 11

3/2 OPT

Page 60: Data Placement Problems in Database Applications

04/19/23 AZ 60

2

Example

Step 2

543 6

78

9 MIN LOAD

2

1

10 11

11 MIN LOAD10 MIN LOAD

OPT+S

Page 61: Data Placement Problems in Database Applications

04/19/23 AZ 61

Recap

Fast 1.5-approximation (O(nlogn)) NP-complete PTAS: generalized cost

Page 62: Data Placement Problems in Database Applications

04/19/23 AZ 62

Summary

Multimedia Systems [GKKTZ 00] Maximize the total clients served

Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time

Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed

moves

Page 63: Data Placement Problems in Database Applications

04/19/23 AZ 63

Other Research Interests

Algorithms for mobile, sensor networks and privacy preserving databases

Online Algorithms: queue management, packet switching, web caching, scheduling

Approximation Algorithms: network design, multi-product pricing

Streaming Algorithms