Upload
nathaniel-glenn
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00] - PowerPoint PPT Presentation
Citation preview
Data Placement Problems in Database Applications
An ZhuAn Zhu
Stanford University
04/19/23 AZ 2
Data Placement
Data objects Multiple disks Assignment of objects to disks
Optimize performance Optimize I/O Handle dynamic situations
04/19/23 AZ 3
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 4
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 5
Multimedia Storage Systems
Movie objects Clients/subscribers Parallel disks
Limited storage: # of movies—Nj
Limited bandwidth: # of clients—Cj
Homogeneous system: Nj=k, Cj=L, j Uniform ratio: Cj/Nj=r, j
04/19/23 AZ 6
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Total Storage: 12, Total Capacity: 1800
04/19/23 AZ 7
An Example
000/600
000/600
400/600
100100
100 400400100100100
Total Storage: 12, Total Capacity: 1800
04/19/23 AZ 8
An Example
400/600
000/600
400/600
400400100100
Total Storage: 12, Total Capacity: 1800
04/19/23 AZ 9
Not All Clients Can be Satisfied
400/600
600/600
400/600
400
Total Satisfied Clients: 1400/1800=7/9
04/19/23 AZ 10
Sliding Window Algorithm
Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less)
with at least L combined clients Assign the first L clients to the disk and
reconsider leftover clients
04/19/23 AZ 11
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
100
04/19/23 AZ 12
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
200
04/19/23 AZ 13
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
400
04/19/23 AZ 14
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
400
04/19/23 AZ 15
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
700
04/19/23 AZ 16
An Example
000/600
000/600
600/600
100 100100100100100
100 400100 0 0 0
Max window size k=4
04/19/23 AZ 17
An Example
000/600
000/600
600/600
100 100100100100100
100 400100
Max window size k=4
04/19/23 AZ 18
An Example
600/600
000/600
600/600
100 100100100100100
Max window size k=4
400
04/19/23 AZ 19
An Example
600/600
400/600
600/600
100 100
Total Satisfied Clients: 1600/1800=8/9
04/19/23 AZ 20
Theoretical Bounds
Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients
Translates to an -approximation
PTAS: (1+)-approximation, >0
21
11
k
21
11
k
04/19/23 AZ 21
Theoretical Bounds
Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients
Translates to an -approximation
PTAS: (1+)-approximation, >0
21
11
k
21
11
k
04/19/23 AZ 22
Proof Sketch
Load vs. storage saturated: ML, MS
Least loaded disk: cL ML+MS=M, 0<c<1 All remaining movies each have no
more than cL/k clients Initial instance is feasible (w.l.o.g.)
04/19/23 AZ 23
An Example
600/600
400/600
600/600
100 100
Total Satisfied Clients: 1600/1800=8/9
ML=2, MS=1,
c=400/600
cL/k=100
04/19/23 AZ 24
Proof Outline
If there is a load saturated disk with less than k movies All clients are satisfied
Otherwise At most ML movies are left Satisfy at least fraction of
the clients 21
11
k
04/19/23 AZ 25
Lemma
If any of the load saturated disk has less than k objects
Any k-1 remaining movies in the list has L clients or more
04/19/23 AZ 26
Lemma
The remaining disks are all load saturated
So, all clients are satisfied
At least LAt least L
04/19/23 AZ 27
Otherwise…
Each disk has exactly k movies Total assigned movies: M·k
Initial movies: N M·k “New” movies generated: ML
# of movies left: ≤ ML
# of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k
04/19/23 AZ 28
Otherwise…
Total clients: ≤ M·L Assigned clients: ML·L + Ms·cL Total # of remaining clients : ≤ Ms·(1-c)L Final bound:
cLMLM
LcMkMcL
S
U
SL
SL
)1(,/min
21
11
k
04/19/23 AZ 29
Simulation Results
M=5L=100N=M·k
Zipf with=0.0( i-1 )
04/19/23 AZ 30
Recap
The problem is NP-complete PTAS: best possible approximation
bound : best possible absolute bound
Sliding window algorithm: practical with O((M+N)log(M+N)) running time
21
11
k
04/19/23 AZ 31
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the total I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 32
Relational Databases
Objects: indexes, tables, views Multiple disks Minimize the total I/O access time
04/19/23 AZ 33
Past Work
Full striping Split uniformly across all available disks Utilize I/O parallelism
: transfer rate
200MB=0.05s/MB,Tt=10s200MB
04/19/23 AZ 34
=0.05s/MB,Tt=2.5s
Past Work
Full striping Split uniformly across all available disks Utilize I/O parallelism
: transfer rate
200MB =0.05s/MB,Tt=10s50MB50MB50MB
50MB50MB 50MB 50MB
04/19/23 AZ 35
Past Work
Co-accessed objects with Random I/O Seek time/per block size: 0.01s/0.1MB Seek rate: =0.1s/MB Smaller object dominates
50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB
AB
Ts=50·2=10s
04/19/23 AZ 36
Past Work
Combined access time Transfer time: Tt=(50+100)·=7.5s Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s
50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB
AB
04/19/23 AZ 37
Past Work
Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’]
Combined time: 200·=10s
100MB 100MB200MB 200MB
04/19/23 AZ 38
Data Layout Problem
Work Load (SQL DML) A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time
04/19/23 AZ 39
Theoretical Questions
Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time
Hard Minimizing transfer time alone is a “good”
approximation
04/19/23 AZ 40
Transfer Time
Heterogeneous disks Different rate: j
Storage constraint: cj
Objects Different size: si
Access frequency: i,i’
Solvable using Linear Programming (LP)
04/19/23 AZ 41
LP
',',',
,',',
,',,',
,
,
,
min
',,
,',,)(
,
,
,,0
iiiiii
jiiii
jjijijii
ji
ji
jiji
ji
T
iitT
jiixxt
jcx
isx
jix
Amount of object i assigned to disk j
Each object must be completely assigned
Each disk’s storage limit is kept
Transfer time for (i,i’) on disk j
Overall transfer time for (i,i’)
Minimize the total transfer time
04/19/23 AZ 42
Seek Time
Hard even on disks with no storage constraint
Integral assignment Each object is assigned to one machine
only Conversion from a fraction assignment
with no loss
04/19/23 AZ 43
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 Want: each file is spread uniformly
across a subset of disks
100MB 100MB200MB 200MB
150MB100MB
A B ABC C
04/19/23 AZ 44
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002+1252
100MB 100MB200MB 200MB
125MB125MB
A B ABC C
04/19/23 AZ 45
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002
100MB 100MB200MB 200MB
125MB125MB250MB
A B ABC C
04/19/23 AZ 46
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0 Each file resides on only one disk
200MB400MB250MB
100MB 100MB200MB200MB250MB
A B ABC C
04/19/23 AZ 47
Implications
A polynomial time algorithm Equivalent to Minimum Edge Deletion
k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to
approximate
04/19/23 AZ 48
Combined Time
Let
Hard to approximate: ·, 1>>0 Optimize transfer time alone gives 1+
j
jj
max
)(),min(2)(
))(1(),min(2)(
212121
212121
xxxxxx
xxxxxx
04/19/23 AZ 49
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 50
Load Rebalancing
Access pattern changes Initial layout no longer balanced
2
1
54
3 6
7
8
9
1011
MAX LOAD
04/19/23 AZ 51
Load Rebalancing
Relocate objects Minimize the max load with k moves
2
1
543 6
78
9
1011
MAX LOAD
04/19/23 AZ 52
Simple Algorithm (O(nlogn))
Step 1: Repeat k times Remove the largest object from the most
loaded disk The resulting max load: L(1)
Step2: Relocate the removed k objects Assign each object to the least loaded
disk The resulting max load: L(2)
04/19/23 AZ 53
Example (k=3)
Step1: L(1) OPT
2
1
543 6
78
9
1011
MAX LOAD
9
MAX LOAD
1
6
L(1)
04/19/23 AZ 54
Example (k=3)
Step2: L(2) OPT + S 2OPT Overall: max(L(1),L(2)) 2OPT
2 543
78
1011
91
6
MIN LOAD6 MIN LOAD
1 9
L(2)
04/19/23 AZ 55
Can We Do Better?
Blindly remove the large object is not wise
2
1
543 6
78
9
1011
MAX LOAD
04/19/23 AZ 56
How can we do better
Take care of large objects Large objects: size >1/2OPT Small objects: size 1/2OPT
2
1
543 6
78
9
1011
OPT
04/19/23 AZ 57
Revising The Plan
Step 1: Repeat k times Remove the largest object from the most
loaded disk The resulting max load: L(1) OPT
Step2: Relocate the removed k objects Assign each object to the least loaded
disk The resulting max load: L(2) OPT +S
2OPT
04/19/23 AZ 58
Revised Plan
Step 1: with no more than k moves Shuffle large objects and remove small
objects The resulting max load: L(1) 3/2 OPT
Step2: Relocate the removed objects Assign each object to the least loaded
disk (they are all small) The resulting max load: L(2) OPT +S
3/2 OPT just to fill in the space
04/19/23 AZ 59
Example
Step 1
2
1
543 6
78
9
1011
MAX LOAD
2
1
10 11
3/2 OPT
04/19/23 AZ 60
2
Example
Step 2
543 6
78
9 MIN LOAD
2
1
10 11
11 MIN LOAD10 MIN LOAD
OPT+S
04/19/23 AZ 61
Recap
Fast 1.5-approximation (O(nlogn)) NP-complete PTAS: generalized cost
04/19/23 AZ 62
Summary
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 63
Other Research Interests
Algorithms for mobile, sensor networks and privacy preserving databases
Online Algorithms: queue management, packet switching, web caching, scheduling
Approximation Algorithms: network design, multi-product pricing
Streaming Algorithms