57
Online Bigtable merge compaction Claire Mathieu CNRS Paris Carl Staelin Google Haifa Neal E. Young 1 UC Riverside Arman Yousefia UCLA Northeastern University, September 17, 2015 1 funded by faculty re$earch award

Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Online Bigtable merge compaction

work in progress

Claire Mathieu

CNRS Paris

Carl Staelin

Google Haifa

Neal E. Young

1

UC Riverside

Arman Yousefia

UCLA

instigator memy student

you are here

this is now

Northeastern University, September 17, 2015

1funded by faculty re$earch award

Page 2: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — data storage at

Google Maps, Search/Crawl, Gmail . . . use BIGTABLE to store data.

I 24,500 Bigtable Servers

I 1.2 million requests per second

I 16 GB/s of outgoing RPC tra�c

I over a petabyte of data just for Google Crawl and Analytics

I these figures are from 2006

Similar to other “NoSQL” databases:

Accumulo, AsterixDB, Cassandra, HBase, Hypertable, Spanner, . . .

Used by Adobe, Ebay, Facebook, GitHub, Meetup, Netflix, Twitter, . . .

“Log-structured merge tree” architecture— for high-volume, highly reliable, distributed, real-time data storage.

Page 3: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — implements dictionary data type

operations supported by a Bigtable instance:

I write(key, value)

I read(key) — return most recent value written for key

I. . . there’s more, but not today . . .

Page 4: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: –empty–

file sequence

Page 5: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: (1, a)

file sequence

write(1, a);

Page 6: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: (1, a) (2, b)

file sequence

write(1, a); write(2, b);

Page 7: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: (1, a) (2, b) (3, c)

file sequence

write(1, a); write(2, b); write(3, c);

Page 8: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: (1, a) (2, b) (3, c) (4, d)

file sequence

write(1, a); write(2, b); write(3, c); write(4, d);

Page 9: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

write(1, a); write(2, b); write(3, c); write(4, d); flush();

Page 10: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: (5, e) (6, f ) (7, g)

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

write(1, a); write(2, b); write(3, c); write(4, d); flush();

write(5, e); write(6, f ); write(7, g);

Page 11: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

(5, e) (6, f ) (7, g)| {z }

from 2nd flush

write(1, a); write(2, b); write(3, c); write(4, d); flush();

write(5, e); write(6, f ); write(7, g); flush();

Page 12: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

(5, e) (6, f ) (7, g)| {z }

from 2nd flush

(8, h) (9, i)| {z }from 3rd flush

write(1, a); write(2, b); write(3, c); write(4, d); flush();

write(5, e); write(6, f ); write(7, g); flush();

write(8, h); write(9, i); flush();

Page 13: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — writes and flushes

write(key, value):

1. Store key/value pair in cache (e.g. hash table in RAM).

Environment periodically forces flush of cache to new immutable disk file.

Example

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

(5, e) (6, f ) (7, g)| {z }

from 2nd flush

(8, h) (9, i)| {z }from 3rd flush

Environment forces Flushes at arbitrary times.

Page 14: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — reads and compactions

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

(5, e) (6, f ) (7, g)| {z }

from 2nd flush

(8, h) (9, i)| {z }from 3rd flush

read(key):

1. Check cache for key.2. If not found, check files (most recent first). cost = O(#files)

Page 15: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — reads and compactions

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

(5, e) (6, f ) (7, g)| {z }

from 2nd flush

(8, h) (9, i)| {z }from 3rd flush

read(key):

1. Check cache for key.2. If not found, check files (most recent first). cost = O(#files)

compaction(): asynchronous background process, to reduce read costs

Periodically select files to merge.

Page 16: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

BIGTABLE — reads and compactions

cache: –empty–

file sequence: (1, a) (2, b) (3, c) (4, d)| {z }

from 1st flush

(5, e) (6, f ) (7, g) (8, h) (9, i)| {z }

merge of 2nd and 3rd

read(key):

1. Check cache for key.2. If not found, check files (most recent first). cost = O(#files)

compaction(): asynchronous background process, to reduce read costs

Periodically select files to merge. cost = O(SIZE of merged files) !!

goals: (i) keep read costs low(ii) keep compaction costs low

constraint: each merge must merge a contiguous subsequence of files

Page 17: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

Page 18: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k =1, problem is easy — never merge

Page 19: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k =1, problem is easy — never merge

after flush 1:

Page 20: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k =1, problem is easy — never merge

after flush 1:after flush 2:

Page 21: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k =1, problem is easy — never merge

after flush 1:after flush 2:after flush 3:after flush 4:

...Total compaction cost = 0.

Page 22: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k = 1, problem is easy — must merge everything each time

after flush 1:

Page 23: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k = 1, problem is easy — must merge everything each time

after flush 1:after flush 2: too many files!

Page 24: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k = 1, problem is easy — must merge everything each time

after flush 1:after flush 2: compaction cost x

1

+ x2

Page 25: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k = 1, problem is easy — must merge everything each time

after flush 1:after flush 2: compaction cost x

1

+ x2

after flush 3: too many files!

Page 26: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k = 1, problem is easy — must merge everything each time

after flush 1:after flush 2: compaction cost x

1

+ x2

after flush 3: compaction cost x1

+ x2

+ x3

Page 27: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable Merge Compaction (bmc) — formal definition

given: Sequence x1

, x2

, . . . , xn

. xt

is size of file resulting from flush t

Integer k > 0. tuned to workload; typically 3–40.

choose: Compactions. Ensure number of files never exceeds k .

objective: Minimize total compaction cost.

If k = 1, problem is easy — must merge everything each time

after flush 1:after flush 2: compaction cost x

1

+ x2

after flush 3: compaction cost x1

+ x2

+ x3

...after flush n: compaction cost x

1

+ · · ·+ xn

Total compaction costP

n

i=2

(x1

+ x2

+ · · ·+ xi

) ⇡P

n

i=1

(n� i +1)xi

.

Page 28: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

Page 29: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

Page 30: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

Page 31: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

Page 32: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

3.

Page 33: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

3.

4.

Page 34: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

3.

4.

Page 35: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

3.

4.

5.

Page 36: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

3.

4.

5.

...

Page 37: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Google’s default compaction algorithm:

Merge minimal su�x so as to maintain (i) #files k and(ii) each file’s size exceeds total size of files to the right.

Example: k = 2, on uniform input x = 1, 1, 1, . . .:

1.

2.

3.

4.

5.

...

Total compaction cost = ⇥(n2).

n

2

-

2

66666666666666666664

for general k, cost is ⇥(n

2/3k�1

)

Page 38: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

OPTIMAL solution for k = 2, uniform x = 1, 1, 1, . . .1.

2.

3.

4.

...

“big” merges: O(pn), of size O(n)

“small” merges: O(n), of size O(pn)

pn -

2

6666666664

Total compaction cost = O(n3/2).

for general k, opt cost is ⇥(kn

1+1/k)

Page 39: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Definition: c-competitive online algorithm

A compaction algorithm is c-competitive if, on any input (k , x), itssolution costs at most c times the optimal cost.

A compaction algorithm is online if its choice of merge after flush tdepends only on k and x

1

, x2

, . . . , xt

(the files flushed so far).

I Default’s cost can be n times opt cost (for any k).

I So default is no better than n-competitive.

! May have high compaction cost even for “easy” inputs.

Theorem 1. There is a k-competitive online algorithm for bmc. today

Theorem 2. No deterministic online algorithm is less than k-competitive.

Page 40: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

t.

?

Page 41: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

t.

Page 42: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

t.

?

Page 43: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

t.

Page 44: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

alg. cost during interval is 2C

t.

Why 2-competitive? Focus on a time interval between two big merges.

case 1 (during this interval, opt does a big merge):

Opt’s cost for big merge during interval is at least C .

case 2 (during this interval, opt does no big merge):

Opt’ cost for small merges during interval is at least C .

Page 45: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

alg. cost during interval is 2C

t.

Why 2-competitive? Focus on a time interval between two big merges.

case 1 (during this interval, opt does a big merge):Opt’s cost for big merge during interval is at least C .

case 2 (during this interval, opt does no big merge):

Opt’ cost for small merges during interval is at least C .

Page 46: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

alg. cost during interval is 2C

t.

Why 2-competitive? Focus on a time interval between two big merges.

case 1 (during this interval, opt does a big merge):Opt’s cost for big merge during interval is at least C .

case 2 (during this interval, opt does no big merge):

Opt’ cost for small merges during interval is at least C .

Page 47: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind 2-competitive online algorithm (for k = 2)...

Q: At each step, do “big” merge or small merge?

A: Do big merge when cost C of previous big merge⇡ total cost of small merges since then.

s. previous big merge, cost C

alg. cost during interval is 2C

t.

Why 2-competitive? Focus on a time interval between two big merges.

case 1 (during this interval, opt does a big merge):Opt’s cost for big merge during interval is at least C .

case 2 (during this interval, opt does no big merge):Opt’ cost for small merges during interval is at least C .

Page 48: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Idea behind k-competitive online algorithm for general k

‘idea: Do big merge, then recurse with k = k � 1.

Q: When to do next big merge?

A: When cost of previous big merge

⇡ (cost for recursion)/(k � 1).

Recurse with k = k � 1

to handle this part.

“Balanced rent-or-buy algorithm (brb)”

Page 49: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Recap of analyses in worst-case model

Bigtable default is at best n-competitive...

Theorem 1. Brb is a k-competitive online algorithm for bmc. today

Theorem 2. No deterministic online algorithm is less than k-competitive.

What about “typical” inputs?

Page 50: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Preliminary benchmarks (one example with k = 5)

0 500 1000 1500 2000

1e+0

52e

+05

3e+0

54e

+05

n

cost

per

ste

p

DefaultBRBOptimal

0e+00 4e+04 8e+040.0e

+00

4.0e

+06

8.0e

+06

1.2e

+07

n

cost

per

ste

p

DefaultBRB

xt

’s are i.i.d. from log-normal distribution.Conjectures

1. Brb and Opt cost per time step ⇠ x k n1/k/e.

2. Default cost per time step ⇠ x n/(2 · 3k�1).

Page 51: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Lots of work in progress

theoretical:

I average-case analyses:absolute and relative costs on i.i.d. inputs

I randomized online algorithms (o(k)-competitive?)I optimal compaction schedules

⌘ optimal binary search trees

practical:

I realistic testing. . . on AsterixDB, then at Google

problem variants:

I allow expiration/deletion of key/value pairs (done)I allowing k to vary — bmc w/ read costs... (open!)

Working paper available on arxiv.org

(Search web for “bigtable merge compaction”.)

Page 52: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bmc with read costs (geometric interpretation)

given: Staircase step-lengths and step-heights (x1

, y1

), (x2

, y2

), . . ..

do: Partition region below staircase into axis-parallel rectangles.

objective: Minimize the sum of the widths and heights of the rectangles.

x

1

y

1

x

2

y

2

x

3

y

3

x

4

y

4

x

5

y

5

x

6

y

6

x

7

y

7

open problem: is there an O(1)-competitive online algorithm?

Page 53: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bmc with read costs (geometric interpretation)

given: Staircase step-lengths and step-heights (x1

, y1

), (x2

, y2

), . . ..

do: Partition region below staircase into axis-parallel rectangles.

objective: Minimize the sum of the widths and heights of the rectangles.

x

1

y

1

x

2

y

2

x

3

y

3

x

4

y

4

x

5

y

5

x

6

y

6

x

7

y

7

open problem: is there an O(1)-competitive online algorithm?

Page 54: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Thank you

Page 55: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

A geometric interpretation of bmc

given: Uneven staircase with step-lengths x1

, x2

, . . . , xn

. Int. k > 0.

do: Partition region below staircase into axis-parallel rectangles,so no row has more than k rectangles.

objective: Minimize the sum of the widths of the rectangles.

x

1

x

2

x

3

x

4

x

5

x

6

x

7

x

8

x

9

x

10

input: an uneven staircase with 10 steps; k = 2.

Page 56: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

A geometric interpretation of bmc

given: Uneven staircase with step-lengths x1

, x2

, . . . , xn

. Int. k > 0.

do: Partition region below staircase into axis-parallel rectangles,so no row has more than k rectangles.

objective: Minimize the sum of the widths of the rectangles.

input: an uneven staircase with 10 steps; k = 2.

solution

Page 57: Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

A geometric interpretation of bmc

given: Uneven staircase with step-lengths x1

, x2

, . . . , xn

. Int. k > 0.

do: Partition region below staircase into axis-parallel rectangles,so no row has more than k rectangles.

objective: Minimize the sum of the widths of the rectangles.

input: an uneven staircase with 10 steps; k = 2.

not a solution

This partition is cheaper. . .

but not valid for k = 2.