CMU SCS 15-721 :: Indexing (OLTP) · CMU 15-721 (Spring 2016) COMPARE -AND- SWAP . Atomic...

Preview:

Citation preview

Andy Pavlo // Carnegie Mellon University // Spring 2016

Lecture #07 – Indexing (OLTP)

DATABASE SYSTEMS

15-721

CMU 15-721 (Spring 2016)

TODAY ’S AGENDA

Latch Implementations Modern OLTP Indexes

2

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

__sync_bool_compare_and_swap(&M, 20, 30)

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

__sync_bool_compare_and_swap(&M, 20, 30)

Address

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 20

Address

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 20

Compare Value

Address

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 20

Compare Value

Address New

Value

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 20

Compare Value

Address New

Value

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 30

Compare Value

Address New

Value

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 30 25 35

Compare Value

Address New

Value

CMU 15-721 (Spring 2016)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V → If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M __sync_bool_compare_and_swap(&M, 20, 30) 30 X 25 35

Compare Value

Address New

Value

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Blocking OS Mutex Test-and-Set Spinlock Queue-based Spinlock Reader-Writer Locks

4

Source: Anastasia Ailamaki

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #1: Blocking OS Mutex → Simple to use → Non-scalable (about 25ns per lock/unlock invocation) → Example: pthread_mutex_t (calls futex)

5

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #1: Blocking OS Mutex → Simple to use → Non-scalable (about 25ns per lock/unlock invocation) → Example: pthread_mutex_t (calls futex)

5

pthread_mutex_t lock; ⋮ pthread_mutex_lock(&lock); // Do something special... pthread_mutex_unlock(&lock);

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #1: Blocking OS Mutex → Simple to use → Non-scalable (about 25ns per lock/unlock invocation) → Example: pthread_mutex_t (calls futex)

5

pthread_mutex_t lock; ⋮ pthread_mutex_lock(&lock); // Do something special... pthread_mutex_unlock(&lock);

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #1: Blocking OS Mutex → Simple to use → Non-scalable (about 25ns per lock/unlock invocation) → Example: pthread_mutex_t (calls futex)

5

pthread_mutex_t lock; ⋮ pthread_mutex_lock(&lock); // Do something special... pthread_mutex_unlock(&lock);

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #2: Test-and-Set Spinlock (TAS) → Very efficient (single instruction to lock/unlock) → Non-scalable, not cache friendly → Example: std::atomic_flag

6

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #2: Test-and-Set Spinlock (TAS) → Very efficient (single instruction to lock/unlock) → Non-scalable, not cache friendly → Example: std::atomic_flag

6

std::atomic_flag lock; ⋮ while (lock.test_and_set(…)) { // Yield? Abort? Retry? }

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #2: Test-and-Set Spinlock (TAS) → Very efficient (single instruction to lock/unlock) → Non-scalable, not cache friendly → Example: std::atomic_flag

6

std::atomic_flag lock; ⋮ while (lock.test_and_set(…)) { // Yield? Abort? Retry? }

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #2: Test-and-Set Spinlock (TAS) → Very efficient (single instruction to lock/unlock) → Non-scalable, not cache friendly → Example: std::atomic_flag

6

std::atomic_flag lock; ⋮ while (lock.test_and_set(…)) { // Yield? Abort? Retry? }

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

CPU1

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

next

CPU1 Lock

CPU1

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

next

CPU1 Lock

CPU1 CPU2

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

next

CPU1 Lock

CPU1 CPU2

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

next

CPU1 Lock

next

CPU2 Lock

CPU1 CPU2

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #3: Queue-based Spinlock (MCS) → More efficient than mutex, better cache locality → Non-trivial memory management → Example: std::atomic_flag

7

next

Base Lock

next

CPU1 Lock

next

CPU2 Lock

CPU1 CPU2 CPU3

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1 =2

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1 =2

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1 =2 =1

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1 =2 =1

CMU 15-721 (Spring 2016)

LATCH IMPLEMENTATIONS

Choice #4: Reader-Writer Locks → Allows for concurrent readers → Have to manage read/write queues to avoid starvation → Can be implemented on top of spinlocks

8

read write

Latch

=0 =0

=0 =0

=1 =2 =1 =1

CMU 15-721 (Spring 2016)

MODERN INDEXES

Bw-Tree (Hekaton) Concurrent Skip Lists (MemSQL) ART Index (HyPer)

9

CMU 15-721 (Spring 2016)

BW-TREE

Latch-free B+Tree index → Threads never need to set latches or block.

Key Idea #1: Deltas → No updates in place → Reduces cache invalidation.

Key Idea #2: Mapping Table → Allows for CAS of physical locations of pages.

10

THE BW-TREE: A B-TREE FOR NEW HARDWARE ICDE 2013

CMU 15-721 (Spring 2016)

BW-TREE: MAPPING TABLE

11

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

Index Page

CMU 15-721 (Spring 2016)

BW-TREE: MAPPING TABLE

11

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

Index Page

102

101

104

CMU 15-721 (Spring 2016)

BW-TREE: MAPPING TABLE

11

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

Index Page

102

101

104

CMU 15-721 (Spring 2016)

BW-TREE: MAPPING TABLE

11

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

Index Page

CMU 15-721 (Spring 2016)

BW-TREE: MAPPING TABLE

11

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

102 104

102 104

Index Page

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

Page 102

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

Page 102

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

Delta physically points to base page.

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102 Install delta address in physical address slot of mapping table using CAS.

Delta physically points to base page.

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102 Install delta address in physical address slot of mapping table using CAS.

Delta physically points to base page.

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102 Install delta address in physical address slot of mapping table using CAS.

Delta physically points to base page.

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Install delta address in physical address slot of mapping table using CAS.

Delta physically points to base page.

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: DELTA UPDATES

Each update to a page produces a new delta.

12

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Install delta address in physical address slot of mapping table using CAS.

Delta physically points to base page.

Source: Justin Levandoski

CMU 15-721 (Spring 2016)

BW-TREE: SEARCH

13

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CMU 15-721 (Spring 2016)

BW-TREE: SEARCH

Traverse tree like a regular B+tree.

13

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CMU 15-721 (Spring 2016)

BW-TREE: SEARCH

Traverse tree like a regular B+tree.

13

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

If mapping table points to delta chain, stop at first occurrence of search key.

CMU 15-721 (Spring 2016)

BW-TREE: SEARCH

Traverse tree like a regular B+tree.

13

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Otherwise, perform binary search on base page.

If mapping table points to delta chain, stop at first occurrence of search key.

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48 ▲Insert 16

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16

CMU 15-721 (Spring 2016)

BW-TREE: CONTENTION UPDATES

Threads may try to install updates to same state of the page.

14

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

Winner succeeds, any losers must retry or abort

▲Insert 16 X

CMU 15-721 (Spring 2016)

BW-TREE: DELTA TYPES

Record Update Deltas → Insert/Delete/Update of record on a page

Structure Modification Deltas → Split/Merge information

15

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102 ▲Insert 50

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CAS-ing the mapping table address ensures no deltas are missed.

▲Insert 55

New 102

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CAS-ing the mapping table address ensures no deltas are missed.

▲Insert 55

New 102

CMU 15-721 (Spring 2016)

BW-TREE: CONSOLIDATION

Consolidate updates by creating new page with deltas applied.

16

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

CAS-ing the mapping table address ensures no deltas are missed.

▲Insert 55

New 102

Old page + deltas are marked as garbage.

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

Operations are tagged with an epoch → Each epoch tracks the threads that are part of it and

the objects that can be reclaimed. → Thread joins an epoch prior to each operation and

post objects that can be reclaimed for the current epoch (not necessarily the one it joined)

Garbage for an epoch reclaimed only when all threads have exited the epoch.

17

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

Epoch Table

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

Epoch Table

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

Epoch Table

CPU1

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

CPU2

Epoch Table

CPU1 CPU2

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

CPU2

Epoch Table

CPU1 CPU2

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

CPU2

Epoch Table

CPU1 CPU2

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

Epoch Table

CPU1 CPU2

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

CPU1

Epoch Table

CPU1

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

Epoch Table

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

▲Insert 50

Page 102

▲Delete 48

▲Insert 55

New 102

Epoch Table

X

CMU 15-721 (Spring 2016)

BW-TREE: GARBAGE COLLECTION

18

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer New 102

Epoch Table

X

CMU 15-721 (Spring 2016)

BW-TREE: STRUCTURE MODIFICATIONS

Page sizes are elastic → No hard physical threshold for splitting. → This allows the tree to split a page when convenient.

Tree supports “half-split” without latching → Install split at child level by creating new page → Install new separator key and pointer at parent level

19

CMU 15-721 (Spring 2016)

102 104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

105

CMU 15-721 (Spring 2016)

102 104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

CMU 15-721 (Spring 2016)

102 104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Separator

▲Split

X X

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Separator

▲Split

X X

[∞,3) [3,7) [7,∞)

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Separator

▲Split

X X

[∞,3) [3,7) [7,∞)

[5,7)

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Separator

▲Split

X X

[∞,3) [3,7) [7,∞)

[5,7)

CMU 15-721 (Spring 2016)

102

105

104

101

103

BW-TREE: STRUCTURE MODIFICATIONS

20

Mapping Table

PID Addr

101

102

103

104

Logical Pointer Physical Pointer

3 4 5 6 1 2 7 8 105

5 6

▲Separator

▲Split

X X

[∞,3) [3,7) [7,∞)

[5,7)

CMU 15-721 (Spring 2016)

BW-TREE: PERFORMANCE

21

Source: Justin Levandoski

10.4

3.83 2.84

0.56 0.66 0.33

4.23

1.02 0.72

0

2

4

6

8

10

12

Xbox Synthetic Deduplication

Oper

atio

ns/s

ec (M

)

Bw-Tree B+Tree Skip List

CMU 15-721 (Spring 2016)

CONCURRENT SKIP L IST

Can implement insert and delete without locks using only CAS operations. Perform lazy deletion of towers.

22

CONCURRENT MAINTENANCE OF SKIP LISTS Univ. of Maryland Tech Report 1990

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End Levels

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

SKIP L ISTS: INSERT

23

P=N

P=N/2

P=N/4

Operation: Insert K5

K5

K5

K5 V5

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

Del true

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

Del true

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

Del true

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

Del true

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

Del false

Del true

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K6 V6

Levels

SKIP L ISTS: DELETE

24

P=N

P=N/2

P=N/4

Operation: Delete K5

Del false

Del false

Del false

Del false

Del false

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K3

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K4 K3

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K4 K3

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K4 K3

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K4 K3

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

End

K1 V1

K2

K2 V2

K3 V3

K4 V4

K4

K5 V5

K5

K5

K6 V6

Levels

SKIP L ISTS: REVERSE SEARCH

25

P=N

P=N/2

P=N/4

Txn #1: Find [K4,K2]

K2<K5

K2=K2

Stack:

K2

K4 K3

K2<K4

Source: Mark Papadakis

CMU 15-721 (Spring 2016)

ADAPATIVE RADIX TREE (ART)

Uses a digital representation of keys to examine key prefixes one-by-one instead of comparing the entire key. Radix trees properties: → The height of the tree depends on the length of keys. → Does not require rebalancing → The path to a leaf node represents the key of the leaf → Keys are stored implicitly and can be reconstructed

from paths.

26

THE ADAPTIVE RADIX TREE: ARTFUL INDEXING FOR MAIN-MEMORY DATABASES ICDE 2013

CMU 15-721 (Spring 2016)

TRIE VS. RADIX TREE

27

Keys: hello, hat, have

Trie

E H

L

¤

L O

A

¤ T

¤

V

E

Radix Tree

ELLO H

¤

A

¤ T

¤

VE

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

¤ ¤

T VE

H

A

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

¤ ¤

T VE

H

A

Operation: Insert hair

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

¤ ¤

T VE

H

A

¤

IR

Operation: Insert hair

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

¤ ¤

T VE

H

A

¤

IR

Operation: Insert hair Operation: Delete hat, have

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

¤ ¤

T VE

H

A

¤

IR

Operation: Insert hair Operation: Delete hat, have

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

¤ ¤

T VE

H

A

¤

IR

Operation: Insert hair Operation: Delete hat, have

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

H

A

¤

IR

Operation: Insert hair Operation: Delete hat, have

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

H

A

¤

IR

Operation: Insert hair Operation: Delete hat, have

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

H

A

Operation: Insert hair Operation: Delete hat, have

AIR

¤

CMU 15-721 (Spring 2016)

ART INDEX: MODIFICATIONS

28

¤

ELLO

H

A

Operation: Insert hair Operation: Delete hat, have

AIR

¤ Note: The ART index described in 2013 is not latch-free.

CMU 15-721 (Spring 2016)

ART INDEX: B INARY COMPARABLE KEYS

Not all attribute types can be decomposed into binary comparable digits for a radix tree. → Unsigned Integers: Byte order must be flipped for

little endian machines. → Signed Integers: Flip two’s-complement so that

negative numbers are smaller than positive. → Floats: Classify into group (neg vs. pos, normalized

vs. denormalized), then store as unsigned integer. → Compound: Transform each attribute separately.

29

CMU 15-721 (Spring 2016)

PARTING THOUGHTS

Bw-Tree is probably the most dank database data structure in recent years. Skip List is really easy to implement.

30

CMU 15-721 (Spring 2016)

NEXT CLASS

Indexing for OLAP workloads. → More from Microsoft Research…

Project #2 Announcement

31

Recommended