Upload
atner-yegorov
View
346
Download
0
Tags:
Embed Size (px)
Citation preview
Fractal Tree® Technology OverviewThe Art of Indexing
Martín Farach-ColtonCo-founder & Chief Technology Officer
The Art of Indexing®
Not all indexing is the same
B-tree is the basis for almost all DB systems• Data structure invented in 1972
• Has not kept up with hardware trends‣ Works poorly on modern rotational disks‣ Works poorly on SSD
Fractal Tree Indexes is the basis of TokuDB• Scales with hardware
• Fast Indexing ➔ More Indexing ➔ Faster Queries
• Great Compression
• No Fragmentation
• Reduced wear on SSDs
How do Fractal Tree Indexes outperform
B-trees?
How do Fractal Tree Indexes outperform
B-trees?
First, some facts about storage systems
The Art of Indexing®
Storage is quirky
Hard disks are slow for random I/O but fast for
sequential I/O
Difference causes problems like fragmentation, ...
The Art of Indexing®
Storage is quirky
Hard disks are slow for random I/O but fast for
sequential I/O
SSDs are fast for random I/O but expensive for
sequential.
Garbage collection causes artefacts: increased
wear, write cliffs...
Difference causes problems like fragmentation, ...
The Art of Indexing®
Storage ♡ Big Reads and Writes
Big I/Os
The Art of Indexing®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No Fragmentation
The Art of Indexing®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No Fragmentation SSDs: Less Garbage
Collection & Wear
The Art of Indexing®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No Fragmentation SSDs: Less Garbage
Collection & Wear
But you only get the goodies if each I/O has lots of new bytes
The Art of Indexing®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No Fragmentation SSDs: Less Garbage
Collection & Wear
But you only get the goodies if each I/O has lots of new bytes
If you read a big block, change one byte, then write it, you get terrible wear problems
The Art of Indexing®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No Fragmentation SSDs: Less Garbage
Collection & Wear
Both: Better Compression
But you only get the goodies if each I/O has lots of new bytes
If you read a big block, change one byte, then write it, you get terrible wear problems
The Art of Indexing®
Caching is great• But you can’t cache all your data
• For stuff not in memory, you have to go to disk
Goal: Do the best we can for the stuff on disk
Data is big, RAM is Small
Data
RAM
Now, What’s a B-tree? & a Fractal Tree Index
The Art of Indexing®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Search Tree
The Art of Indexing®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
The Art of Indexing®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
Fetching pivots as a group saves I/Os
The Art of Indexing®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
Fetching pivots as a group saves I/Os
Most leaves are on disk, not in RAM
The Art of Indexing®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
Fetching pivots as a group saves I/Os
Most leaves are on disk, not in RAM Inserting into a leaf that’s not in memory will require an I/O per insertion
The Art of Indexing®
B-tree Delivery Service
If fast memory is like walking across a room• Each update in a B-tree is a walking trip from ‣ New York
The Art of Indexing®
B-tree Delivery Service
If fast memory is like walking across a room• Each update in a B-tree is a walking trip from ‣ New York to St Louis
The Art of Indexing®
B-tree Delivery Service
If fast memory is like walking across a room• Each update in a B-tree is a walking trip from ‣ New York to St Louis
Each item gets its own round trip!
The Art of Indexing®
Real-world delivery
Keep regional warehouses• Only move stuff when you can move a lot
The Art of Indexing®
Real-world delivery
Keep regional warehouses• Only move stuff when you can move a lot
Each item gets moved several times
The Art of Indexing®
Real-world delivery
Keep regional warehouses• Only move stuff when you can move a lot
Each item gets moved several times
But each trip is vastly cheaper
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
Flush a buffer when it fills
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
Flush a buffer when it fills
A flush might take an I/O, but it does lots of useful work
The Art of Indexing®
Fractal Tree Indexes
B
B B BB
Each node has pivots & Buffers
Buffers fill as updates arrive
Flush a buffer when it fills
A flush might take an I/O, but it does lots of useful work
More changes per write ➔ fewer changes for same write
load ➔ less SSD wear
The Art of Indexing®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have messages
The Art of Indexing®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have messages
But query follows root-leaf path
The Art of Indexing®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have messages
But query follows root-leaf path
So every query has the most up-to-date information
The Art of Indexing®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have messages
But query follows root-leaf path
So every query has the most up-to-date information
Messages can be insert, update, delete
The Art of Indexing®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are broadcast messages
The Art of Indexing®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are broadcast messages
The Art of Indexing®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are broadcast messages
The Art of Indexing®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are broadcast messages
Insertion into root is fast, changes are immediate
The Art of Indexing®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are broadcast messages
Leaves get rewritten when broadcast message
makes it that far
Insertion into root is fast, changes are immediate
Schema change message still on root-
leaf path
The Art of Indexing®
Analysis
Delivery system gives goodies:• Messages get moved, but each I/O pays for a lot of
movement
• You get very fast inserts
• You get Hot Schema Changes
Each flush carries lots of useful information• So it’s worth it to make nodes big
• No fragmentation, Better Compression
• Much better wear on SSDs