Upload
mongodb
View
165
Download
0
Tags:
Embed Size (px)
Citation preview
1
MongoDB and Fractal Tree® Indexes
Tim Callaghan*!VP/Engineering, Tokutek!
MongoDB Boston 2012
* not [yet] a MongoDB expert
2
B-trees
B-tree Definition
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches,
sequential access, insertions, and deletions in logarithmic time.
http://en.wikipedia.org/wiki/B-tree
B-tree Overview
I will use a simple single-pivot example throughout this presentation
5
Basic B-tree
Internal Nodes - Path to data
Leaf Nodes - Actual Data
Pointers
Pivots
B-tree example
22
10 99
2, 3, 4 10,20 22,25 99
* Pivot Rule is >=
B-tree - insert
22
10 99
2, 3, 4 10,15,20 22,25 99
“Insert 15”
Value stored in leaf node
B-tree - search
22
10 99
2, 3, 4 10,20 22,25 99
“Find 25”
DISK
RAM
RAM
B-tree - storage
22
10 99
2, 3, 4 10,20 22,25 99
Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes
DISK
RAM
RAM
B-tree – serial insertions
22
10 99
2, 3, 4 10,20 22,25 99
Serial insertion workloads are in-memory, think MongoDB’s “_id” index
11
Fractal Tree Indexes
Fractal Tree Indexes
similar to B-trees - store data in leaf nodes - use PK for ordering
message buffer
message buffer
message buffer
All internal nodes have message buffers
different than B-trees - message buffer in all internal nodes - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)
13
Fractal Tree Indexes – “insert 15”
22
10 99
2, 3, 4 10, 20 22, 25 99
insert(15)
No IO is required, all internal nodes usually fit in RAM
14
Fractal Tree Indexes – “find 25”
22
10 99
2, 3, 4 10 22, 25 99
insert(15)
insert(20) insert(25)
delete(3)
15
Fractal Tree Indexes – “insert 8”
22
10 99
2, 3, 4 10 22, 25 99
insert(15)
Buffer is full, push messages down to next level.
insert(20) insert(25)
delete(3)
16
Fractal Tree Indexes – “insert 8”
22
10 99
2, 4, 8 10, 20, 25 22, 25 99
insert(15)
Inserted 8, 20, 25. Deleted 3.
17
Fractal Tree Indexes – compression
• Large node size (4MB) leads to high compression ratios.
• Supports zlib, quicklz, and lzma compression algorithms.
• Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data.
• Significantly less disk space needed • Less writes, bigger writes • Both of which are great for SSDs
• Reads are highly compressed, more data per IO
18
So what does this have to do with MongoDB?
19
So what does this have to do with MongoDB?
* Watch Tyler Brock’s presentation “Indexing and Query Optimization”
20
MongoDB Storage
25
10 99
(2,ptr2), (4,ptr4)
(10,ptr10) (25,ptr25), (98,ptr98)
(101,ptr101)
85
40 120
(2,ptr10), (35,ptr101)
(55,ptr4) (90,ptr2) (2599,ptr98)
db.test.insert({foo:55}) db.test.ensureIndex({foo:1})
PK index (_id + pointer) Secondary Index (foo + pointer)
The “pointer” tells MongoDB where to look in the data files for the actual document data.
21
MongoDB Storage
25
10 99
(2,ptr2), (4,ptr4)
(10,ptr10) (25,ptr25), (98,ptr98)
(101,ptr101)
85
40 120
(2,ptr10), (35,ptr101)
(55,ptr4) (90,ptr2) (2599,ptr98)
B-trees
22
• Tokutek’s Fractal Tree Index Implementations • MySQL Storage Engine (TokuDB) • BerkeleyDB API • File System (TokuFS)
• Recently added Fractal Tree Indexes to MongoDB 2.2
• Existing indexes are still supported • Source changes are available via our blog at
www.tokutek.com/tokuview • This is a work in progress (see roadmap
slides)
Who is Tokutek and what have we done?
23
as simple as
db.test.ensureIndex({foo:1}, {v:2})
MongoDB and Fractal Tree Indexes
24
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Node size, defaults to 4MB.
Indexing Options #1
25
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Basement node size, defaults to 128K. • Smallest retrievable unit of a leaf node,
efficient point queries
Indexing Options #2
26
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Compression algorithm, defaults to quicklz. • Supports quicklz, lzma, zlib, and none. • LZMA provides 40% additional compression
beyond quicklz, needs more CPU. • Decompression is of quicklz and lzma are
similar.
Indexing Options #3
27
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Clustering indexes store data by key and
include the entire document as the payload (rather than a pointer to the document)
• Always “cover” a query, no need to retrieve the document data
Indexing Options #4
28
How well does it perform?
Three Benchmarks • Benchmark 1 : Raw insertion performance • Benchmark 2 : Insertion plus queries • Benchmark 3 : Covered indexes vs. clustering
indexes
29
Benchmarks…
Race Results • First Place = John • Second Place = Tim • Third Place = Frank
30
Benchmarks…
Race Results • First Place = John • Second Place = Tim • Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.”
31
Benchmarks…
Race Results • First Place = John • Second Place = Tim • Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results.
32
Benchmark 1 : Overview
• Measure single threaded insertion performance • Document is URI (character), name (character),
origin (character), creation date (timestamp), and expiration date (timestamp)
• Secondary indexes on URI, name, origin, expiration • Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek
Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC0
33
Benchmark 1 : Without Journaling
34
Benchmark 1 : With Journaling
35
Benchmark 1 : Observations
• Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling
• Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions
• B-tree performance is great until the working data set > RAM
36
Benchmark 2 : Overview
• Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds
• Document is same as benchmark 1 • Secondary indexes on URI, name, origin, expiration • Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere – db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true})
• Same hardware as benchmark 1
37
Benchmark 2 : Insertion Performance
38
Benchmark 2 : Query Latency
39
Benchmark 2 : Observations
• Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing
• Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing
• B-tree performance is great until the working data set > RAM
• Random lookups are bad
...but what about MongoDB’s covered indexes?
40
Benchmark 3 : Overview
• Same workload and hardware as benchmark 2 • Create a MongoDB covered index on URI to
eliminate lookups in the data files. – db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})
41
Benchmark 3 : Insertion Performance
42
Benchmark 3 : Query Latency
43
Benchmark 3 : Observations
• Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing
• Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable)
• B-tree performance is great until the working data set > RAM
• MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my
document? o Do I drop and re-create by including my new field? o Do I live without it?
– Clustered Fractal Tree Indexes keep on covering your queries!
44
Roadmap : Continuing the Implementation
• Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and
committing a transaction for each operation – A single transaction envelope will improve performance
45
Roadmap : Continuing the Implementation
• Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two
fields: o {a: [1, 2], b: [1, 2]}
– “it could get out of hand” – Ticketed on 3/24/2010,
jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon…
46
Roadmap : Continuing the Implementation
• Add Crash Safety – Our implementation is not [yet] crash safe with the
MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index
logs. – Need to create a transactional envelope around both of
them
47
Roadmap : Continuing the Implementation
• Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two
storage systems – Compression greatly reduces disk footprint – This is a large task
49
Questions?
Tim Callaghan [email protected]
@tmcallaghan
More detailed benchmark information in my blogs at
www.tokutek.com/tokuview