Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
TokuMX for Dolphins Gerardo “Gerry” Narvaja
Technical Services
Who Are We?
2
• Tokutek builds high-performance database software!
• TokuDB - storage engine for MySQL and MariaDB
• TokuMX – high performance version of MongoDB
Better Indexing Improves Performance
• TokuDB® and TokuMX® use Fractal Tree®
technology: – Internal nodes are similar to B-trees, but they also have message
buffers – Large block size (4M) enables better compression performance
and range queries. – Basement nodes support point queries
o Default size 128K
– Optimal I/O utilization: o Reads: highly compressed data o Writes: aggregation of multiple operations + high compression.
B-Tree
• Rule >=
4
22
99 10
2,3,4 10,20 22,25 99
B-Tree
• Insert 15
5
22
99 10
2,3,4 10,15,20 22,25 99
B-Tree
• In RAM
6
22
99 10
2,3,4 10,15,20 22,25 99
B-Tree
• Find 25
7
22
99 10
2,3,4 10,15,20 22,25 99
B-Tree
• From disk
8
22
99 10
2,3,4 10,15,20 22,25 99
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Fractal Tree Indexes
Each node has pivots & Buffers
Buffers fill as updates arrive
Flush a buffer when it fills
A flush might take an I/O, but it does lots of useful work
More changes per write ➔ fewer changes for same write load ➔
less SSD wear
Fractal Tree Indexes: Queries
Lots of buffers have messages
But query follows root-leaf path
So every query has the most up-to-date information
Messages can be insert, update, delete
Gimme, gimme, gimme …
Storage:
MongoDB and TokuMX
19
MongoDB Storage - Overview
18
4 5555
(1,ptr5) (4,ptr1),(12,ptr8)
(19,ptr7) (10000,ptr2)
The “pointer” tells MongoDB where to look in the heap for the document.
85
40 120
(2,ptr5), (22,ptr6)
(50,ptr4) (100,ptr7) (222,ptr3)
PK index (_id + pointer) Secondary index (foo + pointer)
db.test.insert({foo:55}) db.test.ensureIndex({foo:1})
memory mapped heap
20
MongoDB Storage - Maintenance
18
4 5555
(1,ptr5) (4,ptr1),(12,ptr8)
(19,ptr7) (10000,ptr2)
• Shaded represents what is in memory. • db.test.insert({foo:1}) requires IO
85
40 120
(2,ptr5), (22,ptr6)
(50,ptr4) (100,ptr7) (222,ptr3)
PK index (_id + pointer) Secondary index (foo + pointer)
memory mapped heap
21
TokuMX = MongoDB + Fractal Tree Indexes
18
4 5555
(1,doc) (4,doc),(12,doc) (19,doc) (10000,doc)
85
40 120
(2,4), (22,12) (50,19) (100,10000) (222,1)
PK index (_id + document) Secondary index (foo + _id)
db.test.insert({foo:55}) db.test.ensureIndex({foo:1})
memory mapped heap
22
TokuMX Storage - Maintenance
18
4 5555
(1,doc) (4,doc),(12,doc) (19,doc) (10000,doc)
85
40 120
(2,4), (22,12) (50,19) (100,10000) (222,1)
PK index (_id + document) Secondary index (foo + _id)
insert messages injected
• Shaded represents what is in memory. • db.test.insert({foo:1}) does not require IO
23
Performance - Index Maintenance
• Fractal Tree Indexes are far superior for maintaining > RAM indexes than B-trees – Message buffers delay IO and cache disruption – Not just inserts … updates and deletes too
24
Performance - Inserts
• 100mm inserts into a collection with 3 secondary indexes
25
• Indexed Insertion : Multikey (100 inserts per doc)
Performance - Inserts on indexed arrays
26
Performance - Replication
• TokuMX replication allows secondary servers to process replication without IO – Simply injecting messages into the Fractal Tree
Indexes on the secondary server – The “Hard Work” was done on the primary
• Uniqueness checking • Transactional locking
– Elimination of replication lag. • Benchmarks to come
• Your secondaries are fully available for read scaling!
27
Performance - Lock Refinement
• MongoDB originally implemented a global write lock – 1 writer at a time
• MongoDB v2.2 moved this lock to the database level – 1 writer at a time in each database
• TokuMX performs locking at the document level
28
• Sysbench benchmark (> RAM) – lock refinement
introduced in v0.1.0
Performance - Lock Refinement
29
• Sysbench loading (in-memory) – lock refinement
introduced in v0.1.0
Performance - Lock Refinement
30
Performance - Clustered Indexes
• In TokuMX, the primary key (_id) is clustered – Ordered by _id, co-located with the document
• Lookups by _id require no additional IO to retrieve the document – MongoDB must retrieve via memory mapped heap
• Secondary indexes can optionally be created as “clustering” – Ordered by secondary index field(s) – Additional copy of the document is co-located – Lookups using this index also require no additional IO to
retrieve the document – Good for point lookups, even better for range scans – Compression and efficient index maintenance reduce cons
31
Performance - Large Block Size
• Data is stored in 64K chunks (basement nodes) • 4MB of these chunks are compressed, grouped and
written as a block – * both of these values are user definable
• As a result, range scans perform sequential IO rather than random IO
32
Performance - Memory Management
• Two approaches to memory management – MongoDB = memory-mapped files
• Operating system determines what data is important
– TokuMX = managed cache • User defined size • TokuMX determines what data is important
• Run multiple TokuMX instances on a single server – Each has it’s own fixed cache size
33
Performance - Reduced IO
• Message based architecture of Fractal Tree Indexes allows several operations per IO – Applied when buffer is flushed to leaf nodes – MongoDB is 1-to-1
• Reads and writes are highly compressed – Big/infrequent writes are flash friendly
34
– Indexed insertion benchmark
Performance - Reduced IO
35
Performance - Shard Migration
– In sharded collections, range queries in TokuMX are optimized thanks to the use of a clustering index for the shard key
– Shard migration between TokuMX servers impose very low I/O overhead
– This makes low-entropy keys good candidates for sharding
36
Compression
• MongoDB does not offer compression – Compressed file systems and shortened field names
• TokuMX easily achieves 5x-10x compression – Buy less disk or flash – Compressed reads and writes reduce overall IO
• TokuMX support 3 compression types – zlib, quicklz, lzma (size vs. speed) – all data is compressed
• Use descriptive field names! – They are easy to compress
37
Compression
• Chart shows space used for 51 million mostly random documents
• 46GB vs. ~15GB
38
ACID + MVCC
• ACID – In MongoDB, multi-insertion operations allow for
partial success • Asked to store 5 documents, 3 succeeded
– We offer “all or nothing” behavior – Document level locking
• MVCC – In MongoDB, queries can be interrupted by writers.
• The effect of these writers are visible to the reader
– TokuMX offers MVCC • Reads are consistent as of the operation start
39
Multi-statement Transactions
• TokuMX brings the following to MongoDB – db.runCommand({“beginTransaction”, “isolation”: “mvcc”})
– ... perform 1 or more operations – db.runCommand(“rollbackTransaction”) |
db.runCommand(“commitTransaction”) • Zardosht has some great blogs – http://www.tokutek.com/2013/04/mongodb-
transactions-yes/ – http://www.tokutek.com/2013/04/mongodb-multi-
statement-transactions-yes-we-can/
40
New v1.0.3 - Today!
• MongoDB to TokuMX migration tool – Mongo2toku – Reads and replays vanilla MongoDB replication – Allows TokuMX to sync from vanilla MongoDB
• Leif has a great blog explaining the process – http://www.tokutek.com/2013/07/tokumx-1-0-3-
seamless-migrations-from-mongodb/
Open Source Resources
• Repository in GitHub – https://github.com/tokutek
• Google Groups – http://groups.google.com – tokumx-user: community users and support – tokumx-dev: contributors
• IRC – #tokutek
We’ll help you to find solutions …
Time for Hands On …
Contact Information
• Web site – http://tokutek.com
• IRC – #tokutek
• Google Groups – http://groups.google.com – Tokumx-user, tokumx-dev
• GitHub – https://github.com/Tokutek
• Twitter – @tokutek, @seattlegaucho
• Email – [email protected]
43
Thank You!
We’re Hiring!
Looking for Quality Assurance
and Support Ninjas!
* Boston Area