46
TokuMX for Dolphins Gerardo “Gerry” Narvaja Technical Services

Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

TokuMX for Dolphins Gerardo “Gerry” Narvaja

Technical Services

Page 2: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Who Are We?

2

•  Tokutek builds high-performance database software!

•  TokuDB - storage engine for MySQL and MariaDB

•  TokuMX – high performance version of MongoDB

Page 3: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Better Indexing Improves Performance

•  TokuDB® and TokuMX® use Fractal Tree®

technology: –  Internal nodes are similar to B-trees, but they also have message

buffers –  Large block size (4M) enables better compression performance

and range queries. –  Basement nodes support point queries

o  Default size 128K

–  Optimal I/O utilization: o  Reads: highly compressed data o  Writes: aggregation of multiple operations + high compression.

Page 4: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

B-Tree

•  Rule >=

4

22

99 10

2,3,4 10,20 22,25 99

Page 5: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

B-Tree

•  Insert 15

5

22

99 10

2,3,4 10,15,20 22,25 99

Page 6: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

B-Tree

•  In RAM

6

22

99 10

2,3,4 10,15,20 22,25 99

Page 7: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

B-Tree

•  Find 25

7

22

99 10

2,3,4 10,15,20 22,25 99

Page 8: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

B-Tree

•  From disk

8

22

99 10

2,3,4 10,15,20 22,25 99

Page 9: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Page 10: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Page 11: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Page 12: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Page 13: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Page 14: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Page 15: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Flush a buffer when it fills

A flush might take an I/O, but it does lots of useful work

More changes per write ➔ fewer changes for same write load ➔

less SSD wear

Page 16: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Fractal Tree Indexes: Queries

Lots of buffers have messages

But query follows root-leaf path

So every query has the most up-to-date information

Messages can be insert, update, delete

Page 17: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Gimme, gimme, gimme …

Page 18: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Storage:

MongoDB and TokuMX

Page 19: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

19

MongoDB Storage - Overview

18

4 5555

(1,ptr5) (4,ptr1),(12,ptr8)

(19,ptr7) (10000,ptr2)

The “pointer” tells MongoDB where to look in the heap for the document.

85

40 120

(2,ptr5), (22,ptr6)

(50,ptr4) (100,ptr7) (222,ptr3)

PK index (_id + pointer) Secondary index (foo + pointer)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

memory mapped heap

Page 20: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

20

MongoDB Storage - Maintenance

18

4 5555

(1,ptr5) (4,ptr1),(12,ptr8)

(19,ptr7) (10000,ptr2)

• Shaded represents what is in memory. • db.test.insert({foo:1}) requires IO

85

40 120

(2,ptr5), (22,ptr6)

(50,ptr4) (100,ptr7) (222,ptr3)

PK index (_id + pointer) Secondary index (foo + pointer)

memory mapped heap

Page 21: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

21

TokuMX = MongoDB + Fractal Tree Indexes

18

4 5555

(1,doc) (4,doc),(12,doc) (19,doc) (10000,doc)

85

40 120

(2,4), (22,12) (50,19) (100,10000) (222,1)

PK index (_id + document) Secondary index (foo + _id)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

memory mapped heap

Page 22: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

22

TokuMX Storage - Maintenance

18

4 5555

(1,doc) (4,doc),(12,doc) (19,doc) (10000,doc)

85

40 120

(2,4), (22,12) (50,19) (100,10000) (222,1)

PK index (_id + document) Secondary index (foo + _id)

insert messages injected

• Shaded represents what is in memory. • db.test.insert({foo:1}) does not require IO

Page 23: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

23

Performance - Index Maintenance

•  Fractal Tree Indexes are far superior for maintaining > RAM indexes than B-trees – Message buffers delay IO and cache disruption –  Not just inserts … updates and deletes too

Page 24: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

24

Performance - Inserts

•  100mm inserts into a collection with 3 secondary indexes

Page 25: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

25

•  Indexed Insertion : Multikey (100 inserts per doc)

Performance - Inserts on indexed arrays

Page 26: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

26

Performance - Replication

•  TokuMX replication allows secondary servers to process replication without IO –  Simply injecting messages into the Fractal Tree

Indexes on the secondary server –  The “Hard Work” was done on the primary

•  Uniqueness checking •  Transactional locking

–  Elimination of replication lag. •  Benchmarks to come

•  Your secondaries are fully available for read scaling!

Page 27: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

27

Performance - Lock Refinement

•  MongoDB originally implemented a global write lock –  1 writer at a time

•  MongoDB v2.2 moved this lock to the database level –  1 writer at a time in each database

•  TokuMX performs locking at the document level

Page 28: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

28

•  Sysbench benchmark (> RAM) –  lock refinement

introduced in v0.1.0

Performance - Lock Refinement

Page 29: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

29

•  Sysbench loading (in-memory) –  lock refinement

introduced in v0.1.0

Performance - Lock Refinement

Page 30: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

30

Performance - Clustered Indexes

•  In TokuMX, the primary key (_id) is clustered –  Ordered by _id, co-located with the document

•  Lookups by _id require no additional IO to retrieve the document –  MongoDB must retrieve via memory mapped heap

•  Secondary indexes can optionally be created as “clustering” –  Ordered by secondary index field(s) –  Additional copy of the document is co-located –  Lookups using this index also require no additional IO to

retrieve the document –  Good for point lookups, even better for range scans –  Compression and efficient index maintenance reduce cons

Page 31: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

31

Performance - Large Block Size

•  Data is stored in 64K chunks (basement nodes) •  4MB of these chunks are compressed, grouped and

written as a block –  * both of these values are user definable

•  As a result, range scans perform sequential IO rather than random IO

Page 32: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

32

Performance - Memory Management

•  Two approaches to memory management – MongoDB = memory-mapped files

•  Operating system determines what data is important

–  TokuMX = managed cache •  User defined size •  TokuMX determines what data is important

•  Run multiple TokuMX instances on a single server –  Each has it’s own fixed cache size

Page 33: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

33

Performance - Reduced IO

•  Message based architecture of Fractal Tree Indexes allows several operations per IO –  Applied when buffer is flushed to leaf nodes – MongoDB is 1-to-1

•  Reads and writes are highly compressed –  Big/infrequent writes are flash friendly

Page 34: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

34

–  Indexed insertion benchmark

Performance - Reduced IO

Page 35: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

35

Performance - Shard Migration

–  In sharded collections, range queries in TokuMX are optimized thanks to the use of a clustering index for the shard key

–  Shard migration between TokuMX servers impose very low I/O overhead

–  This makes low-entropy keys good candidates for sharding

Page 36: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

36

Compression

•  MongoDB does not offer compression –  Compressed file systems and shortened field names

•  TokuMX easily achieves 5x-10x compression –  Buy less disk or flash –  Compressed reads and writes reduce overall IO

•  TokuMX support 3 compression types –  zlib, quicklz, lzma (size vs. speed) –  all data is compressed

•  Use descriptive field names! –  They are easy to compress

Page 37: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

37

Compression

•  Chart shows space used for 51 million mostly random documents

•  46GB vs. ~15GB

Page 38: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

38

ACID + MVCC

•  ACID –  In MongoDB, multi-insertion operations allow for

partial success •  Asked to store 5 documents, 3 succeeded

– We offer “all or nothing” behavior –  Document level locking

•  MVCC –  In MongoDB, queries can be interrupted by writers.

•  The effect of these writers are visible to the reader

–  TokuMX offers MVCC •  Reads are consistent as of the operation start

Page 39: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

39

Multi-statement Transactions

•  TokuMX brings the following to MongoDB –  db.runCommand({“beginTransaction”, “isolation”: “mvcc”})

–  ... perform 1 or more operations –  db.runCommand(“rollbackTransaction”) |

db.runCommand(“commitTransaction”) •  Zardosht has some great blogs –  http://www.tokutek.com/2013/04/mongodb-

transactions-yes/ –  http://www.tokutek.com/2013/04/mongodb-multi-

statement-transactions-yes-we-can/

Page 40: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

40

New v1.0.3 - Today!

•  MongoDB to TokuMX migration tool – Mongo2toku –  Reads and replays vanilla MongoDB replication –  Allows TokuMX to sync from vanilla MongoDB

•  Leif has a great blog explaining the process –  http://www.tokutek.com/2013/07/tokumx-1-0-3-

seamless-migrations-from-mongodb/

Page 41: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Open Source Resources

•  Repository in GitHub –  https://github.com/tokutek

•  Google Groups –  http://groups.google.com –  tokumx-user: community users and support –  tokumx-dev: contributors

•  IRC –  #tokutek

We’ll help you to find solutions …

Page 42: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Time for Hands On …

Page 43: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Contact Information

•  Web site –  http://tokutek.com

•  IRC –  #tokutek

•  Google Groups –  http://groups.google.com –  Tokumx-user, tokumx-dev

•  GitHub –  https://github.com/Tokutek

•  Twitter –  @tokutek, @seattlegaucho

•  Email –  [email protected]

43

Page 44: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

Thank You!

Page 45: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version

We’re Hiring!

Looking for Quality Assurance

and Support Ninjas!

* Boston Area

Page 46: Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · • TokuDB - storage engine for MySQL and MariaDB • TokuMX – high performance version