Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · •...

TokuMX for Dolphins Gerardo “Gerry” Narvaja

Technical Services

Who Are We?

•  Tokutek builds high-performance database software!

•  TokuDB - storage engine for MySQL and MariaDB

•  TokuMX – high performance version of MongoDB

Better Indexing Improves Performance

•  TokuDB® and TokuMX® use Fractal Tree®

technology: –  Internal nodes are similar to B-trees, but they also have message

buffers –  Large block size (4M) enables better compression performance

and range queries. –  Basement nodes support point queries

o  Default size 128K

–  Optimal I/O utilization: o  Reads: highly compressed data o  Writes: aggregation of multiple operations + high compression.

B-Tree

•  Rule >=

2,3,4 10,20 22,25 99

B-Tree

•  Insert 15

2,3,4 10,15,20 22,25 99

B-Tree

•  In RAM

2,3,4 10,15,20 22,25 99

B-Tree

•  Find 25

2,3,4 10,15,20 22,25 99

B-Tree

•  From disk

2,3,4 10,15,20 22,25 99

Fractal Tree Indexes

Each node has pivots & Buffers

Buffers fill as updates arrive

Flush a buffer when it fills

A flush might take an I/O, but it does lots of useful work

More changes per write ➔ fewer changes for same write load ➔

less SSD wear

Fractal Tree Indexes: Queries

Lots of buffers have messages

But query follows root-leaf path

So every query has the most up-to-date information

Messages can be insert, update, delete

Gimme, gimme, gimme …

Storage:

MongoDB and TokuMX

MongoDB Storage - Overview

4 5555

(1,ptr5) (4,ptr1),(12,ptr8)

(19,ptr7) (10000,ptr2)

The “pointer” tells MongoDB where to look in the heap for the document.

40 120

(2,ptr5), (22,ptr6)

(50,ptr4) (100,ptr7) (222,ptr3)

PK index (_id + pointer) Secondary index (foo + pointer)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

memory mapped heap

MongoDB Storage - Maintenance

4 5555

(1,ptr5) (4,ptr1),(12,ptr8)

(19,ptr7) (10000,ptr2)

• Shaded represents what is in memory. • db.test.insert({foo:1}) requires IO

40 120

(2,ptr5), (22,ptr6)

(50,ptr4) (100,ptr7) (222,ptr3)

PK index (_id + pointer) Secondary index (foo + pointer)

memory mapped heap

TokuMX = MongoDB + Fractal Tree Indexes

4 5555

(1,doc) (4,doc),(12,doc) (19,doc) (10000,doc)

40 120

(2,4), (22,12) (50,19) (100,10000) (222,1)

PK index (_id + document) Secondary index (foo + _id)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

memory mapped heap

TokuMX Storage - Maintenance

4 5555

(1,doc) (4,doc),(12,doc) (19,doc) (10000,doc)

40 120

(2,4), (22,12) (50,19) (100,10000) (222,1)

PK index (_id + document) Secondary index (foo + _id)

insert messages injected

• Shaded represents what is in memory. • db.test.insert({foo:1}) does not require IO

Performance - Index Maintenance

•  Fractal Tree Indexes are far superior for maintaining > RAM indexes than B-trees – Message buffers delay IO and cache disruption –  Not just inserts … updates and deletes too

Performance - Inserts

•  100mm inserts into a collection with 3 secondary indexes

•  Indexed Insertion : Multikey (100 inserts per doc)

Performance - Inserts on indexed arrays

Performance - Replication

•  TokuMX replication allows secondary servers to process replication without IO –  Simply injecting messages into the Fractal Tree

Indexes on the secondary server –  The “Hard Work” was done on the primary

•  Uniqueness checking •  Transactional locking

–  Elimination of replication lag. •  Benchmarks to come

•  Your secondaries are fully available for read scaling!

Performance - Lock Refinement

•  MongoDB originally implemented a global write lock –  1 writer at a time

•  MongoDB v2.2 moved this lock to the database level –  1 writer at a time in each database

•  TokuMX performs locking at the document level

•  Sysbench benchmark (> RAM) –  lock refinement

introduced in v0.1.0

•  Sysbench loading (in-memory) –  lock refinement

introduced in v0.1.0

Performance - Clustered Indexes

•  In TokuMX, the primary key (_id) is clustered –  Ordered by _id, co-located with the document

•  Lookups by _id require no additional IO to retrieve the document –  MongoDB must retrieve via memory mapped heap

•  Secondary indexes can optionally be created as “clustering” –  Ordered by secondary index field(s) –  Additional copy of the document is co-located –  Lookups using this index also require no additional IO to

retrieve the document –  Good for point lookups, even better for range scans –  Compression and efficient index maintenance reduce cons

Performance - Large Block Size

•  Data is stored in 64K chunks (basement nodes) •  4MB of these chunks are compressed, grouped and

written as a block –  * both of these values are user definable

•  As a result, range scans perform sequential IO rather than random IO

Performance - Memory Management

•  Two approaches to memory management – MongoDB = memory-mapped files

•  Operating system determines what data is important

–  TokuMX = managed cache •  User defined size •  TokuMX determines what data is important

•  Run multiple TokuMX instances on a single server –  Each has it’s own fixed cache size

Performance - Reduced IO

•  Message based architecture of Fractal Tree Indexes allows several operations per IO –  Applied when buffer is flushed to leaf nodes – MongoDB is 1-to-1

•  Reads and writes are highly compressed –  Big/infrequent writes are flash friendly

–  Indexed insertion benchmark

Performance - Reduced IO

Performance - Shard Migration

–  In sharded collections, range queries in TokuMX are optimized thanks to the use of a clustering index for the shard key

–  Shard migration between TokuMX servers impose very low I/O overhead

–  This makes low-entropy keys good candidates for sharding

Compression

•  MongoDB does not offer compression –  Compressed file systems and shortened field names

•  TokuMX easily achieves 5x-10x compression –  Buy less disk or flash –  Compressed reads and writes reduce overall IO

•  TokuMX support 3 compression types –  zlib, quicklz, lzma (size vs. speed) –  all data is compressed

•  Use descriptive field names! –  They are easy to compress

Compression

•  Chart shows space used for 51 million mostly random documents

•  46GB vs. ~15GB

ACID + MVCC

•  ACID –  In MongoDB, multi-insertion operations allow for

partial success •  Asked to store 5 documents, 3 succeeded

– We offer “all or nothing” behavior –  Document level locking

•  MVCC –  In MongoDB, queries can be interrupted by writers.

•  The effect of these writers are visible to the reader

–  TokuMX offers MVCC •  Reads are consistent as of the operation start

Multi-statement Transactions

•  TokuMX brings the following to MongoDB –  db.runCommand({“beginTransaction”, “isolation”: “mvcc”})

–  ... perform 1 or more operations –  db.runCommand(“rollbackTransaction”) |

db.runCommand(“commitTransaction”) •  Zardosht has some great blogs –  http://www.tokutek.com/2013/04/mongodb-

transactions-yes/ –  http://www.tokutek.com/2013/04/mongodb-multi-

statement-transactions-yes-we-can/

New v1.0.3 - Today!

•  MongoDB to TokuMX migration tool – Mongo2toku –  Reads and replays vanilla MongoDB replication –  Allows TokuMX to sync from vanilla MongoDB

•  Leif has a great blog explaining the process –  http://www.tokutek.com/2013/07/tokumx-1-0-3-

seamless-migrations-from-mongodb/

Open Source Resources

•  Repository in GitHub –  https://github.com/tokutek

•  Google Groups –  http://groups.google.com –  tokumx-user: community users and support –  tokumx-dev: contributors

•  IRC –  #tokutek

We’ll help you to find solutions …

Time for Hands On …

Contact Information

•  Web site –  http://tokutek.com

•  IRC –  #tokutek

•  Google Groups –  http://groups.google.com –  Tokumx-user, tokumx-dev

•  GitHub –  https://github.com/Tokutek

•  Twitter –  @tokutek, @seattlegaucho

•  Email –  support@tokutek.com

Thank You!

We’re Hiring!

Looking for Quality Assurance

and Support Ninjas!

* Boston Area

Gerardo “Gerry” Narvaja Technical Servicesfiles.meetup.com/107604/TokuMX 20130724.pdf · •...

Documents

A New Consensus Algorithm for TokuMX and MongoDB · Ark is an implementation of a consensus algorithm similar to Paxos and Raft. It is designed to improve replica set failover in

John Benjamins Publishing Company...2 Elvira Narvaja de Arnoux y José del Valle medida lo social, lo cultural o lo político. Desde esta perspectiva, el lenguaje es concebido, por

2017-31 2017-31.pdf · "Implementing Rules and Regulations of R.A. No. 9646" PRC LIC. NO 0024466 0002075 ... MARIA BERNADETTE AGNES SAMONTE NARVAJA, ANA CANDANO ... VIRGILIO JADE

Caching Memcached vs. Redis - files.meetup.comfiles.meetup.com/107604/redis_memcached_pdf.pdf · Caching Memcached vs. Redis San Francisco MySQL Meetup Ryan Lowe Erin O’Neill 1

MySQL High Availability Solutions - Meetupfiles.meetup.com/107604/MySQL-HA-en-SFMySQLMeetup... · MySQL High Availability Solutions ... like GFS or OCFS2) ... Synchronous vs. asynchronous

MaxScale Overview - Meetupfiles.meetup.com/107604/MaxScale_Overview_DBA... · – Configure MySQL for best connection concurrency vs memory footprint – Design your schema, normalize

VS MONGODB AT SCALE · The MongoDB Solution for Big Data TokuMX™ is an open source, high-performance distribution of MongoDB that has dramatically improved performance and operational

Jeremy Glick, MySQL DBA San Francisco MySQL Meetupfiles.meetup.com/107604/20150311_sf_meetup.pdfWho am I? Jeremy Glick • 6 Years MySQL DBA • Chicago / Sacramento • Organizer,

Glottopolitics: The Power of Language...Elvira Narvaja de Arnoux, from the University of Buenos Aires, provided further precision to the concept through the publication of a chronological

Disposicion DHyCS 05 17 09 06 17 - Universidad Nacional de … LCS/3232... · 2020. 6. 2. · Narvaja de Arnoux, Elvira y otros (2012 ) UNASUR y sus discursos ; integración regi

Edited by M. C. Mirow and Rafael Domingo...15 Justo Arosemena Quesada (Panama and Colombia, 1817–1896) Hernán Alejandro Olano García and M. C. Mirow 16 Tristán Narvaja (Argentina

Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

Resolución MP N° 125/20.- VISTO · 2020. 6. 8. · Resolución MP N° 125/20.-Buenos Aires, 8 de junio de 2020.-VISTO: El expediente CUDAP MPF 1323/2020, caratulado “Vaca Narvaja,

Corporate Social Responsibility - Tristan Vaca Narvaja

Scaling - Meetupfiles.meetup.com/107604/MySQL _Meetup_July_2012-Scaling...Lesson Learned #1 It will fail. Keep it simple. Scaling Pinterest Friday, July 27, 12 Page Views / Day Scaling

MySQL Performance: Demystified Tuningfiles.meetup.com/107604/MySQL_Perf-Tuning-SF_2015-dim.pdfThe following is intended to outline our general product direction. It is intended for

The “What”, “Why”, and “How” of Fractal Tree Indexing for ...files.meetup.com/1742411/The What, Why, and How of... · TokuMX Internals The “What”, “Why”, and “How”