51
ToroDB internals How to create a NoSQL db on top of SQL @NoSQLonSQL Álvaro Hernández <[email protected]>

ToroDB internalsinfo.citusdata.com/rs/235-CNE-301/images/ToroDB... · 2020. 10. 26. · ToroDB @NoSQLonSQL ToroDB in one slide Document-oriented, JSON, NoSQL db Open source (AGPL)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • ToroDB internalsHow to create a NoSQL db

    on top of SQL

    @NoSQLonSQL

    Álvaro Hernández

  • ToroDB @NoSQLonSQL

    About *8Kdata*

    ● Research & Development in databases

    ● Consulting, Training and Support in PostgreSQL

    ● Founders of PostgreSQL España, 5th largest PUG in the world (>500 members as of today)

    ● About myself: CTO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3

    www.8kdata.com

  • ToroDB @NoSQLonSQL

    ToroDB in one slide

    ● Document-oriented, JSON, NoSQL db

    ● Open source (AGPL)

    ● MongoDB compatibility (wire protocol level)

    ● Uses PostgreSQL as a storage backend

  • ToroDB @NoSQLonSQL

    How to make a NoSQL database

    ● Varying schema (aka “schema-less”)

    ● Document-oriented API

    ● Replication

    ● Horizontal scalability

  • ToroDB @NoSQLonSQL

    Varying schema(aka “schema-less”)

  • ToroDB @NoSQLonSQL

    ToroDB's Document 2 Relational transformation

    ● Data is stored in tables. No blobs

    ● JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately

    ● Subdocuments are classified by “type”. Each “type” maps to a different table

  • ToroDB @NoSQLonSQL

    ● A “structure” table keeps the subdocument “schema”

    ● Keys in JSON are mapped to attributes, which retain the original name

    ● Tables are created dinamically and transparently to match the exact types of the documents

    ToroDB's Document 2 Relational transformation

  • ToroDB @NoSQLonSQL

    ToroDB storage internals

    { "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}

  • ToroDB @NoSQLonSQL

    ToroDB storage internals

    The document is split into the following subdocuments:

    { "name": "ToroDB", "data": {}, "nested": {} }

    { "a": 42, "b": "hello world!"}

    { "j": 42, "deeper": {}}

    { "a": 21, "b": "hello"}

  • ToroDB @NoSQLonSQL

    ToroDB storage internals

    select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘

  • ToroDB @NoSQLonSQL

    ToroDB storage internals

    select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘

    select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘

  • ToroDB @NoSQLonSQL

    Why not jsonb?

    ● jsonb is really cool

    ● But keys (metadata) are repeated most of the time (storage, I/O savings)

    ● Docuements are not classified (normalized)

    ● ToroDB does all that

  • ToroDB @NoSQLonSQL

    Metadata repetition

    { “a”: 1, “b”: 2 }{ “a”: 3 }{ “a”: 4, “c”: 5 }{ “a”: 6, “b”: 7 }{ “b”: 8 }{ “a”: 9, “b”: 10 }{ “a”: 11, “b”: 12, “j”: 13 }{ “a”: 14, “c”: 15 }

    Counting “document types” in collections of millions: at most, 1000s of different types

  • ToroDB @NoSQLonSQL

    ToroDB storage and I/O savings

    29% - 68% storage required,compared to Mongo 2.6

  • ToroDB @NoSQLonSQL

    ToroDB: query “by structure”

    ● ToroDB is effectively partitioning by type

    ● Structures (schemas, partitioning types) are cached in ToroDB memory

    ● Queries only scan a subset of the data

    ● Negative queries are served directly from memory

  • ToroDB @NoSQLonSQL

    How data is stored in schema-less

    Data normalization

  • ToroDB @NoSQLonSQL

    This is how we store in ToroDB

  • ToroDB @NoSQLonSQL

    Document oriented-API

  • ToroDB @NoSQLonSQL

    MongoDB WP and API

    ● ToroDB speaks MongoDB protocol natively, and implements Mongo API

    ● All MongoDB tools, drivers, GUIs, whould work as is on ToroDB

    ● Offer a widely used API on top of your current infrastructure

  • ToroDB @NoSQLonSQL

    ● NoSQL is trying to get back to SQL

    ● ToroDB is SQL native!

    ● Insert with Mongo, query with SQL!

    ● Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc

    ToroDB: native SQL

  • ToroDB @NoSQLonSQL

    ToroDB: native SQL

  • ToroDB @NoSQLonSQL

    db.toroviews.insert({id:5, person: {

    name: "Alvaro", surname: "Hernandez",contact: { email: "[email protected]", verified: true }}

    })

    db.toroviews.insert({id:6, person: {

    name: "Name", surname: "Surname", age: 31,contact: { email: "[email protected]" }}

    })

    ToroDB VIEWs

  • ToroDB @NoSQLonSQL

    torodb$ select * from toroviews.person ;┌─────┬───────────┬────────┬─────┐│ did │ surname │ name │ age │├─────┼───────────┼────────┼─────┤│ 0 │ Hernandez │ Alvaro │ ¤ ││ 1 │ Surname │ Name │ 31 │└─────┴───────────┴────────┴─────┘(2 rows)

    torodb$ select * from toroviews."person.contact";┌─────┬──────────┬────────────────────────┐│ did │ verified │ email │├─────┼──────────┼────────────────────────┤│ 0 │ t │ [email protected] ││ 1 │ ¤ │ [email protected] │└─────┴──────────┴────────────────────────┘(2 rows)

    ToroDB VIEWs

  • ToroDB @NoSQLonSQL

    VIEWs, ToroDB from any SQL tool

  • ToroDB @NoSQLonSQL

    Replication

  • ToroDB @NoSQLonSQL

    Replication protocol choice

    ● ToroDB is based on PostgreSQL

    ● PostgreSQL has either binary streaming replication (async or sync) or logical replication

    ● MongoDB has logical replication

    ● ToroDB uses MongoDB's protocol

  • ToroDB @NoSQLonSQL

    MongoDB's replication protocol

    ● Every change is recorded in JSON documents, idempotent format (collection: local.oplog.rs)

    ● Slaves pull these documents from master (or other slaves) asynchronously

    ● Changes are applied and feedback is sent upstream

  • ToroDB @NoSQLonSQL

    MongoDB replication implementation

    https://github.com/stripe/mosql

    https://github.com/stripe/mosql

  • ToroDB @NoSQLonSQL

    ToroDB v0.4● ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)

    ● Implements the full replication protocol (not as an oplog tailable query)

    ● Replicates from Mongo to a PostgreSQL

    ● Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)

  • ToroDB @NoSQLonSQL

    MongoDB's oplog & tailable cursors

    Oplog is a system, capped collection which contains idempotent DML commands

    Tailable cursors are “open queries” where data is streamed as it appears on the database

    Both are used a lot (replication, Meteor...)

  • ToroDB @NoSQLonSQL

    How to do oplog & tailable with PG?

    Thanks to 9.4's logical decoding, it's easy

    Replication (master) is just plain logical decoding, transformed to MongoDB's json format

    Tailable cursors would be logical decoding plus filtering (to make sure streamed data matches query filters)

  • ToroDB @NoSQLonSQL

    Horizontal scalability(aka sharding)

  • ToroDB @NoSQLonSQL

    Write scalability(sharding)

    ● MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)

    ● Will use MongoDB's mongos without modification, as well as config servers

    ● That might change in the future (pg_shard?)

  • ToroDB @NoSQLonSQL

    Horizontal scalability(storage level)

    ● Another non-exclusive option is to have ToroDB store data in a distributed database

    ● Requires a distributed database like GreenPlum, CitusDb or RedShift

    ● Paired with replication as a slave:DW in NoSQL enabler

  • ToroDB @NoSQLonSQL

    ● GreenPlum was open sourced Oct/15

    ● ToroDB already runs on GP!CitusDB 5.0 port coming soon!

    ● Goal: ToroDB v0.4 replicates from a MongoDB master onto Toro-Greenplum and runs the reporting queries using distributed SQL

    ToroDB on GP: SQL DW for MongoDB

  • ToroDB @NoSQLonSQL

    ● Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015

    ● AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD EBS

    ● 4x shards, 3x config; 4x segments GP

    ● 83M records, 65GB plain json

    Benchmark

  • ToroDB @NoSQLonSQL

    Disk usage

    Mongo 3.0, WT, SnappyGP columnar, zlib level 9

    table size index size total size0

    10000000000

    20000000000

    30000000000

    40000000000

    50000000000

    60000000000

    70000000000

    80000000000

    Storage requirements

    MongoDB vs ToroDB on Greenplum

    MongoToroDB on GP

    byte

    s

  • ToroDB @NoSQLonSQL

    SELECT count( distinct( "reviewerID" ))FROM reviews;

    Queries: which one is easier?

    db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])

  • ToroDB @NoSQLonSQL

    SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;

    Queries: which one is easier?

    db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})

  • ToroDB @NoSQLonSQL

    Query times

    3 different queriesQ3 on MongoDB: aggregate fails

    27,95 74,87 00

    200

    400

    600

    800

    1000

    1200

    9691007

    035 13 31

    Query duration (s)

    MongoDB vs ToroDB on Greenplum

    MongoDB ToroDB on GP

    speedup

    seco

    nds

  • ToroDB @NoSQLonSQL

    Doing what MongoDBcan't offer

  • ToroDB @NoSQLonSQL

    Atomic operations

    ● There is no support for atomic bulk insert/update/delete operations

    ● Not even with $isolated:“Prevents a write operation that affects multiple documents from yielding to other reads or writes […] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide “all-or-nothing” atomicity for write operations.”http://docs.mongodb.org/manual/reference/operator/update/isolated/

    http://docs.mongodb.org/manual/reference/operator/update/isolated/

  • ToroDB @NoSQLonSQL

    Cheap single-node durability

    ● Without journaling, MongoDB is not durable nor crash-safe

    ● MongoDB requires “j: true” for true single-node durability. But who guarantees its consistent usage? Who uses it by default?j:true creates I/O storms equivalent to SQL CHECKPOINTs

  • ToroDB @NoSQLonSQL

    “Clean” reads

    Oh really?

  • ToroDB @NoSQLonSQL

    “Clean” reads

    http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior

    “MongoDB will allow clients to read the results of a write operation before the write operation returns.”

    “If the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.”

    Thus, MongoDB suffers from dirty reads. But let's call them just “tainted reads”.

    http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior

  • ToroDB @NoSQLonSQL

    “Clean” reads

    What about $snapshot? Nope:

    “The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.”

    http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

    Cursors in ToroDB run in repeatable read, read-only mode:globalCursorDataSource.setTransactionIsolation("TRANSACTION_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);

    http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

  • ToroDB @NoSQLonSQL

    https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

    Replication dirty and stale reads(“call me maybe”)

    https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

  • ToroDB @NoSQLonSQL

    MongoDB's dirty and stale reads (repl)

    Dirty readsA primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?)

    Stale readsA primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority

  • ToroDB @NoSQLonSQL

    Data discoverability, SQL connectors

    ● They are two of the major announcements for MongoDB 3.2

    ● To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count)

    ● SQL connectors: native, no emulation

  • ToroDB @NoSQLonSQL

    Download, clone, PR, star it!

    https://github.com/torodb/torodb

  • Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51