Upload
8kdata-technology
View
994
Download
0
Embed Size (px)
Citation preview
ToroDB @NoSQLonSQL
About $self and 8Kdata *8Kdata*
ToroDB @NoSQLonSQL
The world has changed...
http://chasingafterdear.com/wp-content/uploads/2013/05/how-the-world-has-changed.png
ToroDB @NoSQLonSQL
Say you wereβ¦
β A happy DBA, managing your RDBMS
β Bofhing your users when required
β Just having to fight devs who don't know who is Mr. Bobby Tables
ToroDB @NoSQLonSQL
β¦ and then NoSQL came
And you started receiving questions like:
I want NoSQL!Install MongoDB!My app is web scale!
ToroDB @NoSQLonSQL
Fear no more!You can now
supercharge your RDBMSwith MongoDB superpowers
ToroDB @NoSQLonSQL
ToroDB @NoSQLonSQL
ToroDB in one slide
β Document-oriented, JSON, NoSQL db
β Open source (AGPL)
β MongoDB compatibility (wire protocol level)
ToroDB @NoSQLonSQL
ToroDB @NoSQLonSQL
Mapping unstructured datato relational
ToroDB @NoSQLonSQL
ToroDB storage internals
{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}
ToroDB @NoSQLonSQL
ToroDB storage internals
The document is split into the following subdocuments:
{ "name": "ToroDB", "data": {}, "nested": {} }
{ "a": 42, "b": "hello world!"}
{ "j": 42, "deeper": {}}
{ "a": 21, "b": "hello"}
ToroDB @NoSQLonSQL
ToroDB storage internalsβββββββ¬ββββββββ¬βββββββββββββββββββββββββββββ¬ββββββββββ did β index β _id β name ββββββββΌββββββββΌβββββββββββββββββββββββββββββΌβββββββββ€β 0 β Β€ β \x5451a07de7032d23a908576d β ToroDB ββββββββ΄ββββββββ΄βββββββββββββββββββββββββββββ΄ββββββββββββββββ¬ββββββββ¬βββββ¬ββββββββββββββββ did β index β a β b ββββββββΌββββββββΌβββββΌβββββββββββββββ€β 0 β Β€ β 42 β hello world! ββ 0 β 1 β 21 β hello ββββββββ΄ββββββββ΄βββββ΄ββββββββββββββββββββββ¬ββββββββ¬ββββββ did β index β j ββββββββΌββββββββΌβββββ€β 0 β Β€ β 42 ββββββββ΄ββββββββ΄βββββ
ToroDB @NoSQLonSQL
ToroDB storage internalsselect * from demo.structuresβββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ sid β _structure ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β 0 β {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} ββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
select * from demo.root;βββββββ¬βββββββ did β sid ββββββββΌββββββ€β 0 β 0 ββββββββ΄ββββββ
ToroDB @NoSQLonSQL
How data is stored in schema-less
Data normalization
ToroDB @NoSQLonSQL
This is how we store in ToroDB
ToroDB @NoSQLonSQL
Advantages over MongoDB
ToroDB @NoSQLonSQL
ToroDB: native SQL
ToroDB @NoSQLonSQL
torodb$ select * from toroviews.person ;βββββββ¬ββββββββββββ¬βββββββββ¬βββββββ did β surname β name β age ββββββββΌββββββββββββΌβββββββββΌββββββ€β 0 β Hernandez β Alvaro β Β€ ββ 1 β Surname β Name β 31 ββββββββ΄ββββββββββββ΄βββββββββ΄ββββββ(2 rows)
torodb$ select * from toroviews."person.contact";βββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ did β verified β email ββββββββΌβββββββββββΌβββββββββββββββββββββββββ€β 0 β t β [email protected] ββ 1 β Β€ β [email protected] ββββββββ΄βββββββββββ΄βββββββββββββββββββββββββ(2 rows)
ToroDB VIEWs
ToroDB @NoSQLonSQL
VIEWs, ToroDB from any SQL tool
ToroDB @NoSQLonSQL
Mix-and-match relational & NoSQL
β Use the same database for both your relational data and ToroDB
β Just use separate schemas (if you will)
β Don't write to ToroDB data or metadata tables
β Query with SQL, do joins, whatever!
ToroDB @NoSQLonSQL
And much more!
β Atomic batch-operations
β Clean reads
β Within nodeβ¦ transactions! (coming soon)
ToroDB @NoSQLonSQL
Data discoverability, SQL connectorsβ They are two of the major announcements for MongoDB 3.2
β To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count)
β SQL connectors: native, no emulation
ToroDB @NoSQLonSQL
Replication
ToroDB @NoSQLonSQL
ToroDB v0.4β ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)
β Implements the full replication protocol (not as an oplog tailable query)
β Open source github.com/torodb/torodb (devel branch, version 0.4-SNAPSHOT)
ToroDB @NoSQLonSQL
Horizontal scalability(aka sharding)
ToroDB @NoSQLonSQL
Write scalability(sharding)
β MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)
β Will use MongoDB's mongos without modification, as well as config servers
β That might change in the future (pg_shard?)
ToroDB @NoSQLonSQL
Horizontal scalability(storage level)
β Another non-exclusive option is to have ToroDB store data in a distributed database
β Requires a distributed database like GreenPlum, CitusDb or RedShift
β Paired with replication as a slave:DW in NoSQL enabler
ToroDB @NoSQLonSQL
Enabling Data Warehousingfor the NoSQL World
ToroDB @NoSQLonSQL
β Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015
β AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD
β 4x shards, 3x config; 4x segments GPβ 83M records, 65GB plain json
Benchmark
ToroDB @NoSQLonSQL
Disk usage
Mongo 3.0, WT, Snappy GP columnar, zlib level 9table size index size total size
0
10000000000
20000000000
30000000000
40000000000
50000000000
60000000000
70000000000
80000000000
Storage requirements
MongoDB vs ToroDB on Greenplum
Mongo
ToroDB on GP
byt
es
ToroDB @NoSQLonSQL
SELECT count( distinct( "reviewerID" ))FROM reviews;
Queries: which one is easier?
db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])
ToroDB @NoSQLonSQL
SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;
Queries: which one is easier?
db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})
ToroDB @NoSQLonSQL
Query times
3 different queries Q3 on MongoDB: aggregate fails
27.95 74.87 00
200
400
600
800
1000
1200969 1007
035 13 31
Query duration (s)
MongoDB vs ToroDB on Greenplum
MongoDB
ToroDB on GP
speedup
seco
nd
s
ToroDB @NoSQLonSQL
Announcing todayβ¦
MyToro!(experimental)
ToroDB @NoSQLonSQL