33
NewSQL Overview Ivan Glushkov [email protected] @gliush

Иван Глушков (Echo)

  • Upload
    ontico

  • View
    1.365

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Иван Глушков (Echo)

NewSQL Overview

Ivan Glushkov [email protected]

@gliush

Page 2: Иван Глушков (Echo)

About myself• MIPT • MCST, Elbrus compiler project • Echo, real-time social platform

(PaaS)

Page 3: Иван Глушков (Echo)

History of SQL• Relational Model in 1970:

disk-oriented, row, sql • “One size fits all” doesn’t

work: – Column-oriented for OLAP. – Key-Value storages,

Document storages

Page 4: Иван Глушков (Echo)

Startups lifecycle• Start: no money, no users,

open source • Middle: more users, storage

optimization • Final: plenty of users, storage

failure

Page 5: Иван Глушков (Echo)

Sharding in application• Additional logic in application • Difficulties with cross-sharding

transactions • More servers to maintain • More components — higher

prob for breakdown

Page 6: Иван Глушков (Echo)

New requirements• Huge and growing data sets

9M msg per hour in Facebook 50M msg per day in Twitter

• Generated by devices • High concurrency • Data model with some relations • Transactional integrity

Page 7: Иван Глушков (Echo)

Architecture change

Client Side

Server Side

Cloud Storage

Client Side

Server Side

Database

Page 8: Иван Глушков (Echo)

Architecture change

• ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection,

limitations in operations, recovery

Page 9: Иван Глушков (Echo)

NoSQL• CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document,

Graph, …

Page 10: Иван Глушков (Echo)

NewSQL: definition

“A DBMS that delivers the scalability !and flexibility promised by NoSQL!while retaining the support for SQL !queries and/or ACID, or to improve !performance for appropriate workloads.”!

451 Group

Page 11: Иван Глушков (Echo)

NewSQL: definition

❖ SQL as the primary interface!❖ ACID support for transactions!❖ Non-locking concurrency control!❖ High per-node performance!❖ Parallel, shared nothing architecture!

Michael Stonebraker

Page 12: Иван Глушков (Echo)

Shared nothing• Each node is independent and

self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests

Page 13: Иван Глушков (Echo)

Column-oriented DBMS• Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large

records with a lot of fields

Page 14: Иван Глушков (Echo)

• Useful work is only 12%

• By Stonebraker and research group

Traditional DBMS

12%

10%

11%

18% 20%

29%

Page 15: Иван Глушков (Echo)

In-memory storage• High throughput • Low latency • No Buffer Management • If serialized, no Locking or

Latching

Page 16: Иван Глушков (Echo)

In-memory storage• Amazon price reduction !!!!

• Current price for 1TB: $14K

Page 17: Иван Глушков (Echo)

NewSQL: categories• New approaches: VoltDB,

Clustrix, NuoDB • New storage engines: TokuDB,

ScaleDB • Transparent clustering:

ScaleBase, dbShards

Page 18: Иван Глушков (Echo)

NuoDB• Multi-tier architecture: – Administrative: managing,

stats, cli, web-ui – Transactional: ACID except

‘D’, cache – Storage: key-value store

Page 19: Иван Глушков (Echo)

NuoDB• Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage

Page 20: Иван Глушков (Echo)

YCSB• Yahoo Cloud Serving

Benchmark • Key-value:

insert/read/update/scan • Measures – Performance – Scaling

Page 21: Иван Глушков (Echo)

NuoDB: YCSBThroughput, tps/nodes

0

275 000

550 000

825 000

1 100 000

1 2 4 8 16 24

Update latency, s

0

25

50

1 2 4 8 16 24

Read latency, s

0

1.5

3

1 2 4 8 16 24

Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads

Page 22: Иван Глушков (Echo)

VoltDB• In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging

Page 23: Иван Глушков (Echo)

VoltDB• Open-source • Partitioning and Replication

control • Java + C++

Page 24: Иван Глушков (Echo)

VoltDB: kv bench

90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors

Page 25: Иван Глушков (Echo)

VoltDB: sql bench

26 SQL statements "per transaction

Page 26: Иван Глушков (Echo)

• Shared data • No single point

of failure

ScaleDBApplication

MySQL MySQL MySQL…

Mirrored"Storage! Mirrored"

Storage!

… Mirrored"Storage!

Page 27: Иван Глушков (Echo)

ClustrixDB• “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query

fragments • Partition, replication: “slices”

Page 28: Иван Глушков (Echo)

ClustrixDB• “Move query to the data” • Dynamic and transparent data

layout • Linear scale

Page 29: Иван Глушков (Echo)

TPC-C

• OLTP benchmark • 9 types of tables • “New-order transaction”

Page 30: Иван Глушков (Echo)

ClustrixDB• ~ 400GB • linear

scale

Page 31: Иван Глушков (Echo)

ClustrixDB: examples• 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month

writes/reads • 42 nodes, 336 cores, 2TB

memory, 46TB SSD

Page 32: Иван Глушков (Echo)

Conclusions• NewSQL is an established trend

with a number of options • Hard to pick one because

they're not on a common scale • No silver bullet • Need more efficient ways to

store and process it

Page 33: Иван Глушков (Echo)

Questions?Ivan Glushkov [email protected] @gliush