25
ROCKSDB CLOUD DHRUBA BORTHAKUR, ROCKSET PRESENTED AT PERCONA-LIVE, APRIL 2017

PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ROCKSDB CLOUD

DHRUBA BORTHAKUR, ROCKSET PRESENTED AT PERCONA-LIVE, APRIL 2017

Page 2: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

WHAT ARE WE TALKING ABOUT?

OUTLINE

▸ Why RocksDB-Cloud?

▸ Differences from RocksDB

▸ Goals, Design, Architecture

▸ Next Steps

Page 3: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

OUR INHERITANCE

▸ Open Sourced from Facebook Engineering

▸ Log Structured MergeTree

▸ Embedded c++/java/go library

▸ Available as MyRocks and MongoRocks

▸ Used at Microsoft, Yahoo, Netflix,…

ROCKSDB STORAGE ENGINE

Page 4: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

OUR INHERITANCE

▸ MyRocks on FB workload

▸ 50% smaller

ROCKSDB FOCUS ON EFFICIENCYDB Size (GB)

0

550

1100

1650

2200

InnoDB RocksDB

Page 5: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

OUR INHERITANCE

ROCKSDB FOCUS ON EFFICIENCY

GB

Writ

ten

0

1750

3500

5250

7000InnoDBRocksDB

Day 1 2 3 4 5 6 7

▸ MyRocks on FB workload

▸ 50% smaller

▸ 50% lesser IOs

MyRocks is supported by Percona

Page 6: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

WHY ROCKSDB-CLOUD?

▸ Rockset started to use RocksDB on AWS

▸ build our own replication engine

▸ build our own backup system

▸ custom code for hot/cold placement

▸ RAM, NVMe, SSD , disk

HAVE YOU USED ROCKSDB FOR CLOUD APPS?

▸ “Shared Nothing is dead”. —— Dewitt @MIT 2017, http://mitdbg.github.io/nedbday/2017/talks/dewitt.pptx

Page 7: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

WHY ROCKSDB-CLOUD?

ROCKSDB-CLOUD IS CHEAPER▸ RocksDB-Cloud uses locally attached SSD and S3

▸ 3x cheaper than 3 way replication

▸ If I redesigned HDFS today

▸ It won't use 3 way replication

▸ n times cheaper than EBS, n > 1

Page 8: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

WHY ROCKSDB-CLOUD?

▸ Optimized for Cloud Applications

▸ Support AWS, Google Storage, Azure

▸ Pluggability for other cloud vendors

VISION FOR ROCKSDB-CLOUD

Page 9: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ROCKSDB-CLOUD

▸ Durability of data inspite of machine failures

▸ Replication of data across machines

GOALS

Page 10: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ROCKSDB-CLOUD

GOALS▸ Durability of data inspite of

machine failures

▸ Replication of data across machines

▸ Auto placement of hot/cold data on cloud storage hierarchy

Copyright ©1996-2017 Computer History Museum

Page 11: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ROCKSDB-CLOUD

GOALS▸ Durability of data inspite of

machine failures

▸ Replication of data across machines

▸ Auto placement of hot/cold data on cloud storage hierarchy

▸ Portability across cloud vendors

Page 12: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 1: APPLICATION LOG IN FRONT OF DATABASE

flush sst file to local SSD

writes

memtable cache

persistent read cache on SSD

tail data from distributed log storage

flush sst file to cloud storage

Cloud Storage

Cloud Application

RocksDB-Cloud

block cache

reads queries

Page 13: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 1

▸ Tail data from Kafka topic into RocksDB-Cloud

▸ Open (local_directory, S3 bucket name)

▸ SST file copied to S3 at the time of file close

▸ keep or delete local set file

▸ Every change to MANIFEST is copied to S3

▸ Kafka state stored in RocksDB-Cloud

TAILING A LOG

Page 14: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 1

▸ Recover data when machine fails

▸ Open (local_directory, S3 bucket name)

▸ MANIFEST downloaded from S3

▸ Download data from sst files on demand

▸ Local SSD/disk as persistent cache

▸ Restart tailing from Kafka

TAILING A LOG

Page 15: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 1: ZERO COPY CLONES

Cloned Server

RocksDB-Cloudtail data from distributed log storage

Cloud Bucket B

write read

readqueries served by either server

▸ Instantaneous clone creation

▸ Both machines run at their own speeds

▸ True master-master configuration

read

tail data from distributed log storage

Cloud Bucket A

write

Server

RocksDB-Cloudqueries

Page 16: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 1: MASTERLESS REPLICATION

ZERO COPY CLONES

▸ Purger runs on every Machine

▸Deletes sst files that are not part of any clones

▸Both machines run at their own speeds

▸True master-master configuration

Page 17: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 2: DATABASE WAL ENABLED

flush sst file to local SSD

writes

memtable cache

persistent read cache on SSD

flush sst file to cloud storage

Cloud Storage

Cloud Application

RocksDB-Cloud

block cache

reads

queries

updates

Clou

d Lo

g St

orag

e,

Kine

sis

WAL

Page 18: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 2: ZERO COPY CLONES

Cloned Server

RocksDB-Cloud

read

queries

read

Cloud Bucket A

write

Server

RocksDB-Cloud

queries

updatesCl

oud

Log

Stor

age,

Kin

esis

WAL writes

Cloud Bucket B

write read

WAL tailer

Clones are read-only replicas

Page 19: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

ARCHITECTURE 2

▸ Master machine:

▸ read-write

▸ write WAL on AWS-Kinesis

READ WRITE DB

▸ Slave machine:

▸ read-only

▸ tail Kinesis and apply

Page 20: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

HIERARCHICAL STORAGE

▸ All Levels L0 - Ln reside in S3

▸ Levels L0 - L2 typically reside in local SSD and S3

▸ Cache data from S3 for reads:

▸ persistent cache on locally attached SSD

▸ Support for Intel NVMe

AUTO PLACEMENT OF HOT/COLD DATA

Page 21: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

PORTABILITY ACROSS CLOUD VENDORS

▸ App on Azure can access AWS S3 Storage

SEAMLESS COPY AMONG S3, AZURE, GOOGLE

RocksDB Cloud App

Azure

AWS S3

write read

RocksDB Cloud App

Google Cloud

write read

▸ same API on all cloud platforms

▸ App on Google Cloud can access Azure Storage

Page 22: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

LOW ADOPTION COST

▸ Pure Open Source

▸ API compatible with stock RocksDB

▸ Data format compatible with stock RocksDB

▸ License compatible with stock RocksDB

▸ BSD License

COMPATIBILITY

Page 23: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

FUTURE WORK

▸ Support for large size objects

▸ Support encryption-at-rest

NEXT STEPS

Page 24: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

OPEN SOURCE COMMUNITY

COLLABORATORS

Page 25: PRESENTED AT PERCONA-LIVE, APRIL 2017 …borthakur.com/ftp/rocksdb-cloud_dhruba_borthakur.pdfPRESENTED AT PERCONA-LIVE, APRIL 2017 WHAT ARE WE TALKING ABOUT? OUTLINE Why RocksDB-Cloud?

COME HACK WITH US

▸ Source code:https://github.com/rockset/rocksdb-cloud

▸ Dev discussions: [email protected] https://groups.google.com/d/forum/rocksdb-cloud

▸ Slack Channel: #rocksdb-cloud @ https://rockset-io.slack.com

REFERENCES