Scylla db deck, july 2017

Dor Laor, CEO, co-founder

ScyllaDB - Achieve No-Compromise Performance and Availability

+ Intro: Who we are, what we do, who uses it+ Why ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise

performance and availability

Today we will cover:

Introduction+ Founded by KVM hypervisor creators+ Q2 2014 - Pivot to the database world+ Q3 2015 - Decloak during Cassandra Summit 2015, Beta+ Q1 2016 - General Availability+ Q3 2016 - First Scylla Summit: 100+ Attendees+ Q1 2017 - Completed B round+ $25MM in funding+ HQs: Palo Alto, CA; Herzelia, Israel+ 42+ employees, hiring!

What we do: Scylla, towards the best NoSQL

+ > 1 million OPS per node

+ < 1ms 99% latency

+ Auto tuned

+ Scale up and out

+ Open source

+ Large community (piggyback on Cassandra)

+ Blends in the ecosystem- Spark, Presto, time series, search, ..

Where Scylla is deployed?

+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise

Why ?@#$%$$%^?

Scylla benchmark by Samsung

Why we started Scylla? (cont)

+ Originally it was about performance/efficiency only+ Over time, we understood we can deliver more:

+ SLA between background and foreground tasks+ Work well on any given hardware {back pressure}+ Deliver consistent, low 99th percentile latency+ Reduction in admin effort+ Low latency under the face of failures (hot cache load balancing)+ High observability

Cassandra ScyllaThroughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core

Latency: High due to Java and JVM’s GC Low and consistent - own cache

Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling

Admin: Maintenance impacts performance SLA guarantee for admin vs serving

+ Outbrain is the world’s largest content discovery platform.

+ Over 557 million unique visitors from across the globe.

+ 250 billion personalized content recommendations every month.

Case study: Document column family

+ First read from memcached, go to Cassandra on misses.+ Pain

+ Stale data from cache+ Complexity+ Cold cache -> C* gets full volume

ReadMicroservice

Write Process

Outbrain: Cassandra plus Memcache

+ Writes are written in parallel to C* and Scylla+ Reads are done in parallel:

+ Memcached + Cassandra+ Scylla (no cache at all)

Scylla/Cassandra side by side deployment

ReadMicroservice

Write Process

Scylla (w/o cache) vs Cassandra + MemcachedScylla Cassandra Diff%

Requests/Minute

12,000,000 500,000(memcache

handles 11,500,000)

AVG Latency

4 ms 8 ms 2X

Max Latency

8 ms 35 ms 3.5X

Hardware 9 machines 30+9 machines

Cassandra’s latency

Scylla’s latency

What does it mean for a non Cassandra user?

+ Throughput, latency and scale benefits+ Wide range of big data integration: {Kariosdb, Spark,

JanusGraph, Presto, Kafka, Elastic}+ Best HA/DR in the industry. + Stop using caches in front of the database+ Consolidate HBase, Redis, MySQL, Mongo and others

Assorted Quotes

+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise

Design decisions: #1 The trivials

+ SSTable file format+ Configuration file format+ CQL language+ CQL native protocol+ JMX management protocol+ Management command line

Design decisions: #2 Compatibility

Double cluster - Migration w/o downtime

AppCassandra

Scylla

CQLproxy

Design decisions: #3 All things async

Threads Shards

Design decisions: #4 Shard per core

SCYLLA DB: Network Comparison

Kernel

Cassandra

TCP/IPScheduler

queuequeuequeuequeuequeuethreads

NICQueues

Kernel

Traditional stack SeaStar’s sharded stack

Memory

Lock contentionCache contentionNUMA unfriendly

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

NICQueue

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

NICQueue

Kernel (isn’t

involved)

Userspace

No contentionLinear scalingNUMA friendly

CoreDatabase

Task Scheduler

queuequeuequeuequeuequeuesmp queue

NICQueue

Userspace

Scylla has its own task schedulerTraditional stack Scylla’s stack

Promise

Promise is a pointer to eventually computed value

Task is a pointer to a lambda function

Scheduler

Thread

Thread is a function pointer

Stack is a byte array from 64k to megabytes

Context switch cost is

high. Large stacks pollutes

the caches No sharing, millions of

parallel events

SCYLLA IS DIFFERENT

❏ DMA❏ Log structured❏ merge tree❏ DBaware cache❏ Userspace I/O❏ scheduler

❏ NUMA friendly❏ Log structured

allocator❏ Zero copy

❏ Thread per core❏ lock-free❏ Task scheduler❏ Reactor programing❏ C++14

❏ Multi queue ❏ Poll mode❏ Userspace

TCP/IP

Scylla vs C* latency by

Design Decision: #5 Unified cache vs C* cache plethora Cassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

Complex Tuning

Unified cache vs Cassandra cache plethora Cassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

SSTable page (4k)

Your data (300b)

Parasitic Rows - 300b/4kb

Cassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

App thread

Kernel

Page faultSuspend thread

Initiate I/OContext switch

I/O completesInterruptContext switch

Map pageResume threadPage

Unified cache vs Cassandra cache plethora

Cassandra Streaming configuration

Design decisions: #6 I/O scheduler

Scylla I/O Scheduling

Commitlog

Compaction

UserspaceI/O

SchedulerDisk

Max useful disk concurrency

I/O queued in FS/deviceNo queues

I/O scheduler result

Memtable

Seastar SchedulerCompaction

Repair

Commitlog

Compaction Backlog Monitor

Memory Monitor

Adjust priority

Design Decision: #8..#10 Workload conditioning

Workload Conditioning in practice

Disk can’t keep up:workload conditioning will figure out the right request rate

+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ How we made design decisions to achieve no-compromise

+ Enterprise release, based on 1.6+ 1.7 - May 2017

+ Counters+ New intra-node sharding algorithm+ SStableloader from 2.2/3.x+ Debian

+ 2.0 - August 2017+ Materialized views+ Execution blocks (cpu cache optimization which boost performance)+ Partial row cache (for wide row streaming)

Upcoming releases

Vertical HorizontalCore database

Scylla Beyond Cassandra

Resources

slideshare.net/ScyllaDB

dor@scylladb.com (@DorLaor)

avi@scylladb.com (@AviKivity)

@scylladb

http://bit.ly/2oHAfok

youtube.com/c/scylladbscylladb.com/blog

NoSQL GOES NATIVE

Thank you.

Scylla db deck, july 2017

Engineering

Scylla Summit 2016: Graph Processing with Titan and Scylla

CAUGHT BETWEEN SCYLLA AND CHARYBDIS? REGULATING …pages.stern.nyu.edu/~sternfin/vacharya/public_html...mythological sea monsters Scylla (rent-seeking) and Charybdis (asset substitution)

Scylla db@sf data meetup, dec 1 2015

Cloudius Systems presents · Part 1: The application The basics: - Scylla is a datastore. - Scylla is a nosql datastore - Scylla is a highly available eventually consistent datastore

Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

Scylla Summit 2017: From Elasticsearch to Scylla at Zenly

Scylla Manual

Averting Catastrophes: The Strange Economics of Scylla and

Prolyte: StageDex dB deck

Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Downtime

Agcapita July 2013 - Central Banking's Scylla and Charybdis

BETWEEN SCYLLA AND CHARYBDIS: CONSTRUCTING THE NICARAGUAN ARMY

Soft Shell Crab Scylla olivacea (Herbst, 1796) Production

DB Pitch Competition Deck vF

Between Scylla and Carybdis

Scylla db@cassandra meetup, tlv, 2015

DB Conference - Heathrow (SP) Investor Deck June 2021

Population Characteristics of the Mangrove Crab Scylla serrata

SEAJACKS SCYLLA · SEAJACKS SCYLLA CLASSIFICATION AND RULES Type: GustoMSC NG14000X Built: Q4 2015 ... according to ABS DPS 2. Navigation and communication systems,