Back to the future with C++ and Seastar

  • View
    9.306

  • Download
    2

  • Category

    Software

Preview:

Citation preview

Cloudius Systems presents:

SeastarAvi Kivity, April 13 2015

● New tech, runs on physical machines, VMs,Linux/OSv● Multi-million IOPS, fully scalable● Perfect building block for database/filesystem/cache● Share-nothing, fully asynchronous model● Open Source

SeaStar Technology

SeaStar current performance

SeaStarBefore: Thread model After: SeaStar shards

Problem with today’s programing model+ Single core performance (frequency, IPC) no

longer growing+ #core grows but it’s hard to utilize. Apps don’t

scale+ Locks have costs even w/o contention+ Data is allocated on one core, copied and used on

others+ Software can’t keep up with the recent hardware

(SSD, line rate for 10Gbps, NUMA, etc)

Kernel

Application

TCP/IPScheduler

queuequeuequeuequeuequeuethreads

NICQueues

Kernel

Traditional stack

Memory

SeaStar FrameworkLinear scaling by #core

+ Each engine is executed by each core+ Shared-nothing per-core design

+ Fits existing shared-nothing distributed applications model

+ Full kernel bypass, supports zero-copy+ No threads, no context switch and no locks

+ Instead, asynchronous lambda invocation

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Kernel

SeaStar Framework Comparison

Application

TCP/IPScheduler

queuequeuequeuequeuequeuethreads

NICQueues

Kernel

Traditional stack SeaStar’s sharded stack

Memory

Lock contention

Cache contention

NUMA unfriendlyApplication

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

No contention

Linear scaling

NUMA friendly

SeaStar handles 1,000,000s connections in parallel!

Traditional stack SeaStar’s sharded stack

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise is a pointer to eventually computed value

Task is a pointer to a lambda function

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread is a function pointer

Stack is a byte array from 64k to megabytes

Context switch cost is

high. Large stacks

pollutes the caches

No sharing, millions

of parallel events

SeaStar current performance Stock TCP stack SeaStar’s native TCP stack

Basic model

■ Futures■ Promises■ Continuations

F-P-C defined: Future

A future is a result of a computation that may not be available yet. ■ Data buffer from the network■ Timer expiration ■ Completion of a disk write■ Result computation that requires the values from one or

more other futures.

F-P-C defined: Promise

A promise is an object or function that provides you with a future, with the expectation that it will fulfil the future.

Basic future/promisefuture<int> get(); // promises an int will be produced eventually

future<> put(int) // promises to store an int

void f() {

get().then([] (int value) {

put(value + 1).then([] {

std::cout << "value stored successfully\n";

});

});

}

Chainingfuture<int> get(); // promises an int will be produced eventually

future<> put(int) // promises to store an int

void f() {

get().then([] (int value) {

return put(value + 1);

}).then([] {

std::cout << "value stored successfully\n";

});

}

Zero copy friendly

future<temporary_buffer>

connected_socket::read(size_t n);

■ temporary_buffer points at driver-provided pages if possible

■ discarded after use

Zero copy friendly (2)

future<size_t>

connected_socket::write(temporary_buffer);

■ Future becomes ready when TCP window allows sending more data (usually immediately)

■ temporary_buffer discarded after data is ACKed■ can call delete[] or decrement a reference count

Dual Networking StackNetworking API

Seastar (native) Stack POSIX (hosted) stack

Linux kernel (sockets)

User-space TCP/IP

Interface layer

DPDKVirtio Xen

igb ixgb

Disk I/O

■ Zero copy using Linux AIO and O_DIRECT■ Some operations using worker threads (open()

etc.)■ Plans for direct NVMe support

Rich APIs

● HTTP Server● HTTP Client● RPC client/server● map_reduce● parallel_for_each

● distributed<>● when_all()● timers

More info

■ http://github.com/cloudius-systems/seastar■ http://seastar-project.com

Thank you@CloudiusSystems

Recommended