Степан Кольцов — Message passing: многопоточное...

Preview:

DESCRIPTION

Степан Кольцов, Яндекс. Самые распространённые примитивы многопоточной синхронизации — это mutex и condvar. Эти примитивы плохо работают в случае contention (т. е. когда несколько потоков заходят в одну критическую область) — операции захвата и отпускания лока начинают работать на порядки медленнее и заметно нагружать CPU, при этом непредсказуемо деградирует производительность системы и появляются другие проблемы. Альтернативный подход к многопоточному программированию — это передача сообщений, или message passing. Степан расскажет о том, как устроены мьютексы, почему возникают такие проблемы и как эффективно реализовать подход message passing.

Citation preview

Concurrency without mutexes

What’s wrong with mutex?

• Hard to write safe code

• Mutexes are slow

• Hard to parallelize

Hard to write safe code

void first() { Guard<Mutex> guard(mutex1); ...}!void second() { Guard<Mutex> guard(mutex2); ...}!void third() { Guard<Mutex> guard(mutex1); ... second(); // possible deadlock}!void fourth() { Guard<Mutex> guard(mutex2); ... first(); // possible deadlock}

void foo() { // must be locked}!

void bar() { Guard guard; foo(); }void baz() { Guard guard; foo(); }void qux() { foo(); }void quux() { Guard guard; qux(); }void corge() { quux(); }// grault does not lockvoid grault() { qux(); }

Mutexes are expensive

• mutex lock/unlock takes about 1us under contention

• under high load it is almost always a contention

• spinlocks are not worse

struct spinlock lock = SPINLOCK_INIT;!

void do_smth() { spinlock_lock(&lock); … spinlock_unlock(&lock);}

Spinlock API

struct spinlock { int locked;};!#define SPINLOCK_INIT { 0 };!void spinlock_lock(struct spinlock* spinlock) { while (!atomic_compare_exchange( &spinlock->locked, 0, 1)) {}}!void spinlock_unlock(struct spinlock* spinlock) { atomic_store(&spinlock->locked, 0, __ATOMIC_SEQ_CST);}

Spinlock impl

Code examples

github.com/stepancheg/no-mutex-c github.com/stepancheg/no-mutex

struct mutex lock = MUTEX_INIT;!

void do_smth() { mutex_lock(&lock); … mutex_unlock(&lock);}

Mutex API

struct mutex { int locked; // 1 if locked int count; // number of threads requesting a lock};!#define MUTEX_INIT { 0, 0 };!void mutex_lock(struct mutex* mutex) { atomic_add_fetch(&mutex->count, 1); while (!atomic_compare_exchange(&mutex->locked, 0, 1)) { futex(&mutex->locked, FUTEX_WAIT, 1); }}!void mutex_unlock(struct mutex* mutex) { int left = atomic_add_fetch(&mutex->count, -1); atomic_store(&mutex->locked, 0); if (left > 0) { futex(&mutex->locked, FUTEX_WAKE, 1); }}

Mutex impl

Numbers

lock cmpxchg 8ns

uncont. mutex lock/unlock 11ns

futex_wake 400ns

cont. mutex lock ~500ns

Hard to parallelize

• We want for some app to use 5 cores. How many threads should we allocate?

There’s a solution!

Message passing/ Actor model

class BlockingQueue<T> { void Enqueue(T elem) { … } // block if empty Vector<T> DequeueAll() { … }}

BlockingQueue

class BlockingQueue<T> { Mutex mutex; CondVar condVar; Vector<T> elements;!

void Enqueue(T elem) { mutex.lock(); elements.push(elem); condVar.signal(); mutex.unlock(); }}

class BlockingQueue<T> { Mutex mutex; CondVar condVar; Vector<T> elements;!

Vector<T> DequeueAll() { mutex.lock(); while (elements.empty()) { condVar.wait(); } Vector<T> r = move elements; mutex.unlock(); return r; }}

Simple message passing with dedicated thread

// non-blocking queue// mutex+condvarBlockingQueue<Request> queue;!

void runProcessingThread() { for (;;) { Vector<Request> requests = Queue.dequeueAll(); // process requests }}!

void start(Request request) { queue.enqueue(request);}

Actors

interface Runnable { void run();}!

interface ThreadPoolExecutor { void submit(Runnable);}

Executor

abstract class Actor { Actor(Executor executor);!

// is not called in parallel protected abstract void act();!

// execute act() // at least once void schedule() { … }}

Actor

class MyReqProcessor: Actor { MyReqProcessor(Executor exec) { super(exec); }! NonBlockingQueue<Request> queue;! override void act() { // is not called in parallel Vector<Request> reqs = queue.dequeueAll(); // process reqs }! // may be called from different threads void addWork(Request request) { queue.enqueue(request); schedule(); }}

MyReqProcessor

enum ETaskState { WAITING, RUNNING, RUNNING_GOT_TASKS,};!class Actor: Runnable { Atomic<TaskState> taskState;! void schedule() { if (AtomicSwap(RGT) == WAITING) { executor.submit(this); } }! …}

Actor.schedule

enum ETaskState { WAITING, RUNNING, RUNNING_GOT_TASKS,};!class Actor: Runnable { Atomic<TaskState> taskState;! override void run() { for (;;) { while (CAS(RGT -> RUNNING)) { fetch tasks act } if (CAS(RUNNING -> WAITING) { return; } } }}

Actor.run

Thanks

Recommended