Quickthreads for GNU-2013

Quickthreads

FROM STRANGER TO MOTHER

About meName:Frankie Onuonga

Contributions: Fedora and Opensuse

Irc: frankieonuonga

Email:[email protected]

Introduction

● Programming is a tricky art trading between flexibility and overhead

● Constructs often trade of between portability and overhead

● We therefore program in a way where we can achieve high portability, flexibility and performance

Techniques

● So there are two main ways to build threaded packages:– Build threaded packages around a thread abstract

data type that has a machine independent interface and only encapsulate the most machine depended operations :initialization and execution

– Expose portable details of thread package implementation to allow higher level higher level thread operations to be optimized for thread operation

Lets dive and swim

● Lets go deep into the two and introduce quick threads along the way.

Basics

● Quickthreads is not a standalone threads package● It is a threads package core that is used to build

non-preemptive user space threads package ● It will provide and interface to machine dependent

code that performs initialization and context switches

● It lets the client do the rest( implement synchronization and scheduling)

Goals

● Make it easy to write and port thread packages : through providing machine dependent code for creating. running and stopping threads, thus client only has to reconfigure and run it on different architecture allowing for high portability

● Make thread packages with replaceable components and with performance close to that of hand coded packages with fixed policies

● But flexibility can lead to slowness but because it is minimalist then it is often close to the operation of hand coded operations

● In machines where there are many registers where restoring and saving them(registers) is a dominating factor then the benefits are seen

● In fact when experimenting with two thread packages no chance was noticed with the norm which for starting is a good thing.

Design Decisions

● The trick is to push sync to the clients and allow them to so that the thread core only sorts out context switching:” oh no we have given up some performance for flexibility “

● “let us see if we can get it back”

Synchronization

● Threads avoid this by pushing all of the sync functions to the client

● The goal is to ensure that a thread does not prohibit or require thread lock on a context switch

● If this existed then we would have race conditions that are bad.

Scheduler Threads

● Each blocking operation returns control to a central scheduler thread which is one per processor

● The scheduler threads enqueue blocked threads only if they are completely blocked and their stacks frozen and thus will ensure no races

● The scheduler thread is never queued with other threads so no races during context switch to and from other scheduler threads

Disadvantages With ST

● each blocking operation requires a context switch to the scheduler then a second context switch to the next thread

● on machines without per-processor private memory it is difficult to locate per-processor schedulers cheaply and shared schedulers must be protected with locks

Trick to locking

● But we can avoid the extra context switch to the scheduler if the blocking thread locks the queue until the stack being used is no longer in.

● You can perform unlock in the middle of a context switch which would require hand coding or providing it carefully out of line (having a problem of longer lock times)

● If we rely on separate functions of lock and unlock this removes atomicity which is bad practice in this case.

Register States

● Stack sharing problems can also be avoided if all transitional states can exist in a machine register.

● This means that processor A can run old thread even though processor B that blocked it is starting a new runable thread

problem

● Space is limited to machine registers and thus limiting the kind of operations that can be performed E. G a stackless thread can not perform procedure calls and some architectures and operating systems do not a thread to be temporarily stackless.

Preswitch technique

● Another method is to block the old thread, switch to the new thread and then run some code on the new threads stack on behalf of the old thread.

● This is good but also has some limitations:– Operations must be transparent

– The new thread must have a stack so lazy allocation is not possible here

Stateless Schedulers

● Another technique is to create a “lightweight” scheduler threads that consists of stacks space but initialized state.

● A context switch saves old thread then switches to the new scheduler stack , but no scheduler state is stored

● The scheduler task is used as a place to store a function of a thread that just got blocked

● It therefore follows when a new thread is started no scheduler state is saved

● This is faster than using “heavy” schedulers because no scheduler state is saved or called.

Slight Challenge

● It may be hard to locate schedulers cheaply on machines without per processor private memory

● Lightweight schedulers are slightly slow but good as they can perform function calls.

● Stateless scheduler is similar to storing all transitional state in registers in so far as scheduler state exists only during the context switch

Our choice

● Preswitch :- it is designed in a way to emulate all other models except storing transitional state in the registers

● Remember that locking is not part of a Quickthreads operative during context switch and thus threads must perform locking on the end of context switch

● We have also seen that hard coding may improve performance but as previously discussed it it not takes as an option because :-– Avoiding embedded synchronization improves

portability

– Keeps the programming model simple

– Synchronization is not always needed

– It would be harder to perform sync inline in quickthreads than in hand coded systems,

– Besides an out if line call would make the switches slower

– Thus we perform switched without any locking.

Flexibility and simplicity

● Usually can be achieved in a number of ways:-– Using powerful operations

– Using customization operations

– Stripping away operations that limit flexibility

Disadvantages

● Customization in operations are often slow and most existing languages do not provide fine grain customization

● Stripping removes operations that are not flexible but implement a useful purpose thus forcing the client to re-implement this functionality somehow

● In our design we strip everything leaving the essentials : initialization and context switch.

Key decisions

● The operations(initialize, start and stop ) are simple enough that they are close to the cost of fixed alternatives

● Quickthreads does not depend on other routines :-client does not need to worry about it introducing spurious races or deadlock

● The client is given the choice to handle storage and this is an advantage as client chooses what is best for it and user

● Quickthreads implements no scheduling mechanism . This is left to the client so as to choose what best suits it.

● It also lacks semaphores, monitors , non blocking I/O ....so as to ensure clients that don't use them are not paying the price of those that do

● Quickthreads provides basic operations to save and restore thread state.

● A huge bottleneck as compared to other packages is that scheduling and locking are provided by the client and executed via indirect procedure calls

● Quickthreads is designed to minimize procedure calls on each context switch to just two

● Hand coded thread operations are faster because locking and scheduling policies are fixed , simple and in-lined to minimize procedure calls overheads and holding time

Flexibility and Simplicity

● Flexibility can be achieved in several ways:– Using powerful operations

– Using customization operations

– Removing operations that limit flexibility

disadvantages

● Powerful operations are often slower● Customizable operations are slower and

existing languages do not support fine grain customization

● Stripping could remove parts of code that serve a useful purpose

● Quickthreads takes away all operations leaving behind context switching and thread initialization

● A design decision is that withing the library the start stop and initialize operations are so cheap in terms of cost as compared to fixed alternatives

Variant argument lists

● Thread creation primitives often take as its arguments a function pointer and 0 or more arguments to the function

● This is difficult because when a thread function is called some memory has to be set aside for parameter search and store

● From the view of thread packages it is easier to point to one argument and data structure The other structure will allow you to store more in it apart from that.

● Even though varargs does accept single arguments , it sadly makes threads slower to initialize

● It is therefore necessary to provide two interfaces, one for varargs and one fast one

● Using two interfaces makes threads harder to understand

The cool beans part

● As noted earlier quickthreads does not perform any allocation : it allows on the client thread package to allocate stacks, threads, queues, or any auxiliary data structures

● This therefore means it also does not implement any semaphores, monitors ,e. t. C

● It will perform the context switch and let the client worry about the clean up and allocation in a queue

● A thread may be in various states: initialized , uninitialized and ready to run but not started,running on a processor, blocked and waiting to be awakened or aborted in which case it is dead and can not be awakened

● Initialized threads are started the same way blocked threads are started: when thee distinction is unimportant they are both considered run-able

● All thread manipulations is done using a thread's stack pointer

● A client creates a thread by allocating a stack region, whereby its growth whether up or down is dependent on the machine

● A client will therefore initialize a thread passing in address and size of the stack region and getting back the stack pointer of the uninitialized thread.

● The client does this by calling a QT initialization primitive, which initializes the stack with functions and arguments to be used when the thread is started.

● As noted then an initialized thread and a suspended thread have no difference.

● The stack's thread pointer is passed to the context switch primitive along with a helper function and some arguments to assist in cleaning the old thread once the switch is done.

● The helper function is a parameter to the context switch primitive and thus can be changed dynamically

● Threads are known to run in a queue but here we do something different: we allow use of arbitrary data structures and thus threads can run when embedded in the data structure

Programming interface

● Quick threads is written in C and assembly language

● It must be bundled with the executable ● Include path options are used to tell it where to

locate the header● The basic routing interface consists of

routines(functions or macros) to create , initialize, run and stop threads

Technology

Quickthreads for GNU-2013