Multithreading patternsMultithreading patterns
Cristian Nicola Cristian Nicola Development ManagerDevelopment Manager
Net Evidence (SLM) LtdNet Evidence (SLM) Ltdhttp://www.tonicola.com [email protected] [email protected]
1. Introduction to multithreading 2. Multithreading patterns
1. Introduction to multithreading
In this section…• Why do multi-threading?• When and when not to use threads?
• Multithreading basic structures (Critical sections, Mutexes, Events, Semaphores and Timers)
• Multithreading problems (atomic operations, race conditions, priority inversion, deadlocks, livelocks, boxcar / lock convoys / thundering herd)
Why multi-threading?
• Multi-core / multi-CPU machines are now standard
• Makes programming more fun
When to use threads?
• Clearly defined work-tasks, and the work-tasks are long enough
• Data needed to complete the work tasks does not overlap (or maybe just a little)
• Generally UI interaction is not needed – background tasks
When NOT to use threads?
• Work-tasks are not clearly defined
• There is a lot of shared data between the tasks
• UI interaction is a requirement
• Work-tasks are small
• You do not have a good reason to use it
Multithreading structures
Jobs, processes, threads, fibers
Job 1
Process 1
Thread 1
Thread 2
Thread M
…
…
Process N
Fiber 1 Fiber 2 Fiber X…
What we need…a way to
i. …avoid simultaneous access to a common resource (mutexes, critical sections)
ii. …signal an occurrence or an action (events)
iii. …restrict/throttle the access to some shared resources (semaphores)
iv. …signal a due time – sometimes periodically (timers)
Critical sections
• User object - lightweight
• Their number is limited by memory
• Re-entrant
• Very fast when no collisions (10’s of instructions)
• Downgrades to a kernel object when locked
• No time-out
Mutexes
• Kernel object• Can be named for inter-process communication• Can have security flags• Can be inherited by child processes
• Can be acquired/released
Events
• Kernel object• Can be named for inter-process communication• Can have security flags• Can be inherited by child processes
• Holds a state: signalled, non-signalled
• Can be auto-reset - PulseEvent (should not be used)
• Auto-reset events are NOT re-entrant
Semaphores
• Kernel object• Can be named for inter-process communication• Can have security flags• Can be inherited by child processes
• Have a count property, but it cannot be interrogated
• Signalled when count > 0
Timers
• Kernel object• Can be named for inter-process communication• Can have security flags• Can be inherited by child processes
• Can be auto-reset
Kernel-land / User-land
• Kernel transition – expensive• User transition – fast
• Should avoid kernel transitions when possible (system calls, usage of kernel objects, un-needed thread creation or destruction)
Multithreading problems
Atomic operations
• A set of operations that must be executed as a whole, so they appear to the rest of the system to be a single operation
• There can be 2 outcomes:
- success
- failure
Atomic operations
For example the code:I = J + 1;
Can be compiled as:
MOV EAX, [EBP-$10]
INC EAX
MOV [EBP-$0C], EAX
Possible task switch
Possible task switch
Solution:Lock;
I = J + 1;
Unlock;
Race conditions
• A task switch can occur any timeA task switch can occur any time
Race conditions
• When 2 threads race to change the data
Problem:
Unpredictable result
Race conditions
Thread 1* Read A=1 into register* Increment register
* Write register= 2 into A in memory
Input: A = 1
Example: 2 threads incrementing a variable by 1
If we start from 1 then the expected result would be 3
Thread 2
* Read A=1 into register* Increment register* Write register= 2 into A in memory
Output: A = 2
Priority inversion
• A thread with a higher priority waits for a resource used by a thread with a lower priority
Problem:
A high priority thread is executed less often than a lower priority thread
Thread 1 (low priority)* Lock a file for usage writing some data into it
* Do some more work with the file* Release the file
Thread 2 (high priority)
* Wait for the file to be available
* Use the file
Priority inversionExample: 2 threads accessing the same file
Out of 3 switches:Low priority 2High priority 1
Deadlock
• 2 or more actions depend on each other for completion, and as a result none finishes
Problem:
One or more threads stop working for indefinite amounts of time
Deadlock conditions
1. Mutual exclusion locking of resources
2. Resources are locked while others are waited for
3. Pre-emption while holding resources is permitted
4. A circular wait condition exists
Deadlock
Thread 1
* Lock resource A
* Wait for resource B to be available
Thread 2
* Lock resource B
* Wait for resource A to be available
Example: 2 threads accessing the same resources
Both threads are now stopped, no way to wake up
Livelock
• Same as deadlock, except the detection/prevention of deadlocks would wake up the threads, without progressing
Problem:
One or more threads do not progress, they do spin
• 2 people travelling in opposite directions, each other is polite and moves aside to make space – none of them can pass as the move from side to side
Boxcar / Lock Convoys / Thundering herd
• Can have a serious performance penalty• The application would work fine• A certain flag wakes up many threads, however only the
first one has work to do
Problem:
Threads wake up, wait on a resource and then there is no work to do
Boxcar / Lock Convoys / Thundering herd
Thread 1* Sleep waiting for event
* Lock data
* Use data* Unlock data* Go back to sleep
Thread 2
* Sleep waiting for event
* Wait for the data lock to be available
* Lock data* Nothing to do* Unlock data* Go back to sleep
Example: 2 threads wake up to use the same resource
Flag is signalled
2. Multithreading patterns
In this section…
• What is a design pattern?
• Groups of patterns (control-flow patterns, data patterns, resource patterns, exception/error patterns)
• Multithreading patterns sources
• A design pattern is a reusable solution to a recurring problem in the context of object oriented development
• Patterns can be about other topics
What is a design pattern?
• Control-flow: aspects related to control and flow dependencies between various threads (e.g. parallelism, choice, synchronization)
• Data perspective: passing of information , scoping of variables, etc
• Resource perspective: resource to thread allocation, delegation, etc.
• Exception handling: various causes of exceptions and the various actions that need to be taken as a result of exceptions occurring
Groups of patterns
Control-flow patterns
Worker threads
• Sometimes referred as “Active Object”, “Cyclic Executive” or “Concurrency Pattern”
• Generic threads doing some work without being aware of what kind of work they do
• They share a common work queue
• Very useful in highly parallel systems
Worker threads
• Windows Vista/Server has API support for creating thread pools (CreateThreadpool)
• Use a semaphore to limit the number of active threads to a number compared to the CPU’s count (usually 2 x CPU)
• Background Worker Patternnotifications when the thread completes, but provides an update on the status of the operation– May need a cancel of the operation
• Asynchronous Results Patternyou are more interested in the result than the actual status of the operations
Worker threads - variants
Implicit Terminationthe worker has finished its work and can end
Explicit Termination the worker is asked to terminate
Worker threads - Termination
Scheduler
• Explicitly control when threads may execute single-threaded code (sequences waiting threads)
• Independent mechanism to implement a scheduling policy
• Read/Write lock is usually implemented using the scheduler pattern to ensure fairness in scheduling
• Adds significant overhead
Thread pool
• A number of threads are created to perform a number of tasks, usually organized in a queue
• There are many more tasks than threads
• When thread completes its task:– If more tasks -> request the next task from the queue – If no more tasks -> it terminates, or sleeps
• Number of threads used is a parameter that can be tuned - can be dynamic based on the number of waiting tasks
Thread pool
• The creating or destroying algorithm impacts overall performance:– Create too many threads = resources and time are wasted – Destroy too many threads = time spent re-creating– Creating threads too slowly = poor client performance – Destroying threads too slowly = starvation of resources
• Negates thread creation and destruction overhead
• Better performance and better system stability
Thread pool - triggers
• Transient Trigger– Offers the capability to signal currently running
threads– They are lost if not acted upon right away
• Persistent Trigger – Generally it would result in the pool actions– They would be persisted and would eventually be
handled
• Asynchronous communications, implemented via queued messages
• Simple, without mutual exclusion problems
• No resource is shared by reference
• The shared information is passed by value
Message Queuing
• Occurs when the event of interest occurs
• Executes very quickly and with little overhead
• Provide a means for timely response to urgent needs
• There are circumstances when their use can lead to system failure
• Asynchronous procedure calls (APC)
Interrupt
• Used when it may not be possible to wait for an asynchronous rendezvous
• The call of the method of the appropriate object in the other thread can lead to mutual exclusion problems if the called object is currently active doing something else
• The Guarded Call Pattern handles this case through the use of a mutual exclusion semaphore
Guarded Call
• Concerned with modelling the preconditions for synchronization or rendezvous of threads
• ready threads registers with the Rendezvous class
• then blocks until the Rendezvous class releases it to run
• Build a collaboration structure that allows any arbitrary set of preconditions to be met for thread synchronization,
• Independent of task phrasings, scheduling policies, and priorities
Rendezvous
Data patterns
• Also called thread-local storage
• Any function in that thread will get the same value, TLS is allocated per thread
• Similar to global storage - unlike global storage, functions in another thread will not get the same value
• Thread specific storage sometimes refers to the private virtual address space of a running task
Thread-Specific Storage
• Dynamic memory problems: • nondeterministic timing of memory allocation and de-allocation• memory fragmentation
• Simple approach to solving both these problems: disallow dynamic memory allocation
• Only used simple systems with highly predictable and consistent loads
• All objects are allocated during system initialization (the system takes longer to initialize, but it operates well during execution)
Static Allocation
• Involves creating of pools of objects at start-up
• Doesn't address needs for dynamic memory
• The pools are not necessarily initialized at start-up
• The pools are available upon request
Pool Allocation
• Memory fragmentation occurs when:• The order of allocation is independent of the release order • Memory is allocated in various sizes from the heap
• Used when we cannot tolerate dynamic allocation problems like fragmentation
• Fragmentation-free dynamic memory allocation at the cost of loss of memory usage optimality
• Similar to a dynamic allocation but only allows fixed pre-defined sizes to be allocated
Fixed Sized Buffer
• Solves memory leaks and dangling pointers
• It does not address memory fragmentation
• Takes the programmer out of the loop
• Adds run-time overhead
• Adds a loss of execution predictability
Garbage Collection
• Removes memory fragmentation
• Maintains two memory segments in the heap
• Moves live objects from one segment to the next
• The free memory in on of the segments is a contiguous block
Garbage Compactor
Resource patterns
Locked structures
• Structures that use a locking mechanism
• Easy to implement, easy to debug
• Can deadlock
• Do not scale well
Lock-free structures
• They do not need to lock
• They need hardware support (e.g. compare-and-swap instructions)
• They can “burn” CPU
• Hard to implement and debug
Wait-free structures
• Same as lock free structures, but there is a guarantee they would finish in a certain number of steps
• All wait-free structures are lock-free
• Very difficult to implement
• Very few real life applications
Single writer / multi reader
• Special kind of lock that would allow multiple read access to the data but only a single write (exclusive write access)
• Problems on promoting from read to write (reader starvation, writers starvation) – Scheduler pattern
• Also known as "Double-Checked Locking Optimization"
• Reduces the overhead of acquiring a lock
• Used for implementing "lazy initialization" in a multi-threaded environment
If check failed thenLock
If check failed thenInitialize
Unlock
Double-checked locking
• Common memory area addressable by multiple processors
• Almost always involves a combined hardware/software solution
• If the data to be shared is read-only then concurrency protection mechanisms may not be required
• Used when responses to messages and events are not desired or too slow
Shared Memory
• Deadlocks avoidance
• Works in an all-or-none fashion
• Prevents the condition of holding some resources while requesting others
• Allows higher-priority tasks to run if they don't need any of the locked resources
Simultaneous Locking
• Eliminates deadlocks
• Orders resources and enforcing an ordered policy in which resources must be allocated
• If enforced then no circular waiting condition can ever occur
• Explicitly lock and release the resources
• Has the potential for neglecting to unlock the resource exists
Ordered Locking
Exception/error patterns
• Work failure • Deadline expiry• Resource unavailability • External trigger • Constraint violation
Exceptions/errors
Handling:• Continue• Remove work item• Remove all items
Recovery:• no action • rollback • compensate
Balking
• Executes an action on an object when the object is in a particular state
• An attempt to use the object out of its legal state would result in an "Illegal State Exception"
Triple Modular Redundancy
• Used when there is no fail-safe state
• Based on an odd number of channels operating in parallel
• The computational results or resulting actuation signals are compared, and if there is a disagreement, then a two-out-of-three majority wins
• Any deviating computation of the third channel is discarded
Watchdog
• Lightweight and inexpensive
• Minimal coverage
• Watches out over processing of another component
• Usually checks a computation time base …
• … or ensures that computation steps are proceeding in a predefined order
• http://www.workflowpatterns.com
• “Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems” by Bruce Powel Douglass
Multithreading patterns sources
Questions ?
Big thank you!