46
Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Scheduling Memory Transactions

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 2: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Synchronization alternatives: Transactional Memory

A (memory) transaction is a sequence of memory reads and writes executed by a single thread that either commits or aborts

If a transaction commits, all the reads and writes appear to have executed atomically

If a transaction aborts, none of its operations take effect

Transaction operations aren't visible until they commit (if they do)

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 3: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Transactional Memory Implementations

Hardware Transactional Memory Transactional Memory [Herlihy & Moss, '93] Transactional Memory Coherence and Consistency [Hammond et al., '04] Unbounded transactional memory [Ananian, Asanovic, Kuszmaul,

Leiserson, Lie, '05]…

Software Transactional Memory Software Transactional Memory [Shavit &Touitou, '97] DSTM [Herlihy, Luchangco, Moir, Scherer, '03] RSTM [Marathe et al., '06] WSTM [Harris & Fraser, '03], OSTM [Fraser, '04], ASTM [Marathe,

Scherer, Scott, '05], SXM [Herlihy]…

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 4: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

“Conventional” STM system high-level structure

TM system

OS-scheduler-controlledthreads

Contention

Manager

ContentionDetection

arbitrate

proceed

Abort/retry, wait

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 5: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Talk outline

Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Schedulers TM-scheduling OS support

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 6: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

TM-ignorant schedulers are problematic!

1) Does not permit serializing contention management and collision avoidance.

2) Makes it difficult to dynamically reduce concurrency level.

3) Hurts TM performance stability/predictability.

TM-ignorant scheduling:

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 7: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

Enter TM schedulers

“Adaptive transaction scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]

“CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08]

“Steal-on-abort: dynamic transaction reordering to reduce conflicts in transactional memory” [Ansari et al., HiPEAC'09]

“Preventing versus curing: avoiding conflicts in transactional memories” [Dragojevic, Guerraoui, Singh & Singh, PODC'09]

“Transactional scheduling for read-dominated workloads” [Attiya & Milani, OPODIS'09]

“On the impact of Serializing Contention Management on STM performance” [Heber, Hendler & Suissa, OPODIS '09, to appear]

“Scheduling support for transactional memory contention management” [Fedorova, Felber, Hendler, Lawall, Maldonado, Marlier Muller & Suissa, PPoPP'10]

Page 8: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

Our work

“CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08]

“On the impact of Serializing Contention Management on STM performance” [Heber, Hendler & Suissa, OPODIS '09]

“Scheduling support for transactional memory contention management” [Fedorova, Felber, Hendler, Lawall, Maldonado, Marlier Muller & Suissa, PPoPP'10]

Page 9: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

CAR-STM (Collision Avoidance and Reduction for STM) Design Goals

Parallel computing day, Ben-Gurion University, October 20, 2009

Limit Parallelism to a single transaction per core (or hardware thread)

Serialize conflicting transactions

Contention avoidance

Page 10: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

CAR-STM high-level architecture

Transaction queue #1

TQ thread

TQ thread

Transaction thread

T-Info

Core #1

Serializing

contention mgr.

Dispatcher

Collision

Avoider

Core #k

Transaction queue #k

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 11: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

TQ-Entry Structure

Transaction queue #1

TQ thread

TQ thread

Transaction thread

T-Info

Core #1

Serializing

contention mgr.

Dispatcher

Collision

Avoider

Core #k

Transaction queue #k

wrapper method

Transaction data

T-Info

Trans. thread

Lock, condition var

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 12: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Transaction dispatching processEnque transaction in most-conflicting queue. Put thread to sleep, notify TQ thread.

4

4

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 13: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Transaction execution

TQ thread

Core #i

Transaction queue #i

wrapper method

Transaction data

T-Info

Trans. threadLock, condition var

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 14: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Serializing Contention Managers

When two transactions collide, fail the newer transaction and move it to the TQ of the older transaction

Fast elimination of live-lock scenarios Two SCMs implemented

o Basic (BSCM) – move failed transaction to end of the other transactions' TQ

o Permanent (PSCM) – Make the failed transaction a subordinate-transaction of the other transaction

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 15: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

PSCM

Ta

Transaction

queue #1

TQ thread

Core #1

PSCM

Tb

Transaction

queue #k

TQ thread

Core #k

Tc

Td Te

Transactions a and b collide, b is older

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 16: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

PSCM

Transaction queue #1

TQ thread

Core #1

PSCM

Tb

Transaction queue #k

TQ thread

Core #k

TaTc

Td Te

Losing transaction and its subordinates are made subordinates of winning transaction

Ta Tc

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 17: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Execution time: STMBench7R/W dominated workloads

Page 18: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Throughput: STMBench7R/W dominated workloads

Page 19: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

CAR-STM Shortcomings

May restrict parallelism too much At most a single transactional thread per

core/hardware-thread Transitive serialization

High overhead

Non-adaptive

Page 20: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Talk outline

Parallel computing day, Ben-Gurion University, October 20, 2009

Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Scheduling TM-scheduling OS support

Page 21: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

“On the impact of Serializing Contention Management on STM performance”

CBench – synthetic benchmark generating workloads with pre-determined length and abort probability.

A low-overhead serialization mechanism

Better understanding of adaptive serialization algorithms

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 22: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

A Low Overhead Serialization Mechanism(LO-SER)

Transactional threads

Conditionvariables

Page 23: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

A Low Overhead Serialization Mechanism (cont'd)

1) t Identifies a collision

2) t calls contention manager: ABORT_OTHER

3) t change status of t' to ABORT (writes that t is winner)

tt'

4) t' identifies it was aborted

Page 24: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

A Low Overhead Serialization Mechanism (cont'd)

t

t'

5) t' rolls back transaction and goes to sleep on the condition variable of t

6) Eventually t commits and broadcasts on its condition variable…

Page 25: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

A Low Overhead Serialization Mechanism (cont'd)

tt'

Page 26: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

Requirements for serialization mechanism

Commit broadcasts only if transaction won a collision since last broadcast (or start of transaction)

No waiting cycles (deadlock-freedom)

Avoid race conditions

Page 27: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

LO-SER algorithm: data structures

Page 28: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

LO-SER algorithm: pseudo-code

Page 29: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

LO-SER algorithm: pseudo-code (cont'd)

Page 30: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

LO-SER algorithm: pseudo-code (cont'd)

Page 31: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

Adaptive algorithms

Collect (local or global) statistics on contention level.

Apply serialization only when contention is high. Otherwise, apply a “conventional” contention-management algorithm.

We find that Stabilized adaptive algorithms perform better.

First adaptive TM scheduler:“Adaptive transaction scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]

Page 32: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

CBench Evaluation

CAR-STM incurs high overhead as compared with

other algorithms

Always serializing is bad in medium

contention

Always serializing is best in high contention

Always serializing incurs no overhead in the lack of contention

Page 33: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

CBench EvaluationAdaptive

serialization fares well for all

contention levels

Page 34: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

CBench Evaluation

Conventional CM performance

degrades for high contention

Page 35: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

CBench Evaluation (cont'd)

CAR-STM has best efficiency but worst

throughput

Page 36: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

RandomGraph Evaluation

Stabilized algorithm improves

throughput by up to 30%

Throughput and efficiency of conventional algorithms are

bad

Page 37: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Schedulers TM-scheduling OS support

Parallel computing day, Ben-Gurion University, October 20, 2009

Talk outline

Page 38: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

“Scheduling Support for Transactional Memory Contention Management”

Implement CM scheduling support in the kernel scheduler (Linux & OpenSolaris) (Strict) serialization Soft serialization Time-slice extension

Different mechanisms for communication between user-level STM library and kernel scheduler

Page 39: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

TM Library / Kernel Communication via Shared Memory Segment (Ser-k)

User code notifies kernel on events such as: transaction start, commit and abort (in which case thread yields)

Kernel code handles moving thread between ready and blocked queues

Page 40: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

Soft Serialization

Instead of blocking, reduce loser thread priority and yield

Efficient in scenarios where loser transactions may take a different execution path when retrying (non-determinism)

Priority should be restored upon commit or when conflicting transactions terminate

Page 41: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Parallel computing day, Ben-Gurion University, October 20, 2009

Time-slice extention

Preemption in the midst of a transaction increases conflict “window of vulnerability”

Defer preemption of transactional threads avoid CPU monopolization by bounding number of

extensions and yielding after commit

May be combined with serialization/soft serialization

Page 42: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Evaluation (STMBench7, 16 core machine)

Conventional CM deteriorates when

threads>cores

Serializing by local spinning is efficient as long as threads ≤

cores

Page 43: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Evaluation - STMBench7 throughput

Serializing by sleeping on condition var is best when threads>cores, since system call

overhead is negligible (long transactions)

Page 44: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Evaluation - STMBench7 aborts data

Page 45: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Conclusions

Scheduling-based CM results in Improved throughput in high contention Improved efficiency in all contention levels LO-SER-based serialization incurs no visible overhead

Lightweight kernel support can improve performance and efficiency

Dynamically selecting best CM algorithm for workload at hand is a challenging research direction

Parallel computing day, Ben-Gurion University, October 20, 2009

Page 46: Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Thank you. Any questions?

Parallel computing day, Ben-Gurion University, October 20, 2009