Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Benjamin Engel Transactional Memory 1 / 28
Transactional Memory
or
How to do multiple things at once
Benjamin Engel Transactional Memory 2 / 28
Transactional Memory:Architectural Support for Lock-Free
Data Structures
M. Herlihy, J. Eliot, and B. Moss (ISCA'93)
And
Transactional Memory Architecture and Implementation for IBM System z
C. Jacobi, T. Slegel, and D.Greiner (MICRO'12)
Benjamin Engel Transactional Memory 3 / 28
Observations
● Single thread performance stalls● SMP everywhere
– Speedup by efficient parallelism
– Coarse locks (contention on the lock)
– Fine-grained locking (hard, lock order, overhead)
– Lock-free data structures (very hard, complexity)
● Amdahl's Law: Max. speedup limited by sequential (non-parallelizable) code
Benjamin Engel Transactional Memory 4 / 28
Locks
● Priority Inversion (lock holder preemption)– Low priority lock holder gets preemped by higher
priority process requesting the lock
● Convoying (lock holder descheduled) – Lock holder cannot run while other lock requesters
could do so
● Deadlocks– Avoidance can be hard, especially if involved
objects and their dependencies are unknown
Benjamin Engel Transactional Memory 5 / 28
Lock-free data structures
● Single linked list ✔
● RCU ✔
● Double linked list or bank account transfer ✘
● CAS (compare and swap) → DCAS, m-CAS
Benjamin Engel Transactional Memory 6 / 28
Transactions
TX begin
TX end
TX load [mem1]
TX store [mem2]
TX begin
TX end
TX store [mem1]
Benjamin Engel Transactional Memory 7 / 28
Transactions
● Finite set of machine instructions, executed by one process
● Serializable, instructions of multiple transactions never appear interleaved
● Atomic, multiple memory accesses (reads and writes) either all commit (become visible at the same time), or abort (writes get discarded)
Benjamin Engel Transactional Memory 8 / 28
Instructions for Memory Access
● Load-transactional (LT) reads a value from shared memory into private register
● Load-transactional-exclusive (LTX) like LT, but with the intention to later write that location
● Store-transactional (ST) writes a value to shared memory, but becomes visible to other processors at commit
Benjamin Engel Transactional Memory 9 / 28
Instructions for Management
● Commit ends transaction ant tries to make writes permanent, either succeeds or fails
● Abort drops writes, manually ends transaction prematurely
● Validate returns true or false, denoting if the ongoing transaction has not aborted yet. Failed validates will discard the write set immediately
Benjamin Engel Transactional Memory 10 / 28
MESI Protocol
Benjamin Engel Transactional Memory 11 / 28
Basics
● Data versioning to undo speculative writes● Buffering writes vs. undo log● Begin TX explicit or implicit (starts with first TX
load or TX store vs. TX begin and TX commit)● Eager vs. lazy conflict detection● Level of granularity for conflict detection:
individual reads/writes, objects, cache lines● Resolution: when to abort and whom● HTM vs. STM
Benjamin Engel Transactional Memory 12 / 28
Hardware Transactional Memory
● TCC ( Transactional Memory Coherence and Consistency model)– Buffers locally its write set
– Upon commit, bus arbiters who is allowed to broadcast stored writes, other processors snoop and abort → lazy conflict detection
● LogTM– Observastion: commits often succeeds, aborts are
rare → optimize good case
– Write to memory, keep undo log
– Eager conflict detection
Benjamin Engel Transactional Memory 13 / 28
Software Transactional Memory
● Per-thread view on the heap● Conflict detection and resolving in software● Memory organization
– Transactional and ordinary data separate vs. mixed → different object format, e-g- keep TX meta data in object header
– Register all reads and writes to TX data
– Shadow copies of modified data, discard at abort
– In-place updates with undo log
Benjamin Engel Transactional Memory 14 / 28
Evaluation (Simulated)
● Counting Benchmark (inc shared counter )
● Producer/Consumer
● Double-Linked List
Benjamin Engel Transactional Memory 15 / 28
Snoop based Directory based
Benjamin Engel Transactional Memory 16 / 28
Snoop based Directory based
Benjamin Engel Transactional Memory 17 / 28
… break
This was 1993, now follows an IBM paper reporting on latest z Series HTM support (2012)
Benjamin Engel Transactional Memory 18 / 28
IBM Blue Gene/Q and z Series
● TBEGIN and TEND● TBEGIN with register mask what to restore● Aborts jump to the instruction after TBEGIN and
sets condition code (CC)● Retry with backof, PPA instructions delays for
undefinded amount of time, which is “optimal” for a given architecture (microcode assisted)
● Nested transactions being flattened (TBEGIN and TEND count nesting depth, also microcode)
Benjamin Engel Transactional Memory 19 / 28
Benjamin Engel Transactional Memory 20 / 28
System Background
● 6 cores per chip x 6 chips per module x 4 modules = 144 coherent SMP cores
● 96K L1, 1M L2, private, write through → never dirty
● 48M L3, 384M off-chip L4, shared, write back● All 4 levels are inclusive● Tracking per cache line, tx-read and tx-dirty bit
Benjamin Engel Transactional Memory 21 / 28
Interrupt Filtering
● Some Exceptions/Interrupts can be filtered, not trapping into the OS, but aborting ongoing transactions
● Memory– Page faults: no need to check for null pointers, but
abort transaction if encountering one
● Arithmetic– No check for div-by-zero of NaN, but again abort
unlikely case
Benjamin Engel Transactional Memory 22 / 28
Testability and Debugging
● Abort path rarely taken– added random abort mode, CPU will randomly
abort some or every transaction before it commits
● Breakpoints (exception)– TX abort, cannot debug within a transaction
– NTSTG (non-transactional stores) are not rolled back on abort, can be used to pass data out of a transaction
● Transactional Diagnostic Block– Buffer to debug abort reason and internal state of
the aborted transaction (instruction pointer, ...)
Benjamin Engel Transactional Memory 23 / 28
Example Transaction
Benjamin Engel Transactional Memory 24 / 28
Constrained Transactions
● No progress guarantee with TBEGIN● Many transactions are short/small → TBEGINC● CPU gives progress guarantee, but no strict
upper limit on retries● Max. 32 instructions in max 256 consecutive
bytes text● Only relative forward jumps● Max. 32 bytes (4 x 8 byte) data● No complex instructions (like floating point)
Benjamin Engel Transactional Memory 25 / 28
TBEGINC
● On abort, direct jump backto TBEGINC, instead of TBEGIN → no abort path
● Microcode– counts retries
– reset by successful TEND
– increases random delay
– Reduces amount of speculative execution
– Last resort: broadcast to all CPUs to sync and thereby stops conflicting accesses
Benjamin Engel Transactional Memory 26 / 28
… break
And briefly Intel Haswell (2013)
Benjamin Engel Transactional Memory 27 / 28
Intel Haswell
● Hardware Lock Elision (HLE)– XAQUIRE and XRELEASE
– Backwards compatible, uses REPNE/REPE prefixes (older CPUs will ignore them)
– Try without actuallly taking (writing) the lock, if commit fails at unlock, redo with lock
● Restricted Transactional Memory (RTM)– XBEGIN, XEND, XTEST, and XABORT
– More powerful, requires code adaptation
– Nesting with flattening
Benjamin Engel Transactional Memory 28 / 28
In a Nutshell
● Improve performance through parallelisation● Fine-grained locking is hard and error prone,
lock-free even more● Transactional memory a solution to build atomic
custom read-modify-write operations● Optimistic assuming rare conflicts, while locking
is pessimistic (assumes conflict)● Can help to drastically improve performance
and code maintainability