28
Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once

New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 1 / 28

Transactional Memory

or

How to do multiple things at once

Page 2: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 2 / 28

Transactional Memory:Architectural Support for Lock-Free

Data Structures

M. Herlihy, J. Eliot, and B. Moss (ISCA'93)

And

Transactional Memory Architecture and Implementation for IBM System z

C. Jacobi, T. Slegel, and D.Greiner (MICRO'12)

Page 3: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 3 / 28

Observations

● Single thread performance stalls● SMP everywhere

– Speedup by efficient parallelism

– Coarse locks (contention on the lock)

– Fine-grained locking (hard, lock order, overhead)

– Lock-free data structures (very hard, complexity)

● Amdahl's Law: Max. speedup limited by sequential (non-parallelizable) code

Page 4: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 4 / 28

Locks

● Priority Inversion (lock holder preemption)– Low priority lock holder gets preemped by higher

priority process requesting the lock

● Convoying (lock holder descheduled) – Lock holder cannot run while other lock requesters

could do so

● Deadlocks– Avoidance can be hard, especially if involved

objects and their dependencies are unknown

Page 5: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 5 / 28

Lock-free data structures

● Single linked list ✔

● RCU ✔

● Double linked list or bank account transfer ✘

● CAS (compare and swap) → DCAS, m-CAS

Page 6: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 6 / 28

Transactions

TX begin

TX end

TX load [mem1]

TX store [mem2]

TX begin

TX end

TX store [mem1]

Page 7: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 7 / 28

Transactions

● Finite set of machine instructions, executed by one process

● Serializable, instructions of multiple transactions never appear interleaved

● Atomic, multiple memory accesses (reads and writes) either all commit (become visible at the same time), or abort (writes get discarded)

Page 8: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 8 / 28

Instructions for Memory Access

● Load-transactional (LT) reads a value from shared memory into private register

● Load-transactional-exclusive (LTX) like LT, but with the intention to later write that location

● Store-transactional (ST) writes a value to shared memory, but becomes visible to other processors at commit

Page 9: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 9 / 28

Instructions for Management

● Commit ends transaction ant tries to make writes permanent, either succeeds or fails

● Abort drops writes, manually ends transaction prematurely

● Validate returns true or false, denoting if the ongoing transaction has not aborted yet. Failed validates will discard the write set immediately

Page 10: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 10 / 28

MESI Protocol

Page 11: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 11 / 28

Basics

● Data versioning to undo speculative writes● Buffering writes vs. undo log● Begin TX explicit or implicit (starts with first TX

load or TX store vs. TX begin and TX commit)● Eager vs. lazy conflict detection● Level of granularity for conflict detection:

individual reads/writes, objects, cache lines● Resolution: when to abort and whom● HTM vs. STM

Page 12: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 12 / 28

Hardware Transactional Memory

● TCC ( Transactional Memory Coherence and Consistency model)– Buffers locally its write set

– Upon commit, bus arbiters who is allowed to broadcast stored writes, other processors snoop and abort → lazy conflict detection

● LogTM– Observastion: commits often succeeds, aborts are

rare → optimize good case

– Write to memory, keep undo log

– Eager conflict detection

Page 13: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 13 / 28

Software Transactional Memory

● Per-thread view on the heap● Conflict detection and resolving in software● Memory organization

– Transactional and ordinary data separate vs. mixed → different object format, e-g- keep TX meta data in object header

– Register all reads and writes to TX data

– Shadow copies of modified data, discard at abort

– In-place updates with undo log

Page 14: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 14 / 28

Evaluation (Simulated)

● Counting Benchmark (inc shared counter )

● Producer/Consumer

● Double-Linked List

Page 15: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 15 / 28

Snoop based Directory based

Page 16: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 16 / 28

Snoop based Directory based

Page 17: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 17 / 28

… break

This was 1993, now follows an IBM paper reporting on latest z Series HTM support (2012)

Page 18: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 18 / 28

IBM Blue Gene/Q and z Series

● TBEGIN and TEND● TBEGIN with register mask what to restore● Aborts jump to the instruction after TBEGIN and

sets condition code (CC)● Retry with backof, PPA instructions delays for

undefinded amount of time, which is “optimal” for a given architecture (microcode assisted)

● Nested transactions being flattened (TBEGIN and TEND count nesting depth, also microcode)

Page 19: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 19 / 28

Page 20: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 20 / 28

System Background

● 6 cores per chip x 6 chips per module x 4 modules = 144 coherent SMP cores

● 96K L1, 1M L2, private, write through → never dirty

● 48M L3, 384M off-chip L4, shared, write back● All 4 levels are inclusive● Tracking per cache line, tx-read and tx-dirty bit

Page 21: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 21 / 28

Interrupt Filtering

● Some Exceptions/Interrupts can be filtered, not trapping into the OS, but aborting ongoing transactions

● Memory– Page faults: no need to check for null pointers, but

abort transaction if encountering one

● Arithmetic– No check for div-by-zero of NaN, but again abort

unlikely case

Page 22: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 22 / 28

Testability and Debugging

● Abort path rarely taken– added random abort mode, CPU will randomly

abort some or every transaction before it commits

● Breakpoints (exception)– TX abort, cannot debug within a transaction

– NTSTG (non-transactional stores) are not rolled back on abort, can be used to pass data out of a transaction

● Transactional Diagnostic Block– Buffer to debug abort reason and internal state of

the aborted transaction (instruction pointer, ...)

Page 23: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 23 / 28

Example Transaction

Page 24: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 24 / 28

Constrained Transactions

● No progress guarantee with TBEGIN● Many transactions are short/small → TBEGINC● CPU gives progress guarantee, but no strict

upper limit on retries● Max. 32 instructions in max 256 consecutive

bytes text● Only relative forward jumps● Max. 32 bytes (4 x 8 byte) data● No complex instructions (like floating point)

Page 25: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 25 / 28

TBEGINC

● On abort, direct jump backto TBEGINC, instead of TBEGIN → no abort path

● Microcode– counts retries

– reset by successful TEND

– increases random delay

– Reduces amount of speculative execution

– Last resort: broadcast to all CPUs to sync and thereby stops conflicting accesses

Page 26: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 26 / 28

… break

And briefly Intel Haswell (2013)

Page 27: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 27 / 28

Intel Haswell

● Hardware Lock Elision (HLE)– XAQUIRE and XRELEASE

– Backwards compatible, uses REPNE/REPE prefixes (older CPUs will ignore them)

– Try without actuallly taking (writing) the lock, if commit fails at unlock, redo with lock

● Restricted Transactional Memory (RTM)– XBEGIN, XEND, XTEST, and XABORT

– More powerful, requires code adaptation

– Nesting with flattening

Page 28: New Transactional Memory - TU Dresdenos.inf.tu-dresden.de/.../SS2014/08-TransactionalMemory.pdf · 2014. 6. 17. · Benjamin Engel Transactional Memory 13 / 28 Software Transactional

Benjamin Engel Transactional Memory 28 / 28

In a Nutshell

● Improve performance through parallelisation● Fine-grained locking is hard and error prone,

lock-free even more● Transactional memory a solution to build atomic

custom read-modify-write operations● Optimistic assuming rare conflicts, while locking

is pessimistic (assumes conflict)● Can help to drastically improve performance

and code maintainability