41
Hardware Transactional Memory Shimin Chen (LBA Reading Group)

Hardware Transactional Memory

Embed Size (px)

DESCRIPTION

Hardware Transactional Memory. Shimin Chen (LBA Reading Group). Outline. Transaction Concept A simple HTM Common Case Transaction Behaviors HTM Research Directions Description of Papers Summary. Transaction. A finite sequence of instructions Atomicity: all or nothing - PowerPoint PPT Presentation

Citation preview

Page 1: Hardware Transactional Memory

Hardware Transactional Memory

Shimin Chen(LBA Reading Group)

Page 2: Hardware Transactional Memory

Outline

Transaction Concept A simple HTM Common Case Transaction

Behaviors HTM Research Directions Description of Papers Summary

Page 3: Hardware Transactional Memory

Transaction A finite sequence of instructions Atomicity: all or nothing Serializability (Isolation): steps of one

transaction never appear to be interleaved with the steps of another.

A and B cannot be concurrent if ReadSet(A) WriteSet(B) , or WriteSet(A) ReadSet(B) , or WriteSet(A) WriteSet(B)

Page 4: Hardware Transactional Memory

A simple HTMNew hardware mechanisms to checkpoint register state

Checkpoint register renaming table buffer transactional writes

in private cache record transactional read-set and write-set

R bit and W bit per cache line Or dedicated state buffer on the side

detect conflict leverage cache coherence protocol

resolve conflict e.g. requester wins

Page 5: Hardware Transactional Memory

Simple HTM Operations TxBegin

Checkpoint register state Load/Store

Set state bits in cache; abort upon cache eviction Incoming coherence message

Check conflicts with state bits; abort if conflicted TxCommit

Flash clear state bits Abort

Flash invalidate write sets and read sets Restore register checkpoint

Page 6: Hardware Transactional Memory

Outline

Transaction Concept A simple HTM Common Case Transaction

Behaviors HTM Research Directions Description of Papers Summary

Page 7: Hardware Transactional Memory

“The Common Case Transactional Memory Behavior of Multithreaded Programs”. Stanford Team (Kozyrakis, Olukotun, and their students: Chung, Chafi, Minh, McDonald, Carlstrom). HPCA 2006.

Studied 35 applications Java, C+Pthread, C+OpenMP,

Parallel Processing Macros Assume high level parallelism

structure remains the same: convert lock/unlock into begin/end etc.

Trace-based analysis

Page 8: Hardware Transactional Memory

Non-blocking synchronization

Page 9: Hardware Transactional Memory

ReadSet and WriteSet Size

For 95% of transactions, RS < 4KB, WS<1KB

Weighted by time: 52KB RS, 30KB WS needed for covering 80% time

(assuming 32B cache lines)

Page 10: Hardware Transactional Memory

Nesting

Nesting distance could be high Partial rollback may be needed

Two-level of nests are common

Page 11: Hardware Transactional Memory

Speculative Parallelization

Page 12: Hardware Transactional Memory

Outline

Transaction Concept A simple HTM Common Case Transaction

Behaviors HTM Research Directions Description of Papers Summary

Page 13: Hardware Transactional Memory

Directions

Dealing with overflows Virtualizing HTM

Mixing HTM with STM Two code paths Use hardware mechanisms to speed

up STM

Page 14: Hardware Transactional Memory

Terminology

Conflict Detection Eager: at coherence message Lazy: at commit time

Version Management Eager: save old version, update in

place Lazy: buffer updates

Conflict Resolution

Page 15: Hardware Transactional Memory

Outline

Transaction Concept A simple HTM Common Case Transaction

Behaviors HTM Research Directions Description of Papers Summary

Page 16: Hardware Transactional Memory

• “Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993.

• “Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993.

• “Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001.

• “Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002.

• “Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004.

• “Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005.

• “Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005.

• “LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006.

• “Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006.

• “Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.

Page 17: Hardware Transactional Memory

• “Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006.

• “Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006.

• “Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006.

• “Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006.

• “Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006.

• “Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007.

• “An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007.

• “An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007.

• “Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.

Page 18: Hardware Transactional Memory

Non-overflowed HTM

Page 19: Hardware Transactional Memory

“Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993.

First HTM paper Simple HTM like

Transactional cache along L1D Abort, roll-back: not fully automatic

HW discards transactional updates SW jumps back and retries transaction (w/ exp

backoffs)

Conflict detection: eager (coherence) Conflict resolution: requester aborts

Page 20: Hardware Transactional Memory

“Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993.

Single reservation: LL-SC Multiple reservations: all or nothing,

transactions w/ read-modified-writes Oklahoma update (In a musical “Oklahoma!”, there is

a song titled “All er Nothin”) Simple HTM like

Batch updates and detection at commit time

Page 21: Hardware Transactional Memory

“Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001. (SLE)

Idea: speculate lock-unlock critical section

while eliding locks using simple HTM fall back to locking upon conflicts &

overflows Novelty: recognizing lock and unlock

Lock: LL-SC with predictors Unlock: a store to restore value

changed by LL-SC

Page 22: Hardware Transactional Memory

“Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002. (TLR)

SLE + resolve conflicts Timestamp

<# of commited TLR on the local cpu, cpu ID> Stall or Abort the younger transaction upon conflicts

Non-trivial addition to cache coherence protocol for avoiding deadlocks

Page 23: Hardware Transactional Memory

“Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004. (TCC)

Conflict detection: lazy Novelty: propose to use

transactional memory to replace cache coherence Illusion of shared memory Batch communication like message

passing

Page 24: Hardware Transactional Memory

“Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006.

Conflict Detection: lazy Use bloom filter signature to do batch

detection 2000 bit bloom filter, avg 70 read lines and 20

write lines per transaction

Page 25: Hardware Transactional Memory

Virtualizing HTM

Page 26: Hardware Transactional Memory

How?

Generally: save transaction states in virtual memory Read set, write set Or readers, writers per block in

memory Conflict detection needs to check this

structure Question: how to make it efficient?

Page 27: Hardware Transactional Memory

“Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005.

First paper on overflowed transactions UTM (“Unbounded TM”):

Idealized (very complicated) LTM (“Large TM”):

Lazy versioning Limitations: less than a time slice, no

migration, smaller than physical memory

Page 28: Hardware Transactional Memory

“Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005. (VTM)

A fairly complete description Novelty:

XSW: transaction status word load/store entries point to XSW; can change transaction state with a single atomic

update Filter for conflict detection

Lazy versioning (buffer updates) Eager conflict detection

Page 29: Hardware Transactional Memory

“LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006.

Overflow handling Eager versioning: per-thread undo log

Update in place, save old values in log Favors commits

Eager conflict detection Cache has a single overflow bit Use directory to remember the transactional access

to a line even if the line is evicted from cache

Page 30: Hardware Transactional Memory

“Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.

Provide support to call software callbacks

Commit, abort, violation Nested transactions

Flatterning: a violation rolls back to the beginning of the top-most transaction

Closed nesting: allow partial roll-backs

Open nesting: allow partial commits

Page 31: Hardware Transactional Memory

“Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006.

Undo log is organized as transaction log frames

(just like stack frames) LIFO

Page 32: Hardware Transactional Memory

“Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006.

Shadow page + home page Conflict detection: special cache for overflow

info before traversing memory structure

Page 33: Hardware Transactional Memory

“Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007.

Making the fast case common: Permission-only cache Cache RW bits for overflowed cache lines

Making the uncommon case simple: Allow only a single overflowed transaction OneTM-serialized: stall all other Xactions OneTM-concurrent: allow other non-

overflowed xactions Each block in memory requires a RW bits +

transaction ID

Page 34: Hardware Transactional Memory

“Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.

Seven pathological scenarios that different HTMs may do poorly

Livelock cases, starvation, convoy, futile stalling for a xaction that eventually aborts

Enhances: Conflict resolution: back-offs, priorities Predicting writes in a transactions, so that one can

get ownership at reads

Page 35: Hardware Transactional Memory

Combining HTM and STM

Page 36: Hardware Transactional Memory

“Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006.

Enhance the Dynamic STM (Herlihy et al: wrap objects with indirection/replication)

HTM mode STM mode Tries HTM first

A trick for conflict detection between HTM and STM:

STM also starts a hardware xaction But only access a single state word transactionally Perform all other actions nontransactionally

Page 37: Hardware Transactional Memory

“Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006. (XTM)

Two modes: all in hardware, all in software If HTM overflows, aborts it and runs it in

software mode Software mode:

Per-transaction page table Copy-on-firstaccess: check if read data is

not changed at commit Copy-on-write: buffer transactional writes

Page 38: Hardware Transactional Memory

“Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006.

Compiler generates two code paths, choose at runtime:

STM HTM

Word-based Metadata access per memory operation

required even for HTM (to detect conflict with STM)

Page 39: Hardware Transactional Memory

“An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007.

SigTM: Enhance a STM system with hardware

signatures

Page 40: Hardware Transactional Memory

“An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007. (RTM)

Two hardware mechanisms to improve a STM (RSTM) performance:

Alert-on-update: allow software callbacks for invalidation and eviction of selected cache lines

Programmable data isolation: control cache to hold transactional blocks

Page 41: Hardware Transactional Memory

Summary

Simple HTM is nice Major complexity comes in

because of space and time limitations Logs, shadow pages, filters, caches,

etc. Combine HTM and STM