Upload
buidat
View
216
Download
1
Embed Size (px)
Citation preview
Improved Single Global Lock Fallback for Best-effort Hardware
Transactional Memory
Irina CalciuTatiana ShpeismanGilles PokamMaurice Herlihy
TRANSACT 2014
Multicore Performance Scaling
2
Intel’s Haswell TSX: RTM & HLE
3
Low overhead (cache based)
IBM’s Blue Gene/Q & System Z & Power Architecture
Hardware Transactional Memory (HTM)
Haswell RTM
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
Speculate Execution, without any locks
Read and Write Sets
4
Abort on memory conflict
else
Abort Handler
Haswell RTM
5
_xbegin()
_xend()
Read X
Write Y
Add to Read Set
Add to Write Set
_xbegin()
_xend()
Write X
Write YAdd to Write Set
Make the change to Y visibleCOMMIT
Add to Write SetABORT
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
Lock Elision
<HLE_Aquire_Prefix> Lock(L)
<HLE_Release_Prefix> Release(L)
Atomic region executed as a transaction or mutually exclusive on L
Execute optimistically, without any locks
Track Read and Write Sets
6
Abort on memory conflict: rollback acquire lock
[Anand Tech]7
Best-effort
OverflowUnsupported InstructionsInterrupts
Conflicts
8
Small & Medium Transactions
Haswell RTM
Needs software fallback
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
9
Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation
Single Global Lock HyTM (simple and common)
10
EndHW txn
BeginHW txnRead L
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txnRead L
EndHW txn
(1)
BeginHW txnRead L
EndHW txn
(2)
BeginHW txnRead L
BeginHW txnRead L
EndHW txn
(3) EndHW txn
(4)
XX
X
X
Legend:X = ABORT
Single Global Lock HyTM (simple and common)
Time
11
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Time
Thread 1 Thread 2
Execution Time 1 12
Thread 1 Thread 2
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Execution Time 1
Time
Execution Time 2
13
Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation
Lazy SGL
1414
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
Read LEnd
HW txn
BeginHW txn
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend:X = ABORT
COMMIT
COMMIT
Lazy SGL
Time
15
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
16
Transactional Memory Correctness
Transaction 1SW
Transaction 2HW
Time
Order T2 AFTER T1
Order T2 BEFORE T1
COMMIT
COMMIT
17
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Correct: a Actual: b
Time
Case 1: HW begins SW begins HW ends SW ends
X value:
Check Lock
ABORT
18
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
Time
Check Lock
ABORT
X value:
19
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Case 3: SW beginsHW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
TimeX value:
Correct: b Actual: b
Check Lock
COMMIT
20
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Case 4: HW beginsSW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
TimeX value:
Correct: b Actual: b
Check Lock
COMMIT21
22
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
TXN_BEGIN
…
Z = 1/(Y-X)
…
TXN_END
Thread 2(HW)
Z = 1/0 !!!Time
Hardware Sandboxing
Indirect Jumps
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
_xbegin
…
if (X == Y) *p = garbagep()
…if (lock) abort_xend
Thread 2(HW)
_xend
Time
23
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
24
0
0.5
1
1.5
2
2.5
3
3.5
1 2 4 8
Spee
dup
Threads
Ssca2 (small txns)
00.51
1.52
2.53
3.54
1 2 4 8
Spee
dup
Threads
Labyrinth (large txns)
25
Intruder (medium txns)
0
0.5
1
1.5
2
2.5
3
1 2 4 8
Speedup
Threads
TL2SGLHLEE-SGLL-SGL
Better
Improved Lock Acquisition Rate
26
Vacation Low (medium txns)
Kmeans High (small txns)
Intruder (medium txns)
Labyrinth (large txns)
0
5
10
15
20
25
30
1 2 4 8
% lock acquisitions
Threads
0
10
20
30
40
50
60
70
1 2 4 8
% lock acquisitions
Threads
051015202530354045
1 2 4 8
% lock acquisitions
Threads
HLE
E-SGL
L-SGL
0
10
20
30
40
50
60
70
80
1 2 4 8
% lock acquisitions
Threads
HLE
E-SGL
L-SGL
Better
No single thread overhead
27
Slowdown relative to sequential for 1 thread
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Slowdown
TL2SGLHLEE-SGLL-SGL
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
28
Bloom Filters
• Efficient probabilistic data structure to compute fast set intersection
• Can admit false positives
• No false negatives
• Used in TM for Conflict Detection
29
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend:X = ABORT
COMMIT
COMMIT
Lazy SGL
Time
30
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Check BFEnd
HW txn(1)
BeginHW txn
Check BFEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
Legend:X = ABORT
COMMIT
COMMIT
BF SGL
Time
31
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Time
Case 1: HW begins SW begins HW ends SW ends
X value:a
Correct: a Actual: a
Check BF
If BFs intersect: ABORTElse: COMMIT
32
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
TimeX value:
Check BF
If BFs intersect: ABORTElse: COMMIT 33
Conclusions
• HTMs are becoming more available
• Best-effort – need software fallback
• Eager SGL• simple and fast fallback, • often preferred to more efficient solutions
34
Conclusions
• Lazy SGL • as simple as Eager SGL• more efficient
• Bloom Filter SGL • more accurate conflict detection• Slower
• Can be implemented directly in hardware
35