Upload
azana
View
64
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. TRANSACT 2014. Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy. Multicore Performance Scaling. 2. Hardware Transactional Memory (HTM). - PowerPoint PPT Presentation
Citation preview
Improved Single Global Lock Fallback for Best-effort Hardware
Transactional Memory
Irina CalciuJustin GottschlichTatiana Shpeisman
Gilles PokamMaurice Herlihy
TRANSACT 2014
Multicore Performance Scaling
2
Intel’s Haswell TSX: RTM & HLE
3
Low overhead (cache based)
IBM’s Blue Gene/Q & System Z & Power Architecture
Hardware Transactional Memory (HTM)
Haswell RTM
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
Speculate Execution, without any locks
Read and Write Sets
4
Abort on memory conflict
else
Abort Handler
Haswell RTM
5
_xbegin()
_xend()
Read X
Write Y
Add to Read Set
Add to Write Set
_xbegin()
_xend()
Write X
Write YAdd to Write Set
Make the change to Y visibleCOMMIT
Add to Write SetABORT
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
Lock Elision
<HLE_Aquire_Prefix> Lock(L)
<HLE_Release_Prefix> Release(L)
Atomic region executed as a transaction or mutually exclusive on L
Execute optimistically, without any locks
Track Read and Write Sets
6
Abort on memory conflict: rollback acquire lock
[Anand Tech]7
Best-effort
OverflowUnsupported InstructionsInterrupts
Conflicts
8
Small & Medium Transactions
Haswell RTM
Needs software fallback
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
9
Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation
Single Global Lock HyTM (simple and common)
10
EndHW txn
BeginHW txnRead L
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txnRead L
EndHW txn
(1)
BeginHW txnRead L
EndHW txn
(2)
BeginHW txnRead L
BeginHW txnRead L
EndHW txn
(3) EndHW txn
(4)
XX
X
X
Legend: X = ABORT
Single Global Lock HyTM (simple and common)
Tim
e
11
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Tim
e
Thread 1 Thread 2
Execution Time 1 12
Thread 1 Thread 2
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Execution Time 1
Tim
e
Execution Time 2
13
Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation
Lazy SGL
1414
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
Read LEnd
HW txn
BeginHW txn
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend: X = ABORT
COMMITCOMMIT
Lazy SGL
Tim
e
15
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
16
Transactional Memory Correctness
Transaction 1SW
Transaction 2HW
Tim
e
Order T2 AFTER T1
Order T2 BEFORE T1
COMMIT
COMMIT
17
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Correct: a Actual: b
Tim
e
Case 1: HW begins SW begins HW ends SW ends
X value: a b
Check Lock
ABORT
Correct: a Actual: a
18
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
Tim
e
Correct: a Actual: a
Check Lock
ABORT
X value:
19
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Case 3: SW beginsHW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
Tim
eX value: a b
Correct: b Actual: b
Check LockCOMMIT
20
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Case 4: HW beginsSW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
Tim
e
X value:Correct:
b Actual: b
Check Lock
COMMIT
21
22
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
TXN_BEGIN
…
Z = 1/(Y-X)
…
TXN_END
Thread 2(HW)
Z = 1/0 !!!Tim
e
Hardware Sandboxing
Indirect Jumps
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
_xbegin
…
if (X == Y) *p = garbagep()
…if (lock) abort_xend
Thread 2(HW)
_xend
Indirect jump to
garbage location
Tim
e
23
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
24
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Ssca2 (small txns)
Threads
Spee
dup
1 2 4 80
0.51
1.52
2.53
3.54
Labyrinth (large txns)
Threads
Spee
dup
25
Intruder (medium txns)
1 2 4 80
0.5
1
1.5
2
2.5
3
TL2
SGL
HLE
E-SGL
L-SGL
Threads
Spee
dup
Better
Improved Lock Acquisition Rate
26
Vacation Low (medium txns)
Kmeans High (small txns)
Intruder (medium txns)
Labyrinth (large txns)
1 2 4 80
5
10
15
20
25
30
Threads
% lo
ck a
cqui
sitio
ns
1 2 4 80
10
20
30
40
50
60
70
Threads
% lo
ck a
cqui
sitio
ns
1 2 4 805
1015202530354045
HLEE-SGLL-SGL
Threads
% lo
ck a
cqui
sitio
ns
1 2 4 80
10
20
30
40
50
60
70
80
HLEE-SGLL-SGL
Threads
% lo
ck a
cqui
sitio
ns
Better
No single thread overhead
27
Slowdown relative to sequential for 1 thread
baye
s
geno
me
intrud
er
km_lo
w
km_h
igh
labyri
nth
vaca
tion_
low
vaca
tion_
high
ssca
2ya
da0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
TL2SGLHLEE-SGLL-SGLSl
owdo
wn
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
28
Bloom Filters
• Efficient probabilistic data structure to compute fast set intersection
• Can admit false positives
• No false negatives
• Used in TM for Conflict Detection
29
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend: X = ABORT
COMMITCOMMIT
Lazy SGL
Tim
e
30
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Check BFEnd
HW txn(1)
BeginHW txn
Check BFEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
Legend: X = ABORT
COMMITCOMMIT
BF SGL
Tim
e
31
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Correct: a Actual: b
Tim
e
Case 1: HW begins SW begins HW ends SW ends
X value: a b
Check Lock
ABORT
Correct: a Actual: a
Check BF
If BFs intersect: ABORTElse: COMMIT
32
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
Tim
e
Correct: a Actual: a
Check Lock
ABORT
X value:
Check BF
If BFs intersect: ABORTElse: COMMIT 33
Conclusions
• HTMs are becoming more available
• Best-effort – need software fallback
• Eager SGL • simple and fast fallback, • often preferred to more efficient solutions
34
Conclusions
• Lazy SGL • as simple as Eager SGL• more efficient
• Bloom Filter SGL • more accurate conflict detection• Slower
• Can be implemented directly in hardware
35
http://de.sap.info/wp-content/uploads/2012/02/In_Memory_Technologie.jpg
http://www.avoiceformen.com/wp-content/uploads/sites/2/2013/01/Questions.jpg
References
1 2 4 80
0.5
1
1.5
2
2.5
3
Intruder
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Vacation Low
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
Vacation High
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Genome
TL2SGLHLEHyswell
Threads
Spee
dup
38
Medium transactions
1 2 4 80
0.51
1.52
2.53
3.54
4.5
Kmeans Low
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.51
1.52
2.53
3.54
4.5
Kmeans High
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Ssca2
TL2SGLHLEHyswell
Threads
Spee
dup
39
Small transactions
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
4
Bayes
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
4
Labyrinth
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.2
0.4
0.6
0.8
1
1.2
Yada
TL2SGLHLEHyswell
Threads
Spee
dup
40
Large transactions
bayes genome intruder kmeans low kmeans high
labyrinth ssca2 vacation low
vacation high
yada0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Speedup over sequential for 8 threads
TL2
SGL
HLE
Hyswell
41
Software Hardware (1) Read(x) Read(x) Not a conflict
(2)Read(x)
Write(x)
Software transaction ordered before hardware transaction -> CORRECT
(3)
Read(x)
Write(x) Hardware abort
(4)Write(x)
Read(x)
Software transaction ordered before hardware transaction -> CORRECT
(5)
Write(x)
Read(x) Hardware abort
(6)Write(x)
Write(x)
Software transaction ordered before hardware transaction -> CORRECT
(7)
Write(x)
Write(x) Hardware abort
42