Upload
devon-akers
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Pay-to-use strong atomicity on conventional hardware
Martín Abadi, Tim Harris, Mojtaba Mehrara
Microsoft Research
Our approachStrong semantics
atomic, retry, ..... What, ideally, should these constructs do?
Programming discipline(s) What does it mean for a
program to use the constructs correctly?
Low-level semantics & actual implementations
Transactions, optimistic concurrency, program transformations, weak
memory models, ...
Programming disciplines
All programs
Violation-freeprograms
Obeying dynamic separation
Obeying static separation
More implementation flexibility
More programs correctly synchronized
• Which programs are correctly synchronized?
Strong atomicity
• Direct accesses work like single-access transactions
• We would like:– Implementation flexibility; ongoing innovation in
STM/hybrid techniques, optimizations, ...• Invisible / visible readers• In-place / deferred updates• Eager / lazy conflict detection
– No overhead on direct accesses– Robust performance, not dependent on success of
static analyses
Strong atomicity: implementation
Physicaladdress
space
Virtual address
space
Tx-heapDirect-heap
Direct memory accesses
Memory accesses
from atomic blocks
Writes from atomic blocks
Physicaladdress
space
Virtual address
space
Tx-heapDirect-heap
Direct memory accesses
Memory accesses
from atomic blocks
1. Atomic block attempts to write to a field of an
object
Writes from atomic blocks
Physicaladdress
space
Virtual address
space
Tx-heapDirect-heap
Direct memory accesses
Memory accesses
from atomic blocks
2. Revoke direct access to the page holding the direct view of the object
Writes from atomic blocks
Physicaladdress
space
Virtual address
space
Tx-heapDirect-heap
Direct memory accesses
Memory accesses
from atomic blocks
3. Use underlying STM write primitives
Writes from atomic blocks
Physicaladdress
space
Virtual address
space
Tx-heapDirect-heap
Direct memory accesses
Memory accesses
from atomic blocks
4. Restore direct access once the underlying
transaction has finished and an access violation
(AV) occurs
Avoiding Access Violations
1. Safe accesses in runtime system code– Virtual method tables and array length–Memory allocation structures (e.g. free
list)– STM implementation structures– GC implementation
Forward all these to TX-
heap at compile time
Avoiding Access Violations
2. Safe accesses in normal code – Normal writes to locations that haven’t been
read or written in a TX– Normal reads from locations that
haven’t been written in a TX
3. Safe accesses in TX code – TX writes to locations that haven’t been read
or written outside TXs– TX reads from locations that haven’t been
written outside TXs
Forward to TX-heap
Avoid page-level
tracking
Sample Codeprivate int ComputeUniqueSegments (int nthreads) { int numUniqueSegment = 0;
for (int i = 0; i < nthreads; i++) numUniqueSegment += this.uniqueSegments[i].Count; return numUniqueSegment; }Genome_Sequencer_ComputeUniqueSegments::loop: mov eax,dword ptr [edi+0x20] // Load uniqueSegments array reference cmp ebx,dword ptr [eax+0x4] // Check reference with array bounds jae outOfRange mov ecx,dword ptr [eax+ebx*4+0x08] // load array element mov eax,dword ptr [ecx] // load Count function pointer call dword ptr [eax+0x88] // call Count (get) function add ebp,eax // add it to numUniqueSegments add ebx,1 cmp ebx,esi jl loop
Access immutable runtime-
system datacmp ebx,dword ptr [eax+0x40000004] // Check reference with array bounds
mov eax,dword ptr [ecx+0x40000000] // load Count function pointercall dword ptr [eax+0x40000088] // call Count (get) function
mov ecx,dword ptr [eax+ebx*4+0x40000008] // load array element
mov eax,dword ptr [edi+0x40000020] // Load uniqueSegments array reference
Safe normal access
Exploiting Safe Accesses
• Implemented by extending Steensgard’s points-to analysis
• Only safe accesses from normal code were beneficial
• Little benefit from identifying safe accesses from inside atomic blocks. #page-table changes:
Genome Delaunay Labyrinth Vacation
Before 31 K 43 147 41 K
After 31 K 39 36 38 K
Ratio 99% 90% 36% 92 %
Patching access violations
• Patch sites of AVs• Our heuristic:– Patch on first AV– Also change page protection as normal
• Future work:– Remove patches if they become
unnecessary–Make multiple patches to bound worst-case
perf
Results - Vacation
WA
SA, c
onserva
tive
+ analy
sis
SA, h
andle AVs
+ analy
sis
SA, p
atch AVs
+ analy
sis0
1
2
3
4
5
6
7
8
9
10
Exec
ution
tim
e (s
)
Results - Delaunay
WA SA, conservative + analysis SA, handle AVs + analysis SA, patch AVs + analysis0
1
2
3
4
5
6
7
Exec
ution
tim
e (s
)
Results - Genome
WA
SA, c
onserva
tive
+ analy
sis
SA, h
andle AVs
+ analy
sis
SA, p
atch AVs
+ analy
sis0
0.5
1
1.5
2
2.5
3
Exec
ution
tim
e (s
)
Results - Labyrinth
WA
SA, c
onserva
tive
+ analy
sis
SA, h
andle AVs
+ analy
sis
SA, p
atch AVs
+ analy
sis7.8
8
8.2
8.4
8.6
8.8
9
9.2
Exec
ution
tim
e (s
)
Scaling
1 2 3 4 5 6 7 80
0.2
0.4
0.6
0.8
1
1.2
Labyrinth
#Threads
Nor
mal
ized
exec
ution
tim
e
1 2 3 4 5 6 7 80
0.2
0.4
0.6
0.8
1
1.2
Vacation
#Threads
Nor
mal
ized
exec
ution
tim
e
1 2 3 4 5 6 7 80
0.2
0.4
0.6
0.8
1
1.2
Delaunay
#Threads
Nor
mal
ized
exec
ution
tim
e
1 2 3 4 5 6 7 80
0.2
0.4
0.6
0.8
1
1.2
Genome
#Threads
Nor
mal
ized
exec
ution
tim
e
SA – patch AV + analysisWA
Conclusion
• Weak atomicity is an obstacle in providing clear semantics for TM models
• We use conventional memory protection hardware to provide strong atomicity
• This comes at a low performance cost… high runtime complexity cost
• Performance hit can be lowered by compile time analysis