View
220
Download
2
Tags:
Embed Size (px)
Citation preview
ANALYZING ABORTS IN SOFTWARE TRANSACTIONAL MEMORYPresented by: Ofer Kiselov &
Omer KiselovSupervised by: Dmitri
Perelman
Final Presentation
Overview
Repeating midterm presentation on the following subjects * Software Transactional Memory abstraction * STM implementation example - TL2 overview * Aborts in STM * Unnecessary aborts in STM * Project goal * Implementation * Overview
Online part – implementation Online logging Evaluation Hardware Deuce Benchmarks Results Conclusion and analysis Nice to have Future work
Importance Of Parallel Programming
Frequency barrier – the single core processor’s performance can not improve.
Switch to multi-cores. Parallel programs allow utilizing
multi-core processors. Need for synchronization for
accessing shared data
Transactional Memory – why?
Current synchronization – locks Coarse-grained – limit parallelism Fine-grained – high programming complexity Error-prone (deadlocks / livelocks)
Transactional memory solution Intuitive for a programmer Provides a “transaction” abstraction for a
critical section (operations executed atomically)
Implemented in both software and hardware.
Why Do Aborts Happen?
OBJECT1
OBJECT2
T1
T2 T3
T4T1 T2 T3 Read from O1
T4 Reads from O2 and writes to O1
To maintain consistency if T4 commits T1 T2 & T3 must abort!
Aborted
Committed
T1 T2 T3 write to O2
Unnecessary Aborts
Aborts are bad work is lost, resources are wasted, throughput
decreases Some aborts are necessary
continuing the run would violate correctness And some aborts are not
Analysis whether the algorithm should is too expensive.
“Unnecessary” abort: it could be avoided keep more versions, better check of
transactional dependencies.
o1
o2C
A
T1
T2
T3
Project Goals Build a software analysis tool:
measures aborts statistics for a given run
evaluate how many of them were unnecessary
evaluate the damage to performance “Will it pay off to add designs to stop
the unnecessary aborts?”
Project Formation
An offline part for analyzing the run: reads the log of the run. gathers statistics. analyzes unnecessary aborts.
An online part for logging the run: is inserted to a specific algorithm run in a benchmark flushes the run info to an XML log file
Offline Part
XML Log Parser
Analyzer
Output of analysis is a precedence graph
showing the transactions and
their actions.RUN DESCRIPTOR
Abort Analyzer
Matlab histograms and final analysis
Parser
Every log line represents transactional action represented by LogLine abstract class
Parser responsibility: iterate over the xml create appropriate LogLine instances
LogLine factories for different operation types transactional start read operation write operation transactional commit
An
alyze
r
Gives basic statistics regarding the transactions run. Counts aborts per reason. Counts reads, writes Count transactions
Inserting the Path into Run Descriptor ADT Struct.
RUN DESCRIPTOR
T1
T4
Reader
OBJECT1
OBJECT2
Reader
OBJECT1 Version2
OBJECT2 Version2
Writer
Writer
WaRWaR
In order to create the graph we needed to establishA way to make the basic run into a graph
ABORTS ANALYZER
Searches for unnecessary aborts in RUN DESCRIPTOR Speculatively adds the edges of the aborted
transaction to the RUN DESCRIPTOR Using DFS – Finds circles in the precedence
graph.Circles represent necessary aborts
Removes the edges at the end of analysis.Built as visitor pattern
Flexible for more complex analysis
Online part
Our goals: Run benchmarks to prepare the
statistics for offline part. Be sure that the measurements don’t
distort the scheduling picture.
Platform Supporting STM
Deuce STM is an open source java STM environment.
With Deuce STM, if the method:public void doThing() {…} is not thread-safe…@AtomicPublic void doThing() {…} is!!
Introducing:
Deuce STM!!!Created By: Guy Korland, Nir Shavit, Pascal Felber, Igor Berman
Source Codefinal public class Context implements org.deuce.transaction.Context {
private static String objectId(Object reference, long field) {return Long.toString(System.identityHashCode(reference) + field);}
final static AtomicInteger clock = new AtomicInteger(0);
TL2 Work
MethodWith
Logging
Deuce Frame Work
How To Utilize Deuce for Logging Modified code to call logging utils. More exceptions type to distinct
between different aborts types.
Logger
Deuce Framework
TL2 Algorithm
Transactions Code:StartReadWrite
Commit
A Perfectly Scalable Code
Online Part Implementation Version 1
Main Problem : Adding to priority queue damages
parallelism and lowers performance
Online Part ImplementationVersion 2
The Back End
Collector
The threads don’t do anyExtra actions to log therun.
The Loglines have ended
The program has ended
Testbenches
SSCA2 – Short transactions, low contention, high memory utilization
Vacation – High contention, Medium length transaction, Mostly reads.
AVL tree – customizable contention, medium length transactions. Random choice between add, remove or
search for a random integer in the tree. Ability to change integer range for custom
contention. Created by us.
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of
Suc
cess
ful C
omm
its
100
101
102
0
1000
2000
3000
4000
5000
6000
Number Of Threads
Am
ount
Of
Unn
eces
sary
Abo
rts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
prec
enta
ge O
f U
nnec
essa
ry A
bort
s
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of
Was
ted
Rea
ds
Simulation Results – AVL tree
Commit Ratio
Percentage of Unnecessary Aborts
All graphs are a function of the thread amount
Amount of Aborts & Unnecessary Aborts
Percentage of Wasted Reads
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of
Suc
cess
ful C
omm
its
100
101
102
0
500
1000
1500
Number Of Threads
Am
ount
Of
Unn
eces
sary
Abo
rts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
prec
enta
ge O
f U
nnec
essa
ry A
bort
s
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of
Was
ted
Rea
ds
Simulation Results – SSCA2
Commit Ratio
Percentage of Unnecessary Aborts
All graphs are a function of the thread amount
Amount of Aborts & Unnecessary Aborts
Percentage of Wasted Reads
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of
Suc
cess
ful C
omm
its
100
101
102
0
100
200
300
400
500
600
700
Number Of Threads
Am
ount
Of
Unn
eces
sary
Abo
rts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
prec
enta
ge O
f U
nnec
essa
ry A
bort
s
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of
Was
ted
Rea
ds
Simulation Results – Vacation
Commit Ratio
Percentage of Unnecessary Aborts
All graphs are a function of the thread amount
Amount of Aborts & Unnecessary Aborts
Percentage of Wasted Reads
Simulation Results – AVL treeAll graphs are a function of the thread amount
46%
11%
43%
threads2
43%
14%
43%
threads4
39%
25%
36%
threads8
51%
28%
22%
threads16
57%27%
16%
threads32
16%
79%
5%threads64
Version Too High
Object LockedReadset Invalid
Simulation Results – SSCA2All graphs are a function of the thread amount
23%
12%
65%
threads2
26%
14%60%
threads4
22%
19%60%
threads8
36%
18%
45%
threads16
35%
24%
41%
threads32
28%
36%
36%
threads64
Version Too High
Object Locked
Readset Invalid
Simulation Results – VacationAll graphs are a function of the thread amount
55%
12%
34%
threads2
61%
10%
29%
threads4
62%
6%
32%
threads8
68%
5%
27%
threads16
62%15%
23%
threads32
38%
49%
13%
threads64
Version Too High
Object LockedReadset Invalid
1 2 3 4 5 60
500
1000
1500
2000
2500
3000
3500
log2 of Number Of Threads
Am
ount
Of
Abo
rts
Version Too High
Object LockedReadset Invalid
1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
log2 of Number Of Threads
Am
ount
Of
Abo
rts
Version Too High
Object LockedReadset Invalid
Simulation Results – AVL treeAll graphs are a function of the thread amount
Percentage of Aborts by typesAmount of Aborts by types
1 2 3 4 5 60
100
200
300
400
500
600
700
log2 of Number Of Threads
Am
ount
Of
Abo
rts
Version Too High
Object LockedReadset Invalid
1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
log2 of Number Of Threads
Am
ount
Of
Abo
rts
Version Too High
Object LockedReadset Invalid
Simulation Results – SSCA2All graphs are a function of the thread amount
Percentage of Aborts by typesAmount of Aborts by types
1 2 3 4 5 60
50
100
150
200
250
300
350
400
log2 of Number Of Threads
Am
ount
Of
Abo
rts
Version Too High
Object LockedReadset Invalid
1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
log2 of Number Of Threads
Am
ount
Of
Abo
rts
Version Too High
Object LockedReadset Invalid
Simulation Results – VacationAll graphs are a function of the thread amount
Percentage of Aborts by typesAmount of Aborts by types
Logger impact on performance
Logger access obviously demands more from the Deuce framework. More memory accesses More exception types On every read & write
How much distortion does the logger cause?
100
101
102
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number Of Threads
Pre
cent
age
Of
Suc
cess
ful C
omm
its
With logger
Without logger
AVL test with logging – commit ratio
Conclusions
Parallelism increases → aborts rate, unnecessary abort rate and the wasted work rate increase as well.
Parallelism increases → more aborts are caused by locked objects.
To improve STM performance over highly parallel workloads, algorithms may be improved to prevent unnecessary aborts.
Nice To Have
Drawing the precedence graph automatically to a drawing in Microsoft Visio.
Possibility to analyze according to abort types.
GUI. Expansion of the simulation to more
algorithms and test benches – makes the comparison of performance between algorithms possible.
Future Work
Drop in abort rates after 128 threads due to a drop in concurrency – further analysis is required.
Unfit versions cause a lot of aborts. The new SMV algorithm may solve this
problem.
BIBLIOGRAPHY
I. Keidar and D. Perelman. On avoiding spare aborts in transactional memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 59–68, 2009.
I. Keidar and D. Perelman .SMV: Selective Multi-Versioning STM
O. S. D. Dice and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194–208, 2006.
M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Soft-ware transactional memory for dynamic-sized data structures. In Pro-ceedings of the twenty-second annual symposium on Principles of distributed computing, pages 92–101, 2003.