35
ANALYZING ABORTS IN SOFTWARE TRANSACTIONAL MEMORY Presented by: Ofer Kiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation

Presented by: Ofer Kiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

ANALYZING ABORTS IN SOFTWARE TRANSACTIONAL MEMORYPresented by: Ofer Kiselov &

Omer KiselovSupervised by: Dmitri

Perelman

Final Presentation

Overview

Repeating midterm presentation on the following subjects      * Software Transactional Memory abstraction      * STM implementation example - TL2 overview       * Aborts in STM      * Unnecessary aborts in STM  * Project goal  * Implementation     * Overview

Online part – implementation Online logging Evaluation Hardware Deuce Benchmarks Results Conclusion and analysis Nice to have Future work

Importance Of Parallel Programming

Frequency barrier – the single core processor’s performance can not improve.

Switch to multi-cores. Parallel programs allow utilizing

multi-core processors. Need for synchronization for

accessing shared data

Transactional Memory – why?

Current synchronization – locks Coarse-grained – limit parallelism Fine-grained – high programming complexity Error-prone (deadlocks / livelocks)

Transactional memory solution Intuitive for a programmer Provides a “transaction” abstraction for a

critical section (operations executed atomically)

Implemented in both software and hardware.

Why Do Aborts Happen?

OBJECT1

OBJECT2

T1

T2 T3

T4T1 T2 T3 Read from O1

T4 Reads from O2 and writes to O1

To maintain consistency if T4 commits T1 T2 & T3 must abort!

Aborted

Committed

T1 T2 T3 write to O2

Unnecessary Aborts

Aborts are bad work is lost, resources are wasted, throughput

decreases Some aborts are necessary

continuing the run would violate correctness And some aborts are not

Analysis whether the algorithm should is too expensive.

“Unnecessary” abort: it could be avoided keep more versions, better check of

transactional dependencies.

o1

o2C

A

T1

T2

T3

Project Goals Build a software analysis tool:

measures aborts statistics for a given run

evaluate how many of them were unnecessary

evaluate the damage to performance “Will it pay off to add designs to stop

the unnecessary aborts?”

Project Formation

An offline part for analyzing the run: reads the log of the run. gathers statistics. analyzes unnecessary aborts.

An online part for logging the run: is inserted to a specific algorithm run in a benchmark flushes the run info to an XML log file

Offline Part

XML Log Parser

Analyzer

Output of analysis is a precedence graph

showing the transactions and

their actions.RUN DESCRIPTOR

Abort Analyzer

Matlab histograms and final analysis

Parser

Every log line represents transactional action represented by LogLine abstract class

Parser responsibility: iterate over the xml create appropriate LogLine instances

LogLine factories for different operation types transactional start read operation write operation transactional commit

An

alyze

r

Gives basic statistics regarding the transactions run. Counts aborts per reason. Counts reads, writes Count transactions

Inserting the Path into Run Descriptor ADT Struct.

Transactional DependenciesRun Descriptor is a precedence graph!

RUN DESCRIPTOR

T1

T4

Reader

OBJECT1

OBJECT2

Reader

OBJECT1 Version2

OBJECT2 Version2

Writer

Writer

WaRWaR

In order to create the graph we needed to establishA way to make the basic run into a graph

ABORTS ANALYZER

Searches for unnecessary aborts in RUN DESCRIPTOR Speculatively adds the edges of the aborted

transaction to the RUN DESCRIPTOR Using DFS – Finds circles in the precedence

graph.Circles represent necessary aborts

Removes the edges at the end of analysis.Built as visitor pattern

Flexible for more complex analysis

Online part

Our goals: Run benchmarks to prepare the

statistics for offline part. Be sure that the measurements don’t

distort the scheduling picture.

Platform Supporting STM

Deuce STM is an open source java STM environment.

With Deuce STM, if the method:public void doThing() {…} is not thread-safe…@AtomicPublic void doThing() {…} is!!

Introducing:

Deuce STM!!!Created By: Guy Korland, Nir Shavit, Pascal Felber, Igor Berman

Source Codefinal public class Context implements org.deuce.transaction.Context {

private static String objectId(Object reference, long field) {return Long.toString(System.identityHashCode(reference) + field);}

final static AtomicInteger clock = new AtomicInteger(0);

TL2 Work

MethodWith

Logging

Deuce Frame Work

How To Utilize Deuce for Logging Modified code to call logging utils. More exceptions type to distinct

between different aborts types.

Logger

Deuce Framework

TL2 Algorithm

Transactions Code:StartReadWrite

Commit

A Perfectly Scalable Code

Online Part Implementation Version 1

Main Problem : Adding to priority queue damages

parallelism and lowers performance

Online Part ImplementationVersion 2

The Back End

Collector

The threads don’t do anyExtra actions to log therun.

The Loglines have ended

The program has ended

What Do we Check?

Commit rate Unnecessary aborts (classified by

types) Wasted work

Testbenches

SSCA2 – Short transactions, low contention, high memory utilization

Vacation – High contention, Medium length transaction, Mostly reads.

AVL tree – customizable contention, medium length transactions. Random choice between add, remove or

search for a random integer in the tree. Ability to change integer range for custom

contention. Created by us.

Hardware Benchmarks run on Trinity:

8 quad-cores 132 GB RAM Machine was idle for our use.

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

Pre

cent

age

Of

Suc

cess

ful C

omm

its

100

101

102

0

1000

2000

3000

4000

5000

6000

Number Of Threads

Am

ount

Of

Unn

eces

sary

Abo

rts

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

prec

enta

ge O

f U

nnec

essa

ry A

bort

s

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

Pre

cent

age

Of

Was

ted

Rea

ds

Simulation Results – AVL tree

Commit Ratio

Percentage of Unnecessary Aborts

All graphs are a function of the thread amount

Amount of Aborts & Unnecessary Aborts

Percentage of Wasted Reads

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

Pre

cent

age

Of

Suc

cess

ful C

omm

its

100

101

102

0

500

1000

1500

Number Of Threads

Am

ount

Of

Unn

eces

sary

Abo

rts

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

prec

enta

ge O

f U

nnec

essa

ry A

bort

s

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

Pre

cent

age

Of

Was

ted

Rea

ds

Simulation Results – SSCA2

Commit Ratio

Percentage of Unnecessary Aborts

All graphs are a function of the thread amount

Amount of Aborts & Unnecessary Aborts

Percentage of Wasted Reads

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

Pre

cent

age

Of

Suc

cess

ful C

omm

its

100

101

102

0

100

200

300

400

500

600

700

Number Of Threads

Am

ount

Of

Unn

eces

sary

Abo

rts

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

prec

enta

ge O

f U

nnec

essa

ry A

bort

s

100

101

102

0

0.2

0.4

0.6

0.8

1

Number Of Threads

Pre

cent

age

Of

Was

ted

Rea

ds

Simulation Results – Vacation

Commit Ratio

Percentage of Unnecessary Aborts

All graphs are a function of the thread amount

Amount of Aborts & Unnecessary Aborts

Percentage of Wasted Reads

Simulation Results – AVL treeAll graphs are a function of the thread amount

46%

11%

43%

threads2

43%

14%

43%

threads4

39%

25%

36%

threads8

51%

28%

22%

threads16

57%27%

16%

threads32

16%

79%

5%threads64

Version Too High

Object LockedReadset Invalid

Simulation Results – SSCA2All graphs are a function of the thread amount

23%

12%

65%

threads2

26%

14%60%

threads4

22%

19%60%

threads8

36%

18%

45%

threads16

35%

24%

41%

threads32

28%

36%

36%

threads64

Version Too High

Object Locked

Readset Invalid

Simulation Results – VacationAll graphs are a function of the thread amount

55%

12%

34%

threads2

61%

10%

29%

threads4

62%

6%

32%

threads8

68%

5%

27%

threads16

62%15%

23%

threads32

38%

49%

13%

threads64

Version Too High

Object LockedReadset Invalid

1 2 3 4 5 60

500

1000

1500

2000

2500

3000

3500

log2 of Number Of Threads

Am

ount

Of

Abo

rts

Version Too High

Object LockedReadset Invalid

1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

log2 of Number Of Threads

Am

ount

Of

Abo

rts

Version Too High

Object LockedReadset Invalid

Simulation Results – AVL treeAll graphs are a function of the thread amount

Percentage of Aborts by typesAmount of Aborts by types

1 2 3 4 5 60

100

200

300

400

500

600

700

log2 of Number Of Threads

Am

ount

Of

Abo

rts

Version Too High

Object LockedReadset Invalid

1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

log2 of Number Of Threads

Am

ount

Of

Abo

rts

Version Too High

Object LockedReadset Invalid

Simulation Results – SSCA2All graphs are a function of the thread amount

Percentage of Aborts by typesAmount of Aborts by types

1 2 3 4 5 60

50

100

150

200

250

300

350

400

log2 of Number Of Threads

Am

ount

Of

Abo

rts

Version Too High

Object LockedReadset Invalid

1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

log2 of Number Of Threads

Am

ount

Of

Abo

rts

Version Too High

Object LockedReadset Invalid

Simulation Results – VacationAll graphs are a function of the thread amount

Percentage of Aborts by typesAmount of Aborts by types

Logger impact on performance

Logger access obviously demands more from the Deuce framework. More memory accesses More exception types On every read & write

How much distortion does the logger cause?

100

101

102

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number Of Threads

Pre

cent

age

Of

Suc

cess

ful C

omm

its

With logger

Without logger

AVL test with logging – commit ratio

Conclusions

Parallelism increases → aborts rate, unnecessary abort rate and the wasted work rate increase as well.

Parallelism increases → more aborts are caused by locked objects.

To improve STM performance over highly parallel workloads, algorithms may be improved to prevent unnecessary aborts.

Nice To Have

Drawing the precedence graph automatically to a drawing in Microsoft Visio.

Possibility to analyze according to abort types.

GUI. Expansion of the simulation to more

algorithms and test benches – makes the comparison of performance between algorithms possible.

Future Work

Drop in abort rates after 128 threads due to a drop in concurrency – further analysis is required.

Unfit versions cause a lot of aborts. The new SMV algorithm may solve this

problem.

BIBLIOGRAPHY

I. Keidar and D. Perelman. On avoiding spare aborts in transactional memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 59–68, 2009.

I. Keidar and D. Perelman .SMV: Selective Multi-Versioning STM

O. S. D. Dice and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194–208, 2006.

M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Soft-ware transactional memory for dynamic-sized data structures. In Pro-ceedings of the twenty-second annual symposium on Principles of distributed computing, pages 92–101, 2003.

?QUESTIONS