TM performance: seeing the whole picture or Looking back over the first 500 papers

TM performance: seeing the whole picture

Looking back over the first 500 papers

Tim Harris (MSR Cambridge)

How might we compare TM systems?

Where might TM be most useful?

Extending Dan’s GC analogy

Concurrent GC algorithm

(run GC in small steps in

amongst mutators)

“Here’s a way to reduce the pause times...”

“Here’s a way to support pinned objects...”

B “Here’s a way to improve the throughput (total app

runtime)...

Min mutator utilization

0 2 4 6 8 10 120.0

Algorithm AAlgorithm B

Time interval / ms

Five dimensions to TM behaviorSequentialoverhead

Scalability(to longer

transactions)

Scalability(to more cores)

Tx-supportedoperations

Semantics

Scaling to large transactions

0 1 2 3 4 5 6 7 8 9 100.00.51.01.52.02.53.03.54.04.55.0

Tx size

1.0 = optimized sequential code(no tx, no locks)

Scaling: n*1-core copies

0 1 2 3 4 5 6 7 8 9 100

#cores

Scaling: 1*n-core copy

0 1 2 3 4 5 6 7 8 9 100

#cores

How might we compare TM systems?

Where might TM be most useful?

Application model #1

Sequential Parallelizable

f = fraction of original program that is parallelizable

Sequential

Parallel

f = fraction of original program that is parallelizablen = num parallel threads

Sequential

Parallel, transactional

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-down

Conflict model

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

1 2 3 4 5 6

Fixed number of alternatives, executedifferent alternatives in parallel

Execute conflictingoperations in series

n=16, c=1.0, vary f, vary x

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

n=16, c=1.0

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

8x on 16 threads => 95% parallelizable

n=16, c=1.0

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

Straight-line slow-down bites quickly

n=16, c=1.1 (1..1024)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

n=16, c=1.4 (1..256)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

n=16, c=2.0 (1..64)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

n=16, c=3.1 (1..16)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

If Amdahl and overheads don’t get

you then conflicts still can...

n=16, c=1.0, scaling of large tx

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

0.0 1.0 2.0 3.0 4.00.0

n=16, c=1.0, x*(f+(f^1.25)/4)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

0.0 1.0 2.0 3.0 4.00.0

n=16, c=1.0, x*(f+(f^2)/4)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

0.0 1.0 2.0 3.0 4.00.0

Application model #2: 100% parallel

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Non-tx

Tx Non-tx

Workloads (ASPLOS ’10)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%30%

)Labyrinth

Genome

JBBAtomicVacation

MaxFlow

Workloads (ASPLOS ’10)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%30%

Labyrinth

Genome

JBBAtomicVacation

MaxFlow

n=16, c=1.0 (no conflicts)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

n=16, c=1.0 (no conflicts)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

Overheads rapidly reduce the amount

that transactions can be used

n=16, c=1.1 (1..1024)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

n=16, c=1.4 (1..256)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

n=16, c=2.0 (1..64)

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

Conclusions• Bad things come in threes...

– Amdahl’s law– Sequential overhead– Conflicts

• When developing TM systems we need to be careful about tradeoffs between these

• There’s a risk of “chasing around the TM design space”– Sequential overhead– Scaling without conflicts– Scaling with conflicts

TM performance: seeing the whole picture or Looking back over the first 500 papers

Documents

1999 by Cell Press Seeing the Big Picture: Review Integration of Image ...samondjm/papers/CronerandAlbright1999.pdf · Seeing the Big Picture: Review Integration of Image Cues in

Seeing the bigger picture: Conditions that influence

Seeing the Bigger Picture in Claims Reserving

Seeing a bigger picture - Tobias Fors - Citerus

Seeing the BIG PICTURE Looking at the Bible from beginning to end

Hurricane Electric : IXPs, Global Networking and Partnership Opportunities: Seeing the Big Picture

Seeing The Whole Picture

Seeing the Big Picture The Ten Principles of Systems Thinking

Beyond Green Building & Codes: Seeing a Bigger Picture€¦ · Beyond Green Building & Codes: Seeing a Bigger Picture David Eisenberg Director Development Center for Appropriate Technology

“Seeing the Big Picture”.pdf

Seeing the Big Picture - edX...Image Segmentation 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 1 • Divide up digital images to salient regions/clusters corresponding

Riverside youth council seeing the big picture august 21

Seeing the bigger picture Summary Report 2014s3-eu-west-1.amazonaws.com › skygroup-sky-static › ... · the long term • Driving eficiency f • Seeing the bigger picture •

Seeing the bigger picture: Conditions that influence effective …eprints.usq.edu.au/37920/1/Influencing Conditions for... · 2020. 2. 6. · Seeing the bigger picture: Conditions

Portfolio Kanban - Seeing the Bigger Picture by Sandy Mamoli

Seeing the Big Picture - AIT Solutionssolutions.ait.ac.th/resources/pdf/11.Seeing the Big... · 2013-08-21 · l 2 –Introduction •Seeing Details and Context in Big Data –Example:

SEEING THE BIG PICTURE - Baker Law, Design & Construction

Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?

Seeing the bigger picture: context-aware regulations

Seeing the Bigger Picture: Исследуем всех?