61
A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto Marek Olszewski Jeremy Cutler Greg Steffan

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

  • Upload
    carr

  • View
    40

  • Download
    2

Embed Size (px)

DESCRIPTION

Marek Olszewski. Jeremy Cutler. Greg Steffan. A Dynamic Binary-Rewriting Approach to Software Transactional Memory. appeared in PACT 2007, Brasov, Romania University of Toronto. The Parallel Programming Challenge. Coarse-grained locking Easy to program  Scales poorly  - PowerPoint PPT Presentation

Citation preview

Page 1: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

appeared in PACT 2007, Brasov, Romania

University of Toronto

Marek Olszewski Jeremy Cutler Greg Steffan

Page 2: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

2

The Parallel Programming Challenge Coarse-grained locking

Easy to program Scales poorly

Fine-grained locking Scales well Hard to get right

eg., deadlock, priority inversion, etc. The promise of Transactional Memory

As easy to program as coarse-grained locking Performance/scalability of fine-grained locking

Page 3: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

3

Transactional Memory (TM)Source Code:

...atomic { ... access_shared_data(); ...}...

TM System

Specifies threads/transactions in source code

...atomic { ... access_shared_data(); ...}...

...atomic { ... access_shared_data(); ...}...

Transactions:

Executes transactions optimistically in parallel

Programmer:TM System:

1) Checkpoints execution2) Detects conflicts

? ?

3) Commits or aborts and re-executes

Page 4: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

4

TM Implementations Flavors of TM:

Hardware (HTM), Software (STM), Hybrid (HyTM) STM is especially compelling

Exploit current commodity hardware (multicores) Learn about real TM systems and apps

Current STM Systems: Java: DSTM, ASTM C or C++: McRT icc, TL2, RSTM, OSTM

object-based or programmer intensive (or both)

Our focus: arbitrary C/C++, realistic environment

Page 5: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

5

my_app

my_app

Programming with STM

#include <glib.h>

GTree *tree;

...atomic {g_tree_insert(tree &key, &val);}...

STM Compiler

Source Code:

Executable:

Shared Library:

glib

Running Application:

Not handled by current compiler/library-based STMs

Loader

kernel

“Legacy Locks”Pre-compiled BinarySystem Calls

Page 6: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

6

JudoSTM: An Overview Key design choices:

1) Dynamic Binary Rewriting (DBR) insert instrumentation to implement STM

2) Value-based conflict detection

Resulting key features:1) Privileged transactions (support system calls)2) Legacy lock elision3) Efficient invisible readers

Page 7: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

7

JudoSTM Design Choice 1

Dynamic Binary Rewriting (DBR) Judo DBR Framework (user-space version of JIFL†)

† JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007

Page 8: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

8

Dynamic Binary RewritingOriginal Code: Code Cache:

bb1

Judo

bb3bb2

bb4

bb1 bb1

Page 9: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

9

Dynamic Binary RewritingOriginal Code: Code Cache:

bb1

Judo

bb3bb2

bb4

bb1

bb2 bb2

Page 10: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

10

Dynamic Binary RewritingOriginal Code: Code Cache:

bb1

Judo

bb3bb2

bb4

bb1

bb2

bb4 bb4

bb1

bb2

Page 11: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

11

Judo - Performance0.

93

1.26

1.06

1.04

1.03

1.41

0.95

1.25

1.53

1.15

1.03

2.20

1.11

1.05 1.07

1.50

1.05

1.41

1.31

1.26

0.80

1.00

1.20

1.40

1.60

1.80

2.00

2.20

2.40

mcf gcc vpr gzip bzip2 vortex twolf crafty eon GeometricMean

Judo DynamoRIO

Nor

mal

ized

Run

time

Ove

rhea

d

Overhead low enough to implement STM?

Page 12: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

12

DBR-Based STM Goal: Perform These Efficiently For all non-stack write instructions

Track write addresses and values (write-set) Write-buffer the values from regular memory

For all non-stack read instructions Redirect to the write-buffer If miss: track read addr.s and values (read-set)

When a transaction completes:1) Acquire commit lock(s)2) Validate read-set (value-based conflict detection)3) Commit write-set to memory4) Release commit lock(s)

Page 13: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

13

DBR: Attractive Properties for STM Performance: overheads are amortized

code cache Can handle arbitrary code and shared libraries

any/all code is transactionalized as it executes Sandboxed Transactions

Typical STM: inconsistent values could stray execution

i.e., stray to non-transactionalized code (very bad!) solution: frequent & costly read-set validation

DBR-based STM: any/all code is transactionalized as it executes

Tough problems for conventional STMs addressed by DBR

Page 14: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

14

JudoSTM Design Choice 2

Value-Based Conflict Detection (as opposed to location-based)

Page 15: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

15

Location-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

0 0 0Strip versions:

Strip versions:

Strip versions:

Strips

2 3 56 2 3 5

Page 16: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

16

Location-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:0 0 0

Legend:Read Written

02 3 56

Strip versions:

Strip versions:

Strip versions:

Transaction 1:

2 3 5

2 3 5

0

Page 17: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

17

Location-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:0 0 0

Legend:Read Written

02 3 56

Strip versions:

Strip versions:

Strip versions:Transaction 2:

2 3 50

026

6 9

Page 18: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

18

Location-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:0 0 0

Legend:Read Written

02 3 56

Strip versions:

Strip versions:

Strip versions:Transaction 2:

2 3 50

0

26

6 9

Commit step 1) Validate Read Set

Commit step 2) Publish Writes (and inc version #s)

9

1

Page 19: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

19

Location-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:0 1 0

Legend:Read Written

02 3 56

Strip versions:

Strip versions:

Strip versions:

Transaction 1: 2 3 50

0

96

Commit step 1) Validate Read Set

Abort!

Note: all transactions must maintain strip version #s

Page 20: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

20

Value-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 1:

2 3 5

2 3 5

Page 21: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

21

Value-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 2:

2 3 5

26

6 9

Page 22: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

22

Value-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 2:

2 3 5

26

6 9

Commit step 1) Validate Read Set

Commit step 2) Publish Writes

9

Page 23: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

23

Value-Based Conflict Detection

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 1: 2 3 5

96

Commit step 1) Validate Read SetAbort!

Note: no version information to maintain

Page 24: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

24

Privileged transactions Can execute (but not roll back) system calls Grab commit lock(s) when about to make a syscall

Release when transaction completes Only one privileged transaction exists at a time

JudoSTM Feature 1:

Page 25: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

25

Privileged Transactions

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 1:

2 3 5

2 3 5

Page 26: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

26

Privileged Transactions

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 2:

2 3 5

26

9

Privileged: can write directly to memory(privileged, syscalls)

may be uninstrumented

Page 27: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

27

Privileged Transactions

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

2 3 56

Transaction 1: 2 3 5

96

Commit step 1) Validate Read SetAbort!

Value-based conflict detection facilitates system calls within transactions!

Page 28: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

28

Legacy Lock Elision Safely ignore locks within legacy code

JudoSTM Feature 2:

Page 29: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

29

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

20

Transaction 1:

5Lock: 26

Read/Write

lock acquire

0

01

Page 30: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

30

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

20

Transaction 2:

5Lock: 26

Read/Write

0

01

01lock acquire

Page 31: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

31

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

20

Transaction 2:

5Lock: 66

9

Read/Write

0

01

01 6lock release

0

Page 32: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

32

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

20

Transaction 2:

5Lock: 66

9

Read/Write

0

01

01 60

Commit step 1) Validate Read Set

Commit step 2) Publish Writes

0 9

silent store

Page 33: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

33

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

50

Transaction 2:

5

7

Lock: 66

Read/Write

0

01 5lock release

0

9

Page 34: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

34

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

50

Transaction 2:

5

7

Lock: 66

Read/Write

0

01 50Commit step 1) Validate Read Set

9

Page 35: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

35

Legacy Lock Elision

Transaction 1:

Transaction 2:

Main Memory:

Legend:Read Written

50

Transaction 2:

5

7

Lock: 66

Read/Write

0

01 50Commit step 2) Publish Writes

0 7

9

Value-based conflict detection facilitates the elision of legacy locks!

Page 36: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

36

JudoSTM Feature 3:

Efficient Invisible Readers

Page 37: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

37

Supporting Invisible Readers Invisible Readers: don’t report reads to others

good performance but can lead to inconsistent read data: errors!

Data errors: segfault, divide by zero Cheap solution: catch with trap/signal handlers

Control errors: jump to non-instrumented code Typical solution: verify read-set after every load

Expensive! O(N2) DBR solution: prevented by sandboxing

DBR instruments all code as it executes

Page 38: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

38

JudoSTM Details

Implementation

Page 39: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

39

(reminder)Goal: Perform These Efficiently

For all non-stack write instructions Track write addresses and values (write-set) Buffer the values from regular memory

For all non-stack read instructions Redirect to the write-buffer If miss: track read addr.s and values (read-set)

When a transaction completes:1) Acquire commit lock(s)2) Validate read-set (value-based conflict detection)3) Commit write-set to memory4) Release commit lock(s)

Page 40: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

40

Read/Write Buffer Implementation

Read Hashtable:

Read Buffer:

Write Hashtable:

Write Buffer:

Linear probed open-addressed hashtables

Address Address

Efficient lookup: 5 insts for a hit (+ state-saving?)Efficient validate and commit?

Page 41: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

41

Efficient Commit: Executable Write-Buffer

movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 ret

Write Hashtable:

Top ptr

Write Buffer:

Pre-allocated buffer of move instructionsEmit value-address pairs as transaction executes

Page 42: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

42

Efficient Commit: Executable Write-Buffer

movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000025,0x80B10BB8 ret

Write Hashtable:

Top ptr

Write Buffer:

Pre-allocated buffer of move instructionsEmit value-address pairs as transaction executes

Page 43: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

43

Efficient Commit: Executable Write-Buffer

movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x0000ab42,0x80B10BCCmovl $0x00000025,0x80B10BB8 ret

Write Hashtable:

Top ptr

Write Buffer:

Pre-allocated buffer of move instructionsEmit value-address pairs as transaction executes

Page 44: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

44

Efficient Commit: Executable Write-Buffer

movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCCmovl $0x00000025,0x80B10BB8 ret

Write Hashtable:

Top ptr

Write Buffer:

Pre-allocated buffer of move instructionsEmit value-address pairs as transaction executes

Page 45: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

45

Efficient Commit: Executable Write-Buffer

movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x00000000,0x00000000 movl $0x00000000,0x00000000movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCCmovl $0x00000025,0x80B10BB8 ret

Write Hashtable:

Top ptr

Write Buffer:

Execute the write-buffer to commit!

Page 46: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

46

cmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000000, 0x00000000jne,pn judostm_trans_abortret

Read Hashtable: Read Buffer:

Efficient Validation: Executable Read-Buffer

Top ptr

Pre-allocated buffer of compare & jump instructionsEmit value-address pairs as transaction executes

Page 47: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

47

cmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000a34, 0x80B10CA4jne,pn judostm_trans_abortret

Read Hashtable: Read Buffer:

Efficient Validation: Executable Read-Buffer

Top ptr

Pre-allocated buffer of compare & jump instructionsEmit value-address pairs as transaction executes

Page 48: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

48

cmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000005, 0x80B10BB8jne,pn judostm_trans_abortcmp $0x00000a34, 0x80B10CA4jne,pn judostm_trans_abortret

Read Hashtable: Read Buffer:

Efficient Validation: Executable Read-Buffer

Top ptr

Pre-allocated buffer of compare & jump instructionsEmit value-address pairs as transaction executes

Page 49: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

49

cmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000100, 0x80B10BCCjne,pn judostm_trans_abortcmp $0x00000005, 0x80B10BB8jne,pn judostm_trans_abortcmp $0x00000a34, 0x80B10CA4jne,pn judostm_trans_abortret

Read Hashtable: Read Buffer:

Efficient Validation: Executable Read-Buffer

Top ptr

Pre-allocated buffer of compare & jump instructionsEmit value-address pairs as transaction executes

Page 50: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

50

cmp $0x00000000, 0x00000000jne,pn judostm_trans_abortcmp $0x00000100, 0x80B10BCCjne,pn judostm_trans_abortcmp $0x00000005, 0x80B10BB8jne,pn judostm_trans_abortcmp $0x00000a34, 0x80B10CA4jne,pn judostm_trans_abortret

Read Hashtable: Read Buffer:

Efficient Validation: Executable Read-Buffer

Top ptr

Execute the read-buffer to validate the read-set!

Page 51: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

51

Evaluation

JudoSTM performance Comparison with Rochester’s RSTM†

† http://www.cs.rochester.edu/research/synchronization/rstm

Page 52: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

52

RSTM vs JudoSTM: DesignRSTM JudoSTM

Language C++ C/C++

Programming model

Library API, rewrite code

atomic{…}

Conflict detection

Object-level location-based

Value-based

Memory Allocation

Custom “Hoard” scalable parallel allocator

Fast commit Object-cloning & pointer-switching

Executable write-buffer

JudoSTM more flexible, less intrusive; but performance?

Page 53: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

53

Experimental Framework RSTM micro-benchmarks

Linked List, Hash Table, RBTree Equal mix of insert, remove, and lookup Measure throughput (transactions/sec)

Test platform 4-way SMP Intel Pentium 4 Xeon - 2.8GHz L1d/L2/L3 cache sizes: 8KB/512KB/2MB Linux 2.6.17.13

with per thread signal handler support

Page 54: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

54

Linked List

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

1 2 3 4Processors

Tran

sact

ions

/ S

econ

d (m

illio

ns)

Coarse-Grained Locking

Fine-Grained Locking

RSTM

Judo (Single Lock)

Judo (Distributed Lock)

Coarse-grained locking best, but not scaling

Page 55: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

55

Linked List – Zoomed in

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1 2 3 4Processors

Tran

sact

ions

/ S

econ

d (m

illio

ns)

Coarse-Grained Locking

Fine-Grained Locking

RSTM

Judo (Single Lock)

Judo (Distributed Lock)

Single-lock JudoSTM scaling nicely ; RSTM flatlined

Page 56: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

56

Hash Table

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

1 2 3 4Processors

Tran

sact

ions

/ S

econ

d (m

illion

s)

Coarse-GrainedLocking

Fine-GrainedLocking

RSTM

Judo (SingleLock)

Judo (DistributedLock)

Distributed-lock JudoSTM beats CG-locking, tracks RSTM

Page 57: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

57

RBTree

0.0

1.0

2.0

3.0

4.0

5.0

6.0

1 2 3 4Processors

Tran

sact

ions

/ S

econ

d (m

illion

s)Coarse-Grained Locking

RSTM

Judo (Single Lock)

Judo (Distributed Lock)

JudoSTM on track to scale past CG-locking; RSTM flatlined

Page 58: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

58

Conclusions Judo: highly-efficient DBR framework

Beats DynamoRIO on SPEC benchmarks JudoSTM: First STM based on DBR

Value-based conflict detection Executable read/write buffers

Desirable features: Efficient invisible readers (sandboxing) Legacy lock elision Privileged transactions (system call support) Performance comparable to RSTM

Facilitates STM for real programs & environments!

Page 59: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

59

Backups

Page 60: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

60

JudoSTM Details

Programming with JudoSTM

Page 61: A Dynamic Binary-Rewriting Approach to Software Transactional Memory

61

my_app

my_app

Programming with JudoSTM

#include <glib.h>#include <judostm.h>GTree *tree;

...atomic { g_tree_insert(tree &key, &val);}...

Source Code:Executable:

Shared Library:

glib

kernel

loader

Running Application:

#include <glib.h>#include <judostm.h>GTree *tree;

...

g_tree_insert(tree &key, &val);

...

Library:

judoSTM

Instrumentedmy_app +

glib

Code Cache

#include <glib.h>#include <judostm.h>GTree *tree;

...atomic { g_tree_insert(tree &key, &val);}...

#include <glib.h>#include <judostm.h>GTree *tree;

...judostm_start() g_tree_insert(tree &key, &val);judostm_stop()...

gcc

Easy to use, with no compiler support!

#ifndef JUDOSTM_H#define JUDOSTM_H

extern void judostm_start(void);extern void judostm_stop(void);

#define atomic \ asm __volatile__ ("":::"eax", "ecx", "edx", "ebx", "edi", \ "esi", "flags", "memory");\ int __count = 0; \ judostm_start();\ for (; __count < 1; judostm_stop(), __count++)

#endif