45
Tongping Liu, Charlie Curtsinger, Emery Berger DTHREADS: Efficient Deterministic Multithreading Insanity: Doing the same thing over and over again and expecting different results.

Dthreads: Efficient Deterministic Multithreading

Embed Size (px)

DESCRIPTION

Dthreads is an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. It is easy to use: just link your program with -ldthread instead of -lpthread. Dthreads can be downloaded from its source code repo on GitHub (https://github.com/plasma-umass/dthreads). A technical paper describing Dthreads appeared at SOSP 2012 (https://github.com/plasma-umass/dthreads/blob/master/doc/dthreads-sosp11.pdf?raw=true). Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads. This talk presents Dthreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks we evaluated, matches and occasionally exceeds the performance of pthreads.

Citation preview

Page 1: Dthreads: Efficient Deterministic Multithreading

Tongping Liu, Charlie Curtsinger, Emery Berger

DTHREADS: Efficient Deterministic Multithreading

Insanity: Doing the same thing over and

over again and expecting different

results.

Page 2: Dthreads: Efficient Deterministic Multithreading

2

In the Beginning…

Page 3: Dthreads: Efficient Deterministic Multithreading

3

There was the Core.

Page 4: Dthreads: Efficient Deterministic Multithreading

4

And it was Good.

Page 5: Dthreads: Efficient Deterministic Multithreading

5

It gave us our Daily Speed.

Page 6: Dthreads: Efficient Deterministic Multithreading

6

Until the Apocalypse.

Page 7: Dthreads: Efficient Deterministic Multithreading

7

And the Speed was no Moore.

Page 8: Dthreads: Efficient Deterministic Multithreading

8

And then came a False Prophet…

Page 9: Dthreads: Efficient Deterministic Multithreading

9

Page 10: Dthreads: Efficient Deterministic Multithreading

10

Want speed?

Page 11: Dthreads: Efficient Deterministic Multithreading

11

I BRING YOU THE GIFT OF PARALLELISM!

Page 12: Dthreads: Efficient Deterministic Multithreading

12

color = ; row = 0; // globalsvoid nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++;}for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe);for (n = 0; n < 9; n++) pthread_join(t[n]);JUST USE THREADS…

Page 13: Dthreads: Efficient Deterministic Multithreading

13

Page 14: Dthreads: Efficient Deterministic Multithreading

14

Page 15: Dthreads: Efficient Deterministic Multithreading

15

Page 16: Dthreads: Efficient Deterministic Multithreading

16

Page 17: Dthreads: Efficient Deterministic Multithreading

17

Page 18: Dthreads: Efficient Deterministic Multithreading

18

pthreads

race conditions

atomicity violations

deadlock

order violations

Page 19: Dthreads: Efficient Deterministic Multithreading

19

Salvation?

Page 20: Dthreads: Efficient Deterministic Multithreading

20

Page 21: Dthreads: Efficient Deterministic Multithreading

21

pthreads

race conditions

atomicity violations

deadlock

order violations

DTHREADS

deterministic

race conditions

atomicity violations

deadlock

order violations

Page 22: Dthreads: Efficient Deterministic Multithreading

22DTHREADS Enables…

Race-free Executions

Replay Debugging w/o Logging

Replicated State Machines

Page 23: Dthreads: Efficient Deterministic Multithreading

23

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Usually faster than the state of the art

Page 24: Dthreads: Efficient Deterministic Multithreading

24

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Generally as fast or faster than pthreads

Page 25: Dthreads: Efficient Deterministic Multithreading

25

% g++ myprog.cpp –l thread

DTHREADS: Easy to Use

p

Page 26: Dthreads: Efficient Deterministic Multithreading

26

Isolation

shared address space disjoint address spaces

Page 27: Dthreads: Efficient Deterministic Multithreading

27

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

Page 28: Dthreads: Efficient Deterministic Multithreading

28

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

Page 29: Dthreads: Efficient Deterministic Multithreading

29

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

Page 30: Dthreads: Efficient Deterministic Multithreading

30

“Shared Memory”

Page 31: Dthreads: Efficient Deterministic Multithreading

31

Snapshot pagesbefore modifications

“Shared Memory”

Page 32: Dthreads: Efficient Deterministic Multithreading

32

Write back diffs

“Shared Memory”

Page 33: Dthreads: Efficient Deterministic Multithreading

33

“Thread” 1

“Thread” 2

“Thread” 3

Parallel Serial

Update in Deterministic Time & Order

Parallelmutex_lock

cond_wait

pthread_create

Page 34: Dthreads: Efficient Deterministic Multithreading

34

PHOENIX

histogra

m

kmea

ns

linea

r_reg

ressio

n

matrix_

multiply pca

revers

e_index

string_

match

word_count

PARSEC

blacksc

holes

cannea

l

dedup

ferret

strea

mcluste

r

swap

tions

hmean

0

1

2

3

4

dthreads pthreads

runti

me

rela

tive

to p

thre

ads

DTHREADS performance analysis

Page 35: Dthreads: Efficient Deterministic Multithreading

35

Thread 1

Main Memory

Core 1

Thread 2

Core 2

Invalidate

The Culprit: False Sharing

Page 36: Dthreads: Efficient Deterministic Multithreading

36

Thread 1 Thread 2

Invalidate

Main Memory

Core 1 Core 2

The Culprit: False Sharing

20x

Page 37: Dthreads: Efficient Deterministic Multithreading

37

Process 1 Process 2

Global State

Core 1 Core 2

Process 2

Process 1

DTHREADS: Eliminates False Sharing!

Page 38: Dthreads: Efficient Deterministic Multithreading

38

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

Page 39: Dthreads: Efficient Deterministic Multithreading

39

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

Page 40: Dthreads: Efficient Deterministic Multithreading

40

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

Page 41: Dthreads: Efficient Deterministic Multithreading

41

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

Page 42: Dthreads: Efficient Deterministic Multithreading

42

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

Page 43: Dthreads: Efficient Deterministic Multithreading

43

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

Page 44: Dthreads: Efficient Deterministic Multithreading

44

DTHREADS

% g++ myprog.cpp –l threadp

Page 45: Dthreads: Efficient Deterministic Multithreading

45

End