Dthreads: Efficient Deterministic Multithreading

Preview:

DESCRIPTION

Dthreads is an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. It is easy to use: just link your program with -ldthread instead of -lpthread. Dthreads can be downloaded from its source code repo on GitHub (https://github.com/plasma-umass/dthreads). A technical paper describing Dthreads appeared at SOSP 2012 (https://github.com/plasma-umass/dthreads/blob/master/doc/dthreads-sosp11.pdf?raw=true). Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads. This talk presents Dthreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks we evaluated, matches and occasionally exceeds the performance of pthreads.

Citation preview

Tongping Liu, Charlie Curtsinger, Emery Berger

DTHREADS: Efficient Deterministic Multithreading

Insanity: Doing the same thing over and

over again and expecting different

results.

2

In the Beginning…

3

There was the Core.

4

And it was Good.

5

It gave us our Daily Speed.

6

Until the Apocalypse.

7

And the Speed was no Moore.

8

And then came a False Prophet…

9

10

Want speed?

11

I BRING YOU THE GIFT OF PARALLELISM!

12

color = ; row = 0; // globalsvoid nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++;}for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe);for (n = 0; n < 9; n++) pthread_join(t[n]);JUST USE THREADS…

13

14

15

16

17

18

pthreads

race conditions

atomicity violations

deadlock

order violations

19

Salvation?

20

21

pthreads

race conditions

atomicity violations

deadlock

order violations

DTHREADS

deterministic

race conditions

atomicity violations

deadlock

order violations

22DTHREADS Enables…

Race-free Executions

Replay Debugging w/o Logging

Replicated State Machines

23

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Usually faster than the state of the art

24

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Generally as fast or faster than pthreads

25

% g++ myprog.cpp –l thread

DTHREADS: Easy to Use

p

26

Isolation

shared address space disjoint address spaces

27

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

28

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

29

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

30

“Shared Memory”

31

Snapshot pagesbefore modifications

“Shared Memory”

32

Write back diffs

“Shared Memory”

33

“Thread” 1

“Thread” 2

“Thread” 3

Parallel Serial

Update in Deterministic Time & Order

Parallelmutex_lock

cond_wait

pthread_create

34

PHOENIX

histogra

m

kmea

ns

linea

r_reg

ressio

n

matrix_

multiply pca

revers

e_index

string_

match

word_count

PARSEC

blacksc

holes

cannea

l

dedup

ferret

strea

mcluste

r

swap

tions

hmean

0

1

2

3

4

dthreads pthreads

runti

me

rela

tive

to p

thre

ads

DTHREADS performance analysis

35

Thread 1

Main Memory

Core 1

Thread 2

Core 2

Invalidate

The Culprit: False Sharing

36

Thread 1 Thread 2

Invalidate

Main Memory

Core 1 Core 2

The Culprit: False Sharing

20x

37

Process 1 Process 2

Global State

Core 1 Core 2

Process 2

Process 1

DTHREADS: Eliminates False Sharing!

38

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

39

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

40

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

41

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

42

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

43

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

44

DTHREADS

% g++ myprog.cpp –l threadp

45

End

Recommended