Transcript
Page 1: Dthreads: Efficient Deterministic Multithreading

Tongping Liu, Charlie Curtsinger, Emery Berger

DTHREADS: Efficient Deterministic Multithreading

Insanity: Doing the same thing over and

over again and expecting different

results.

Page 2: Dthreads: Efficient Deterministic Multithreading

2

In the Beginning…

Page 3: Dthreads: Efficient Deterministic Multithreading

3

There was the Core.

Page 4: Dthreads: Efficient Deterministic Multithreading

4

And it was Good.

Page 5: Dthreads: Efficient Deterministic Multithreading

5

It gave us our Daily Speed.

Page 6: Dthreads: Efficient Deterministic Multithreading

6

Until the Apocalypse.

Page 7: Dthreads: Efficient Deterministic Multithreading

7

And the Speed was no Moore.

Page 8: Dthreads: Efficient Deterministic Multithreading

8

And then came a False Prophet…

Page 9: Dthreads: Efficient Deterministic Multithreading

9

Page 10: Dthreads: Efficient Deterministic Multithreading

10

Want speed?

Page 11: Dthreads: Efficient Deterministic Multithreading

11

I BRING YOU THE GIFT OF PARALLELISM!

Page 12: Dthreads: Efficient Deterministic Multithreading

12

color = ; row = 0; // globalsvoid nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++;}for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe);for (n = 0; n < 9; n++) pthread_join(t[n]);JUST USE THREADS…

Page 13: Dthreads: Efficient Deterministic Multithreading

13

Page 14: Dthreads: Efficient Deterministic Multithreading

14

Page 15: Dthreads: Efficient Deterministic Multithreading

15

Page 16: Dthreads: Efficient Deterministic Multithreading

16

Page 17: Dthreads: Efficient Deterministic Multithreading

17

Page 18: Dthreads: Efficient Deterministic Multithreading

18

pthreads

race conditions

atomicity violations

deadlock

order violations

Page 19: Dthreads: Efficient Deterministic Multithreading

19

Salvation?

Page 20: Dthreads: Efficient Deterministic Multithreading

20

Page 21: Dthreads: Efficient Deterministic Multithreading

21

pthreads

race conditions

atomicity violations

deadlock

order violations

DTHREADS

deterministic

race conditions

atomicity violations

deadlock

order violations

Page 22: Dthreads: Efficient Deterministic Multithreading

22DTHREADS Enables…

Race-free Executions

Replay Debugging w/o Logging

Replicated State Machines

Page 23: Dthreads: Efficient Deterministic Multithreading

23

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Usually faster than the state of the art

Page 24: Dthreads: Efficient Deterministic Multithreading

24

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Generally as fast or faster than pthreads

Page 25: Dthreads: Efficient Deterministic Multithreading

25

% g++ myprog.cpp –l thread

DTHREADS: Easy to Use

p

Page 26: Dthreads: Efficient Deterministic Multithreading

26

Isolation

shared address space disjoint address spaces

Page 27: Dthreads: Efficient Deterministic Multithreading

27

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

Page 28: Dthreads: Efficient Deterministic Multithreading

28

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

Page 29: Dthreads: Efficient Deterministic Multithreading

29

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

Page 30: Dthreads: Efficient Deterministic Multithreading

30

“Shared Memory”

Page 31: Dthreads: Efficient Deterministic Multithreading

31

Snapshot pagesbefore modifications

“Shared Memory”

Page 32: Dthreads: Efficient Deterministic Multithreading

32

Write back diffs

“Shared Memory”

Page 33: Dthreads: Efficient Deterministic Multithreading

33

“Thread” 1

“Thread” 2

“Thread” 3

Parallel Serial

Update in Deterministic Time & Order

Parallelmutex_lock

cond_wait

pthread_create

Page 34: Dthreads: Efficient Deterministic Multithreading

34

PHOENIX

histogra

m

kmea

ns

linea

r_reg

ressio

n

matrix_

multiply pca

revers

e_index

string_

match

word_count

PARSEC

blacksc

holes

cannea

l

dedup

ferret

strea

mcluste

r

swap

tions

hmean

0

1

2

3

4

dthreads pthreads

runti

me

rela

tive

to p

thre

ads

DTHREADS performance analysis

Page 35: Dthreads: Efficient Deterministic Multithreading

35

Thread 1

Main Memory

Core 1

Thread 2

Core 2

Invalidate

The Culprit: False Sharing

Page 36: Dthreads: Efficient Deterministic Multithreading

36

Thread 1 Thread 2

Invalidate

Main Memory

Core 1 Core 2

The Culprit: False Sharing

20x

Page 37: Dthreads: Efficient Deterministic Multithreading

37

Process 1 Process 2

Global State

Core 1 Core 2

Process 2

Process 1

DTHREADS: Eliminates False Sharing!

Page 38: Dthreads: Efficient Deterministic Multithreading

38

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

Page 39: Dthreads: Efficient Deterministic Multithreading

39

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

Page 40: Dthreads: Efficient Deterministic Multithreading

40

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

Page 41: Dthreads: Efficient Deterministic Multithreading

41

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

Page 42: Dthreads: Efficient Deterministic Multithreading

42

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

Page 43: Dthreads: Efficient Deterministic Multithreading

43

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

Page 44: Dthreads: Efficient Deterministic Multithreading

44

DTHREADS

% g++ myprog.cpp –l threadp

Page 45: Dthreads: Efficient Deterministic Multithreading

45

End


Recommended