Upload
emery-berger
View
193
Download
5
Embed Size (px)
DESCRIPTION
Dthreads is an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. It is easy to use: just link your program with -ldthread instead of -lpthread. Dthreads can be downloaded from its source code repo on GitHub (https://github.com/plasma-umass/dthreads). A technical paper describing Dthreads appeared at SOSP 2012 (https://github.com/plasma-umass/dthreads/blob/master/doc/dthreads-sosp11.pdf?raw=true). Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads. This talk presents Dthreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks we evaluated, matches and occasionally exceeds the performance of pthreads.
Citation preview
Tongping Liu, Charlie Curtsinger, Emery Berger
DTHREADS: Efficient Deterministic Multithreading
Insanity: Doing the same thing over and
over again and expecting different
results.
2
In the Beginning…
3
There was the Core.
4
And it was Good.
5
It gave us our Daily Speed.
6
Until the Apocalypse.
7
And the Speed was no Moore.
8
And then came a False Prophet…
9
10
Want speed?
11
I BRING YOU THE GIFT OF PARALLELISM!
12
color = ; row = 0; // globalsvoid nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++;}for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe);for (n = 0; n < 9; n++) pthread_join(t[n]);JUST USE THREADS…
13
14
15
16
17
18
pthreads
race conditions
atomicity violations
deadlock
order violations
19
Salvation?
20
21
pthreads
race conditions
atomicity violations
deadlock
order violations
DTHREADS
deterministic
race conditions
atomicity violations
deadlock
order violations
22DTHREADS Enables…
Race-free Executions
Replay Debugging w/o Logging
Replicated State Machines
23
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
CoreDet dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
s 8.4
Overhead with CoreDet
7.8
DTHREADS: Efficient Determinism
Usually faster than the state of the art
24
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
CoreDet dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
s 8.4
Overhead with CoreDet
7.8
DTHREADS: Efficient Determinism
Generally as fast or faster than pthreads
25
% g++ myprog.cpp –l thread
DTHREADS: Easy to Use
p
26
Isolation
shared address space disjoint address spaces
27
Performance: Processes vs. Threads
threadsprocesses
1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Nor
mal
ized
Exec
ution
Tim
e
28
Performance: Processes vs. Threads
threadsprocesses
1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Nor
mal
ized
Exec
ution
Tim
e
29
Performance: Processes vs. Threads
threadsprocesses
1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Nor
mal
ized
Exec
ution
Tim
e
30
“Shared Memory”
31
Snapshot pagesbefore modifications
“Shared Memory”
32
Write back diffs
“Shared Memory”
33
“Thread” 1
“Thread” 2
“Thread” 3
Parallel Serial
Update in Deterministic Time & Order
Parallelmutex_lock
cond_wait
pthread_create
34
PHOENIX
histogra
m
kmea
ns
linea
r_reg
ressio
n
matrix_
multiply pca
revers
e_index
string_
match
word_count
PARSEC
blacksc
holes
cannea
l
dedup
ferret
strea
mcluste
r
swap
tions
hmean
0
1
2
3
4
dthreads pthreads
runti
me
rela
tive
to p
thre
ads
DTHREADS performance analysis
35
Thread 1
Main Memory
Core 1
Thread 2
Core 2
Invalidate
The Culprit: False Sharing
36
Thread 1 Thread 2
Invalidate
Main Memory
Core 1 Core 2
The Culprit: False Sharing
20x
37
Process 1 Process 2
Global State
Core 1 Core 2
Process 2
Process 1
DTHREADS: Eliminates False Sharing!
38
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
ordering only isolation only dthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Dthreads detailed analysis
DTHREADS: Detailed Analysis
39
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
ordering only isolation only dthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Dthreads detailed analysis
DTHREADS: Detailed Analysis
40
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
ordering only isolation only dthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Dthreads detailed analysis
DTHREADS: Detailed Analysis
41
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
CoreDet dthreads pthreads
spee
dup
of 8
cor
es o
ver 2
cor
es
Scalability
DTHREADS: Scalable Determinism
42
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
CoreDet dthreads pthreads
spee
dup
of 8
cor
es o
ver 2
cor
es
Scalability
DTHREADS: Scalable Determinism
43
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
CoreDet dthreads pthreads
spee
dup
of 8
cor
es o
ver 2
cor
es
Scalability
DTHREADS: Scalable Determinism
44
DTHREADS
% g++ myprog.cpp –l threadp
45
End