36
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University 1

Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Embed Size (px)

Citation preview

Page 1: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Sound and Precise Analysis ofParallel Programs through

Schedule Specialization

Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng YangColumbia University

1

Page 2: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

2

Motivation

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

AnalyzedSchedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

?

• Analyzing parallel programs is difficult.

Page 3: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

3

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

Schedule Specialization

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedules

ScheduleSpecialization

Page 4: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

4

Enforcing Schedules Using Peregrine

• Deterministic multithreading– e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet

(ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)

– Performance overhead• e.g. Kendo: 16%, Tern & Peregrine: 39.1%

• Peregrine– Record schedules, and reuse them on a wide range of

inputs.– Represent schedules explicitly.

Page 5: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

5

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

Schedule Specialization

soundness (# of analyzed schedules / # of total schedules)

precision

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedulesSchedule

Specialization

Page 6: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

6

Framework

• Extract control flow and data flow enforced by a set of schedules

Schedule

ScheduleSpecializationProgram

C/C++ programwith Pthread

Total order ofsynchronizations

SpecializedProgram

Extra def-usechains

Page 7: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

7

Outline

• Example• Control-Flow Specialization• Data-Flow Specialization• Results• Conclusion

Page 8: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Running Example

int results[p_max];int global_id = 0;

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

8

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlocklock

unlock

Race-free?

Page 9: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

9

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

Page 10: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

10

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

atoi

create

i = 0

i < p

Page 11: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

11

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

create

atoi

i = 0

i < p

create

++i

create

i < p

Page 12: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

12

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

atoi

create

i = 0

i < p

++i

create

i < p

++i

i < p

join

i < p

i = 0

++i

join

i < p

++i

i < p

return

Page 13: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

13

Control-Flow Specialized Program

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0;}

atoi

create

i = 0

i < p

++i

create

i < p

++i

i < p

join

i < p

i = 0

++i

join

i < p

++i

i < p

return

Page 14: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

14

More Challenges onControl-Flow Specialization

• Ambiguity

call

Caller Callee

call

S1

• A schedule has too many synchronizations

ret

S2

Page 15: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

15

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = global_idglobal_id++

my_id = global_idglobal_id++

Page 16: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

16

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = global_idglobal_id++

my_id = global_idglobal_id++

Page 17: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

17

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = global_idglobal_id++

Page 18: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

18

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = 1global_id = 2

Page 19: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0;}

19

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = 1global_id = 2

Page 20: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

20

More Challenges onData-Flow Specialization

• Must/May alias analysis– global_id

• Reasoning about integers– results[0] = compute(0)– results[1] = compute(1)

• Many def-use chains

Page 21: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

21

Evaluation

• Applications– Static race detector– Alias analyzer– Path slicer

• Programs– PBZip2 1.1.5– aget 0.4.1– 8 programs in SPLASH2– 7 programs in PARSEC

Page 22: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

22

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 23: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

23

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 24: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

24

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 25: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

25

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 26: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

26

Static Race Detector: Harmful Races Detected

• 4 in aget• 2 in radix• 1 in fft

Page 27: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

27

Precision of Schedule-AwareAlias Analysis

Page 28: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

28

Precision of Schedule-AwareAlias Analysis

Page 29: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

29

Precision of Schedule-AwareAlias Analysis

Page 30: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

30

Conclusion and Future Work

• Designed and implemented schedule specialization framework– Analyzes the program over a small set of schedules– Enforces these schedules at runtime

• Built and evaluated three applications– Easy to use– Precise

• Future work– More applications– Similar specialization ideas on sequential programs

Page 31: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

31

Related Work

• Program analysis for parallel programs– Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09)

• Slicing– Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser

(PhD thesis), Zhang (PLDI ’04)• Deterministic multithreading

– DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)

• Program specialization– Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92),

Nirkhe (POPL ’92), Reps (PDSPE ’96)

Page 32: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

32

Backup Slides

Page 33: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

33

Specialization Time

Page 34: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

34

Handling Races

• We do not assume data-race freedom. • We could if our only goal is optimization.

Page 35: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

35

Input Coverage

• Use runtime verification for the inputs not covered

• A small set of schedules can cover a wide range of inputs

Page 36: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

36