Upload
ira-burns
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Dancing With Uncertainty
Saša Misailović
Stelios Sidiroglou
Martin Rinard
MIT CSAIL
ExampleWater: Simulates system of water molecules
HHO
HHO
HHO
H
HO
H
HO
HHO
HHO
ExampleWater: Simulates system of water molecules
HHO
HHO
HHO
H
HO
H
HO
HHO
HHO
ExampleWater: Simulates system of water molecules
HHO
HHO
HHO
H
HO
H
HO
HHO
HHO
ExampleWater: Simulates liquid water molecules
HHO
HHO
HHO
H
HO
H
HO
HHO
HHO
ExampleWater: Simulates system of water molecules
HHO
HHO
HHO
H
HO
H
HO
HHO
HHO
ExampleWater: Simulates system of water molecules
HHO
HHO
HHO
H
HO
H
HO
HHO
HHO
Dubstep
Explores the effects of selectively removing
synchronization
Dubstep Highlights
1. Removing locks and opportunistic barrierstrade accuracy for performance
2. Automatically explores the tradeoff space induced by candidate transformations
3. Uses statistical analysis to characterize impact of transformations on accuracy
Dubstep Workflow
Prepare
Find
Transform
Analyze
Navigate
Dubstep Workflow
Prepare
Find
Transform
Analyze
Navigate
1. Prepare representative inputs
2. Prepare accuracy model– Output abstraction
(important parts of output)– Accuracy bound (amount of
tolerable error)
Dubstep Workflow
Prepare
Find
Transform
Analyze
Navigate
Loops with parallel constructs• Profiling: performance &
memory
Dubstep Workflow
Prepare
Find
Transform
Analyze
Navigate
Loops with parallel constructs• Profiling: performance &
memoryInterf (56.4%)
Poteng (43.4%)
Dubstep Workflow
Removing synchronizationPrepare
Find
Transform
Analyze
Navigate
void scratchPad::updateForces (double R[3][3]) { mutex_lock(this->lock); this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); mutex_unlock(this->lock);}
Dubstep Workflow
Removing synchronizationPrepare
Find
Transform
Analyze
Navigate
void scratchPad::updateForces (double R[3][3]) { mutex_lock(this->lock); this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); mutex_unlock(this->lock);}
Dubstep Workflow
Removing synchronizationPrepare
Find
Transform
Analyze
Navigate
void scratchPad::updateForces (double R[3][3]) { this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); }
Dubstep Workflow
Opportunistic barriersPrepare
Find
Transform
Analyze
Navigate
void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }
Dubstep Workflow
Opportunistic barriersPrepare
Find
Transform
Analyze
Navigate
void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }
Dubstep Workflow
Opportunistic barriersPrepare
Find
Transform
Analyze
Navigate
void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }
• Schedule threads • Execute interf_body in
parallel• Wait for all threads to
complete
Dubstep Workflow
Opportunistic barriersPrepare
Find
Transform
Analyze
Navigate
void ensemble::interf(){ parallel_for*( interf_body, 0, NumMol-1 ); }
• Schedule threads • Execute interf_body in
parallel• Wait for half of threads to
completeInstruct remaining threads to stop
[Rinard, OOPSLA 2007]
Dubstep Workflow
Analyze transformed program:
• Criticality– Memory safety, integrity
• Performance – Speedup comparison
• Accuracy– Statistical analysis
Prepare
Find
Transform
Analyze
Navigate
Dubstep Workflow
Prepare
Find
Transform
Analyze
Navigate
c
InputOriginal ProgramOutput
Output Abstraction(Application-Specific)
Transformed
Program
Difference Bound
δ<
Dubstep Workflow
Navigate the tradeoff space:• Transform and analyze
one location at a time– 3 locations in water
• Transform multiple locations in the same candidate program– Guided by the results of the
previous step
Prepare
Find
Transform
Analyze
Navigate
Search Space Exploration
0 0.01 0.02 0.03 0.04 0.05 0.06 0.071
1.05
1.1
1.15
1.2
1.25
Average Accuracy Loss vs. Speedup
LI BI
BRLI+BI
LI+BP
BI+BP
LI+BI+BP
Rela
tive
Sp
eed
up
Accuracy loss
LI – Synchronization InterfBI – Barrier InterfBP – Barrier Poteng
Baseline: original parallel program runs
6.2 times faster than sequential on 8 cores
Search Space Exploration
0 0.01 0.02 0.03 0.04 0.05 0.06 0.071
1.05
1.1
1.15
1.2
1.25
Average Accuracy Loss vs. Speedup
LI BI
BRLI+BI
LI+BP
BI+BP
LI+BI+BP
Rela
tive
Sp
eed
up
Accuracy loss
LI – Synchronization InterfBI – Barrier InterfBP – Barrier Poteng
How confident can we be about these observations?
Baseline: original parallel program runs
6.2 times faster than sequential on 8 cores
Execution Reliability
The probability p that the transformed program on the given
input produces the result with error less than
bound δ𝐩=𝐏𝐫 [|𝐑𝐞𝐬𝐎−𝐑𝐞𝐬𝐓
𝐑𝐞𝐬𝐎 |≤𝛅]While we cannot model p, we can specify
minimum acceptable reliability r
Execution Reliability
• Repeat execution N times:• Observations: if , else 0
• Compute statistic p’
• Return Yes if p’ > r + • Return No otherwise
Determine if program’s reliability p > r
Execution Reliability
• Repeat execution N times:• Observations: if , else 0
• Compute statistic p’
• Return Yes if p’ > r + • Return No otherwise
Determine if program’s reliability p > r
How to pick N?
How Many Runs Are Enough?
Procedure that determines that p > r :
• Returns correct result most of the time– Wrong decision rate – Tolerance region
• Quickly determines extreme (very good or bad) transformations
Statistical AnalysisSequential Probability Ratio Test
Two hypotheses:
H0: p > r +
H1: p < r
• Collects one observation in every iteration
• Updates likelihoods of H0 and H1 based on the
previous observation • Stops when wrong decision less than
specified • N is not fixed, depends on observations p
Statistical AnalysisSequential Probability Ratio Test
• r (acceptable reliability) = 0.90• (wrong decision rate) = 0.10• ε (tolerance region) = 0.02
• If program always produces acceptable result, test says Yes after 100 runs
• If program never produces acceptable result, test says No after 10 runs
Statistical AnalysisSequential Probability Ratio Test
• r (acceptable reliability) = 0.90• (wrong decision rate) = 0.10• ε (tolerance region) = 0.02
Bound (δ)Best
Transformation
0.01 LI
0.05 LI
0.10 LI+BI+BR
0.15 LI+BI+BR
Statistical AnalysisSequential Probability Ratio Test
• r (acceptable reliability) = 0.90• (wrong decision rate) = 0.10• ε (tolerance region) = 0.02
Bound (δ)Best
Transformation
0.01 LI
0.05 LI
0.10 LI+BI+BR
0.15 LI+BI+BR
Exploring Tradeoff Space
Start: Sequential program with for loopsTransformations:• Parallel loop introduction• Synchronization,
ReplicationQuickstep [MIT-TR-2010-38, TECS/PEC 2012]
Prepare
Find
Transform
Analyze
Navigate
Exploring Tradeoff Space
Start: Program with for loopsTransformations:• Skip loop iterations
(multiple forms)
Loop Perforation[ICSE 2010, ONWARD 2010, SAS 2011, FSE 2011]
Prepare
Find
Transform
Analyze
Navigate
Exploring Tradeoff Space
Start: Program with command line parametersTransformations:• Alternate function
versions activated by CL parameters
Dynamic Knobs [ASPLOS 2011]
Prepare
Find
Transform
Analyze
Navigate
Exploring Tradeoff Space
Start: Program is a tree of Map-Reduce type tasksTransformations:• Function Substitution• Reduction Sampling
NapRed[POPL 2012]
Prepare
Find
Transform
Analyze
Navigate
Exploring Tradeoff Space
Start: Parallel program with for loopsTransformations:• Removing Locks• Opportunistic Barriers
Dubstep[Today: RACES 2012]
Prepare
Find
Transform
Analyze
Navigate
Reasoning About Accuracy
Exploring levels of accuracy guarantees:• Logic-based• Probabilistic• Statistical• Empirical
Prepare
Find
Transform
Analyze
Navigate