GRASS : Trimming Stragglers in Approximation …...GRASS : Trimming Stragglers in Approximation...

GRASS: Trimming Stragglers in

Approximation Analytics

Ganesh Ananthanarayanan, Michael Hung,

Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu

Next Generation of Analytics

• Timely results, even if approximate

– Data deluge makes this necessary

Optimal Scheduler

Approximation Dimensions

� Error: Minimize time

to get desired accuracy

“#cars sold to the nearest

thousand”

�Deadline: Maximize

accuracy within deadline

“Pick the best ad to display

within 2s”

*w.r.t. state-of-the-art schedulers (production workloads from

Facebook and Bing)

Improve accuracy by 48% Speedup by 40%

• Prioritize tasks

– Subset of tasks to complete

– #tasks » #slots (multi-waved jobs)

(NP-Hard but many known heuristics…)

• Straggler tasks

– Slowest task can be 8x slower than median task

– Speculation: Spawn a duplicate, earliest wins

• Google[OSDI’04], FB[OSDI’08], Microsoft[OSDI’10]

Scheduling Challenge

Challenge: dynamically prioritize between

speculative & unscheduled tasks

to meet deadline/error bound

Speculative copies consume extra resources

T3T3T3T3

Opportunity Cost

T2T2T2T2

timetimetimetime

Slot 1Slot 1Slot 1Slot 1

Slot 3Slot 3Slot 3Slot 3 T1T1T1T1

55550000 101010109999

Is speculation

worth the

payoff?

T1T1T1T1

Roadmap

1. Two natural scheduling designs

2. GRASS: Combining the two designs

3. Evaluation of GRASS

Greedy Scheduling (GS)

Greedily improve accuracy, i.e., earliest finishing task

T1T1T1T1

T2T2T2T2 T3T3T3T3timetimetimetime

11110000

T4T4T4T4 T5T5T5T5 T6T6T6T6

T7T7T7T7

Task ID T1 T2 T3 T4 T5 T6 T7 T8 T9

remaining

5 --- --- --- --- --- --- --- ---

New copy 2 --- 1 1 1 1 1 1 3

T1T1T1T1Deadline = 6

(at time =1 )

Accuracy = 7/9Straggler

Resource Aware Scheduling (RAS)

Speculate only if it saves time and resources

timetimetimetime

T1T1T1T1

T2T2T2T2 T1T1T1T1

11110000

T6T6T6T6

T3T3T3T3

T4T4T4T4

T5T5T5T5

3333 6666

T7T7T7T7

T8T8T8T8

Task ID T1 T2 T3 T4 T5 T6 T7 T8 T9

remaining

5 --- --- --- --- --- --- --- ---

New copy 2 --- 1 1 1 1 1 1 3

T1T1T1T1

Deadline = 6

(at time =1 )

Accuracy = 8/9

One copy for 5s (vs.)

Two copies for 2s

Straggler

GS vs. RAS

T1T1T1T1

T2T2T2T2 T3T3T3T3timetimetimetime

11110000 3333

T4T4T4T4 T5T5T5T5 T6T6T6T6

T7T7T7T7

Deadline = 6

Accuracy = 7/9

timetimetimetime

T1T1T1T1

T2T2T2T2 T1T1T1T1

T6T6T6T6

T3T3T3T3

T4T4T4T4

T5T5T5T5

T7T7T7T7

T8T8T8T8Deadline = 6

Accuracy = 8/9

Deadline = 3

Accuracy = 3/9T1T1T1T1

Accuracy = 2/9

11110000 3333 6666

Neither GS nor RAS is uniformly better

Intuition:

Use RAS early in the job (be “conservative”),

switch to GS towards the end (be “aggressive”)

Theoretical Scheduling Model

• Multi-waved scheduling of tasks

– Constant wave-width

– Agnostic to fairness policies

– Heavy-tailed (Pareto) distribution of task durations

• Speculation: GS, RAS, Switching, Optimal

Theorem:

Using RAS when >2 waves of tasks remain, and GS when ≤2 waves of tasks remain

is “near-optimal”

How to estimate two remaining waves?

• Wave boundaries are not strict

– Non-uniform task durations

• Wave-width is not constant

Start with RAS and switch to GS close to the

deadline/error-bound

GSRASRAS GSRAS

Learning the switching point

• GS-only and RAS-only job samples

– “Exploration vs. Exploitation”

– Multi-armed bandit solution, ɛ = 0.1

66664444

RAS[4s]+GS[2s]

RAS[5s]+GS[1s]

RAS[6s]

Switch

Deadline5555

GRASS (= GS + RAS) Scheduler

• Opportunity Cost in speculation for stragglers

– GS � Greedy Scheduling

– RAS � Resource Aware Scheduling

• Switch RAS�GS close to deadline/error-bound

– Learn switching point empirically from job samples

• Provably near-optimal in theoretical model

Implementation

• Hadoop 0.20.2 and Spark 0.7.3

– Modified Fair Scheduler

– Job bins with GS-only and RAS-only samples

• Task Estimators

– Remaining time is extrapolated from data-to-process

• progress reports at 5% intervals

– New copy’s time is sampled from completed tasks

How well does GRASS perform?

• Workload from Facebook and Bing traces

– Hadoop and Dryad production jobs

– Added deadlines and error bounds

• Baselines: LATE & Mantri

• 200 node EC2 deployment (m2.2xlarge instances)

Accuracy of deadline-bound jobs

improve by 47%

Gains hold across deadlines (lenient and stringent )

GRASS is 22% better than statically

picking GS or RAS

… and is near-optimal

Error-bound Jobs

• Overall speedup of 38% (optimal is 40%)

– Gains hold across all error bounds

• Exact jobs (0% error-bound) speed up by 34%

Unified Straggler Mitigation

Conclusion

• Next gen. of analytics: Approximate but timely results

• Challenge: Dynamic and unpredictable stragglers

• GRASS – Conservative speculation early in the job;

aggressive towards its end

• Evaluation with Hadoop & Spark

– Accuracy of deadline-bound jobs improve by 47%

– Error-bound jobs speed up by 38%

GRASS : Trimming Stragglers in Approximation …...GRASS : Trimming Stragglers in Approximation...

Documents

GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu

Ecology of Blue Stragglers: Closing Thoughts and Discussion

GRASS: Trimming Stragglers in Approximation Analyticsusers.cms.caltech.edu/~adamw/papers/straggler.pdfSpark [15] (for interactive jobs). We evaluate GRASS using production workloads

Approximation Algorithms. 2301681Approximation Algorithms2 Outlines Why approximation algorithm? Approximation ratio Approximation vertex cover

Tree Trimming

TREEPOWER programsfrom acrossthenation · portanceoftree-trimming,care andplacementtoensuresafety. APPA’sTREEPOWERprogram ... Utah annuallysendsoutflyersontree trimming,“PlantingtheRightTree”

On data skewness, stragglers, and MapReduce …On data skewness, stragglers, and MapReduce progress indicators Emilio Coppa Irene Finocchi Computer Science Department, Sapienza University

Tree Trimming Sydney

Instruction Manual - The Sharper Image · 2016-10-03 · Fine-tune Trimming Forward/Backward trimming Sideward ﬂy trimming Turn left/right trimming MODE 1 Operating direction Hover

Essex Stragglers Orienteering Society (SOS)stragglers.info/newsletter/vol21no9.pdf · Essex Stragglers Orienteering Society (SOS) accredited Newsletter Volume 21 No. 9 ... which is

Trimming Nuclear Excess

Beyond Blue Stragglers - Kepler & K2 Science Center · 2018. 1. 23. · Beyond Blue Stragglers: K2 Observations Reveal Post-Mass-Transfer Binaries Hidden on the M67 Main-Sequence

Bonsai Tools TOOL i RIMMING SCISSORS 17nmm TRIMMING SCISSORS TRIMMING SCISSORS SCISSORS TRIMMING TRIMMING SCISSORS 185m BOW SHEARS BONSAI SHEARS SHEARS BONSAI SHEARS PRUNING

The blue stragglers formed via mass transfer in old open clusters

Self-Calibration And Digital-Trimming Of Successive ... · Successive Approximation Register Analog-to-Digital Converter (SAR ADC) uses binary search algorithm for converting an analog

GRASS : Trimming Stragglers in Approximation Analytics

Form Trimming Systems

Barium Enhanced Blue Stragglers in Open Cluster NGC · PDF fileBarium Enhanced Blue Stragglers in Open Cluster NGC 6819 Katelyn Milliman University of Wisconsin-Madison K. Milliman,

Self-Calibration And Digital-Trimming Of Successive ... · PDF fileSelf-Calibration And Digital-Trimming Of Successive Approximation Analog-To-Digital Converters by ... Charge Scaling

Blue Stragglers