29
CLARINET: WAN-Aware Optimization for Analytics Queries Presented By Robert Claus

Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

CLARINET: WAN-Aware Optimization for Analytics Queries

Presented By Robert Claus

Page 2: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Agenda

1. The Problem2. Clarinet3. Optimizing WAN Queries4. Results

Page 3: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Agenda

1. The Problem2. Clarinet3. Optimizing WAN Queries4. Results

Page 4: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Low Application Latency Requires Localized Servers

Servers must be close to clients for latency.

Wide Area Networks (WANs) are necessary.

Collecting data into a central datastore for analytics is costly and slow.

Page 5: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Geode Focused On Execution

Previous work focused on executing queries smartly.

Caching / Sending Deltas

Choosing efficient distributed join algorithms

Minimizing bandwidth rather than optimizing performance

Allowing servers to adjust their sub-query execution plans

Page 6: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Wide Area Networks Are Heterogeneous

Sites may have different data available.

Links vary by 20x in latency.

Link properties are relatively constant.

Bandwidth is finite.

Page 7: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Example Query Planned Sub-optimally

Hash Join Results

Select Results

Page 8: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Central Planning Is Necessary

Execution plans limit flexibility during execution.

Need to consider the network before the execution plan.

Page 9: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Agenda

1. The Problem2. Clarinet3. Optimizing WAN Queries4. Results

Page 10: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Clarinet Focuses on Planning

Clarinet adds network considerations into logical query plan optimization.

Allows global optimization across queries.

Introduces optimizations not possible at execution stage.

Optimize execution time rather than resource usage.

Page 11: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Combining Optimization and Scheduling

Page 12: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Agenda

1. The Problem2. Clarinet3. Optimizing WAN Queries4. Results

Page 13: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Optimizing WAN Queries Is Hard

There are too many options to optimize in absolute terms

Breaking queries into sub-queries

Where each subquery will be run

How each subquery will be run

Network properties are a shared resource across all queries

Page 14: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Heuristic Optimization Algorithm

1. Assign where tasks run first:a. Place tasks with no dependencies (Mappers) where the data is.b. Just optimize where dependant tasks (Reducers) run based on network capacity.

i. Also consider just putting all reducers on the node with the most mappers.

2. Estimate how long each DAG should take:a. Insert “shuffle” nodes into the DAG whenever data is moved over the network.

i. Network propertiesii. Currently running tasks

b. Calculate the total length the DAG will take using a LP.

Page 15: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Example Query Planning

Hash Join

Select A=1 Select A=1

Scan SS Scan WS

Broadcast Join

Select A=1

Scan CS

Page 16: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 DC1

DC3

Assign Mappers

Hash Join

Select A=1 Select A=1

Scan SS Scan WS

Broadcast Join

Select A=1

Scan CS

Page 17: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 Work DC1 Work

DC3 Work

Compress Compute Operators

Hash Join

Broadcast Join

Page 18: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 Work DC1 Work

DC3 Work

Compress Compute Operators

Hash Join

Broadcast JoinOn what server do these operators take place?

Page 19: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 Work DC1 Work

DC3 Work

Compress Compute Operators

Hash Join

Broadcast JoinOn what server do these operators take place?

200 GB80 Gbps

100 Gbps to DC1or40 Gbps to DC2

200 GB80 GbpsOR

Page 20: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 Work DC1 Work

DC3 Work

Compress Compute Operators

Hash Join

Broadcast JoinOn what server do these operators take place?

200 GB80 Gbps

100 Gbps to DC1or40 Gbps to DC2

200 GB80 GbpsOR

Page 21: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Shuffle Operators

Shuffle OperatorOperation onServer 1

Operation onServer 2

Data on Server 1

Data on Server 2

This operation’s cost can be estimated from the volume of data and network bandwidth.

Page 22: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 Work DC1 Work

DC3 Work

Introduce “Shuffle” Operators

Hash Join

80 Gbps

Broadcast Join

100 Gbps

Page 23: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

DC2 Work DC1 Work

DC3 Work

Compute Cost Estimate

Hash Join

80 Gbps

Broadcast Join

100 Gbps

120s

60s

60s180s

120s

120s

60s

Page 24: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Dynamically Scheduling Resources

Allow scheduling tasks from any of the next k queries if resources available.

Efficiently uses available resources.

k must be tuned to avoid over-scheduling tasks with no dependencies.

Queries selected based on relative deadline proximity.

Page 25: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Agenda

1. The Problem2. Clarinet3. Optimizing WAN Queries4. Results

Page 26: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Running Time Improved

Page 27: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Network Usage Improved

Page 28: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Other Performance Features

Multi Query Optimization

60% of queries run in batches ended up with different plans.

Resource Fragmentation

Network links are fallow less than 3% of the time.

Optimization Time

Approximately 10 seconds

Page 29: Optimization for Analytics Queries CLARINET: WAN-Awarepages.cs.wisc.edu/~shivaram/cs744-slides/cs744-robert-clarinet.pdf · Clarinet Focuses on Planning Clarinet adds network considerations

Questions?