25
Understanding Priority-Based Scheduling of Graph Algorithms on a Shared-Memory Platform Serif Yesil * , Azin Heidershenas * , Adam Morrison + , and Josep Torrellas * * University of Illinois at Urbana-Champaign and + Tel Aviv University November 20, 2019 Priority-Based Scheduling of Graph Algorithms November 20, 2019 1 / 28

Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Understanding Priority-Based Scheduling of Graph

Algorithms on a Shared-Memory Platform

Serif Yesil∗, Azin Heidershenas∗, Adam Morrison+, and Josep Torrellas∗

∗University of Illinois at Urbana-Champaign and +Tel Aviv University

November 20, 2019

Priority-Based Scheduling of Graph Algorithms November 20, 2019 1 / 28

Page 2: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Task Based Graph Processing

Many graph processing frameworks use a task based model

Computation is broken down into dynamically created tasks

Easy to program

Priority-Based Scheduling of Graph Algorithms November 20, 2019 2 / 28

Page 3: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Unordered Task Execution

Tasks can be processed in any order

Correct result regardless of execution order

Easy to parallelize

Problem: may cause unnecessary work

1 Vertex 4 is scheduled

2 Vertex ∞ becomes 7

3 Vertex 1 is scheduled

4 Vertex 7 becomes 2

Update in step 2 is unnecessary

SSSP Application

0

1 4

1

4

1

3

Scheduler

Graph

72

Priority-Based Scheduling of Graph Algorithms November 20, 2019 3 / 28

Page 4: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Adding Task Priorities and Using a Priority Scheduler

Each task becomes associated with a priority

Tasks are sorted by priorities in the scheduler

For SSSP priority=distance, ordered in ascendingorder

1 Vertex 1 is scheduled

2 Vertex ∞ becomes 2

0

1 4

2

1

2

1

3

Priority Scheduler

Graph

14

Priority-Based Scheduling of Graph Algorithms November 20, 2019 4 / 28

Page 5: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Fundamental Trade-off in Concurrent Priority Schedulers:Synchronization vs. Useless Work

Scheduling in strict priority order causes synchronization overhead

Solution: Use Relaxed Concurrent Priority Scheduler (CPS)

Highly relaxed priority processing → more useless work

Priority-Based Scheduling of Graph Algorithms November 20, 2019 5 / 28

Page 6: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Contributions

Provide an extensive empirical analysis of existing CPS designs

Identify shortcomings of high performance CPS designs

Propose a CPS approach (PMOD) to provide robust performance automatically

Priority-Based Scheduling of Graph Algorithms November 20, 2019 6 / 28

Page 7: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Types of Concurrent Priority Schedulers

Combined-Priority Queue CPSs: Each queue has multiple prioritiesSprayLists (SL) – Global Queue [Alistarh et.al.]Multiqueues (MQ) – Distributed Queues [Rihani et.al.]Remote Enqueue Local Dequeue (RELD) – Distributed Queues [Jeffrey et.al.]

Per-Priority Queue CPSs: Each priority in a different queueOrdered-By Integer Metric (obim) [Nguyen et.al.]

Priority-Based Scheduling of Graph Algorithms November 20, 2019 8 / 28

Page 8: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

SprayList

A global queue design

Tasks are stored in a lock-free skip list, sorted in priority order

To minimize synchronization, dequeues retrieve a random high priority task

Done with a short random walk

Priority-Based Scheduling of Graph Algorithms November 20, 2019 9 / 28

Page 9: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Distributed Queues

MultiQueues

Create k queues per core (Total:k × p)

Enqueue to a random queue

Dequeues retrieve the highest prioritytask from two random queues

c0 q0 q4 c1 q1 q5

c2 q2 q6

CoreQueue

c3 q3 q7

Enq

Enq

<Deq

Remote Enqueue Local Dequeue

One queues per core (Total:p)

Enqueue to a random queue

Dequeues go to the local queue

c0 q0Deq

c1 q1

c2 q2 c3 q3

CoreQueue

Enq

Enq

Priority-Based Scheduling of Graph Algorithms November 20, 2019 10 / 28

Page 10: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

OBIM Chunking

Core

Local Map

Per-PriorityQueue

Global Map

c0 c1 c2 c3

2 5 2 2 7 5 7

q2q2 q5 q7

2 5 7

cx

Creates a queue for each priority

Uses a global map for ordering

Access bottleneck

A local map caches a snapshot of theglobal map in each core

When needed, local maps are refreshedfrom the global map

To reduce contention, a core enqueuesand dequeues a chunk of tasks at atime

Priority-Based Scheduling of Graph Algorithms November 20, 2019 11 / 28

Page 11: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Minimizing Useless Work vs. Minimizing Communication

Minimize Useless Work

Combined-priority queues

SL and MQ relaxation guarantees

Retrieve tasks close to priority order

Invest in synchronization andcommunication

Higher enqueue and dequeue cost

Minimize Communication

Per-priority queues: obim

Emphasis on reducing enqueuedequeue cost

Coarse-grain enqueue/dequeueoperations

Prone to useless work

Priority-Based Scheduling of Graph Algorithms November 20, 2019 12 / 28

Page 12: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

CPS Performance Comparison

How well do relaxed CPSs work in practice?

Useless work behavior

Performance of CPS operations

Priority-Based Scheduling of Graph Algorithms November 20, 2019 14 / 28

Page 13: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Types of Execution Cycles in an Application

Work Cycles

Good Work (GWork): The work ends up being useful.Useless Work (UWork): The work later proves useless.

CPS Operation Cycles

Enqueue (Enq)Dequeue (Deq)Failed Dequeue (FDeq): Attempting and failing to dequeue

Other

Other: Executing framework code.

Priority-Based Scheduling of Graph Algorithms November 20, 2019 15 / 28

Page 14: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Analysis Setup

Galois framework

5 applications

Single source shortest path(SSSP)Breadth first search (BFS)PageRank (PR)Minimum spanning tree (MST)A* (A*)

5 inputs graphs

Graph # Verts # Edges

USA roads (rUSA) 24 M 58 MWest USA roads (rW ) 6 M 15 MTwitter40 (tw) 42 M 1469 MWeb-Google (wg) 875 K 5 MSoc-LiveJournal1 (lj) 5 M 69 M

Priority-Based Scheduling of Graph Algorithms November 20, 2019 16 / 28

Page 15: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Speedups: Combined-Priority Queues vs. Per-Priority Queues

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimRELDMQSL

SSSP with tw

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimRELDMQSL

BFS with rUSA

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimRELDMQSL

PR with wg

Per-priority queue performs better than combined priority queues

Priority-Based Scheduling of Graph Algorithms November 20, 2019 17 / 28

Page 16: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Cycles: Combined-Priority Queues vs. Per-Priority Queue CPSs

obim

RELD MQ SL

0

25

50

75

100Cy

cles (

%)

SSSP tw

obim

RELD MQ SL

BFS rUSA

obim

RELD MQ SL

PR wg

OtherFDeqDeqEnqGWorkUWork

Small amount of useless workCombined-priority queues spend upto 90% of cycles on CPS operations

Priority-Based Scheduling of Graph Algorithms November 20, 2019 18 / 28

Page 17: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Per-Priority Queues: Effect of Chunking

1 5 10 20 30 40# Threads

15

10

20

30

40Sp

eedu

p

c0c16c64c256

BFS with rUSAc0⇒no chunks

c16⇒chunk size 16c64⇒chunk size 64

c256⇒chunk size 256

Enqueueing and dequeueing a chunk of tasks isvery beneficial

Chunking decreases the Deq time.

No chunks: Higher DeqLarge chunk sizes: Higher FDeq

Priority-Based Scheduling of Graph Algorithms November 20, 2019 19 / 28

Page 18: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Per-Priority Queues: Effect of Priority Distribution

Different algorithms create different priority distributions

BFS: Close by prioritiesSSSP: Dispersed priorities

Dispersed priorities lowers the performance of per-priority queue CPSs

Create many per-priority queues with few tasks

Priority-Based Scheduling of Graph Algorithms November 20, 2019 20 / 28

Page 19: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Ad-Hoc Solution: Manually Merging Priorities

Different priorities are grouped together in the same per-priority queue

Merging Factor(∆):

Range of priorities that map to the same queueBest ∆ parameter is input dependent

Used in SSSP application:

Optimal ∆ for SSSP with rUSA is ∆ = 14 ⇒ Range=[0− 214)Optimal ∆ for SSSP with tw is ∆ = 0 ⇒ No priority merging

Priority-Based Scheduling of Graph Algorithms November 20, 2019 21 / 28

Page 20: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Effect of Merging Factor for SSSP

1 5 10 20 30 40# Threads

15

10

20

30

40Sp

eedu

p

Δ10Δ14Δ18

SSSP with rUSA

∆10 ⇒ Merging Factor=10∆14 ⇒ Merging Factor=14

∆18 ⇒ Merging Factor=18

Priority merging has significant impact

Too-small ∆: Dequeue has high overhead

Too-large ∆: High useless work

Optimal ∆: 20-35% of cycles spent for uselesswork

Priority-Based Scheduling of Graph Algorithms November 20, 2019 22 / 28

Page 21: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Proposed CPS: Priority Merging on Demand (PMOD)

Too many per-priority queues → higher enqueue and dequeue times

Too few per-priority queues → more useless work

Idea: Dynamically detect inefficient execution and tune Merging Factor

Too many per-priority queues → Merge priorities (increase ∆)Too few per-priority queues → Unmerge priorities (decrease ∆)

Priority-Based Scheduling of Graph Algorithms November 20, 2019 24 / 28

Page 22: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Detect and Adjust Merging Factor

Too many per-priority queues → Merge priorities (increase ∆)

1 Trigger: Frequent global/local map synchronizations2 Check: Density of per-priority queues3 Increase merging factor

Too few per-priority queues → Unmerge priorities (decrease ∆)

1 Trigger: Too many tasks dequeued from a per-priority queue2 Check: Number of per-priority queues and priority range3 Decrease merging factor

PMOD challenge: Priority ordering with different merging factors

Details in the paper...

Priority-Based Scheduling of Graph Algorithms November 20, 2019 25 / 28

Page 23: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

PMOD Performance

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimPMOD

SSSP with rUSA

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimPMOD

SSSP with tw

Same performance as obim without manual/application specific tuningMerging factor converges to 13 for SSSP rUSA (∆ = 14)

Priority-Based Scheduling of Graph Algorithms November 20, 2019 26 / 28

Page 24: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

PMOD Performance

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimPMOD

BFS with rUSA

1 5 10 20 30 40# Threads

15

10

20

30

40

Spee

dup

obimPMOD

BFS with tw

Same performance as obim without manual/application specific tuningMerging factor doesn’t increase when not needed: BFS

Priority-Based Scheduling of Graph Algorithms November 20, 2019 27 / 28

Page 25: Understanding Priority-Based Scheduling of Graph Algorithms on …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_sc19.pdf · 2019-12-10 · Understanding Priority-Based Scheduling

Conclusions

Fundamental trade-off between useless work and synchronization overhead

Current CPSs often incur large enqueue/dequeue overheads

Per-priority CPSs that minimize CPS operation cost give better performance

But performance is application & input dependent

PMOD provides high performance automatically

Same performance as per-application manually optimized application with obim

In the paper

Performance sensitivity to chunk sizes & priority mergingobim performance sensitivity to priority rangesMore applications and insights

Priority-Based Scheduling of Graph Algorithms November 20, 2019 28 / 28