31
1 LCN : Design and Implementation of a LCN : Design and Implementation of a Contention-Aware Scheduler Contention-Aware Scheduler Raptis Dimos – Dimitrios Raptis Dimos – Dimitrios 8 8 th th SFHMMY Conference 2015 SFHMMY Conference 2015 April 4, 2015 April 4, 2015 April 4, 2015 April 4, 2015 1 National Technical University of National Technical University of Athens Athens School of Electrical and Computer Engine School of Electrical and Computer Engine 8 th th SFHMMY Conference SFHMMY Conference

Contention - Aware Scheduling (a different approach)

Embed Size (px)

Citation preview

Page 1: Contention - Aware Scheduling (a different approach)

11

LCN : Design and Implementation of a Contention-LCN : Design and Implementation of a Contention-Aware SchedulerAware Scheduler

Raptis Dimos – DimitriosRaptis Dimos – Dimitrios

88thth SFHMMY Conference 2015 SFHMMY Conference 2015April 4, 2015April 4, 2015

April 4, 2015April 4, 2015 11National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering88thth SFHMMY Conference SFHMMY Conference

Page 2: Contention - Aware Scheduling (a different approach)

22April 4, 2015April 4, 2015 22National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Outline Motivation Background Similar Research Scheduler Overview Classification Scheme Prediction Model Scheduling Algorithm Comparison with Similar Research Conclusion Future Work

88thth SFHMMY Conference SFHMMY Conference

Page 3: Contention - Aware Scheduling (a different approach)

33April 4, 2015April 4, 2015 33National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Motivation

Memory Wall

Protocols

SMPs & CMPs

Multithreaded

Cache Coherency

Programming

Parallel Processing

88thth SFHMMY Conference SFHMMY Conference

Page 4: Contention - Aware Scheduling (a different approach)

44April 4, 2015April 4, 2015 44National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Motivation

Cache Coherence Problems Legacy PC

applications(not benefiting) Applications

benefiting from multithreaded environments

“Embarassingly parallel” applications (GPU etc.)

Leveraging Parallelism

88thth SFHMMY Conference SFHMMY Conference

Page 5: Contention - Aware Scheduling (a different approach)

55April 4, 2015April 4, 2015 55National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Motivation Problems Approaches

Memory Contention Problem

Cache Coherence Problem

Missing existing infrastructure to detect and restrict system resources contention

What if it was not programmer's responsibility to “allocate” resources ?

What if Operating System was responsible for judging applications' parallelism ?

Contention – Aware Scheduling

88thth SFHMMY Conference SFHMMY Conference

Page 6: Contention - Aware Scheduling (a different approach)

66April 4, 2015April 4, 2015 66National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Contention – Aware Scheduling

Classification (based on locality and degree of contention)

Background

Scheduling Algorithm

HPC Monitoring

++ Our approach contains an additional component : a prediction model

88thth SFHMMY Conference SFHMMY Conference

Page 7: Contention - Aware Scheduling (a different approach)

77April 4, 2015April 4, 2015 77National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Similar Research

Various Approaches Simple Heuristic approaches (LLC misses & Memory bandwidth) Stack Distance Profiling approaches Dynamic Scheduling approaches using supervised learning (linear regression, fuzzy-rule models, K-nearest neighbour)

Differences

Simple Heuristic approaches (LLC misses & Memory bandwidth) Stack Distance Profiling approaches Dynamic Scheduling approaches using supervised learning (linear regression, fuzzy-rule models, K-nearest neighbour)

Not covering the whole memory hierarchy

Using additional hardware not available currently in OS

Targeting the same problem from a different view

Pre-defined allocated resources in applications

88thth SFHMMY Conference SFHMMY Conference

Page 8: Contention - Aware Scheduling (a different approach)

88April 4, 2015April 4, 2015 88National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Scheduler Overview

Scheduler Main Components

Classification Scheme

4 categories of applications based on memory hierarchy Prediction Model

prediction of contention in varying resources allocations Scheduling Algorithm

scheduling a workload of applications based on classification scheme (co-scheduling combinations) prediction model (for ideal resource management)

88thth SFHMMY Conference SFHMMY Conference

Page 9: Contention - Aware Scheduling (a different approach)

99April 4, 2015April 4, 2015 99National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Classification Scheme 4 main categories of applications

L LC

C N

88thth SFHMMY Conference SFHMMY Conference

Page 10: Contention - Aware Scheduling (a different approach)

1010April 4, 2015April 4, 2015 1010National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Classification Scheme Co-scheduling interference

N - * : no interference L - L : contention on same resource, bandwidth “divided” L - C : contention in different resources

severe performance degradation in C no impact in L

L - LC : performance degradation for both LC faces bigger degradation than L

LC - LC : contention in 2 resources (memory link and LLC) Both have degradation but in low levels

LC - C : mediocre contention, mainly in C C - C : most difficult to predict - based on data access

patterns (MESI protocol)

88thth SFHMMY Conference SFHMMY Conference

Page 11: Contention - Aware Scheduling (a different approach)

1111April 4, 2015April 4, 2015 1111National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Classification Scheme Co-scheduling interference

Analysis from workload of 16 applications

4 applications belonging to each

class

Co-scheduling of all possible

combinations

Average slowdown calculated for each

combination Table : Average slowdown in co-execution

88thth SFHMMY Conference SFHMMY Conference

Page 12: Contention - Aware Scheduling (a different approach)

1212April 4, 2015April 4, 2015 1212National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Classification Scheme Classification tree

88thth SFHMMY Conference SFHMMY Conference

Page 13: Contention - Aware Scheduling (a different approach)

1313April 4, 2015April 4, 2015 1313National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model

Linear Regression Model Target : Prediction of scaling

possess HPC monitored for 1 core allocation capability to predict scaling for any possible allocation use of threshold value for defining optimal scaling

Use the suitable counters for each class

Class L : memory link (bandwidth) Class LC : LLC reuse (MESI protocol) Class C : L2 and LLC reuse (MESI protocol) Class N : private part of memory hierarchy

88thth SFHMMY Conference SFHMMY Conference

Page 14: Contention - Aware Scheduling (a different approach)

1414April 4, 2015April 4, 2015 1414National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model L class

Rp = (Mem1 p)/(Maximum Memory Bandwidth)∗

poptimum = max{p}, Rp < 1.15

LC class Completion(LC) = 0.01799 ∗ fLC + 0.50119 (p = 2cores)

= 0.02516 ∗ fLC + 0.34286 (p = 3 cores)

= 0.02846 ∗ fLC + 0.26028 (p = 4 cores)

= 0.03199 ∗ fLC + 0.21584 (p = 5 cores)

= 0.03404 ∗ fLC + 0.18296 (p = 6 cores)

= 0.03621 ∗ fLC + 0.16410 (p = 7 cores)

= 0.03751 ∗ fLC + 0.13969 (p = 8 cores)

Ideal_Completionp = 1/p , fLC = L2 RFO Requests/(L3 reuse*105)

Rp = (Ideal_Completionp /Completionp ) 100∗

poptimum = max{p}, Rp > 70

88thth SFHMMY Conference SFHMMY Conference

Page 15: Contention - Aware Scheduling (a different approach)

1515April 4, 2015April 4, 2015 1515National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model C class

Completion(C) = 0.3447 ∗ fC + 0.4947 (2cores)

= 0.46974 ∗ fC + 0.34415 (p = 3 cores)

= 0.5155 ∗ fC + 0.2478 (p = 4 cores)

= 0.63609 ∗ fC + 0.22492 (p = 5 cores)

= 0.61403 ∗ fC + 0.18127 (p = 6 cores)

= 0.65915 ∗ fC + 0.15864 (p = 7 cores)

= 0.6095 ∗ fC + 0.1263 (p = 8 cores)

Ideal_Completionp = 1/p , fC = (L2 Shared*104)/Inst.Retired

Rp = (Ideal_Completionp /Completionp ) 100∗

poptimum = max{p}, Rp > 70

N class Completion(N)p = Completion_idealp

poptimum = max{p}

88thth SFHMMY Conference SFHMMY Conference

Page 16: Contention - Aware Scheduling (a different approach)

April 4, 2015April 4, 2015 National Technical University of AthensNational Technical University of Athens 1616

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model Example

L class LC class Mem1 = 4GB/secMemmax = 13.5 GB/secR1 = 4/13.5 = 0.29R2 = (4*2)/13.5 = 0.59R3 = (4*3)/13.5 = 0.88R4 = (4*4)/13.5 = 1.185R5 = (4*5)/13.5 = 1.48R6 = (4*6)/13.5 = 1.77R7 = (4*7)/13.5 = 2.07R8 = (4*8)/13.5 = 2.37

poptimum = 3 cores

RFO1 = 319106 per second , L3 reuse = 1.51f

LC = 319106/(1.51*105) = 2.10

Completion(LC)2 = 0.01799*2.10 + 0.50119 = 0.53 → R2=0.5/0.53*100= 92.792.7Completion(LC)3 = 0.02516*2.10 + 0.34286 = 0.39 → R3=0.33/0.39*100= 84.284.2Completion(LC)4 = 0.02846*2.10 + 0.26028 = 0.32 → R4=0.25/0.32*100= 78.078.0Completion(LC)5 = 0.03199*2.10 + 0.21584 = 0.28 → R5=0.2/0.28*100= 70.670.6Completion(LC)6 = 0.03404*2.10 + 0.18296 =0.25 → R6=0.166/0.25*100= 65.465.4Completion(LC)7 = 0.03621*2.10 + 0.16410 = 0.24 → R7=0.142/0.24*100= 59.059.0Completion(LC)8 = 0.03751*2.10 + 0.13969 = 0.21 → R8=0.125/0.21*100= 57.157.1

Poptimum = 5 cores

88thth SFHMMY Conference SFHMMY Conference

Page 17: Contention - Aware Scheduling (a different approach)

1717April 4, 2015April 4, 2015 1717National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model

Evaluation – Verification

Relative Errors in Predictions of C class

88thth SFHMMY Conference SFHMMY Conference

Page 18: Contention - Aware Scheduling (a different approach)

1818April 4, 2015April 4, 2015 1818National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model

Evaluation – Verification

Relative Errors in Predictions of LC class

88thth SFHMMY Conference SFHMMY Conference

Page 19: Contention - Aware Scheduling (a different approach)

1919April 4, 2015April 4, 2015 1919National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model

LC - C prediction model improvement Integration of 7 relationships to a single one Coefficients follow logarithmic trendline Results after analysis

Completion(LC)p = [0.0139536 log(p) + 0.0090562] f∗ ∗ LC + [−0.252533 log(p) + 0.6407058]∗

Completion(C)p = [0.2151318 log(p) + 0.2239032] f∗ ∗ C + [−0.25468 log(p) + 0.6397947]∗

Ideal_Completionp = 1/p

Rp = (Ideal_Completionp /Completionp ) 100∗

poptimum = max{p}, Rp > 70

88thth SFHMMY Conference SFHMMY Conference

Page 20: Contention - Aware Scheduling (a different approach)

2020April 4, 2015April 4, 2015 2020National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Prediction Model

Evaluation – Verification of Refinement

Deviation in C coefficients Deviation in LC coefficients

88thth SFHMMY Conference SFHMMY Conference

Page 21: Contention - Aware Scheduling (a different approach)

Prediction Model School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

April 4, 2015April 4, 2015 National Technical University of AthensNational Technical University of Athens 2121

Experimentation Platform cores : 8 L1D,I: 32KB

8-way L2 : 256KB

8-way L3 : 16 MB

16-way 64bytes line Mem :64GB

DDR3 1.3GHZ

Debian 6.06

*(Prediction Model also tested on Nehalem architecture)

88thth SFHMMY Conference SFHMMY Conference

Page 22: Contention - Aware Scheduling (a different approach)

2222April 4, 2015April 4, 2015 2222National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Scheduling Algorithm

Executed after first 2 steps are finished for each application Step 1 has classified each application Step 2 has predicted the optimum number of cores that

should be allocated by the scheduler to each application The algorithm tries to co-schedule the applications in pairs

so that Sum of cores does not exceed package cores Contention is avoided as much as possible

(using conclusions from Classification step) The approach can be extended for co-execution of more

than 2 applications N applications are allocated half cores and scheduled twice

(their profile implies that they are not affected by this)

88thth SFHMMY Conference SFHMMY Conference

Page 23: Contention - Aware Scheduling (a different approach)

2323April 4, 2015April 4, 2015 2323National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Scheduling Algorithm Lists of applications separated by class : L, LC, C, N while(N not empty){

x = current N application ;y = popMatchFromTheEnd(C, L, LC, N);coschedule(x, y);

}while( LC not empty){

x = current LC application;y = popMatchFromTheEnd(C, LC, L);coschedule(x, y);

}while(L not empty){

x = current L application;y = popMatchFromTheEnd(L);coschedule(x, y);

}while(C not empty){

x = current C application;y = popMatchFromTheEnd(C);coschedule(x, y);

}scheduleRemainingApplications();

88thth SFHMMY Conference SFHMMY Conference

Page 24: Contention - Aware Scheduling (a different approach)

2424April 4, 2015April 4, 2015 2424National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Comparison with Similar Research The other state-of-the-art schedulers

Sorting by heuristic Distributing load Combining

application from the top with application from the bottom

LLC – MRB

LLC misses LBB

memory bandwidth

88thth SFHMMY Conference SFHMMY Conference

Page 25: Contention - Aware Scheduling (a different approach)

2525April 4, 2015April 4, 2015 2525National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Comparison with Similar Research

Experiments – Comparison Process Linux CFS, LCN, LLC-MRB, LLB to be compared Workload of 17 applications (equally shared among classes) Whole workload executed for 1 hour Time quantums of 1 second defined in all schedulers When application finishes, it gets respawn to re-execute Comparison between schedulers with 2 criteria

Throughput Total number of executions of all applications Number of improved applications

Fairness Standard Deviation between gain of each application

*gain compared to Gang scheduler

88thth SFHMMY Conference SFHMMY Conference

Page 26: Contention - Aware Scheduling (a different approach)

2626April 4, 2015April 4, 2015 2626National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Comparison with Similar Research

Most Improved Applications

Linux : 5LLC – Balance : 7MEM-Balance : 5LCN : 8

88thth SFHMMY Conference SFHMMY Conference

Page 27: Contention - Aware Scheduling (a different approach)

2727April 4, 2015April 4, 2015 2727National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Comparison with Similar Research

Criteria :

- Throughput LCNLCN- Fairness LLC-Balance LLC-Balance **

* fairness can be * fairness can be misinterpretedmisinterpreted

88thth SFHMMY Conference SFHMMY Conference

Page 28: Contention - Aware Scheduling (a different approach)

2828April 4, 2015April 4, 2015 2828National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Comparison with Similar Research

Major Drawbacks of other schedulers Linux Scheduler CFS

Cannot locate contention Does not identify threads of the same application

parallelism benefits lost MEM - Balance Scheduler

Uses over-generic heuristic Does not take into account all memory hierarchy parts

LLC - Balance Scheduler Cannot differentiate between class N and C applications,

since they both exhibit low LLC misses Results co-scheduling L with C applications → contention

88thth SFHMMY Conference SFHMMY Conference

Page 29: Contention - Aware Scheduling (a different approach)

2929April 4, 2015April 4, 2015 2929National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Conclusion Proposed contention-aware schedulers that

Does not require additional OS hardware adjustments Simple, easily integratable as component in modern OS Consisted of 3 parts

Compared to other state-of-the-art schedulers and the CFS Presents the best throughput Presents equal fairness to CFS

(and lower than the other contention-aware schedulers)

Can be integrated to real-life scheduling with 2 approaches: Applications executed when inserted in queue for 2-3 quantums Start scheduling and monitoring simultaneously (dynamic adaptation)

88thth SFHMMY Conference SFHMMY Conference

Page 30: Contention - Aware Scheduling (a different approach)

3030April 4, 2015April 4, 2015 3030National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

Future Work Major Improvements Improvements in the prediction model

Stepwise regresion models to add more variables Decrease error Caution : limitation in number of monitored counters

Other methods, such as machine learning Investigation of added overhead Extension of approach to NUMA architectures

Implemented and tested for 1 package only Extensible to multiple packages, using thread migrations

• Initially try to allocate threads of the same application in the same package

• Thread migrations executed when class change is observed along with memory migrations

88thth SFHMMY Conference SFHMMY Conference

Page 31: Contention - Aware Scheduling (a different approach)

3131April 4, 2015April 4, 2015 3131National Technical University of AthensNational Technical University of Athens

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

THE END

Thank you !!!

Any Questions ??

88thth SFHMMY Conference SFHMMY Conference