Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
CS528
Intro Param Ishan and
Task Scheduling (Part I)
A Sahu
Dept of CSE, IIT Guwahati
1A Sahu
Outline
• Scheduling Concepts
• Independent Tasks, Dependent Tasks
A Sahu
PARAM ISHANPARAM ISHANPARAM ISHANPARAM ISHAN
• 250 Teraflops Peak computing facility• Total 162 compute Nodes• 2 Master Nodes• 4 Login Nodes• Mellanox FDR (56Gbps) 324 port chassis
switch as primary high speed interconnect• 300TB Storage with 15GB/s write throughput
based on lustre parallel file system
System Overview
Schematic Diagram
Compute NodesCompute NodesCompute NodesCompute Nodes
� 16 nodes� 384 cpu cores� 2 x Intel Xeon E5-2680 v3, 12-
core, 2.5 GHz processors per node
� 64 GB of physical memory per node
� GPU accelerator 2 x NVIDIA Tesla K40 per node
� Compute power of 60 Tfops/s
� 16 nodes� 384 cpu cores� 2 x Intel Xeon E5-2680 v3, 12-
core, 2.5 GHz processors per node� 64 GB of physical memory per
node� MIC accelerator 2x Intel Xeon Phi
7120 per node� Compute power of 47.36 Tfops/s
� 126 nodes� 3024 cores� 2 x Intel Xeon E5-2680 v3, 12-
core, 2.5 GHz processors per node
� 64 GB of physical memory per node
� Compute power of 121 Tflops
� 4 nodes� 96 cores� 2 x Intel Xeon E5-2680 v3, 12-
core, 2.5 GHz processors per node� 512 GB of physical memory per
node� Compute power of 3.8 Teraflops
Compute without any AcceleratorHigh Memory Compute Nodes without
any Accelerator
Compute Nodes with GPU Compute Nodes with Xeon Phi
Software StackSoftware StackSoftware StackSoftware Stack
HPC
Programming
Tools
Application Libraries Ferret/GRADS/PARAview
Development Tools Intel Cluster
Studio 2016
GNU
(GCC 4.4.7 & 5.2)
Driver/System
Libraries Intel MPSS 3.6.1 CUDA 7.5 Mellanox OFED 2.4-1.0.4
Resource
Management/
Job Scheduling
SLURM 15.08.6
Middleware
Applications
and
Management
File System NFS Local FS
(ext3, ext4, XSF) Lustre 2.5
Provisioning Bright Cluster Manager 7.2
Cluster Monitoring Bright Cluster Manager 7.2
Remote Power
Mgmt RMM4
Remote Console
Mgmt RMM4
Operating
System CentOS 6.6
HPC
Programming
Tools
Application Libraries Ferret/GRADS/PARAview
Development Tools Intel Cluster
Studio 2016
GNU
(GCC 4.4.7 & 5.2)
Driver/System
Libraries Intel MPSS 3.6.1 CUDA 7.5 Mellanox OFED 2.4-1.0.4
Resource
Management/
Job Scheduling
SLURM 15.08.6
Middleware
Applications
and
Management
File System NFS Local FS
(ext3, ext4, XSF) Lustre 2.5
Provisioning Bright Cluster Manager 7.2
Cluster Monitoring Bright Cluster Manager 7.2
Remote Power
Mgmt RMM4
Remote Console
Mgmt RMM4
Operating
System CentOS 6.6
HPC scheduling : Large Scale� When
� number of node is 162 and number of cores in system is 162*24=3888 cores
� Number of Jobs and users around 1000� Manual scheduling and Gant chart
depiction is not possible� SLURM : Simple Linux Resource
Management uses SQL data base to store Gant chart and scheduling
File-Systems� Home
− 100TB lustre based Storage
− 30GB default quota
� Scratch− 10GB/sec write throughput
− Users are recommended to use this file-
system during execution of their job
− They must transfer back their data to home
file-system
� Archive− Policy based movement of Home file-system
data to archive filesystem
Access to Cluster
� ssh to param-ishan.iitg.ernet.in�
� Users will get one login node out of 4 login nodes in round robin fashion
� For GPU jobs ssh to GPU login node� For Intel Xeon Phi/MIC jobs ssh to MIC
login node�
cpu-login1 cpu-login2 gpu-login mic-login
A Sahu
Google “Scheduling Algorithm Brucker pdf” to get
a PDF copy of the Book
• Find time slots in which activities (or jobs)
should be processed under given constraints.
• Constraints
– Resource constraints
– Precedence constraints between activities.
• A quite general scheduling problem is
– Resource Constrained Project Scheduling Problem
(RCPSP)
A Sahu
• We have
– Activities j = 1, ... , n with processing times pj.
– Resources k = 1, ... , r. A constant amount of Rk units
of resource k is available at any time.
– During processing, activity j occupies rjk units of
resource k for k = 1, ... , r.
– Precedence constrains i → j between some activities
i, j with the meaning that activity j cannot start
before i is finished..
A Sahu
• Objective : Determine starting times Sj for all
activities j in such a way that
– at each time t the total demand for resource k is
not greater than the availability Rk for k = 1, ... , r,
– the given precedence constraints are fulfilled, i. e.
Si+ pi ≤ Sj if i → j ,
A Sahu
• Some objective function f(C1, ... , Cn) is
minimized where Cj = Sj + pj is the completion
time of activity j.
• The fact that activities j start at time Sj and
finish at time Sj + pj implies that the activities j
are not preempted.
• We may relax this condition by allowing
preemption (activity splitting).
A Sahu
• Consider a project with n = 4 activities, r = 2
• resources with capacities R1 = 5 and R2 = 7,
• A precedence relation 2 → 3 and the following data:
i 1 2 3 4
pi 4 3 5 8
ri1 2 1 2 2
ri2 3 5 3 4
2 → 3
A corresponding schedule with minimal makespanTime
2
2
3
3
R2=7
R1=5
4
4
1
1
A Sahu
• Production scheduling
• Robotic cell scheduling
• Computer Processor scheduling
• Timetabling
• Personnel scheduling
• Railway sc
• Air traffic control, Etc.
A Sahu
• Most machine scheduling problems are special
cases of the RCPSP.
– Single machine problems,
• Online Problem: FCFS, SJF, SRF, RR…
– Parallel machine problems, and
– Shop scheduling problems, etc.
A Sahu
• We have n jobs j =1, ... , n to be processed on
a single machine. Additionally precedence
constraints between the jobs may be given.
• This problem can be modeled by an RCPSP
with r = 1, R1 = 1, and rj1 = 1 for all jobs j.
A Sahu
• P: We have jobs j as before and m identical
machines M1, ... , Mm .
• The processing time for j is the same on each
machine.
• One has to assign the jobs to the machines
and to schedule them on the assigned
machines.
• This problem corresponds to an RCPSP with r
= 1, R1 = m, and rj1 = 1 for all jobs j.
A Sahu
1
2
3
4
8
5
6 7
A Sahu
2 6 7
1 3
4 8
5
M1
M2
M3
0 1 2 3 4 5 6 7 8 9
• Q: The machines are called uniform if pjk = pj/rk.
• R: For unrelated machines the processing time pjk depends on the machine Mk on which j is processed.
• MPM: In a problem with multi-purpose machines a set of machines µj is isassociated with each job j indicating that j can be processed on one machine in µj
only.
A Sahu
Parallel Machines
Ti P1 P2 P3 P4
T1 10 10 10 10
T2 12 12 12 12
T3 16 16 16 16
T4 20 20 20 20
Ti P1 P2 P3 P4
T1 10 15 20 25
T2 12 18 24 30
T3 16 24 32 40
T4 20 30 40 50
Ti P1 P2 P3 P4
T1 10 8 12 2
T2 12 28 25 13
T3 16 4 32 14
T4 20 38 42 22
Q: Uniform : with
speed difference
(S1=1, S2=2/3,
S3=1/2, S4=2/5
P: Identical
R: Unrelated :
heterogeneous
Classes of scheduling problems can be specified
in terms of the three-field classification
α | β | γwhere
• α specifies the machine environment,
• β specifies the job characteristics, and
• γ describes the objective function(s).
A Sahu
If the number of machines is fixed to m we write
Pm, Qm, Rm, MPMm, Jm, Fm, Om.
Symbol Meaning
1 Single Machine
P Parallel Identical Machine
Q Uniform Machine
R Unrelated Machine
MPM Multipurpose Machine
J Job Shop
F Flow Shop
A Sahu
Symbol meaning
pmtn preemption
rj release times
dj deadlines
pj = 1 or pj = p or
pj ∈ {1,2}
restricted processing times
prec arbitrary precedence constraints
intree (outtree) intree (or outtree) precedence
chains chain precedence
series-parallel a series-parallel precedence graph
A Sahu
Two types of objective functions are most
common:
• bottleneck objective functions
max {fj(Cj) | j= 1, ... , n}, and
• sum objective functions Σ Σ Σ Σ fj(Cj) = f1(C1) +
f2(C2) + ... ... + fn(Cn) .
Cj is completion time of task j
A Sahu
• Cmax and Lmax symbolize the bottleneck
objective
– Cmax objective functions with fj(Cj) = Cj (makespan)
– Lmax objective functions fj(Cj) = Cj - dj (maximum
Lateness)
• Common sum objective functions are:
– Σ Σ Σ Σ Cj (mean flow-time)
– Σ Σ Σ Σ ωωωωj Cj (weighted flow-time)
A Sahu
• Σ Σ Σ Σ Uj (number of late jobs) and Σ Σ Σ Σ ωωωωj Uj
(weighted number of late jobs) where Uj = 1 if
Cj > dj and Uj = 0 otherwise.
• Σ Σ Σ Σ Tj (sum of tardiness) and Σ Σ Σ Σ ωωωωj Tj (weighted
sum of tardiness/lateness) where the
tardiness of job j is given by
Tj = max { 0, Cj - dj }.
A Sahu
• 1 | prec; pj = 1 | Σ Σ Σ Σ ωωωωj Cj
• P2 | | Cmax
• P | pj = 1; rj | Σ Σ Σ Σ ωωωωj Uj
• R2 | chains; pmtn | Cmax
• R | n = 3 | Cmax
• P | pij = 1; outtree; rj | Σ Σ Σ Σ Cj
• Q| pj = 1 | Σ Σ Σ Σ Tj
A Sahu
• A problem is called polynomially solvable if it
can be solved by a polynomial algorithm.
Example
1 | | Σ ωjCj can be solved by
Scheduling the jobs in an ordering of non-
increasing ωj/pj - values.
Complexity: O(n log n)
A Sahu
Example
1 | | Σ Cj can be solved by
Scheduling the jobs in an ordering of non-
increasing 1/pj - values. == > SJF
Ci =Qi+Pi : Waiting time + Processing time
(SJF is optimal)
Complexity: O(n log n)
A Sahu