Upload
alissa
View
32
Download
1
Embed Size (px)
DESCRIPTION
New Seaborg Queue Configuration Results May 29, 2003 David Turner NERSC User Services Group [email protected] 510-486-4027. Introduction. Review of LoadLeveler Class Structure NERSC-3 classes Proposed NERSC-3 Extended classes Current NERSC-3 Extended classes - PowerPoint PPT Presentation
Citation preview
New Seaborg Queue Configuration
Results
May 29, 2003
David TurnerNERSC User Services Group
510-486-4027
Introduction
• Review of LoadLeveler Class Structure— NERSC-3 classes — Proposed NERSC-3 Extended classes— Current NERSC-3 Extended classes— Objectives of current class structure
• Effects of Current Structure— Connect time
wallclock * nodes * 16— Wait time
start time - submit time— Connect time / Wait time
• Conclusions
NERSC-3 Class Structure
Class Nodes (Procs) Time Limit
interactive 8 (128) 30 min
debug 16 (256) 30 min
premium 128 (2,048) 8 hrs
regular 128 (2,048) 8 hrs
low 128 (2,048) 8 hrs
regular_long 32 (512) 24 hrs
Proposed Class Structure
Class Nodes (Procs) Time Priority
interactive 8 (128) 30 min 1
debug 24 (384) 30 min 1
premium 256 (4,096) 12 hrs 2
large 32 – 256 (512 – 4,096) 48 hrs 3
regular 256 (4,096) 12 hrs 4
regular_long 32 (512) 24 hrs 4
low 256 (4,096) 12 hrs 5
Other Proposed Changes
• Various limit adjustments— Increase user run limit from 4 to 6— Eliminate class limit of 7 in regular_long— Retain 1 running, 1 queued limit in regular_long
• Eliminate aging— Incompatible with class priorities
• Schedule lowest load average and smallest memory nodes first
• Tune scheduling parameters to maintain responsiveness
Current Class Structure
Submit Class
LLClass
Max Nodes
MaxProcs
Max Hours
RelativePriority
interactive interactive 1–8 1–128 0.5 1
debug debug 1–24 1–384 0.5 2
premium pre_1 1–31 1–496 12 7
pre_32 32–127 497–2032 48 5
pre_128 128–380 2033–6080 48 3
regular reg_1 1–31 1–496 12 8
reg_1l 1–31 1–496 24 8
reg_32 32–127 497–2032 48 6
reg_128 128–380 2033–6080 48 4
low low 1–128 1–2048 12 9
General Batch Policies
• Each user may have:— 6 jobs running— 10 jobs considered for scheduling (idle state)— 30 jobs submitted
• The class run limit for reg_1l is 15 jobs• Jobs requesting 8 hours or less will complete
before scheduled outages• Jobs placed on “user hold” (status HU) will be
removed after one week
Objectives of Class Structure
• Allow 4096-way jobs— Current MPI maximum
• Favor “large” jobs• Provide longer time limit for “regular” jobs• Provide more resources to “long” jobs
— Allow greater access
• Provide more resources to interactive and debug jobs— As needed
All while maintaining system responsiveness
N3 vs. N3E
• N3— October 1, 2002 – March 2, 2003— 153 days
• N3E— March 3, 2003 – May 20, 2003— 79 days
Jobs Per Week
Nodes N3 N3E
1 - 15 9208.6 8534.7
16 - 31 143.9 253.3
32 - 63 13.7 88.9
64 - 127 12.4 48.0
128+ 3.6 17.8
Total 9382.2 8942.7
Connect Time vs. Class
0
10
20
30
40
50
60
70
80
90
%
Low Regular Premium
N3
N3E
Charge Class
Connect Time vs. Size
0
10
20
30
40
50
60
70
80
%
1-15 16-31 32-63 64-127 128+
N3
N3E
Number of Nodes
Wait Time vs. Size
0:00:00
4:00:00
8:00:00
12:00:00
16:00:00
20:00:00
24:00:00
1-15 16-31 32-63 64-127 128+
N3
N3E
Number of Nodes
Connect Time / Wait Time
0.00
150.00
300.00
450.00
600.00
750.00
900.00
1-15 16-31 32-63 64-127 128+
N3
N3E
Number of Nodes
Conclusions
• Users running larger jobs• Users running longer jobs• Interactive and debug throughput maintained
Resources I
http://hpcf.nersc.gov/running_jobs/ibm/llsum/summary.php
Resources II
http://hpcf.nersc.gov/running_jobs/ibm/llsum/
End of Talk
This slide intentionally left blank.