Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Agenda
• Job Scheduling Goals• Understanding the Needs of the Users• Configuration Parameterization• Incentivizing User Behavior• Topology Awareness• Weekly Resource Management Discussion
2Managing HPC Systems and Centers
Job Scheduling Goals
• Know and prioritize goals to measure against• Scheduling goals are frequently in conflict: it’s a balancing act• If there is dissatisfaction with the scheduler
1. Identify if goals are being met by configuration2. Question whether the dissatisfaction is tolerable, or if goals need
adjustment3. If goals are adjusted, then adjust configuration to match and then
monitor
3Managing HPC Systems and Centers
Job Scheduling Goals (on Blue Waters)
• No user or project is favored by policy• Large jobs have higher priority• System is highly utilized• Job turnaround time is minimized• Debug queue with fastest turnaround time• Users can attain higher priority with higher charge to allocation• Scheduler commands are responsive• Predictable job start times
4Managing HPC Systems and Centers
Understanding the Needs of the Users• Evaluate requirements of users
• Wall clock time• Job turn around time• System availability• Multitenancy
• Variables Beyond Control• Job geometry (requested resources, such as walltime or nodes)• Job volume submitted• Walltime accuracy• Application stability
5Managing HPC Systems and Centers
Configuration Parameterization
• Identify the tools to manipulate scheduling behavior• QoS, Queues, Reservations, Fairshare
• Avoid unnecessarily complex configurations• Queues might be configured for varying:
• Priority• Time• Job size• Resource type
6Managing HPC Systems and Centers
Incentivizing User Behavior
• Discounts provide user incentives to encourage a submission behavior• This can be as easy as changing charge factor for a specific queue
• Examples: Seasonal submission lull, specific job sizes, preemptible queues, backfillable job
• Scheduler product built-ins will vary – custom efforts sometimes necessary
7Managing HPC Systems and Centers
Topology Awareness
• Placing jobs in network locations optimal for tightly coupled communication
• Can be beneficial to some applications by improving performance and runtime consistency
• Represents a constraint and can affect turnaround time• May reduce utilization, but increase overall throughput through
average performance enhancement
8Managing HPC Systems and Centers
Weekly Resource Management Discussion
• Review tickets submitted that are scheduling related• View storage utilization and usage• View system utilization• Look at wait times per queue and per user• Look at scheduler performance (response time)• Review for any user behavior that could potentially affect system
procedures and policy
9Managing HPC Systems and Centers
10Managing HPC Systems and Centers
Filesystem Activity
Filesystem Load and Response Time
11Managing HPC Systems and Centers
12Managing HPC Systems and Centers
Wait Times (Xdmod) and Historical Utilization
Scheduler Statistics and Iteration Time
13Managing HPC Systems and Centers