30
Merit Allocation Training 2022 Pawsey Supercomputing Research Centre 3 September 2021

Merit Allocation Training 2022 - support.pawsey.org.au

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Merit Allocation Training 2022 Pawsey Supercomputing Research Centre

3 September 2021

2

Table of Contents

Photo Credit: The Rottnest Island Authority

• Capital Refresh Overview

• Setonix Overview (Hardware and Software available)

• Accounting Model

• Merit Allocation Schemes

• How do I apply?

3

Capital Refresh Overview

Photo Credit: The Rottnest Island Authority

• Capital Refresh Overview

• Computational Capability

• Setonix Availability Timeline

4

Capital Refresh Overview

• Several initial procurements completed• Garrawarla compute cluster• Astronomy high speed storage

• Cloud high throughput computing

• Acacia object storage system installation in progress

• Setonix will be arriving soon• Phase 1: CPU-based

• Phase 2: GPU-based and additional CPUs

Merit Allocation Training 2022

5

Computational CapabilityWhat is the increase in computational capacity?

• Merit allocations not bound to specific partitions, support for large-scale jobs using most of Setonix

• Significant increase in computational capacity for the available schemes on Setonix.

• Increase in double precision floating point operations available are:

1.1 petaflops

Magnus Setonix Phase 1

2.7 petaflops

Setonix Phase 2

50 petaflops

Merit Allocation Training 2022

6

Setonix Availability TimelineHow does it affect merit applications?

• Setonix’s full capacity will be available to researchers in 2023 allocation schemes

• In 2022 allocations will go through a transition from current to new model

Merit Allocation Training 2022

7

Setonix Overview

• Setonix Hardware Overview

• Key Hardware Changes

• Storage Overview

• Software Overview & Changes

8

Setonix Hardware OverviewPhase – 1 provides

• CPU compute

• Fast interconnect

• High Memory & Visualisation Nodes

• High-performance filesystems (LusterFS)

Phase – 2 will add

• Additional CPU compute

• Production-level GPU compute

• Slingshot upgrade to +200Gbs

Acacia system

• Large-volume storage (Object Store, CEPH, S3)

Merit Allocation Training 2022

9

Key Hardware Changes

• Moving from 24 core Intel nodes to 128 core AMD nodes

• Changing from 64 GB to 256 GB (more memory per node)

• Changing from 2.5 GB per core to 2 GB per core (slightly less memory per core)

• Exclusive node access to shared node access

• Project storage on /group will move to the Acacia object store

• Software installations on /group will move to the /software file system

Merit Allocation Training 2022

10

Storage Overview

Supercomputing File Systems

/home

• Like current Pawsey systems, minimal storage (NFS)

/software

• LusterFS storage used for software, replaces some functionality of /group

/scratch

• Fast LusterFS workflow storage. Data should be moved in/out of Object Store/offsite

Acacia Object Store

• Large-volume project storage, uses S3 interface

IMPORTANT: /group will no longer exist Merit Allocation Training 2022

11

Software Overview & Changes

Overview

• HPE/Cray provides optimised compilers, libraries and tools

• Pawsey supports software used in many scientific domains

Key Changes

• Moving from Intel architecture to AMD means a move from Intel compilers to AMD & Clang-based compilers

• Move from OpenACC to OpenMP

• Move from MAALI to Spackinstallation tool

Merit Allocation Training 2022

12

Accounting Model

• What is an accounting model?

• The Previous Accounting Model: Magnus

• The New Account Model: Setonix

• Setonix Account Model Examples

• The Accounting Model: Setonix vs Gadi

13

What is an accounting model?

The accounting model determines what a user is charged for, and how much.

• Traditionally, the consumable resource is the hourly usage of CPU cores of a supercomputer.

• Service Unit (SU) is the unit of measure for consumable supercomputing resources.• 1 SU is equivalent to 1 hour use of 1 CPU core.• Cost of a job (SU): number of CPU cores requested (CPU) × wall time (h).

Examples:

• 1 SU = 1 hour use of 1 CPU core = ½ hour use of 2 CPU cores.

• 576 SU = 24 hours on 1 Magnus node (24 CPU cores)= 4.5 hours on 1 Setonix node (128 CPU cores)

Merit Allocation Training 2022

14

The Previous Accounting Model: Magnus

• Consumable resource: hourly usage of CPU cores.

• 1 SU = 1 hour use of 1 CPU core

Exclusive node usage:

• Resources are allocated and charged for at a compute node granularity.

• At any time, at most one job has access to cores in a node.

• If a job doesn't use all the cores in a node, the consumable resource is wasted.

• Hence, it is also charged for time on idle cores.

• Cost of a job (SU): 24 × nodes × wall time.

Exclusive Node Use on Magnus

NODE

Merit Allocation Training 2022

15

The New Accounting Model: Setonix• Consumable resource: hourly usage of CPU cores (CPUs).

• 1 SU = 1 hour use of 1 CPU core

Proportional node usage:

• Resources are allocated at a sub-node granularity.

• Multiple jobs can run on the same node.

• A job is charged for the largest fraction of resources used.

• RAM consumption is mapped to CPU consumption.

• RAM consumption by 1 job may affect other jobs on the same node.

• Cost of a job (SU): largest fraction × nodes × wall time.

• Min: 1 SU per hour • Max: 128 SU per hour per node

Proportional Node Use on Setonix Phase 1

NODE

Merit Allocation Training 2022

16

Setonix Accounting Model Examples

Examples of Setonix proportional node usage

Example 1: RAM proportion (2/3) is bigger than CPU cores proportion (½).

Example 2: CPU cores proportion (2/3) is bigger than RAM proportion (½).

NODE NODE

Merit Allocation Training 2022

17

The Accounting Model – Setonix vs Gadi

Comparing the accounting models of Setonix and Gadi

When applying for NCMAS, remember that NCI charges 2 service units for 1 hour use of 1 core on Gadi.

ResourcesService Units

Setonix (128 cores per node)

Gadi (48 cores per node)

1 CPU core / h 1 2

1 CPU / h 64 48

1 Node / h 128 96

Merit Allocation Training 2022

18

Merit Allocation Schemes

• Merit Allocation Schemes on Setonix

• Timeline

• Early Access and GPU Migration

19

Merit Allocation Schemes on SetonixApplications for 2022 merit allocations are open for Setonix CPU partition:

The National Computational Merit Allocation Scheme (NCMAS)

• Annual allocation call open to the whole Australian research community• Meritorious, computational research projects

The Pawsey Partner Merit Allocation Scheme

• Annual call open to researchers in Pawsey Partner institutions• Meritorious, computational research projects• Partner institutions: CSIRO, Curtin University, Edith Cowan University, Murdoch

University and The University of Western AustraliaNOTE: The Pawsey Energy & Resources Merit Allocation Scheme will be discontinued. From 2022, there are no more calls for this Scheme. Researchers from the Australian energy and resources research community are encouraged to apply through NCMAS and Pawsey Partner schemes.

Merit Allocation Training 2022

20

Timeline

Dates Milestone18 August 2021 Applications open5 October 2021 Applications close (5pm AWST)1-2-3 December 2021 Allocation Committee meeting21 December 2021 Allocations announced1 Jan 2022 Access to allocations commences

The National Computational Merit Allocation Scheme (NCMAS)

The Pawsey Partner Merit Allocation SchemeDates Milestone31 August 2021 Applications open11 October 2021 Applications close (5pm AoE - Anywhere on Earth Time)2nd half of December 2021 Allocation Committee meeting22 December 2021 Allocations announced1 Jan 2022 Access to allocations commences

Merit Allocation Training 2022

21

Early Access and GPU Migration

Researchers can apply for early access to Setonix resources separately to the Merit Allocation Calls:

• Setonix CPU Early Adopters EOI will be sent to all current Magnus projects in Q4 2021

• Setonix GPU Early Science Call will be available for 2H 2022

Topaz GPU cluster will be available for GPU migration purposes:

• Access will be provided to Merit Allocation projects on request

• Programming environment supports AMD GPU porting (with HIP and OpenMP)

• Container environment supports AI/ML workloads

Merit Allocation Training 2022

22

How do I apply?

• Setonix Allocations Requests in 2022

• Examples: 1st and 2nd Request

• Benchmarking and Scaling

• Magnus vs Setonix Benchmarks Comparison

• Application Portal Information

• Help & Further Assistance

• Questions?

23

Setonix Allocation Requests in 2022

Scheme 1st Request(full year)

2nd Request(2H 2022 pro rata)

NCMASTotal capacity 100 MSU 150 MSU

Minimum request 250 kSU 1 MSU

Pawsey PartnerTotal capacity 110 MSU 190 MSU

Minimum request 100 kSU 1 MSU

In 2022 researchers applying through NCMAS and Pawsey Partner Schemes will do so separately for Setonix Phase 1 (1st Request) and Setonix Phase 2 (2nd Request) CPU allocations.

NOTE: 1 kSU = 1000 SU, 1 MSU = 1000000 SUMerit Allocation Training 2022

24

Examples: 1st and 2nd Requests

Example 1: Research Group A

Research Group A was awarded:

• 1st Request: 2 MSU, and

• 2nd Request: 10 MSU

Setonix Phase 2 becomes available for researchers on the first day of Q4 2022.

The real allocation of Research Group A is:

• Setonix Phase 1 available throughout the year: 2 MSU

• Setonix Phase 2 available in Q4 2022: 5 MSU

Example 2: Research Group B

Research Group B was awarded:

• 1st Request: 1 MSU, and

• 2nd Request: 5 MSU

Setonix Phase 2 becomes available for researchers on the first day of 2H 2022.

The real allocation of Research Group B is:

• Setonix Phase 1 available throughout the year: 1 MSU

• Setonix Phase 2 available in 2H 2022: 5 MSU

Merit Allocation Training 2022

25

Benchmarking and Scaling

• Benchmarking and Scaling information is an important part of the application• Demonstrates efficient use of a

supercomputer

• Include scaling information for a typical job you will run• Ideally provide scaling tests of your

jobs on Magnus• At minimum provide scaling tests of

the software on a system

Cores Nodes Walltime(hours)

Cost(Node hours)

128 8 11.2 90

256 16 4.4 70

512 32 2.1 67

1024 64 1.0 64

2048 128 0.91 116

4096 256 0.75 192

Real-world example using NWChem

• Which configuration is best for you and why?

• 1024-cores is the most efficient use of core hours, as well as giving good job walltime

• The choice should be used in the calculation of how much total allocation you require

Merit Allocation Training 2022

26

Magnus vs Setonix Benchmarks Comparison

• Setonix will not be available for benchmarking of codes and workflows for 2022 allocation requests.• SU cost per simulation comparison between Magnus and Setonix may vary.• For codes benchmarked by Pawsey and HPE Cray we have noted:

• Average of 20% increase in SU cost• Shorter time to solution per simulation (up to 3.7x)

We recommend:

• 1st SU request: calculate based on Magnus SU cost per simulation• 2nd SU request:

• Ask for additional capacity• Add 20% of your first SU request

IMPORTANT: Setonix allows for overuse with reduced priority. Overuse is capped at 50% of the original allocation.

Merit Allocation Training 2022

27Merit Allocation Training 2022

NCMAS - Application Portal

• Apply for Setonix access online through the NCMAS scheme:http://ncmas.nci.org.au/

28Merit Allocation Training 2022

Pawsey Partner Scheme - Application Portal

• Apply for Setonix access online via the Pawsey Partner scheme:https://ssl.linklings.net/organizations/pawsey/

29

Help & Further Assistance

Changes in Supercomputing Services for 2022

https://support.pawsey.org.au/documentation/display/US/Changes+in+Supercomputing+Services+for+2022

Pawsey webpage: https://pawsey.org.au

Pawsey Friends mailing list: https://pawsey.org.au/pawsey-friends/

Pawsey Twitter feed: @PawseyCentre

Pawsey YouTube Channel:

https://www.youtube.com/pawseysupercomputingcentre

User Support Portal: https://pawsey.org.au/support/

Merit Allocation Training 2022

30

Questions?

Merit Allocation Training 2022