14
Designing and managing scalable HPC Infrastructure to support Public Health in the Genomic Era Francesco Giannoccaro

Designing and managing scalable HPC Infrastructure to support Public Health in the Genomic Era

Embed Size (px)

Citation preview

Designing and managing scalable HPC Infrastructure to support Public Health in the Genomic Era

Francesco Giannoccaro

A walk-trough the design and implementation phases of a scalable HPC/HTC infrastructure to support the increasing demand for advanced computing platforms, driven by the speed at which public health science is evolving and the rate at which medical microbiology is modernising.

Summary

Slide 2 of 14

● Public Health England (PHE) is an executive agency of the Department of Health in the UK, its main mission is "to protect and improve the nation’s health and well-being, and reduce health inequalities”

● PHE is structured in directorates and corporate programs and has a network of specialist microbiology laboratories across England

● PHE services include the microbiology services, regional microbiology network, field epidemiology, surveillance and control

● Increased demand on computational power primarily due to the implementation of whole genome sequencing (WGS) as part of its modernisation

● Geographically distributed IT infrastructure mainly located on two sites Colindale (north London) and Porton Down (south west)

Background

Slide 3 of 14

High performance and high throughput computing (HPC/HTC) have been used in PHE mainly by three departments:

● Emergency Response, use HPC to better understand ahead of time the (eco-) epidemiological, social, behavioral drivers that exacerbate the risks posed by infectious disease threats, including bioterrorism;

● Statistics Modeling and Economics, use HPC to provide real time models to predict expected pandemic disease dynamics, and to produce data contributing to the body of knowledge, scientific publication informing national policy including national vaccination policy and control of antimicrobial resistance;

● Infection Disease Informatics (Bioinformatics), use HPC to support whole genome sequencing (WGS) analysis for diagnostics and surveillance of infectious diseases, with hundreds of biological samples received per week from patients with unidentified and potentially aggressive pathogens (bacteria and virus) that need urgent identification.

Overview of pre-existing HPC systems

Slide 4 of 14

Pre-existing HPC infrastructure

High Performance ComputingSystem used by Modeling andEconomics department

Located in Colindale

High Performance ComputingSystem used by Bioinformatics unit

Located in Colindale Located in Porton

High Performance ComputingSystem used by EmergencyResponse department

Linux cluster based on RHEL● Resource manager: GridEngine● Provisioning system: xCat

2 x Management server IBM x3650● 2 x Intel X5450, 32GB of RAM● 2 x 10 GB Ethernet● 6 x 72 GB SAS

16 x HP Blade BL460c Gen8 compute nodes● 2 x Intel E5-2680, 128 GB of RAM● 2 x 10 Gb Ethernet● 2 x 900GB 6G SAS 10K

10 x IBM Flex System x240 compute node● 2 x Intel E5-2650v2, 128GB of RAM● 1 x CN4022 2-port 10Gb ● 2 x 900GB 10K SAS HDD

Lustre filesystem

Linux cluster based on Bull/RHEL● Resource manager: Slurm● Provisioning system: Bull

2 x Bull R423-E3 management server● 2 x Intel E5-2620, 32GB of RAM● 2 x 500GB SATA3● 2 x InfiniBand ConnectX-2 QDR IB● 2 x Gb Ethernet

72 x Bull B500 compute nodes:● 2 x Intel X5660, 48 GB of RAM● 1 x 128GB SATA2 SSD● InfiniBand adapter

Lustre filesystem

Linux cluster based on Bull/RHEL● Resource manager: Slurm● Provisioning system: Bull

28 x Bull B510 compute nodes:● 2 x Intel E5-2620, 32GB of RAM● 1 x 256 GB SSD● 2 x 1Gb Ethernet● 1 x InfiniBand QDR

8 x Bull B500 compute nodes:● 2 x CPU Intel L5530, 24GB of RAM● 1 x 256 GB SSD● 2 x 1Gb Ethernet● 1 x InfiniBand QDR

1 x Bull GPU server:● 2 x CPU Inte E5620, 20GB of RAM● 2 x Nvidia K20c GPU cards● 1 x InfiniBand QDR - 2 x 1Gb Ethernet● 1TB SAS disk

Lustre filesystem

Slide 5 of 14

Lustre HPS storage tier

HPS system DDN EXAScaler SFA 10K ● Lustre filesystem (v. 2.5.41) ● 2 x DDN SF 10K controllers ● 3 x DDN Enclosure SS7000 (300 TB usable)● 4 x Lustre Object Storage Servers● 2 x Meta Data servers, 1 x MDT DDN EF3015● Host interfaces: 4x 10GbE SFP+ (per controller)● 2.5 GB/s read and write performance

HPS system DDN ES7K Lustre ● Lustre version 2.5.42.8 (EXAScaler 2.3.1)● 40 x 4TB NL-SAS disks (145 TB raw, 125 TiB

usable capacity for data)● 6 x 300GB SAS (metadata)● 2 virtual OSS & 2 virtual MDS● 3 GB/s reads and writes performance ● Host interfaces: 4 x InfiniBand FDR or 40

GbE QSFP● Dual LNETs configured

● LNET1 on InfiniBand FDR, IPoIB configured to use datagram mode

● LNET2 on Ethernet/QSFP 40Gb/s, jumbo frames enabled

Slide 6 of 14

iRODS archive storage tier

High Performance StorageDDN EXAScaler / Lustre fs

iRODS serversWith SSL SAN certificate

PHE/Colindale

HPC cluster

DDN WOS object storage system

Sequencingmachines

PHE WAN

PHE/Porton Down

iRODS server

DDN WOS object storage system

Computing andStorage system forsimple analysisSequencing

machines

PHE/Birmingham

DDN WOS objectstorage system

iRODS server

Slide 7 of 14

Metalnx – iRODS Administrative and Metadata Management WebUI

PHE Cloud Platform goals & objectives

Slide 8 of 14

Applying a holistic approach to increase both capacity and capability of the IT infrastructure, implementing cloud technologies able to:

● provide HPC on demand services, offering the ability to expand any of the existing clusters by deploying additional compute resources when needed; also provide the ability to deploy virtual HPC cluster (eg ElastiCluster, Senlin)

● improve orchestration and automation of existing HPC environments and the new software defined infrastructure by implementing an end to end API solution to support centralized provisioning, configuration and management operations;

● provide IaaS capability to host big data analytics platforms, to leverage the value of a number of existing PHE datasets (eg Cassandra, Hadoop, Apache Spark);

● increase resilience and disaster recovery capability by implementing geographically distributed cloud storage tiers and archive storage tier across multiple sites/regions;

● reduce and limit vendor lock-in constraints using open-source enterprise class technologies;

● allow PHE to scale-up its computational capacity if required above and beyond the on premise computational resources, leveraging commercial clouds.

Infrastructure capacity Cores Ram HPS Archive Storage

Initial capacity 1.5k 6.4TB 390TB 500TB

Post deployment capacity 2.9k 16.4TB 500TB 500TB

Overview of the new architecture

Slide 9 of 14

OpenStack deployment at each cloud region

Slide 10 of 14

OpenStack Undercloud provisioning server:● Lenovo x3550 M5 with 1x Intel E5-2620 v3● 64 GB of RAM● 2x 1TB 10K 6Gbps SAS HDD● 1x ConnectX-3 Pro 2x40GbE/FDR VPI Adapt.

3 x Controller nodes - Lenovo x3550 M5:● 2x Intel E5-2620 v3, 128 GB of RAM● 2x 240GB SATA SSD - 2x 480GB STAT SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter

31 x Compute nodes (*) – Lenovo nx360 M5:● 2x Intel E5-2640 v3(16cores) , 128 GB of RAM● 2x 120GB SATA SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter

1 x Compute node for large cloud instances - Lenovo x3950 X6 8U● 8x Intel E7-8860 v3(128cores) - 1TB of RAM● 2x 120GB STAT SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter

1x GPU node – Lenovo nx360 M5:● 2x Intel E5-2640 v3, 128 GB of RAM● 2x 120GB STAT SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter● 1x nVidia Tesla K80

infrastructure designed to be easily expanded

(*) Including additional nodes that will be installed by end of March 2017Cloud Storage Tier

Network topology at each cloud region

Slide 11 of 14

2x Mellanox SX1710spine switches

Mellanox switches and network cards installed on all OpenStack servers are configured with support for SR-IOV

3x Mellanox SX1710leaf switches

Lustre HPS is presented as external-network to Newtron with range of floating IP address that can be used by tenants.

Note that from release Liberty Neutron support RBAC, to solve the inability to share certain network resources with a subset of projects/tenants. Neutron has supported shared resources in the past, but until now it's been all-or-nothing. If a network is marked as shared, it is shared with all tenants.

Access can now be tuned trough RBAC on the basis of these features• regular port creation permissions on

networks (since Liberty)• binding QoS policies permissions to

networks or ports (since Mitaka)• attaching router gateways to

networks (since Mitaka)

CEPH cloud storage deployment at each region

Slide 12 of 14

3 x CEPH server - Lenovo x3550 M5

● 2x Intel E5-2630 v3● 128 GB of RAM● 9x 6TB SATA SSD● 2x 480GB SATA SSD● 1x ConnectX-3 Pro 2x40GbE/FDR VPI

Adapter

40/56GbEthernet Networkfor CEPH data

1Gb Ethernet Network formanagement

Insights of the adopted solution

Slide 13 of 14

● Using OpenStack Mitaka (RHOSP9) which provide several improvements, including better Heat support for: resource chain, support for cleanup actions (filesystem sync), thread-aware CPU pinning, ceilometer integration improvements, autoscaling for compute based on heat/ceilometer .

● Deployment thorugh Director (based on TripleO and Ironic): two cloud-regions deployed using CEPH cloud storage tiers with smart replication and synchronization. Supported upgrade path.

● Fernet tokens for Keystone authentication and authorisation system (FreeIPA) across multiple regions allowing users to log-in at either site using the same credentials.

● Spine/leaf network architecture with cloud nodes connected at 56Gb. SR-IOV (ethernet/IB) & Mellanox configuration for low latency MPI workloads.

● OpenStack sub-projects that will be used in addition to the core ones● Sahara: provides a simple means to provision a data-intensive application cluster

(Hadoop or Spark)● Magnum: application catalog to deploy Kubernetes clusters, pods, and container

applications

Acknowledgments

Team member's and key contributions

Technical partners and key contributions

Francesco Giannoccaro Tim CairnesThomas Stewart Anna Rance

Andrew Dean, Christopher Brown

Richard Mansfield

Slide 14 of 14

Thanks and keep in [email protected] www.linkedin.com/in/giannoccaro