Clusters in the Cloud - conference.eresearch.edu.au/eres2016 › 2014 › 11 › ...Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Computing Specialist

Clusters in the Cloud

Dr. Paul Coddington, Deputy DirectorDr. Shunde Zhang, Computing Specialist

eResearch SA

October 2014

Use Cases

• Make the cloud easier to use for compute jobs– Particularly for users familiar with HPC clusters

• Personal, on‐demand cluster in the cloud• Cluster in the cloud

– A private cluster only available to a research group– A shared Node‐managed cluster

• Preferably dynamic/elastic– “Cloudbursting” for HPC

• Dynamically (and transparently) add extra compute nodes from cloud to an existing HPC cluster

Cluster infrastructure

Hardware, Network

Local Resource Management System /

Queueing systemMonitoring

Shared File System

Configuration Management System

Application Distribution

Software Layer

Traditional Static Cluster

• Hardware/Network– Dedicated hardware– Long process to get new hardware– Static not elastic

• Software– Assumes a fairly static environment (IPs etc)– Not cloud‐friendly– Some systems need restart if cluster is changed– Not adaptable to changes

Cluster in the cloud

• Hardware / Network– Provisioned by the cloud (OpenStack)– Get new resources in minutes– Remove resources in minutes– Elastic/scalable on demand

• Software– Dynamic– Can easily add/remove nodes as needed

Possible solutions

• Condor for high‐throughput computing– Cloud Scheduler working for a CERN LCG node– Recent versions of Condor support cloud execution

• Torque/PBS static cluster in cloud– Works, but painful to set up and maintain

• Dynamic Torque/PBS cluster– No existing dynamic/elastic solution

• StarCluster for personal cluster– Automate setup of VMs in cloud, including cluster– Can add/subtract worker nodes manually– Only Amazon, SGE and Condor but not PBS/Torque

Our work

• Condor for high‐throughput computing– Cloud Scheduler for Australian CERN LCG node

• Torque/PBS static cluster in cloud– Set up large cluster in cloud for CoEPP– Scripts to automate setup and monitoring

• Dynamic Torque/PBS cluster– Created Dynamic Torque system for OpenStack

• StarCluster for personal cluster– Ported to OpenStack and added Torque plugin– Add‐ons to make it easier to use for eRSA users

Application software

• Want familiar HPC applications to be available to cloud VMs– And we don’t want to install and maintain software twice, in HPC and cloud

• But limit on size of VM images in the cloud• Want to avoid making lots of custom images• We use CVMFS

– Read‐only distributed file system, http based– Used by CERN LHC Grid for distributing software– One VM image, with CVMFS client– Downloads and caches software from HPC cluster

HTC Cluster in the Cloud

• NeCTAR eResearch Tools project for high‐throughput computing in the cloud

• ARC Centre of Excellence in Experimental Particle Physics (CoEPP)

• Needed a large cluster for CERN ATLAS data analysis and simulation

• Tier 2 (global) and Tier 3 (local) jobs• Augment existing small physical clusters at multiple sites – running Torque

CERN ATLAS experiment

CERN ATLAS experiment

Static Cluster in the Cloud

• Built a large Torque cluster using cloud VMs • A challenging exercise!• Reliability issues, needed a lot of scripts to automate setup, monitoring, recovery, etc

• Some types of usage are bursty but cluster resources were static

• Didn't take advantage of elasticity of cloud

Dynamic Torque

• Static/dynamic worker nodes– Static: stays up all the time– Dynamic: up and down according to workload

• Independent of Torque/MAUI– Runs as a separate process

• Only add/remove worker nodes• Query Torque and MAUI scheduler periodically• Still up to MAUI scheduler to decide where to run a job

Dynamic Torque

Dynamic Torque for CoEPP

Worker nodes in SA

Worker nodes in Melbourne

Worker nodes in Monash

Torque/MAUI and Dynamic

TorqueLDAP NFS Puppet

Ganglia Nagios CVMFS

Interactive nodesin Melbourne

Dynamic Torque for CoEPP

CoEPP Outcomes

• Three large clusters in use for over a year– Hundreds of cores in each

• Condor and CloudScheduler for ATLAS Tier 2• Dynamic Torque for ATLAS Tier 3 and Belle• LHC ATLAS experiment at CERN

– 530,000 Tier 2 jobs– 325,000 CPU hours for Tier 3 jobs

• Belle experiment in Japan– 150,000 jobs

Private Clusters

• Good for building a shared cluster for a large research group with good IT support who can set up and manage a Torque cluster

• What about the many individual researchers or small groups who also want a private cluster using their cloud allocation?

• But have no dedicated IT staff and very basic Unix skills?

• Is there a simple DIY solution?

StarCluster

• Generic setup– Create security group for the cluster– Launch VMs (master, node01, node02 …)– Set up public key for password‐less SSH– Install NFS on master and share scratch space to all nodeXX

– Can use EBS (Cinder) volumes as scratch space

• Queuing system setup (plugins)– Condor, SGE, Hadoop … and your own plugin!

StarCluster for OpenStack

OpenStackEC2 API

StarClusterStarCluster

Head Node(NFS, Torque server, MAUI)

CVMFS proxy

Worker Node (Torque MOM)

Volume

eRSA App Repository (CVMFS server) Worker

Node (Torque MOM)

Worker Node (Torque MOM)

StarCluster ‐ configuration

• Availability zone• Image• (optional) Image for master• Flavor• (optional) Flavor for master• Number of nodes• Volume• Username• User ID• Group ID• User shell• plugins

Start a cluster with StarCluster

# fire up a new cluster (from your desktop)$ starcluster start mycluster# log in to the head node (master) to submit jobs$ starcluster sshmaster mycluster# Copy files$ starcluster put /path/to/local/file/or/dir /remote/path/$ starcluster get /path/to/remote/file/or/dir /local/path/# Add a compute node to the cluster$ starcluster addnode –n 2 mycluster# terminate it after use$ starcluster terminate mycluster

Other options for Personal Cluster

• Elasticluster– Python code to provision VMs– Ansible to configure them– Ansible playbooks for Torque/SGE/…, NFS/pvfs/…

• Heat– Everything in HOT template– Earlier versions had limitations that made it hard to implement everything

–May revisit in future

Private Cluster in the Cloud• Can use your personal or project cloud allocation to start up your own personal cluster in the cloud– No need to share! Except among your group.

• Can use the standard PBS/Torque queueingsystem to submit jobs (or not)– Only your jobs in the queue

• But you have to set up and manage the cluster– Straightforward if you have good Unix skills (unless things go wrong…)

• Several groups now using this – But eRSA doing support when things go wrong…

Emu Cluster in the Cloud

• Emu is an eRSA cluster that runs in the cloud• Aimed to be like an old cluster (Corvus)

– 8‐core compute nodes • But a bit different

– Dynamically created VMs in the cloud– Can have private compute nodes– Different size compute nodes if you want

Emu

• eRSA‐managed dynamic cluster in the cloud• Shared by multiple cloud tenants and eRSA users• All nodes in SA zone • eRSA cloud allocation contributes 128 cores• Users can bring in their own cloud allocation to launch their worker nodes in Emu

• Users don’t need to build and look after their own personal cluster

• It can also mount users’ Cinder volume storage to their own worker nodes via NFS

• Set up so researchers use their eRSA accounts

Using your own cloud allocation

• Users add our sysadmins to their tenant– So we can launch VMs on their behalf – Will look at Trusts in Icehouse

• Add some configs to Dynamic Torque– Number of static/dynamic nodes, size of nodes, etc

• Add a group of user accounts allowed to use it• Create a reservation for users’ worker nodes in MAUI• A special ‘account string’ needs to be put in the job

– To match users’ jobs to their group’s reserved nodes• A qsub filter to check if the ‘account string’ is valid

– You can’t submit a job using another group’s allocation

Emu

Worker nodes of Tenant1

Shared worker nodes (eRSA donated)

Worker nodes of Tenant2

Torque/MAUI and Dynamic

TorqueLDAP NFS Salt

Sensu CVMFS

NFS

Static cluster vs dynamic cluster

Static Cluster Dynamic Cluster

Hardware Physical Machines Virtual Machines

LRMS Torque Torque with Dynamic Torque

CMS Puppet Salt Stack

Monitoring Nagios, Ganglia Sensu, Graphite, Logstash

App Distribution NFS Mount CVMFS

Shared FS NFS NFS

Future Work

• Better reporting and usage graphs• More monitoring checks• Queueing system

– Multi‐node jobs don’t work because a new node is not trusted by existing nodes

– Trust list is only updated when PBS server is started– Could hack Torque source code– Or maybe use SLURM or SGE

• Better way to share user credentials– Trusts in Icehouse?

• National and/or other regional services

Future Work

• Spot instance queue• Distributed file system

– NFS is the static component• Cannot add storage to NFS without stopping it• Cannot add new nodes to the allow list dynamically• needs to use iptables; update iptables instead

– Investigation of a dynamic and distributed FS• One FS for all tenants

• Alternatives to StarCluster– Heat or Elasticluster

Resources

• Cloud Scheduler– http://cloudscheduler.org/

• Star cluster– http://star.mit.edu.au/cluster– OpenStack version

• https://github.com/shundezhang/StarCluster/

• Dynamic Torque– https://github.com/shundezhang/dynamictorque

Nagios vs Sensu

• Nagios– First designed in the last century for static environment

– Needs to update local configuration and restart service if a remote server is added or removed

– Server perform all checks and it is not scalable• Sensu

– Modern design with AMQP as communication layer– Local agent runs checks– Weak coupling between clients and server and it is scalable

Imagefactory

• Template in XML– Packages– Commands– Files

• Can backup an ‘image’ in github• Automatic, no user interaction required

EMU Monitoring

• Sensu– Run health checks

• Logstash– Collect PBS server/MOM/accounting logs

• Collectd– Collect metrics of CPU, memory, disk, network etc

Salt Stack vs Puppet

Salt Stack Puppet

Architecture Server‐Client Server‐Client

Working Modal Push Pull

Communication Zeromq + msgpack HTTP + text

Language Python Ruby

Remote execution Yes No

Documents

Clusters in the Cloud - conference.eresearch.edu.au/eres2016 › 2014 › 11 › ...Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Computing Specialist