Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Clusters in the Cloud
Dr. Paul Coddington, Deputy DirectorDr. Shunde Zhang, Computing Specialist
eResearch SA
October 2014
Use Cases
• Make the cloud easier to use for compute jobs– Particularly for users familiar with HPC clusters
• Personal, on‐demand cluster in the cloud• Cluster in the cloud
– A private cluster only available to a research group– A shared Node‐managed cluster
• Preferably dynamic/elastic– “Cloudbursting” for HPC
• Dynamically (and transparently) add extra compute nodes from cloud to an existing HPC cluster
Cluster infrastructure
Hardware, Network
Local Resource Management System /
Queueing systemMonitoring
Shared File System
Configuration Management System
Application Distribution
Software Layer
Traditional Static Cluster
• Hardware/Network– Dedicated hardware– Long process to get new hardware– Static not elastic
• Software– Assumes a fairly static environment (IPs etc)– Not cloud‐friendly– Some systems need restart if cluster is changed– Not adaptable to changes
Cluster in the cloud
• Hardware / Network– Provisioned by the cloud (OpenStack)– Get new resources in minutes– Remove resources in minutes– Elastic/scalable on demand
• Software– Dynamic– Can easily add/remove nodes as needed
Possible solutions
• Condor for high‐throughput computing– Cloud Scheduler working for a CERN LCG node– Recent versions of Condor support cloud execution
• Torque/PBS static cluster in cloud– Works, but painful to set up and maintain
• Dynamic Torque/PBS cluster– No existing dynamic/elastic solution
• StarCluster for personal cluster– Automate setup of VMs in cloud, including cluster– Can add/subtract worker nodes manually– Only Amazon, SGE and Condor but not PBS/Torque
Our work
• Condor for high‐throughput computing– Cloud Scheduler for Australian CERN LCG node
• Torque/PBS static cluster in cloud– Set up large cluster in cloud for CoEPP– Scripts to automate setup and monitoring
• Dynamic Torque/PBS cluster– Created Dynamic Torque system for OpenStack
• StarCluster for personal cluster– Ported to OpenStack and added Torque plugin– Add‐ons to make it easier to use for eRSA users
Application software
• Want familiar HPC applications to be available to cloud VMs– And we don’t want to install and maintain software twice, in HPC and cloud
• But limit on size of VM images in the cloud• Want to avoid making lots of custom images• We use CVMFS
– Read‐only distributed file system, http based– Used by CERN LHC Grid for distributing software– One VM image, with CVMFS client– Downloads and caches software from HPC cluster
HTC Cluster in the Cloud
• NeCTAR eResearch Tools project for high‐throughput computing in the cloud
• ARC Centre of Excellence in Experimental Particle Physics (CoEPP)
• Needed a large cluster for CERN ATLAS data analysis and simulation
• Tier 2 (global) and Tier 3 (local) jobs• Augment existing small physical clusters at multiple sites – running Torque
CERN ATLAS experiment
CERN ATLAS experiment
Static Cluster in the Cloud
• Built a large Torque cluster using cloud VMs • A challenging exercise!• Reliability issues, needed a lot of scripts to automate setup, monitoring, recovery, etc
• Some types of usage are bursty but cluster resources were static
• Didn't take advantage of elasticity of cloud
Dynamic Torque
• Static/dynamic worker nodes– Static: stays up all the time– Dynamic: up and down according to workload
• Independent of Torque/MAUI– Runs as a separate process
• Only add/remove worker nodes• Query Torque and MAUI scheduler periodically• Still up to MAUI scheduler to decide where to run a job
Dynamic Torque
Dynamic Torque for CoEPP
Worker nodes in SA
Worker nodes in Melbourne
Worker nodes in Monash
Torque/MAUI and Dynamic
TorqueLDAP NFS Puppet
Ganglia Nagios CVMFS
Interactive nodesin Melbourne
Dynamic Torque for CoEPP
CoEPP Outcomes
• Three large clusters in use for over a year– Hundreds of cores in each
• Condor and CloudScheduler for ATLAS Tier 2• Dynamic Torque for ATLAS Tier 3 and Belle• LHC ATLAS experiment at CERN
– 530,000 Tier 2 jobs– 325,000 CPU hours for Tier 3 jobs
• Belle experiment in Japan– 150,000 jobs
Private Clusters
• Good for building a shared cluster for a large research group with good IT support who can set up and manage a Torque cluster
• What about the many individual researchers or small groups who also want a private cluster using their cloud allocation?
• But have no dedicated IT staff and very basic Unix skills?
• Is there a simple DIY solution?
StarCluster
• Generic setup– Create security group for the cluster– Launch VMs (master, node01, node02 …)– Set up public key for password‐less SSH– Install NFS on master and share scratch space to all nodeXX
– Can use EBS (Cinder) volumes as scratch space
• Queuing system setup (plugins)– Condor, SGE, Hadoop … and your own plugin!
StarCluster for OpenStack
OpenStackEC2 API
StarClusterStarCluster
Head Node(NFS, Torque server, MAUI)
CVMFS proxy
Worker Node (Torque MOM)
Volume
eRSA App Repository (CVMFS server) Worker
Node (Torque MOM)
Worker Node (Torque MOM)
StarCluster ‐ configuration
• Availability zone• Image• (optional) Image for master• Flavor• (optional) Flavor for master• Number of nodes• Volume• Username• User ID• Group ID• User shell• plugins
Start a cluster with StarCluster
# fire up a new cluster (from your desktop)$ starcluster start mycluster# log in to the head node (master) to submit jobs$ starcluster sshmaster mycluster# Copy files$ starcluster put /path/to/local/file/or/dir /remote/path/$ starcluster get /path/to/remote/file/or/dir /local/path/# Add a compute node to the cluster$ starcluster addnode –n 2 mycluster# terminate it after use$ starcluster terminate mycluster
Other options for Personal Cluster
• Elasticluster– Python code to provision VMs– Ansible to configure them– Ansible playbooks for Torque/SGE/…, NFS/pvfs/…
• Heat– Everything in HOT template– Earlier versions had limitations that made it hard to implement everything
–May revisit in future
Private Cluster in the Cloud• Can use your personal or project cloud allocation to start up your own personal cluster in the cloud– No need to share! Except among your group.
• Can use the standard PBS/Torque queueingsystem to submit jobs (or not)– Only your jobs in the queue
• But you have to set up and manage the cluster– Straightforward if you have good Unix skills (unless things go wrong…)
• Several groups now using this – But eRSA doing support when things go wrong…
Emu Cluster in the Cloud
• Emu is an eRSA cluster that runs in the cloud• Aimed to be like an old cluster (Corvus)
– 8‐core compute nodes • But a bit different
– Dynamically created VMs in the cloud– Can have private compute nodes– Different size compute nodes if you want
Emu
• eRSA‐managed dynamic cluster in the cloud• Shared by multiple cloud tenants and eRSA users• All nodes in SA zone • eRSA cloud allocation contributes 128 cores• Users can bring in their own cloud allocation to launch their worker nodes in Emu
• Users don’t need to build and look after their own personal cluster
• It can also mount users’ Cinder volume storage to their own worker nodes via NFS
• Set up so researchers use their eRSA accounts
Using your own cloud allocation
• Users add our sysadmins to their tenant– So we can launch VMs on their behalf – Will look at Trusts in Icehouse
• Add some configs to Dynamic Torque– Number of static/dynamic nodes, size of nodes, etc
• Add a group of user accounts allowed to use it• Create a reservation for users’ worker nodes in MAUI• A special ‘account string’ needs to be put in the job
– To match users’ jobs to their group’s reserved nodes• A qsub filter to check if the ‘account string’ is valid
– You can’t submit a job using another group’s allocation
Emu
Worker nodes of Tenant1
Shared worker nodes (eRSA donated)
Worker nodes of Tenant2
Torque/MAUI and Dynamic
TorqueLDAP NFS Salt
Sensu CVMFS
NFS
Static cluster vs dynamic cluster
Static Cluster Dynamic Cluster
Hardware Physical Machines Virtual Machines
LRMS Torque Torque with Dynamic Torque
CMS Puppet Salt Stack
Monitoring Nagios, Ganglia Sensu, Graphite, Logstash
App Distribution NFS Mount CVMFS
Shared FS NFS NFS
Future Work
• Better reporting and usage graphs• More monitoring checks• Queueing system
– Multi‐node jobs don’t work because a new node is not trusted by existing nodes
– Trust list is only updated when PBS server is started– Could hack Torque source code– Or maybe use SLURM or SGE
• Better way to share user credentials– Trusts in Icehouse?
• National and/or other regional services
Future Work
• Spot instance queue• Distributed file system
– NFS is the static component• Cannot add storage to NFS without stopping it• Cannot add new nodes to the allow list dynamically• needs to use iptables; update iptables instead
– Investigation of a dynamic and distributed FS• One FS for all tenants
• Alternatives to StarCluster– Heat or Elasticluster
Resources
• Cloud Scheduler– http://cloudscheduler.org/
• Star cluster– http://star.mit.edu.au/cluster– OpenStack version
• https://github.com/shundezhang/StarCluster/
• Dynamic Torque– https://github.com/shundezhang/dynamictorque
Nagios vs Sensu
• Nagios– First designed in the last century for static environment
– Needs to update local configuration and restart service if a remote server is added or removed
– Server perform all checks and it is not scalable• Sensu
– Modern design with AMQP as communication layer– Local agent runs checks– Weak coupling between clients and server and it is scalable
Imagefactory
• Template in XML– Packages– Commands– Files
• Can backup an ‘image’ in github• Automatic, no user interaction required
EMU Monitoring
• Sensu– Run health checks
• Logstash– Collect PBS server/MOM/accounting logs
• Collectd– Collect metrics of CPU, memory, disk, network etc
Salt Stack vs Puppet
Salt Stack Puppet
Architecture Server‐Client Server‐Client
Working Modal Push Pull
Communication Zeromq + msgpack HTTP + text
Language Python Ruby
Remote execution Yes No