Galaxy Australia distributed nationally. › wp-content › uploads › 2019 › 05 › 2019-C3...150 cores - 60TB disk Pulsar-Mel ~90 cores 2018-now Pulsar-NCI ~60 cores. Galaxy Australia

Galaxy AustraliaA public bioinformatics compute resource

distributed nationally.Simon Gladman (presented by Andrew Lonie)

Supported by

Development partners

Outline

● A bit about Galaxy Australia

● Nationally distributed compute & Pulsar

● Dynamic job handling & DTD

● Reference data management & CVMFS

● Monitoring and stats collection

● Next?

What is Galaxy?

What is Galaxy?● Web based bioinformatics

platform● Retains “histories” of analyses● Has large number of cutting

edge tools● Conda environments/Singularity

containers for dependency management

● Has an app store● Has a large support community

Galaxy Australia Infrastructure Locations

Galaxy-QLD96 cores - 60TB disk

Galaxy-Mel64 cores - 50TB disk

2015-2017

Galaxy Australia Infrastructure Locations

Galaxy AustraliaMain Queue150 cores - 60TB disk

Pulsar-Mel~90 cores

2018-now

Pulsar-NCI~60 cores

Galaxy Australia - Usage

• Public and free Galaxy server in Australia

• Currently running > 50,000 jobs month.

Canberra Pulsar

Head Node m2.xlarge

Worker Nodem1.xxlarge

NFS & CVMFS

RefsWorker Node

m1.xxlarge

...


Melbourne Pulsar

Head Node - m2.xlarge

2 x web handlers (x6 threads)2 job mules - 1 farm

Some cloudman changes

16 vcpu | 64gb ram | 500gb /mnt

DB Server - m1.large

PostgreSQL100GB Volume

4 vpcu | 16gb ram

CEPH filesystem

Worker Node - m1.xxlarge

Slurm queue: main

Head Node m2.xlarge

Worker Nodem1.xlarge

NFS & CVMFS

Refs

Workers are minimal GVL image, version controlled, updated, with telegraf.Can be swapped into cluster by draining and re-deploying.

Training Jobs

Kubernetes

Integrated Environments

Galaxy-Aus Architecture Brisbane



...



...


Large/Long Jobs

Slurm Queue

Pulsar QueueDB/Disk Activity

NFS Server

Job Working

TMP

Canberra Pulsar

Head Node m2.xlarge


NFS & CVMFS

RefsWorker Node

m1.xxlarge

...


Melbourne Pulsar

Head Node - m2.xlarge

2 x web handlers4 job mules - 1 farm

DB Server - m1.large

PostgreSQL


Slurm queue: main

Head Node m2.xlarge


NFS & CVMFS

Refs

Workers are minimal GVL image, version controlled, updated, with telegraf.Can be swapped into cluster by draining and re-deploying.

Training Jobs

Galaxy-Aus Architecture Brisbane



...



...


Large/Long Jobs

Slurm Queue

Pulsar QueueDB/Disk Activity

GPFSStorage

Services serverSlurm controller

NFS server

What is Pulsar?

● Python server application● Allows a Galaxy server to run jobs on a remote system● No shared file system required● Configurable● Securable● Can submit jobs to HPC queueing system● Automatically handles dependency management

How Pulsar works

1. User clicks “Execute”2. Galaxy packs up and sends:

○ Data○ Config files○ Tool name & version○ Parameters and other job metadata

3. Pulsar accepts the job4. Pulsar checks if tool is installed locally

○ If not - Installs tool with Conda or Docker5. Pulsar submits job to local queue6. Pulsar waits until job complete7. Pulsar packs up result and sends it back to Galaxy

Pulsar-Mel

What we use it for

Pulsar-Mel

● Training jobs / Workshops

● Runs lots of small jobs● Usually 2-4 cores● Allows a tutorial or

workshop to run without interfering with main queue

Pulsar-NCI

● Large long running jobs● Jobs that could run for

days● Usually 16+ core jobs● Lots of Disk-IO● Genome assemblies /

Transcriptome assemblies

Dynamic Tool Destinations

● Tool vs Destination rules● YAML● Can have rules for

○ File size○ User○ Tool parameters

● More complex rules in python functions

● Can alter without Galaxy restart

The problem of reference data:There’s a lot and it’s hard to deal with

1000’s of genomes availableMultiple sources

> 200 available on Galaxy Australia and growing

Multiple tool indices for each

User requests

Large compute burden to create Index Admin burden to manage

So we started talking with other large Galaxies!

Share the burden

● Admin of references● Compute requirements

We decided to share it all via CernVM-fs

● Currently controlled by the Galaxy Project● Moving to a community model soon

CVMFS global structureStratum 0 serverdata.galaxyproject.orgPenn State

Stratum 1 serverMelbourne

Stratum 1 serverGermany

Galaxy EU

Galaxy AU

Stratum 1 serverElsewhere

Other compute

server

GVL Instance

Galaxy Main

Stratum 0: The canonical sourceTransactional updates

Stratum 1: Multiple serversMirrors Stratum 0 serverContinuous updates

User servers: Many multiple serversMounts repo from stratum 1Based on GEO-APIWith fallback to other stratum 1s

Primary mount Fallback mount

stats.genome.edu.au

StructureGalaxy Australia

Head node

Galaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker node

Pulsar nodePulsar nodePulsar node

Other serverOther server

Other server

stats.genome.edu.au

telegrafRuns on each machine as a daemon

Influx DBGalaxy-AusPulsar-MelPulsar-NCI

Grafana - App & Web server

All very nice tech, but is it used??

What’s next?

● More compute nodes around the country

● Cloudstor?

● BYO resources

● More internationalisation of resources

●

●

●

○
https://twitter.com/galaxyaustraliahttps://www.embl-abr.org.au/galaxyaustralia/https://usegalaxy.org.au/

Documents

Galaxy Australia distributed nationally. › wp-content › uploads › 2019 › 05 › 2019-C3...150 cores - 60TB disk Pulsar-Mel ~90 cores 2018-now Pulsar-NCI ~60 cores. Galaxy Australia