Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Galaxy AustraliaA public bioinformatics compute resource
distributed nationally.Simon Gladman (presented by Andrew Lonie)
Supported by
Development partners
Outline
● A bit about Galaxy Australia
● Nationally distributed compute & Pulsar
● Dynamic job handling & DTD
● Reference data management & CVMFS
● Monitoring and stats collection
● Next?
What is Galaxy?
What is Galaxy?● Web based bioinformatics
platform● Retains “histories” of analyses● Has large number of cutting
edge tools● Conda environments/Singularity
containers for dependency management
● Has an app store● Has a large support community
Galaxy Australia Infrastructure Locations
Galaxy-QLD96 cores - 60TB disk
Galaxy-Mel64 cores - 50TB disk
2015-2017
Galaxy Australia Infrastructure Locations
Galaxy AustraliaMain Queue150 cores - 60TB disk
Pulsar-Mel~90 cores
2018-now
Pulsar-NCI~60 cores
Galaxy Australia - Usage
• Public and free Galaxy server in Australia
• Currently running > 50,000 jobs month.
Canberra Pulsar
Head Node m2.xlarge
Worker Nodem1.xxlarge
NFS & CVMFS
RefsWorker Node
m1.xxlarge
...
Worker Nodem1.xxlarge
Melbourne Pulsar
Head Node - m2.xlarge
2 x web handlers (x6 threads)2 job mules - 1 farm
Some cloudman changes
16 vcpu | 64gb ram | 500gb /mnt
DB Server - m1.large
PostgreSQL100GB Volume
4 vpcu | 16gb ram
CEPH filesystem
Worker Node - m1.xxlarge
Slurm queue: main
Head Node m2.xlarge
Worker Nodem1.xlarge
NFS & CVMFS
Refs
Workers are minimal GVL image, version controlled, updated, with telegraf.Can be swapped into cluster by draining and re-deploying.
Training Jobs
Kubernetes
Integrated Environments
Galaxy-Aus Architecture Brisbane
Worker Node - m1.xxlarge
Worker Node - m1.xxlarge
...
Worker Node - m1.xxlarge
Worker Nodem1.xlarge
...
Worker Nodem1.xlarge
Large/Long Jobs
Slurm Queue
Pulsar QueueDB/Disk Activity
NFS Server
Job Working
TMP
Canberra Pulsar
Head Node m2.xlarge
Worker Nodem1.xxlarge
NFS & CVMFS
RefsWorker Node
m1.xxlarge
...
Worker Nodem1.xxlarge
Melbourne Pulsar
Head Node - m2.xlarge
2 x web handlers4 job mules - 1 farm
DB Server - m1.large
PostgreSQL
Worker Node - m1.xxlarge
Slurm queue: main
Head Node m2.xlarge
Worker Nodem1.xlarge
NFS & CVMFS
Refs
Workers are minimal GVL image, version controlled, updated, with telegraf.Can be swapped into cluster by draining and re-deploying.
Training Jobs
Galaxy-Aus Architecture Brisbane
Worker Node - m1.xxlarge
Worker Node - m1.xxlarge
...
Worker Node - m1.xxlarge
Worker Nodem1.xlarge
...
Worker Nodem1.xlarge
Large/Long Jobs
Slurm Queue
Pulsar QueueDB/Disk Activity
GPFSStorage
Services serverSlurm controller
NFS server
What is Pulsar?
● Python server application● Allows a Galaxy server to run jobs on a remote system● No shared file system required● Configurable● Securable● Can submit jobs to HPC queueing system● Automatically handles dependency management
How Pulsar works
1. User clicks “Execute”2. Galaxy packs up and sends:
○ Data○ Config files○ Tool name & version○ Parameters and other job metadata
3. Pulsar accepts the job4. Pulsar checks if tool is installed locally
○ If not - Installs tool with Conda or Docker5. Pulsar submits job to local queue6. Pulsar waits until job complete7. Pulsar packs up result and sends it back to Galaxy
Pulsar-Mel
What we use it for
Pulsar-Mel
● Training jobs / Workshops
● Runs lots of small jobs● Usually 2-4 cores● Allows a tutorial or
workshop to run without interfering with main queue
Pulsar-NCI
● Large long running jobs● Jobs that could run for
days● Usually 16+ core jobs● Lots of Disk-IO● Genome assemblies /
Transcriptome assemblies
Dynamic Tool Destinations
● Tool vs Destination rules● YAML● Can have rules for
○ File size○ User○ Tool parameters
● More complex rules in python functions
● Can alter without Galaxy restart
The problem of reference data:There’s a lot and it’s hard to deal with
1000’s of genomes availableMultiple sources
> 200 available on Galaxy Australia and growing
Multiple tool indices for each
User requests
Large compute burden to create Index Admin burden to manage
So we started talking with other large Galaxies!
Share the burden
● Admin of references● Compute requirements
We decided to share it all via CernVM-fs
● Currently controlled by the Galaxy Project● Moving to a community model soon
CVMFS global structureStratum 0 serverdata.galaxyproject.orgPenn State
Stratum 1 serverMelbourne
Stratum 1 serverGermany
Galaxy EU
Galaxy AU
Stratum 1 serverElsewhere
Other compute
server
GVL Instance
Galaxy Main
Stratum 0: The canonical sourceTransactional updates
Stratum 1: Multiple serversMirrors Stratum 0 serverContinuous updates
User servers: Many multiple serversMounts repo from stratum 1Based on GEO-APIWith fallback to other stratum 1s
Primary mount Fallback mount
stats.genome.edu.au
StructureGalaxy Australia
Head node
Galaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker node
Pulsar nodePulsar nodePulsar node
Other serverOther server
Other server
stats.genome.edu.au
telegrafRuns on each machine as a daemon
Influx DBGalaxy-AusPulsar-MelPulsar-NCI
Grafana - App & Web server
All very nice tech, but is it used??
What’s next?
● More compute nodes around the country
● Cloudstor?
● BYO resources
● More internationalisation of resources
●
●
●
○
https://twitter.com/galaxyaustraliahttps://www.embl-abr.org.au/galaxyaustralia/https://usegalaxy.org.au/