Sanger, upcoming Openstack for Bio-informaticians

Sanger and our upcoming flexible compute platform

Peter Clapham - Jan 2017

Why a private cloud ?

Collaboration is hard enough already

HPC is a weak security modelCat 4 data is a large elephant

We’re reaching POSIX scalability

Increasing demand for more flexibility regarding operating systems and supplied libraries

Running services at scale should be able to burst to meet demand and collapse when no longer required

We should be able to more readily take advantage of developing technology

Linking up with common standards across the broader community.

Openstack at Sanger.

July 2015 - Development Juno system.September 2015 - Limited access POC Kilo system ( using Triple-O ).January 2016 - Hybrid cloud for commercial entities. June 2016 - Wider access POC Kilo system ( Triple-O ).Sep -> Dec 2016 - First production Liberty system ( Triple-O )

Production openstack (I)

• 107 Compute nodes (Supermicro) each with:• 512GB of RAM, 2 * 25GB/s network interfaces,• 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )

• 6 Control nodes (Supermicro) allow 2 openstack instances.• 256 GB RAM, 2 * 100 GB/s network interfaces,• 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var )• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )

• Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading.• Redhat Liberty deployed with Triple-O

Production openstack (II)• 9 Storage nodes (Supermicro) each with:

• 512GB of RAM,• 2 * 100GB/s network interfaces,• 60 * 6TB SAS discs, 2 system SSD.• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )• 4TB of Intel P3600 NVMe used for journal.

• Ubuntu Xenial.• 3 PB of disc space, 1PB usable.• Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read )• Ceph benchmarks imply 7 GBytes/second

Production openstack (III)

• 3 Racks of equipment, 24 KW load per rack.• 10 Arista 7060CX-32S switches .

• 1U, 32 * 100Gb/s -> 128 * 25Gb/s .• Hardware VxLan support integrated with openstack *.• Layer two traffic limited to rack, VxLan used inter-rack.• Layer three between racks and interconnect to legacy systems.• All network switch software can be upgraded without disruption.• True linux systems.• 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems.

(* VxLan in ml2 plugin not used in first iteration)

But what are we providing?

CloudFormsService driven access

OpenStack HorizonGranular control over

instance

Direct API Access

Direct https access from anywhere

Accessible only from within Sanger

Ceph Object Storage(Used to provide volume

and image storage)

S3 Object StorageLayer

How Does this Fit with Existing Services?

OpenStack “Bubble”

ComputeCeph

100GB/s SDN network infrastructure

Sanger internal systems

Access to secured services

i.e.iRODSDatabasesCIFS (Windows shares)

S3 API accessOpenStack API and GUI access

80GB/s connectivity

No access to:

NFSLustre

CloudForms Interface

Horizon Interface

Efficient Resource Management

OpenStack resources are managed at a tenant group level.• Each “tenant” group has an assigned quota for:

• Disk• CPU• Memory

Once limits are full, tenant members will either have to wait for resources to become available or shutdown or terminate a running instance.Initial quotas are agreed with the IC before creation

Quotas, they are not all the same.

Some groups have a requirement that they have an absolute number of spots available for essential servicesOther groups would like to burst to meet demand as required.

These requirements do not fit well with each other.

The Proposed Workaround

For those projects which require guaranteed access:• We create a dedicated tenant group that has specific access to a set quota

allocation of vCPU, Disk and memory.• This is tied directly against reserved hardware

This guarantees requested resource will be available when required, whilst providing security, operating system flexibility and instance management.

BUT there is no ability to use more than the requested allocation

Dynamic Workflows

Dynamic workflows can expand to meet demand and collapse when not required.So a quota that matches the initial resource request will mean constantly under quota’ing the systemFor the initial release we will start by:

• Overcommitting CPU by 1.5 : 1 (available total vCPU ~9000)• Overallocate quotas so that 115% of the overcommitted vCPU is available to

tenants.So some initial ability to use more of the system than may be available.

For More Details, see

https://docs.google.com/a/sanger.ac.uk/document/d/17z9urhh3bTLRhQo9b8CcsZW_3O7cxlGY9uiwpAS_GqQ/edit?usp=sharingOrhttp://tinyurl.com/zzurp5s

We are adding monitoring and metrics gathering to the system. This will provide a feedback loop for quota and project management.

New Opportunities for Application Development

Cloud application development aims to scale out compute and provide:

• Auto scaling of key services• Making pipelines cost effective on commercial platform providers• Self-healing of service components that fail• Creating resilient services with reduced impact when service components fail• Not tied to any one specific environment• Enabling sharing of code, images and services with collaborators. This can

dramatically reduce the need to copy large data sets around the world and permit running complex pipelines where the data resides.

How do we see Migration ?

Initial Early Adopters.

We have some early adopters !

1. Mutational signatures2. Imputation service3. Blast service4. Pan-prostate

We look forward to hearing more from these groups soon !

Mostly Share a Common Approach

Web Interface

Data upload

Run analysis

Update job status data

basePresent data

Invoke Analysis

Retain a copy

Adaption to Cloud based toolsStage Current approach Cloud approach

User details local databases, directory services, Oauth Oauth, directory services

Data Downloads Globus or https S3, Globus

Job status RDBM: MySQL, Oracle or PostgreSQL NoSQL: Mongodb, Cassandra or REDIS

Invoke Job Analysis

Hand crafted equest to LSF AMQP

Run Analysis LSF job submission AMQP, Heat orchestration or API call to Openstack

Present data Make available via sftp, Globus or https web upload

S3 automatically generated URL's

Keep data No consistent approach S3, archive as required

Service failure Await systems Use IFTTT or add code to instance to raise or restart an instance as required

Autoscale options Await systems Use IFTTT or add code to instance to raise or restart an instance as required

Service discovery Manual Cloud init, heat templates, dynamic DNS

New service, New Image ?

Cloud software stacks are based around services (micro-services) and are an exemplar of service-orientated architectures.Instances are mostly started from pre-created images and these form the building blocks for a given service.

Starting with:• Ubuntu with Docker support• Rstudio• An NFS server• OpenLava cluster

But what if you need something different ? You could ask or you could use the tools provided to create your own. Think /software+++

Developing machine images.

• Start simple and add complexity later.• We understand that Biologists are not often software engineers.• We believe that the process of images creation should be codified

and software development best practices followed.• Openstack images are based on images from a vendor.• It is possible to import other virtualisation system images to

Openstack ( these images could be made with automated tools ).• Virtualisation allows the possibility of software reproducibility.

Software development

• Source control ( git ), gitflow• Infrastructure as code ( Packer )• Continuous Integration ( gitlab CI )• Test driven development ( test kitchen )

Git branches (gitflow)

• Gitflow http://nvie.com/posts/a-successful-git-branching-model/• We follow the principle but do not use the software

• The master branch is always useable.• New features are integrated on the development branch.• Develop on a feature branch created from the development branch.• When a feature is complete pull feature to development branch.• When a set of features is ready pull development to master and tag release.• Develop bug fixes on a branch off development and cherry pick to bug

release branch created from tag of release.

Semantic versioning

MAJOR.MINOR.PATCHhttp://semver.org/spec/v2.0.0.html

• MAJOR version when you make incompatible API changes,• MINOR version when you add functionality in a backwards-

compatible manner, and• PATCH version when you make backwards-compatible bug fixes.

We treat changes in environment variables as a change to the “api”.

Packer

• https://packer.io/• Machine image configuration as code.• In use by systems at Sanger since 2014 ( used to build lustre clients )• Supports multiple virtualization platforms.• Supports both linux and Windows.• Simple example that can be used without CI:

• https://github.com/wtsi-ssg/image-creation

Packer, Provisioners

• Provisioners change the state of the machine.• Provisioners are bits of code written in various languages.• Multiple provisioners are allowed in an order.• Can be restricted to specific builds.

• Shell - simple shell scripts.• File uploads.• Ansible• Chef, Puppet, Salt• Powershell, Windows-Shell

Packer, Builders

• Builders are responsible for creating machines and generating images from them for various platforms.• Amazon, Takes an Image and applies changes.• Openstack, Takes an Image and applies changes.• Vmware , Uses an ISO and installs, then applies changes.• Docker, Takes a container and applies changes.• VirtualBox, Uses an ISO and installs, then applies changes.• Others….

Gitlab CI

• Allows processes to be run in response to a push to a repository.• Configured by a yaml file ( .gitlab-ci.yml ) • A build consists of multiple stages.• Each stage is run sequentially.• Parallel execution of tasks in each stage

• State needs to be stored in separate files/directories ( $CI_BUILD_ID )• Tags control which processes execute the stage.• https://about.gitlab.com/gitlab-ci/

Test Kitchen

• http://kitchen.ci/• Creates new instances to run tests on.• Drivers for various systems eg.

• Amazon• Openstack• Docker• Windows

• Configured with a single file ( .kitchen.yml ) which is a erb template.

Test Kitchen

• Each group should have an openstack tenant for CI.• Credentials are stored in gitlabs variables section.• Tenant needs to have a ssh security group.• Tenant needs a single network.• Configuration is shared in environment variables.• Supports multiple test frameworks:

• ServerSpec• Bats

Testing orchestration

Test kitchen can have multiple servers running at one time, each test runs from a separate directory, this allows us to test client server systems:• In a server directory start a machine and run server tests.• Extract the internal ip address from the master.• In a client directory start a machine, inject master location.• Run client tests.• Stop client, stop master.

ServerSpec

• RSpec is a behaviour-driven development framework for unit tests.• ServerSpec allows rspec tests to check server status.• E.g

require 'serverspec'

# Required by serverspecset :backend, :exec

describe "file system checks" do describe file('/data1') do

it { should be_mounted } endend

A CI workflow

image creation

• Our base image.• Used to make changes that will affect all the images.• https://github.com/wtsi-ssg/image-creation-ci• Multiple tags, each tag is a release eg.

• v5.0.0 migration from openstack beta to openstack gamma• v6.0.0 adding ansible as a system for configuration• v7.0.0 adding support for xenial and centos 7.2 as well as trusty.

ISG repository.

• https://github.com/wtsi-ssg/simple-image-builder• Continuous Integration and tests infrastructure framework already

available, additional tests will need writing.• Chain of software reproducibility relies on

• Trust that vendor built an image consistently.• Note that operating system packages will be pulled in a time of creation.• Critical components need to be pulled in from a fixed source.• Test should be written to validate system.

Batch scheduling is a bit old...

Openlava image

• A single image which is used for both master/head node and compute nodes.

• Includes NFS server for home directory.• Currently based on trusty ( Ubuntu 14.04 ).

• Development branch for Xenial ( Ubuntu 16.04 ) .• Development branch for Centos 7.2 .

• ServerSpec tests using multiple servers.

A canned demonstration

New tools and images are already being created internally

WR from Sendu:https://github.com/VertebrateResequencing/wr

NPG are producing an AMQP service image https://gitlab.internal.sanger.ac.ukWhen completed

And there’s more !

We are listening to your requests for features and supporting infrastructure:https://docs.google.com/spreadsheets/d/1_oeBz27beLLj_4xe3yoyZYjpaYhDFTE__F_L6pVNcTE/editORhttp://tinyurl.com/z5bh5q5

But also online tutorials, videos and documentation

Some, hopefully, useful examples have been collated here:

https://ssg-confluence.internal.sanger.ac.uk/display/OPENSTACK/Distributed+applications%3A+links+and+resourcesOrhttp://tinyurl.com/gwbrtfl

And still more

10th March OpenStack event here at Sanger

Tim Bell from Cern.• Head of the CERN OpenStack team• 200,000+ vCPU’s• Many Many PB of Ceph

Final schedule TBA

Almost done

Release date: On time for March 1st.

Watch out for the upcoming flyers

Acknowledgements

Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon.Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz.All our early testers and those who have provided constructive feedback !

11 more days to migrate from lustre 108, 109,110 and 111 before the system is made read only.

And only 1 month (1st March) until the old lustre systems are securely wiped and ready for removal from campus

Remember, lustre is not backed up

Sanger, upcoming Openstack for Bio-informaticians

Science

SANGER WHO? - Science

Sanger Facilities Master Plan

Matt Sanger Address 2014

Sanger Circle Estates - LoopNet

Margaret Sanger

City of sanger

Community Informatics for Community Informaticians (keynote at CIRN 2010, Prato, Italy)

Sanger Sequencing - KSU

Sanger Unified School District Presented by Matt Navo Superintendent, Sanger USD

Ruth Sanger Oration 2009

Openstack meetup: NFV and Openstack

BICYCLE PLAN 2005 - Sanger, CA

Social Technologies for Informaticians and Researchers

Decoding the Double Helix: Frederick Sanger and Sanger

Case Study on Steve Sanger

Sequencing technologies - Technical University of Valencia · 2019-05-06 · Sequencing technologies: Sanger ... Sanger sequencing. Sanger sequencing Traditional DNA sequencing method

VMware + OpenStack · VMware’s OpenStack Initiative Contribute to OpenStack •Integrate VMware compute, network, storage SW with OpenStack. •Make OpenStack better, helping customers

Original Sanger

Frederick sanger

OpenStack Deployment Manual - Bright Computingsupport.brightcomputing.com/.../7.3/openstack-deployment-manual.pdf3.1.8 OpenStack Nodes Selection ... •The OpenStack Deployment Manual