View
36
Download
2
Category
Preview:
Citation preview
Sanger and our upcoming flexible compute platform
Peter Clapham - Jan 2017
Why a private cloud ?
Collaboration is hard enough already
HPC is a weak security modelCat 4 data is a large elephant
We’re reaching POSIX scalability
Increasing demand for more flexibility regarding operating systems and supplied libraries
Running services at scale should be able to burst to meet demand and collapse when no longer required
We should be able to more readily take advantage of developing technology
Linking up with common standards across the broader community.
Openstack at Sanger.
July 2015 - Development Juno system.September 2015 - Limited access POC Kilo system ( using Triple-O ).January 2016 - Hybrid cloud for commercial entities. June 2016 - Wider access POC Kilo system ( Triple-O ).Sep -> Dec 2016 - First production Liberty system ( Triple-O )
Production openstack (I)
• 107 Compute nodes (Supermicro) each with:• 512GB of RAM, 2 * 25GB/s network interfaces,• 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• 6 Control nodes (Supermicro) allow 2 openstack instances.• 256 GB RAM, 2 * 100 GB/s network interfaces,• 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var )• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading.• Redhat Liberty deployed with Triple-O
Production openstack (II)• 9 Storage nodes (Supermicro) each with:
• 512GB of RAM,• 2 * 100GB/s network interfaces,• 60 * 6TB SAS discs, 2 system SSD.• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )• 4TB of Intel P3600 NVMe used for journal.
• Ubuntu Xenial.• 3 PB of disc space, 1PB usable.• Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read )• Ceph benchmarks imply 7 GBytes/second
Production openstack (III)
• 3 Racks of equipment, 24 KW load per rack.• 10 Arista 7060CX-32S switches .
• 1U, 32 * 100Gb/s -> 128 * 25Gb/s .• Hardware VxLan support integrated with openstack *.• Layer two traffic limited to rack, VxLan used inter-rack.• Layer three between racks and interconnect to legacy systems.• All network switch software can be upgraded without disruption.• True linux systems.• 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems.
(* VxLan in ml2 plugin not used in first iteration)
But what are we providing?
CloudFormsService driven access
OpenStack HorizonGranular control over
instance
Direct API Access
Direct https access from anywhere
Accessible only from within Sanger
Ceph Object Storage(Used to provide volume
and image storage)
S3 Object StorageLayer
How Does this Fit with Existing Services?
OpenStack “Bubble”
ComputeCeph
100GB/s SDN network infrastructure
Sanger internal systems
Access to secured services
i.e.iRODSDatabasesCIFS (Windows shares)
S3 API accessOpenStack API and GUI access
80GB/s connectivity
No access to:
NFSLustre
CloudForms Interface
Horizon Interface
Efficient Resource Management
OpenStack resources are managed at a tenant group level.• Each “tenant” group has an assigned quota for:
• Disk• CPU• Memory
Once limits are full, tenant members will either have to wait for resources to become available or shutdown or terminate a running instance.Initial quotas are agreed with the IC before creation
Quotas, they are not all the same.
Some groups have a requirement that they have an absolute number of spots available for essential servicesOther groups would like to burst to meet demand as required.
These requirements do not fit well with each other.
The Proposed Workaround
For those projects which require guaranteed access:• We create a dedicated tenant group that has specific access to a set quota
allocation of vCPU, Disk and memory.• This is tied directly against reserved hardware
This guarantees requested resource will be available when required, whilst providing security, operating system flexibility and instance management.
BUT there is no ability to use more than the requested allocation
Dynamic Workflows
Dynamic workflows can expand to meet demand and collapse when not required.So a quota that matches the initial resource request will mean constantly under quota’ing the systemFor the initial release we will start by:
• Overcommitting CPU by 1.5 : 1 (available total vCPU ~9000)• Overallocate quotas so that 115% of the overcommitted vCPU is available to
tenants.So some initial ability to use more of the system than may be available.
For More Details, see
https://docs.google.com/a/sanger.ac.uk/document/d/17z9urhh3bTLRhQo9b8CcsZW_3O7cxlGY9uiwpAS_GqQ/edit?usp=sharingOrhttp://tinyurl.com/zzurp5s
We are adding monitoring and metrics gathering to the system. This will provide a feedback loop for quota and project management.
New Opportunities for Application Development
Cloud application development aims to scale out compute and provide:
• Auto scaling of key services• Making pipelines cost effective on commercial platform providers• Self-healing of service components that fail• Creating resilient services with reduced impact when service components fail• Not tied to any one specific environment• Enabling sharing of code, images and services with collaborators. This can
dramatically reduce the need to copy large data sets around the world and permit running complex pipelines where the data resides.
How do we see Migration ?
Initial Early Adopters.
We have some early adopters !
1. Mutational signatures2. Imputation service3. Blast service4. Pan-prostate
We look forward to hearing more from these groups soon !
Mostly Share a Common Approach
Web Interface
Data upload
Run analysis
Update job status data
basePresent data
Invoke Analysis
Retain a copy
Adaption to Cloud based toolsStage Current approach Cloud approach
User details local databases, directory services, Oauth Oauth, directory services
Data Downloads Globus or https S3, Globus
Job status RDBM: MySQL, Oracle or PostgreSQL NoSQL: Mongodb, Cassandra or REDIS
Invoke Job Analysis
Hand crafted equest to LSF AMQP
Run Analysis LSF job submission AMQP, Heat orchestration or API call to Openstack
Present data Make available via sftp, Globus or https web upload
S3 automatically generated URL's
Keep data No consistent approach S3, archive as required
Service failure Await systems Use IFTTT or add code to instance to raise or restart an instance as required
Autoscale options Await systems Use IFTTT or add code to instance to raise or restart an instance as required
Service discovery Manual Cloud init, heat templates, dynamic DNS
New service, New Image ?
Cloud software stacks are based around services (micro-services) and are an exemplar of service-orientated architectures.Instances are mostly started from pre-created images and these form the building blocks for a given service.
Starting with:• Ubuntu with Docker support• Rstudio• An NFS server• OpenLava cluster
But what if you need something different ? You could ask or you could use the tools provided to create your own. Think /software+++
Developing machine images.
• Start simple and add complexity later.• We understand that Biologists are not often software engineers.• We believe that the process of images creation should be codified
and software development best practices followed.• Openstack images are based on images from a vendor.• It is possible to import other virtualisation system images to
Openstack ( these images could be made with automated tools ).• Virtualisation allows the possibility of software reproducibility.
Software development
• Source control ( git ), gitflow• Infrastructure as code ( Packer )• Continuous Integration ( gitlab CI )• Test driven development ( test kitchen )
Git branches (gitflow)
• Gitflow http://nvie.com/posts/a-successful-git-branching-model/• We follow the principle but do not use the software
• The master branch is always useable.• New features are integrated on the development branch.• Develop on a feature branch created from the development branch.• When a feature is complete pull feature to development branch.• When a set of features is ready pull development to master and tag release.• Develop bug fixes on a branch off development and cherry pick to bug
release branch created from tag of release.
Semantic versioning
MAJOR.MINOR.PATCHhttp://semver.org/spec/v2.0.0.html
• MAJOR version when you make incompatible API changes,• MINOR version when you add functionality in a backwards-
compatible manner, and• PATCH version when you make backwards-compatible bug fixes.
We treat changes in environment variables as a change to the “api”.
Packer
• https://packer.io/• Machine image configuration as code.• In use by systems at Sanger since 2014 ( used to build lustre clients )• Supports multiple virtualization platforms.• Supports both linux and Windows.• Simple example that can be used without CI:
• https://github.com/wtsi-ssg/image-creation
Packer, Provisioners
• Provisioners change the state of the machine.• Provisioners are bits of code written in various languages.• Multiple provisioners are allowed in an order.• Can be restricted to specific builds.
• Shell - simple shell scripts.• File uploads.• Ansible• Chef, Puppet, Salt• Powershell, Windows-Shell
Packer, Builders
• Builders are responsible for creating machines and generating images from them for various platforms.• Amazon, Takes an Image and applies changes.• Openstack, Takes an Image and applies changes.• Vmware , Uses an ISO and installs, then applies changes.• Docker, Takes a container and applies changes.• VirtualBox, Uses an ISO and installs, then applies changes.• Others….
Gitlab CI
• Allows processes to be run in response to a push to a repository.• Configured by a yaml file ( .gitlab-ci.yml ) • A build consists of multiple stages.• Each stage is run sequentially.• Parallel execution of tasks in each stage
• State needs to be stored in separate files/directories ( $CI_BUILD_ID )• Tags control which processes execute the stage.• https://about.gitlab.com/gitlab-ci/
Test Kitchen
• http://kitchen.ci/• Creates new instances to run tests on.• Drivers for various systems eg.
• Amazon• Openstack• Docker• Windows
• Configured with a single file ( .kitchen.yml ) which is a erb template.
Test Kitchen
• Each group should have an openstack tenant for CI.• Credentials are stored in gitlabs variables section.• Tenant needs to have a ssh security group.• Tenant needs a single network.• Configuration is shared in environment variables.• Supports multiple test frameworks:
• ServerSpec• Bats
Testing orchestration
Test kitchen can have multiple servers running at one time, each test runs from a separate directory, this allows us to test client server systems:• In a server directory start a machine and run server tests.• Extract the internal ip address from the master.• In a client directory start a machine, inject master location.• Run client tests.• Stop client, stop master.
ServerSpec
• RSpec is a behaviour-driven development framework for unit tests.• ServerSpec allows rspec tests to check server status.• E.g
require 'serverspec'
# Required by serverspecset :backend, :exec
describe "file system checks" do describe file('/data1') do
it { should be_mounted } endend
A CI workflow
image creation
• Our base image.• Used to make changes that will affect all the images.• https://github.com/wtsi-ssg/image-creation-ci• Multiple tags, each tag is a release eg.
• v5.0.0 migration from openstack beta to openstack gamma• v6.0.0 adding ansible as a system for configuration• v7.0.0 adding support for xenial and centos 7.2 as well as trusty.
ISG repository.
• https://github.com/wtsi-ssg/simple-image-builder• Continuous Integration and tests infrastructure framework already
available, additional tests will need writing.• Chain of software reproducibility relies on
• Trust that vendor built an image consistently.• Note that operating system packages will be pulled in a time of creation.• Critical components need to be pulled in from a fixed source.• Test should be written to validate system.
Batch scheduling is a bit old...
Openlava image
• A single image which is used for both master/head node and compute nodes.
• Includes NFS server for home directory.• Currently based on trusty ( Ubuntu 14.04 ).
• Development branch for Xenial ( Ubuntu 16.04 ) .• Development branch for Centos 7.2 .
• ServerSpec tests using multiple servers.
New tools and images are already being created internally
WR from Sendu:https://github.com/VertebrateResequencing/wr
NPG are producing an AMQP service image https://gitlab.internal.sanger.ac.ukWhen completed
And there’s more !
We are listening to your requests for features and supporting infrastructure:https://docs.google.com/spreadsheets/d/1_oeBz27beLLj_4xe3yoyZYjpaYhDFTE__F_L6pVNcTE/editORhttp://tinyurl.com/z5bh5q5
But also online tutorials, videos and documentation
Some, hopefully, useful examples have been collated here:
https://ssg-confluence.internal.sanger.ac.uk/display/OPENSTACK/Distributed+applications%3A+links+and+resourcesOrhttp://tinyurl.com/gwbrtfl
And still more
10th March OpenStack event here at Sanger
Tim Bell from Cern.• Head of the CERN OpenStack team• 200,000+ vCPU’s• Many Many PB of Ceph
Final schedule TBA
Almost done
Release date: On time for March 1st.
Watch out for the upcoming flyers
Acknowledgements
Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon.Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz.All our early testers and those who have provided constructive feedback !
P.S.
11 more days to migrate from lustre 108, 109,110 and 111 before the system is made read only.
And only 1 month (1st March) until the old lustre systems are securely wiped and ready for removal from campus
Remember, lustre is not backed up
Recommended