18
The CRI compute cluster CRUK Cambridge Research Institute

The CRI compute cluster CRUK Cambridge Research Institute

Embed Size (px)

Citation preview

Page 1: The CRI compute cluster CRUK Cambridge Research Institute

The CRI compute clusterCRUK Cambridge Research Institute

Page 2: The CRI compute cluster CRUK Cambridge Research Institute
Page 3: The CRI compute cluster CRUK Cambridge Research Institute

The CRUK Cambridge Research InstituteFounded to enable translational research Basic biology Early phase clinical trials Late phase translational studies

Leveraging the specialist experience and facilities provided by The University of Cambridge Addenbrooke’s Hospital

A CRUK facility, hosting CRUK core services (including Information Systems) CRUK Groups and Group Leaders Cambridge University Groups and Group Leaders

Page 4: The CRI compute cluster CRUK Cambridge Research Institute

Research objectives with significant Information Systems demandsGenomics - Clonal sequencing (Solexa) generating ~32TB per annum per sequencer using 8-16 CPU cores full timeHistopathology Scanners generating 16TB per annumMicroscopy Generating 8+TB per annum Processing time series sequencesIn vivo imaging MRI, PET-CTSystems Biology 20+ systems biology researchers working on expression data, network

models etc.

Page 5: The CRI compute cluster CRUK Cambridge Research Institute

Multiple groups, similar requirements

MRI imaging

ComputeHigh performance

StorageLong term

storage

Genomics

Bioinformatics

Tavare Group

Institute

Page 6: The CRI compute cluster CRUK Cambridge Research Institute

2007/2008 Architectural consolidation

MRI imaging

HP Blade Cluster

HP LustreSFS storage

Long termstorage

Genomics

Bioinformatics

MacOS X SAN

I/O storage

Page 7: The CRI compute cluster CRUK Cambridge Research Institute

“Virtual” group infrastructure using LSF

MRI imaging

HP Blade Cluster

HP LustreSFS storage

Long termstorage

Genomics

Bioinformatics

MacOS X SAN

I/O storage Institute

Tavare Storagepolicies

Genomics

MRI imaging

Bioinformatics

Institute

Tavare

Platform LSFjob scheduler

Page 8: The CRI compute cluster CRUK Cambridge Research Institute

2008/2009 Storage consolidation

HP Blade Cluster

HP LustreSFS storage

Long termstorage

EMC SAN

I/O storage Institute

Tavare

Storagepolicies

Genomics

MRIimaging

Bio-informatics

Genomics

MRI imaging

Bioinformatics

Institute

Tavare

Platform LSFjob scheduler

Page 9: The CRI compute cluster CRUK Cambridge Research Institute

The CRUK CRI cluster

Page 10: The CRI compute cluster CRUK Cambridge Research Institute

The CRUK CRI cluster

Blades BladesHead node

I/O node

SFSstorage

Solexa storage

Aperiostorage

Ariolstorage

I/O storage

Networking

Page 11: The CRI compute cluster CRUK Cambridge Research Institute

Blades BladesHead node

I/O node

SFSstorage

Solexa storage

Aperiostorage

Ariolstorage

I/O storage

Networking

Desktop clientInput files

Output files

LSF job submission

Linux home directoriesShared binaries for blades

/data for input – output to network/usr/local/bin for shared binaries

/lustre high performance storage

Page 12: The CRI compute cluster CRUK Cambridge Research Institute

Seeing the cluster from the desktop

The I/O storage and linux homes are visible from the CRI network:

Page 13: The CRI compute cluster CRUK Cambridge Research Institute

Filesystems/home 100GB Linux home directories Visible from all the cluster nodes Use for local code, scripts etc backed up/data 2.7TB Use for delivering data to and from the cluster Lower performance to the blades – not used for processing Not backed up, files over 2 weeks old may be deleted without warning/lustre 16TB High performance, use for processing Not backed up, files over 1 month old may be deleted without warning

Page 14: The CRI compute cluster CRUK Cambridge Research Institute

Platform LSF - Queue structureOwnership of Blades: Core facilities

- Genomics Genomics (6x8 cores)

- Imaging Imaging (3x8 cores)

- Bioinformatics bioinformatics

- Information Systems information_systems

Groups- Tavare Lab

stlab (18x8 cores) high_memory (2x8 cores)

- Other Groups cluster (4x8 cores)

…But ownerhsip doesn’t necessarily match daily usage patterns.

Page 15: The CRI compute cluster CRUK Cambridge Research Institute

Balanced Scheduling – Fairshare Policy

Group Share

Simon Tavaré Group 20

Genomics 5

Bioinformatics 6

Imaging 3

General 4

DynamicPriority

=number_of_shares

( cpu_time * CPU_TIME_FACTOR + run_time * RUN_TIME_FACTOR + (1 + job_slots) * RUN_JOB_FACTOR )

Page 16: The CRI compute cluster CRUK Cambridge Research Institute

Information Systems Processes

User accounts Managed via central Service Desk Linux accounts bound to AD (Windows/Mac) accountsTroubleshooting Linux support in London and Cambridge Accessed via Service DeskSoftware installation All Blades share binaries Users can put local code in home directory IS department will install common code in /usr/local/bin

Page 17: The CRI compute cluster CRUK Cambridge Research Institute

Summary: The CRUK Cambridge Research Institute is delivering a shared computational science infrastructure

Principle• “Virtualisation” to make scalable, easy to administer systems• Common architecture to deliver cost and service benefits

Practice• Blade architecture suitable for most computing needs• Networking and storage need careful design

Benefits• Optimal use of resources• Low wastage• Excess capacity “buffers” new experimental and development techniques•…to date, provision of compute power hasn’t limited science at the CRI

Page 18: The CRI compute cluster CRUK Cambridge Research Institute