Details of our first attempts to deal with our next-generation sequencing machines. Talk given at International Supercomputing, 2008.
- 1. Cluster Filesystems and the next 1000 Human genomes Guy
Coates Wellcome Trust Sanger Institute
2. Introduction 3. About the Institute
- Funded by Wellcome Trust.
- 2 ndlargest research charity in the world. 4. ~700
employees.
- Large scale genomic research.
- Sequenced 1/3 of the human genome (largest single contributor).
5. We have active cancer, malaria, pathogen and genomic variation
studies.
- All data is made publicly available.
- Websites, ftp, direct database. access, programmatic APIs.
6. New technology Sequencing 7. Sequencing projects at the
Sanger The Human Genome Project
-
- Worldwide collaboration, 6 countries, 5 major centres, many
smaller labs. 8. 13 years.
1000 Genomes project.
-
- Study variation in human populations. 9. 1000 genomes over 3
years by 5 centres. 10. We have agreed to do 200 genomes.
And the rest.
-
- We have cancer, malaria, pathogen, worm, human variation WTCCC2
etc.
11. How is this achievable? Moore's Law of Sequencing.
-
- Cost of sequencing halves every 2 years. 12. Driven by multiple
factors.
Economies of Scale:
-
- Human Genome Project: 13 years, 23 labs,$500 Million. 13. Cost
today: $10 Million, several months in a single large genome
centre.
New sequencing technologies:
-
- Illumina/solexa machines. 14. $100,000 for a human genome. 15.
Single machine, 3 days.
16. New sequencing technologies Capillary sequencing.
-
- 96 sequencing reactions carried out per run. 17. 0.5-1 hour run
time.
Illumina sequencing.
-
- 52 Million reactions per run. 18. 3 day run time.
Machines are cheap (ish) and small.
19. Data centre
- 4x250 M 2Data centres.
- 2-4KW / M 2cooling. 20. 3.4MW power draw.
- Overhead aircon, power and networking.
- Allows counter-current cooling. 21. More efficient.
- Technology Refresh.
- 1 data centre is an empty shell.
- Rotate into the empty room every 4 years. 22. Refurb one of the
in-use rooms with the current state of the art.
- Fallow Field principle.
rack rack rack rack 23. Highly Disruptive Sequencing centre runs
24x7 Peak capacity of capillarysequencing:
Current Illumina sequencing:
-
- 262 Gbases / month in April. 24. 1 T Base /month predicted for
Sept.
Total sequence deposited in genbank for all time.
75x Increase in sequencing output. 25. Gigabase!=Gigabyte We
store ~8 bytes of data per base.
-
- Quality,error, experimental information. 26. ~10 Tbytes / month
permanent archival storage.
Raw data from the machines is much larger.
-
- June 2007 15 machine, 1 TB every 6 days. 27. Sept 2007, 30
machines, 1TB every 3 days. 28. Jan 2008, 30 machines, 2TB every 3
days.
Compute pipeline crunches 2 TB bytes of raw data into 30 Gbytes
of sequence data. We need to capture ~120 TB data perweek beforewe
can analyse the data to produce the final sequence. 29. IT for new
technology sequencing 30. Compute Infrastructure Problem I
-
- How do we capture the data coming off the sequencing
machines?
Problem 2
-
- How do we analyse the compute coming of the sequencing
machines?
Problem 3
-
- How do we do this from scratch in 8 weeks?
31. Problem 1: Build a Big file-system 3 x 100 TB file-systems
to dump data to.
-
- Multiple file-systems in order to protect against catastrophic
hardware failures.
Hold data for 2 weeks only.
-
- This should give us enough space to store ~2 weeks worth ofraw
data. 32. Once run has passed QC, raw data can be deleted.
Use Lustre (HP SFS, based on lustre 1.4).
-
- Sustained write rate 1.6 Gbit /s (not huge). 33. Reads will
have to be much faster so we can analyse data on our compute
cluster faster than we capture it. 34. We used it already. 35. Low
risk for us.
36. Problem 2: Build a compute cluster Compute was the easy
part.
-
- Analysis pipeline embarrassingly parallel workload. 37. Scales
well on commodity clusters.
- (After the bugs had been fixed).
8 Chassis of HP BL460c Blades.
-
- 128 nodes / 640 cores. 38. We use blade systems already. 39.
Excellent manageability. 40. Fit into our existing machine
management structure. 41. Once physically installed we can deploy
thecluster in a few hours.
42. Add lots of networking Black diamond 8810 chassis.
Trunked GigE links.
-
- 8x per blade chassis (16 machines) for the lustre network. 43.
8x links to the sequencing centre.
44. Data pull ... LSF reconfig allows processing capacity to be
interchanged. sequencer1sequencer 30 Offline 2 o analysis Realtime
1 o analysis suckers Final Repository 100TB / yr scratch area 25 TB
Lustre SFS20 staging area 320 TB Lustre EVA 45. Problem 3: How do
we do it quickly? Plan for more than you actually need.
-
- Make and estimation and add 1/3. 46. Still was not enough in
our case.
Go with technologies you know.
-
- Nothing works out of the box on this scale. 47. There will
inevitably be problems with kit you do know.
- Firmware / hardware skews, delivery .
- Other technologies might have been better on paper (eg lustre
1.6, Big NAS box?), but might not have worked.
Good automated systems management infrastructure.
-
- Machine and software configs all held in cfengine. 48. Easy to
add new hardware and make it the same as that.
49. Problems Lustre file-system is striped across a number of
OSS server
-
- A box with some disk attached.
Original plan was for 6 EVA arrays (50TB each) and 12 OSS
servers.
Limit in the SFS failover code means that 1 OSS can only serve 8
LUNs.
-
- We were looking at 13 LUNsper server (26 in case of
failover).
Required to increase the number of OSSs from 6 to 28.
-
- Plus increased SAN / networking infrastructure.
50. More Problems Golden Rule of Storage Systems
-
- All disks go to 96% full and stay there.
Increase in data production rates reduced the time we could
buffer data for.
-
- Keep data for 2 weeks rather than 3.
We need to add another 100TB / 6 OSS.
-
- Expansion is currently ongoing.
df -h FilesystemSizeUsed Avail Use% Mounted on XXX97T93T4T96%
/lustre/sf1 XXX116T111T5T96% /lustre/sf2 XXX97T93T4T96% /lustre/sf3
51. Even More Problems Out of memory...
-
- Changes in the analysis code + new sequencing machines means
that we were filling up our compute farm. 52. Code requirement
jumped from 1GB / core to 2GB core. 53. Under-commit machines with
jobs to prevent memory exhaustion.
- Reduced overall capacity.
- Retro-fit underway to increase memory.
Out of machines...
-
- Changes in downstream analysis means that we need twice as much
CPU than we had. 54. Installed 560 cores of IBM HS21 Blades
55. Can we do it better? Rapidly changing environment.
-
- Sequencing machine are already here. 56. Have to do something,
and quickly.
Agile software development.
-
- Sequencing software team use agile development to cope with
change in sequencing science and process. 57. Very fast
,incrementaldevelopment (weekly release). 58. Get usable code into
production very quickly, even if it is not feature complete.
Can we do agile systems?
-
- Software is empheral, Hardware is not.
- You cannotmagic 320TB of disk and 1000 cores out of thin
air...
- Or can you?
59. Possible Future directions Virtualisation.
-
- Should help us if we have excess capacity in our data-centre.
- We are not talking single machines with a bit of local disk.
60. Cluster file-systems, non-trivial networking.
- Require over-provision of network and storage infrastructure.
61. Is this the price of agility?
Grid / cloud / elastic computing.
-
- Can we use someone else's capacity instead? 62. Can we find a
sucker / valued partner to take a wedge of our data? 63. Can we get
data in and out of the grid quickly enough? 64. Do the machine
inside the clould have fast data paths between them and the
storage? 65. Supercomputer, not web services.
We are starting work to look at all of these. 66. There is no
end in sight! We already have exponential growth in storage and
compute.
-
- Storage doubles every 12 months. 67. We cross the 2PB barrier
last week.
Sequencing technologies are constantly evolving. Known
unknowns:
-
- Higher data output from our current machines. 68. More
machines.
Unknown unknowns:
-
- New big science projects are just a good idea away... 69. Gen 3
sequencing technologies
70. Acknowledgements Sanger
-
- Systems network, SAN and storage teams. 71. Sequencing pipeline
development team.
- Code performance, testing.
Cambridge Online
HP Galway
-
- Eamonn O'Toole, Gavin Brebner.