Upload
buiduong
View
217
Download
1
Embed Size (px)
Citation preview
University of Durham
Institute for Computational Cosmology
Carlos S. Frenk Institute for Computational Cosmology,
Durham
DiRAC-2: a petascale facility at Durham
UKMHD, Virgo, Exeter +?
University of Durham
Institute for Computational Cosmology
DiRAC-2: Petascale facility
1. Science case • UKMHD – Sam Falle
• Exeter – Mathew Bate
• Virgo – CSF
2. Requirements
3. Hardware options
4. Concluding remarks
Understanding the origin of stellar properties
• Initial mass function
• Observed to be relatively independent of initial conditions, at least in our Galaxy
• Star formation rate and efficiency
• Observed to be 3-6% of gas mass per free-fall time (Evans et al. 2009)
• Multiplicity
• Observed to be an increasing function of primary mass
• Separations, mass ratios, eccentricities
• High order systems (triples, quadruples)
• Protoplanetary discs
• Masses, sizes, density distributions
Simulations of stellar cluster formation
• Published more than two dozens of papers over the past decade
• e.g. Bonnell et al. 2001a,b; Bate et al. 2002a,b; Bate et al. 2003; Bonnell et al. 2003; Bate & Bonnell 2005; Bonnell et al. 2006; Price & Bate 2008, 2009; Bate 2009a,b,c, Bate 2012
• Include papers that rank in the top 1% most cited refereed astronomical papers in years 2002, 2003, 2009 (ADS)
• Bate, Bonnell & Bromm (2003)
• In the top 10 MNRAS papers in 2003
• 3rd most cited in Galactic star formation
• Bate (2009a)
• In the top 10 MNRAS papers in 2009
• 4th most cited in Galactic star formation
• At least one paper in the top 4% of most cited papers every year from 2000-2010
Most recent large calculation
• Bate (2012) includes self-gravity, hydrodynamics, radiation transport
• Resolves
• The opacity limit for fragmentation (i.e. even low-mass brown dwarfs)
• Protostellar discs (down to ~1 AU in size)
• All binaries and multiple systems (using sink particles)
• Produces >180 stars and brown dwarfs
• First star cluster formation simulation to give realistic stellar properties
• Mass function indistinguishable from the observer IMF
• Binary frequency
• Trends for binary and multiple star separations, mass ratios, etc
• Performed on Exeter’s DiRAC facility
Initial mass function Cumulative initial mass function
Fraction of binary & multiple stellar systems vs primary mass
Aim: to model more massive clusters
• Recent calculations form dozens to ~200 stars
• Need to model larger clusters to
• Produce massive stars (>10 solar masses)
• Study the effects of cluster size and mass on stellar properties
• Produce better statistics to better compare with observational surveys
• Requirements
• To model an Orion Nebula sized cluster
• ~20 million core-hours (e.g. 4096 cores for ~7 months)
• ~6 TB memory
• ~120 TB of disk space for data and post-run analysis
University of Durham
Institute for Computational Cosmology
Core members Carlos Frenk – ICC, Durham (P.I.) Adrian Jenkins – ICC, Durham Tom Theuns – ICC, Durham Gao Laing – ICC, Durham, Beijng Obs Simon White – Max Plank Inst für Astrophys (co-P.I.) Volker Springel – Max Plank Inst für Astrophys Frazer Pearce – Nottingham Naoki Yoshida – Nagoya Peter Thomas – Sussex Hugh Couchman – McMaster John Peacock – Edinburgh George Efstathiou – Cambridge Scott Kay – Manchester Rob Thacker – McGill Julio Navarro – Victoria Joop Schaye – Leiden
Simulation data, movies, etc available at: www.durham.ac.uk/virgo http://www.mpa-garching.mpg.de/Virgo
Virgo consortium for supercomputer simulations
• ~25 Associate members
• ~30 PhD students
~35/70 UK
~70 scientists in 6 countries
University of Durham
Institute for Computational Cosmology
Science goal:
Virgo consortium for supercomputer simulations
→ Model the formation and evolution of cosmic structures from the Big Bang to the present
→ Link early universe theory to observations
Need top-end supercomputers
University of Durham
Institute for Computational Cosmology
World standing and impact
• Of ~20,000 papers in astronomy every year, Virgo has at least one to the top 10 most cited papers in each of: 1996, 1997, 1999, 2001, 2005, 2006 (2 papers), 2008, 2010
• The most cited paper in Nature in the decade of the 2000s is Virgo’s 2005 Millennium Simulation paper
• Virgo has 2 papers with > 2500 citations
4 > 1000
8 > 500
• Some of these are amongst the most cited in Physics
University of Durham
Institute for Computational Cosmology
www.durham.ac.uk/virgo
www.mpa-garching.mpg.de/Virgo
June 2/05
Springel et al 05
(1236 citations)
University of Durham
Institute for Computational Cosmology
University of Durham
Institute for Computational Cosmology
www.durham.ac.uk/virgo
www.mpa-garching.mpg.de/Virgo
June 2/05
Springel et al 05
(1137 citations)
As of yesterday 445 refereed papers had been published by astronomers
all over the world using the Millennium simulation data
University of Durham
Institute for Computational Cosmology
The cosmic power spectrum: from the CMB to the 2dFGRS
2dFGRS
z=0
Sanchez et al 06
⇒ ΛCDM provides an excellent description of mass power spectrum
from 10-1000 Mpc
WMAP
ΛCDM
wavenumber k (comoving h-1 Mpc)-1
k-1 (comoving h-1 Mpc) 1000 10
Pow
er s
pect
rum
P(k
) (h-
1 Mpc
)3
z~1000
z~0
The content of our universe
University of Durham
Institute for Computational Cosmology
Structure of subhalos seems inconsistent with kinematical data for
Milky Way satellites
The Aquarius simulations of galactic halos
University of Durham
Institute for Computational Cosmology
• cold dark matter • warm dark matter
• Lovell, Eke, Frenk, Gao, Jenkins, White, Theuns et al 2011
The identity of the dark matter
The content of our universe
The Millennium XXL
What is the dark energy?
First parallel modified gravity simulation code
University of Durham
Institute for Computational Cosmology
ΛCDM model
ΛCDM convolved with window
Euclid ESA approved mission: 2018
Simulations with different forms of DE essential to interpret data
Galaxies-Intergalactic Medium Interaction Calculation!
Millennium Simulation L = 500 Mpc/h
GIMIC region (1 of 5) r ~ 20 Mpc/h
GIMIC galaxy ε = 500 pc/h
``Resimulation’’ of 5 regions from the Millennium simulation including baryons
• New projects:
Eagle Millennium-2 (100 Mpc)3 with full baryonic physics Aquila Single Milky Way galaxy with Arepo
University of Durham
Institute for Computational Cosmology
The Virgo programme: requirements 2012-2015
1. The identity of the dark matter: 44M core-hours
3. Galaxy formation: 74M core-hours
2. The cosmic large-scale structure: 44M core-hours
→ WDM large volume simulations (14M ) → 12 WDM galactic simulations (24M)
→ Other DM candidates (6M)
→ Eagle (100Mpc)3 volume, full baryon physics (20M)
→ Single MW galaxy with Arepo (54M)
→ 6 Modified Gravity Millenniums + 1 MXXL (20M) → 12 MXXLs WDM large volume simulations (24M)
University of Durham
Institute for Computational Cosmology
The Virgo programme: outreach
• TV (Newsnight, Horizon, Cosmos (BBC2), Discovery Channel, National Geographic, etc)
• National and international press
• Royal Society Summer Science Exhibition
University of Durham
Institute for Computational Cosmology
The Virgo programme: outreach
Computa(onal requirements: Virgo + UKMHD + Exeter 2012 -‐ 2015
• Virgo: Iden(ty of dark maFer 44M core-‐hours
Gas physics and galaxy forma(on 74M core-‐hours
The large-‐scale structure of the Universe 44M core-‐hours
Memory requirement: 36 Tbytes minimum: 4 Gbytes per core.
• UKMHD: Small number of high quality calcula(ons
to establish correct scaling laws 100M core-‐hours.
• Exeter: `Orion’ nebula cluster simula(on 20M core-‐hours.
Almost 300M core-‐hours in total -‐ requiring ~10000-‐16000 cores.
Hardware: cluster only
• £3.5M cluster: 600 nodes/motherboards: 200 Tflop/s peak
9600 cores, 38 Tb RAM, 2Pb storage, HSM,
tape archive.
• £5M cluster: 900 nodes/motherboards: 300 Tflop/s peak
14400 cores, 38 Tb RAM, 3Pb storage, HSM,
tape archive.
Hardware: £7.5M
• Op(on 1: cluster: 1000 nodes/motherboards : 330 Tflop/s peak
16000 cores, 64 Tb RAM, 3Pb storage, HSM,
tape archive
+ Shared memory machine: 2048 cores, 16Tb RAM
• Op(on 2: cluster: 600 nodes/motherboards : 200 Tflop/s peak
9600 cores, 38 Tb RAM, 3Pb storage, HSM, tape archive
+ Shared memory machine: 2048 cores, 16Tb RAM
£1.75M upgrade in 2013: Intel MIC?
ITS
Rack
s
ITS
Rack
s
AC05
FRONT
FRONT
FRONT
BACK
BACK
BACK
AC02 AC03
AC07
COSMA!IV
Hamilton with in!row cooling
Free rack spaces (31 standard computer racks)
Services and University backup services
UP
SU
PS
Availability of space
in AH machine room
door
The second door is 1.40 wide and 1.50 high
1
The room has overhead cable trays and and all other services are in the 50 cm highunder!floor service area. There are a total of 50 connection points into the water!cooling system of which only 12 have been used. Cooling capacity per point = upto 30 kW.
door2
Arthur Holmes Machine Room
Infrastructure costs
1. Enhance UPS (from 120 to 320 kW) + electrical work -‐ £130K 2. Remove the UPS from machine room freeing more space -‐ £50K 3. Increase the power to the facility by 360 kW -‐ £20K 4. Increase the power to the facility by 550 kW -‐ £100K 5. Add a forth chiller -‐ £120K 6. Add a fifh chiller -‐ £120K
£3.5M cluster: choices 1,3,5 Total: £270K £5M cluster: choices 1,4,5,6 Total: £470K £7.5M cluster + Shmem opt 1+2 choices 1,2,4,5,6 Total: £520K
Other op(ons possible but need more inves(ga(on: e.g. ren(ng a containerised UPS/generator
0. UPS upgrade to room will be provided by Durham University £200K
Staffing £3.5-‐5M cluster
• Senior system manager -‐ 0.5 FTE (Durham)
• Junior system manager -‐ 0.5 FTE (Durham)
• User support manager -‐ 0.5 FTE (HEIs excluding Durham)
• Scien(fic Director -‐ 0.3 FTE (Durham)
• Programmer -‐ 1 FTE (Durham)
Total: 2.8 FTE 400K over 3 yrs
Staffing £7.5M cluster
• Senior system manager -‐ 0.5 FTE (Durham)
• Junior system manager -‐ 1 FTE (Durham)
• User support manager -‐ 1 FTE (HEIs excluding Durham)
• Scien(fic Director -‐ 0.5 FTE (Durham)
• Programmer -‐ 2 FTE (1 Durham, 1 other)
Total: 5 FTE 680K over 3 yrs
£3.5M cluster
Item Cost Section(s)Hardware £3.5M 5.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £3.6MData centre enhancement £270K 5.2.2Electricity over 3 years £573K 5.1, 5.2.4System manager 3yrs (0.5FTE,9+) £100K 5.3System Administrator 3yrs (0.5FTE,7) £60K 5.3User manager 3yrs (0.5FTE,7) £60K 5.3Science director 3yrs (0.3FTE,9) £60K 5.3Programmer 3yrs (1FTE,7) £120K 5.3Recurrent costs over 3 yrs £973K 5.3
We propose a lean staffing plan consisting of 2.8 FTEs. This includes a System manager, Scientific direc-tor, System administrator, User administator and programmer at (0.5,0.3,0.5,0.5,1) FTEs respectively.Of these all but the user administrator would be located at Durham.
The procurement timetable is outlined in Section 7. DU is providing 300K for ICC/Virgo to be spent byAugust/2012. It may be advantageous to combine the two procurements.
In Section 9 we consider by how much the cluster can be scaled and still fit in the machine room. A £5Msystem rated as 300 Tflop/s would be close to the limits.
We note the Durham University has contributed over £1M to supercomputing infrastructure at Durhamin recent years, and there is every reason to expect that the University will actively continue to supportthis area.
9 Scaling the size of the cluster to £5M
• The system proposed in Section 5.1 can easily be accomodated in the AH machine room once theelectrical work and additional cooling capacity are installed (Section 5.2.2).
• The system can be scaled by up to a factor of 1.5, and still fit within the space, power and coolinglimits, although an additional chiller would probably be required (Section 5.2.2) which we estimatewould add another £120K to machine room improvements. This option be best realised if DurhamUniversity implements the plan to install a gas generator/kinetic UPS system to ensure there issufficient power for the machine room and the chillers.
• Such a machine, costing about £5M, would have a peak performance of 300 Tflops, 57Tbytes ofRAM and 3 Pbytes of storage and at a utilisation of 70% provide 265M core-hours.
• Scaling the system by 50% would allow a 300 Tflop system, to be installed, still within the powerlimits of the machine room, and the capability of the chillers to cool it. The peak power for themachine room would then be 518 kW, which is 94% of the available power in the machine room.
• The table below summarises the estimated costings:
17
£5M cluster
Item Cost Section(s)Hardware £4.925M 5.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £5MData centre enhancement £470K 5.2.2Electricity over 3 years £860K 5.1, 5.2.4System manager 3yrs (0.5FTE,9) £100K 5.3System Administrator 3yrs (0.5FTE,7) £60K 5.3User manager 3yrs (0.5FTE,7) £60K 5.3Science director 3yrs (0.3FTE,9) £60K 5.3Programmer 3yrs (1FTE,7) £120K 5.3Recurrent costs over 3 yrs £1260K
10 Financial Summary: £7.5M
A breakdown of the costings for option 1 is given in the table below. The costings are discussed in detailin the sections listed in the table.
Option 1
Item Cost Section(s)Hardware £6.98M 6.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £6.5MData centre enhancement £520K 5.2.2Electricity over 3 years £1.2M 6.1, 5.2.4System manager 3yrs (0.5FTE, grade 9) £100K 6.2System Administrator 3yrs (1FTE, grade 7) £120K 6.2User manager 3yrs (1FTE, grade 7) £120K 6.2Science director 3yrs (0.5FTE, grade 9) £100K 6.2Programmer 3yrs (2FTE, grade 7) £240K 6.2Recurrent costs over 3 yrs £1.88M
A breakdown of the costings for option 2 is given in the table below. The costings are discussed in detailin the sections listed in the table.
Option 2
Item Cost Section(s)Hardware £7.055M 6.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £7.13MData centre enhancement £520K 5.2.2Electricity over 3 years £825K 6.1, 5.2.4System manager 3yrs (0.5FTE, grade 9) £100K 6.2System Administrator 3yrs (1FTE, grade 7) £120K 6.2User manager 3yrs (1FTE, grade 7) £120K 6.2Science director 3yrs (0.5FTE, grade 9) £100K 6.2Programmer 3yrs (2FTE, grade 7) £240K 6.2Recurrent costs over 3 yrs £1.5M
We propose a staffing plan consisting of 5 FTEs. This includes a System manager, Scientific director,System administrator, User administator and programmers at (0.5,0.5,1,1,2) FTEs respectively. Of theseall but the user administrator and one programmer would be located at Durham.
18
£7.5M cluster + shmem – Op(on 1
Item Cost Section(s)Hardware £4.925M 5.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £5MData centre enhancement £470K 5.2.2Electricity over 3 years £860K 5.1, 5.2.4System manager 3yrs (0.5FTE,9) £100K 5.3System Administrator 3yrs (0.5FTE,7) £60K 5.3User manager 3yrs (0.5FTE,7) £60K 5.3Science director 3yrs (0.3FTE,9) £60K 5.3Programmer 3yrs (1FTE,7) £120K 5.3Recurrent costs over 3 yrs £1260K
10 Financial Summary: £7.5M
A breakdown of the costings for option 1 is given in the table below. The costings are discussed in detailin the sections listed in the table.
Option 1
Item Cost Section(s)Hardware £6.98M 6.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £6.5MData centre enhancement £520K 5.2.2Electricity over 3 years £1.2M 6.1, 5.2.4System manager 3yrs (0.5FTE, grade 9) £100K 6.2System Administrator 3yrs (1FTE, grade 7) £120K 6.2User manager 3yrs (1FTE, grade 7) £120K 6.2Science director 3yrs (0.5FTE, grade 9) £100K 6.2Programmer 3yrs (2FTE, grade 7) £240K 6.2Recurrent costs over 3 yrs £1.88M
A breakdown of the costings for option 2 is given in the table below. The costings are discussed in detailin the sections listed in the table.
Option 2
Item Cost Section(s)Hardware £7.055M 6.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £7.13MData centre enhancement £520K 5.2.2Electricity over 3 years £825K 6.1, 5.2.4System manager 3yrs (0.5FTE, grade 9) £100K 6.2System Administrator 3yrs (1FTE, grade 7) £120K 6.2User manager 3yrs (1FTE, grade 7) £120K 6.2Science director 3yrs (0.5FTE, grade 9) £100K 6.2Programmer 3yrs (2FTE, grade 7) £240K 6.2Recurrent costs over 3 yrs £1.5M
We propose a staffing plan consisting of 5 FTEs. This includes a System manager, Scientific director,System administrator, User administator and programmers at (0.5,0.5,1,1,2) FTEs respectively. Of theseall but the user administrator and one programmer would be located at Durham.
18
£7.05
Item Cost Section(s)Hardware £4.925M 5.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £5MData centre enhancement £470K 5.2.2Electricity over 3 years £860K 5.1, 5.2.4System manager 3yrs (0.5FTE,9) £100K 5.3System Administrator 3yrs (0.5FTE,7) £60K 5.3User manager 3yrs (0.5FTE,7) £60K 5.3Science director 3yrs (0.3FTE,9) £60K 5.3Programmer 3yrs (1FTE,7) £120K 5.3Recurrent costs over 3 yrs £1260K
10 Financial Summary: £7.5M
A breakdown of the costings for option 1 is given in the table below. The costings are discussed in detailin the sections listed in the table.
Option 1
Item Cost Section(s)Hardware £6.98M 6.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £6.5MData centre enhancement £520K 5.2.2Electricity over 3 years £1.2M 6.1, 5.2.4System manager 3yrs (0.5FTE, grade 9) £100K 6.2System Administrator 3yrs (1FTE, grade 7) £120K 6.2User manager 3yrs (1FTE, grade 7) £120K 6.2Science director 3yrs (0.5FTE, grade 9) £100K 6.2Programmer 3yrs (2FTE, grade 7) £240K 6.2Recurrent costs over 3 yrs £1.88M
A breakdown of the costings for option 2 is given in the table below. The costings are discussed in detailin the sections listed in the table.
Option 2
Item Cost Section(s)Hardware £7.055M 6.1Depreciation of data centre over 3 years £75K 5.2.3Capital: computer and depreciation costs 3yrs £7.13MData centre enhancement £520K 5.2.2Electricity over 3 years £825K 6.1, 5.2.4System manager 3yrs (0.5FTE, grade 9) £100K 6.2System Administrator 3yrs (1FTE, grade 7) £120K 6.2User manager 3yrs (1FTE, grade 7) £120K 6.2Science director 3yrs (0.5FTE, grade 9) £100K 6.2Programmer 3yrs (2FTE, grade 7) £240K 6.2Recurrent costs over 3 yrs £1.5M
We propose a staffing plan consisting of 5 FTEs. This includes a System manager, Scientific director,System administrator, User administator and programmers at (0.5,0.5,1,1,2) FTEs respectively. Of theseall but the user administrator and one programmer would be located at Durham.
18
£7.5M cluster + shmem – Op(on 2
Concluding remarks: a National Supercomputing Centre for Astrophysics Research
Why a national facility?
A single large machine:
• Allows a wider range of internationally competitive science (n.b. proposed Leicester and Durham clusters are very simiar)
• Science topics (gal form) and codes (Gadget) in common
• Allows access (10%) to industry/academics w. allied techniques/aims
• Is more readily sustainable in the long term
• Is what BIS are looking for!
A unique opportunity to do something big and long-lasting!
Concluding remarks: a National Supercomputing Centre for Astrophysics Research
Why at Durham?
• 10 years experience of running large machine for world-class science on a very slim financial model
• Basic infrastructure already in place
• Most economical alternative
- Infrastructure: 54% of Leicester for 2 x machine
- Running costs: 90% of Leicester for 2 x machine
- 1.75 x more than Cambridge for 4x no of cores
- Staff: same as Leicester for 2xmachine
- 90% of Cambridge for 4x no of cores