Upload
christine-mccormick
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
State of HCC2012
Dr. David R. SwansonDirector, Holland Computing Center
Nature Communications, July 17, 2012
Nebraska Supercomputing Symposium 2012
HCC CPU Hour Usage 2012
Nebraska Supercomputing Symposium 2012
Zeng (Quant Chem) 4.5M
Starace (AMO Phys) 2.7M
Rowe (Climate) 2.0M
NanoScience 6.4M
B
N
NN
N NN
NB
CComp Bio 3.0M B
Comp Sci 1.7M C
C
Physics 0.7M
Mech E 0.4M
High Performance Computing
• http://t2.unl.edu/status/hcc-status• Xiao Zeng, Chemistry, UNL (prior slide)• DFT and Car Parrinello MD• HPC – tightly coupled codes• Requires expensive low-latency local network
(infiniband)• Requires high-performance storage (Panasas,
Lustre) • Requires highly reliable hardware
Nebraska Supercomputing Symposium 2012
Eureka! A Higgs! (or at least something currently indistinguishable)
• "I think we have it. We have discovered a particle that is consistent with a Higgs boson." – CERN Director-General Rolf Heuer
Nebraska Supercomputing Symposium 2012
US CMS Tier2 Computing
Nebraska Supercomputing Symposium 2012
Compact Muon Solenoid (CMS)
5.5 mi
Large Hadron Collider
Nebraska Supercomputing Symposium 2012
CMS Grid Computing Model
Nebraska Supercomputing Symposium 2012
Eureka! A Higgs! (or at least something currently indistinguishable)
• Ca. 50 PB of CMS data in entirety• Over 1 PB currently at HCC’s “Tier2”, 3500
cores• Collaboration at many scales
– HCC and Physics Department– Over 2700 scientists worldwide– International Grid Computing Infrastructure– Data grid as well– UNL closely linked to KU, KSU physicists via
a jointly hosted “Tier3” Nebraska Supercomputing Symposium 2012
Data Intensive HTC
• Huge database• Requires expensive high-bandwidth wide area
network (dwdm fiber)• Requires high-capacity storage (HDFS, dCache) • HTC – loosely coupled codes• Requires hardware
Nebraska Supercomputing Symposium 2012
Outline
• HCC Overview• New User report• HCC-Go• Moving Forward (after break)
– Next purchase– It’s the Data, stupid… – Other Issues
Nebraska Supercomputing Symposium 2012
Outline
• New User report• HCC-Go• Moving Forward (next section)
– Next purchase (motivation)– New Communities– PIVOT– It’s the Data, stupid…
Nebraska Supercomputing Symposium 2012
HOLLAND COMPUTING CENTER OVERVIEW
Nebraska Supercomputing Symposium 2012
HCC @ NU
• Holland Computing Center has a University-wide mission to – Facilitate and perform computational and
data intensive research– Engage and train NU researchers, students,
and other state communities
– This includes you! – HCC would be delighted to collaborate
Nebraska Supercomputing Symposium 2012
Computational Science – 3rd Pillar
Experim
ent
Theory
Computation/D
ata
Nebraska Supercomputing Symposium 2012
Lincoln Resources
• 10 staff• Red• Sandhills• 5,000 compute
cores• 3 PetaBytes
storage in HDFS
Nebraska Supercomputing Symposium 2012
Sandhills “Condominium Cluster”
• 44 nodes X 32-core, 128 GB, IB
• Lustre (175 TB)• Priority Access
– $HW + $50/month– 4 groups currently
• SLURM
Nebraska Supercomputing Symposium 2012
Omaha Resources
• 3 Staff • Firefly • Tusker• 10,000 compute
cores• 500 TB storage• New offices soon:
158J PKI
Nebraska Supercomputing Symposium 2012
Tusker
• 106*64= 6784 cores
• 256 GB/node• 2 nodes w/ 512
GB• 360 TB Lustre
– 100 TB more en route
• QDR IB• 43 TFlop
Nebraska Supercomputing Symposium 2012
Tusker
• ¼ footprint of Firefly
• ¼ the power• 2X the TFLOPS• 2X the storage• Fully utilized• Maui/Torque
Nebraska Supercomputing Symposium 2012
In between …
• HCC (UNL) to Internet2: 10 gbps• HCC (Schorr) to HCC (PKI): 20 gbps• Allows us to do some interesting things
– “overflow” jobs to/from Red– DYNES project– Xrootd mechanism
Nebraska Supercomputing Symposium 2012
HCC Staff
• HPC Applications Specialists– Dr. Adam Caprez– Dr. Ashu Guru– Dr. Jun Wang– Dr. Nicholas
Palermo
• System Administrators– Dr. Carl Lundstedt– Garhan Attebury– Tom Harvill– John Thiltges– Josh Samuelson– Dr. Brad Hurst
Nebraska Supercomputing Symposium 2012
HCC Staff
• Other Staff– Dr. Brian
Bockelman– Joyce Young
• GRAs– Derek Weitzel– Chen He– Kartik Vedalaveni– Zhe Zhang
• Undergraduates– Carson Crawford– Kirk Miller– Avi Knecht– Phil Brown– Slav Ketsman– Nicholas Nachtigal– Charles Cihacek
Nebraska Supercomputing Symposium 2012
HCC Campus Grid
• Holland Computing Center resources are combined into an HTC campus grid– 10,000 cores, 500 TB in Omaha– 5,000 cores, 3 PB in Lincoln– All tied together via a single submission
protocol using OSG software stack– Straightforward to expand to OSG sites
across the country, as well as to EC2 (cloud)– HPC jobs get priority; HTC ensures high
utilizationNebraska Supercomputing Symposium 2012
HCC Model for a Campus Grid
Me, my friends and everyone else
Grid
Campus
Local
25
Nebraska Supercomputing Symposium 2012
HCC & Open Science Grid
• National, distributed computing partnership for data-intensive research– Opportunistic computing– Over 100,000 cores– Supports the LHC experiments, other
science– Funded for 5 more years– Over 100 sites in the Americas– Ongoing support for 2.5 (+3) FTE at HCC
Nebraska Supercomputing Symposium 2012
It Works!
Nebraska Supercomputing Symposium 2012
HCC Networking Monitoring
Nebraska Supercomputing Symposium 2012
OSG Resources
Nebraska Supercomputing Symposium 2012
Working philosophy
• Use what we buy– These pieces of infrastructure are linked, but improve
asynchronously – Depreciation is immediate– Leasing is still more expensive (for now)– Buying at fixed intervals mitigates risk, increases ROI– Space, Power and Cooling have a longer life span
• Share what we aren’t using– Share opportunistically – retain local ownership– Consume opportunistically – there is more to gain!– Collaborators, not just consumers– Greater good vs. squandered opportunity
Nebraska Supercomputing Symposium 2012
Working philosophy
• A Data deluge is upon us• Support is essential
– If you only build it, they still may not come– Build incrementally and buy time for user training– Support can grow more gradually than hardware
• Links to national and regional infrastructure are critical – Open Source Community– GPN access to Internet2– Access to OSG, XSEDE resources– Collaborations with fellow OSG experts– LHC
Nebraska Supercomputing Symposium 2012
HCC New Users
FY UNL-City
UNL-East
UNO UNMC Outside NU system
2011 424 (74) 33 (10) 75 (19) 30 (17)
112 (26)
2012 519 (95) 50 (17)
105 (30) 35 (5)
130 (18)
Nebraska Supercomputing Symposium 2012
New User Communities
• Theatre, Fine Arts/Digital Media, Architecture• Psychology, Finance
• UNMC
• Puerto Rico
• PIVOT collaborators
Nebraska Supercomputing Symposium 2012
HCC NEW USER REPORT:HEATH ROEHR
Nebraska Supercomputing Symposium 2012
HCC-GO :DR. ASHU GURU
Nebraska Supercomputing Symposium 2012
MOVING FORWARD
Nebraska Supercomputing Symposium 2012
NEW PURCHASE
Nebraska Supercomputing Symposium 2012
$2M for …
• More computing– need ca. 100 TF to hit Top500 for Jun 2013 – Likely use all of funds to hit that amount
• More storage– Near-line archive (9 PB)– HDFS
• Specialty hardware– GPGPU/Viz– Mic hardware
Nebraska Supercomputing Symposium 2012
More computing
• How much RAM/core? • Currently almost always oversubscribed• Large scale jobs almost impossible (> 2000
core)• Safest investment – will use right away• Firefly due to be retired soon – EOL
Nebraska Supercomputing Symposium 2012
More computing
Nebraska Supercomputing Symposium 2012
More Computing
Nebraska Supercomputing Symposium 2012
More storage
• Most rapidly growing demand• Growing contention, can’t just queue up• Largest unmet need (?)
Nebraska Supercomputing Symposium 2012
Storage for $2M
• $2M HDFS cluster – 250 nodes– 4000 cores (Intel)– 9.0 PB (RAW)– 128 GB / node
Nebraska Supercomputing Symposium 2012
Other options
• GPGPUs most Green option for computing• Highest upside for raw power (Top500)• Mic even compatible with x86 codes
• SMP uniquely meets some needs, easiest to use/program
• Bluegene, Tape silo, …
Nebraska Supercomputing Symposium 2012
HCC personnel timeline
1999 2002 2005 2009 2012
Personnel 2 3 5 9 13
1
3
5
7
9
11
13
HCC Personnel Numbers
Nu
mb
er
7X
Nebraska Supercomputing Symposium 2012
HCC networking timeline
1999 2002 2005 2009 2012
WAN B/W 0.155 0.155 0.622 10 30
2.5
7.5
12.5
17.5
22.5
27.5
HCC WAN Bandwidth
Gb
/sec
200X
Nebraska Supercomputing Symposium 2012
HCC cpu timeline
1999 2002 2005 2009 2012
CPU Cores 16 256 656 6956 14492
1000
3000
5000
7000
9000
11000
13000
15000
HCC CPU Cores
Nu
mb
er
900X
Nebraska Supercomputing Symposium 2012
HCC storage timeline
1999 2002 2005 2009 2012
Capacity 0.108 1.2 31.2 1200 3250
250
750
1250
1750
2250
2750
3250
HCC Storage Capacity (RAW)
Tera
Byte
s
30,000X
Nebraska Supercomputing Symposium 2012
Composite Timeline
• Data increase/ CPU Cores = 33• Data increase/ WAN bandwidth = 150• It takes a month to move 3 PB at 10 Gb/sec
• Power < 100X increase, largely constant last 3 years
Nebraska Supercomputing Symposium 2012
Storage at HCC
• Affordable, Reliable, High Performance, High Capacity– Pick 2 – So multiple options
• /home• /work• /shared• Currently, no /archive
Nebraska Supercomputing Symposium 2012
/home
• Reliable• Low performance
– No W from workers• ZFS• Rsync’ed pair, one in Omaha, one in Lincoln• Backed up incrementally, requires severe
quotas
Nebraska Supercomputing Symposium 2012
/work
• High performance• High(er) capacity• Not permanent storage• Lenient quotas• More robust, more reliable “scratch space”• Subject to purge as needed
Nebraska Supercomputing Symposium 2012
/share
• Purchased by given group• Exported to both Lincoln and Omaha machines• Usually for capacity, striped for some reliability
Nebraska Supercomputing Symposium 2012
Storage Strategy
• Maintain /home for precious files– Could be global
• Maintain /work for runtime needs– Remain local to cluster
• Create /share for near-line archive– 3-5 year time frame (or less)– Use for accumulating intermediate data,
then purge– Global access
Nebraska Supercomputing Symposium 2012
Storage strategy
• Permanent archival has 3 options– 1) library– 2) Amazon glacier
• Currently $120/TB/year– 3) tape system
Nebraska Supercomputing Symposium 2012
HCC Data Visualizations
• Fish!• HadoopViz• OSG Google Earth
• Web-based monitoring– http://t2.unl.edu/status/hcc-status/– http://hcc.unl.edu/gratia/index.php
Nebraska Supercomputing Symposium 2012
Other discussion topics
• Maui vs. SLURM• Queue length policy • Education approaches
– This (!)– Tutuorials (next!)– Afternoon workshops– Semester courses– Individual presentations/meetings– Online materials
Nebraska Supercomputing Symposium 2012
©2007 The Board of Regents of the University of Nebraska
NU Administration (UNL, NRI)NSF, DOE, EPSCoR, OSG
Holland FoundationCMS: Ken Bloom, Aaron Dominguez
HCC: Drs. Brian Bockelman, Adam Caprez, Ashu Guru, Brad Hurst, Carl Lundstedt, Nick Palmero, Jun Wang.
Garhan Attebury, Tom Harvill, Josh Samuelson, John Thiltges
Chen He, Derek Weitzel