Upload
holly-henderson
View
214
Download
0
Embed Size (px)
Citation preview
NERSC Status Update for NERSC User Group Meeting
June 2006
William T.C [email protected]
510-486-7577
Ernest Orlando LawrenceBerkeley National Laboratory
Outline
•
Thanks for 10 Years of Help
• This is the 20th NUG meeting I have the privilege of attending
• Throughout the past 10 years you all have provided NERSC invaluable help and guidance
• NUG is very unique within the HPC community
• NERSC and I are grateful for your help in making NERSC successful
Science-Driven Computing Strategy 2006 -2010
NERSC Must AddressThree Trends
• The widening gap between application performance and peak performance of high-end computing systems
• The recent emergence of large, multidisciplinary computational science teams in the DOE research community
• The flood of scientific data from both simulations and experiments, and the convergence of computational simulation with experimental data collection and analysis in complex workflows
Science-Driven Systems
• Balanced and timely introduction of best new technology for complete computational systems (computing, storage, networking, analytics)
• Engage and work directly with vendors in addressing the SC requirements in their roadmaps
• Collaborate with DOE labs and other sites in technology evaluation and introduction
Science-Driven Services
• Provide the entire range of services from high-quality operations to direct scientific support
• Enable a broad range of scientists to effectively use NERSC in their research
• Concentrate on resources for scaling to large numbers of processors, and for supporting multidisciplinary computational science teams
Science-Driven Analytics
• Provide architectural and systems enhancements and services to more closely integrate computational and storage resources
• Provide scientists with new tools to effectively manipulate, visualize and analyze the huge data sets from both simulations and experiments
COMPUTATIONAL SYTEMSJAMES CRAWGroup Leader
National Energy Research Scientific National Energy Research Scientific Computing Computing ((NERSCNERSC)) Center DivisionCenter Division
NERSC CENTERNERSC CENTERDIVISION DIRECTORDIVISION DIRECTOR
HORST SIMON
DIVISION DEPUTYDIVISION DEPUTYWILLIAM KRAMER
NERSC CENTER GENERAL MANAGERNERSC CENTER GENERAL MANAGER& HIGH PERFORMANCE COMPUTING & HIGH PERFORMANCE COMPUTING
DEPARTMENT HEADDEPARTMENT HEAD
WILLIAM KRAMER
SCIENCE DRIVEN SYSTEM SCIENCE DRIVEN SYSTEM ARCHITECTUREARCHITECTURE
JOHN SHALF Team Leader
COMPUTER OPERATIONS & ESnet SUPPORT
STEVE LOWEGroup Leader
SCIENCE DRIVEN SERVICESSCIENCE DRIVEN SERVICES
FRANCESCA VERDIERAssociate General Manager
USER SERVICESJONATHAN CARTER
Group Leader
SCIENCE DRIVEN SYSTEMSSCIENCE DRIVEN SYSTEMS
HOWARD WALTERAssociate General Manager
HENP COMPUTINGHENP COMPUTING
CRAIG TULLGroup Leader
MASS STORAGEJASON HICKGroup Leader
NETWORK, SECURITY & SERVERS
BRENT DRANEYGroup Leader
ANALYTICSWES BETHEL- TL
(Matrixed - CRD)
OPEN SOFTWARE & PROGRAMMING
DAVID SKINNER Group Leader
ACCOUNTS & ALLOCATION TEAMCLAYTON BAGWELL
Team Leader
SCIENCE DRIVEN SYSTEMS
HOWARD WALTER Associate General Manager
NERSC CenterNERSC Center
USER SERVICES
JONATHAN CARTERGroup Leader
Harsh Anand Andrew Canning (.25- CRD)
Richard GerberFrank HaleHelen He
Peter Nugent – (.25-CRD) David Skinner (.5)
Mike StewartDavid Turner (.75)
NERSC CENTER GENERAL MANAGERNERSC CENTER GENERAL MANAGER
WILLIAM KRAMER
COMPUTATIONALSYSTEMS
JAMES CRAWGroup Leader
Matthew Andrews (.5)William BairdNick Balthaser
Scott Burrow (V)Greg ButlerTina Butler
Nicholas CardoThomas Langley
Rei LeeDavid Paul
Iwona SakrejdaJay Srinivasan
Cary Whitney (HEP/NP)
Open Positions (2)COMPUTER
OPERATIONS & ESnet SUPPORT
STEVE LOWEGroup Leader
Richard BeardDel Black
Aaron GarrettRussell Huie (ES)
Yulok LamRobert Neylan
Tony Quan (ES)Alex Ubungen
NETWORKING,SECURITY, SERVERS & WORKSTATIONS
BRENT DRANEYGroup Leader
Elizabeth Bautista (DB)Scott Campbell
Steve ChanJed Donnelley
Craig LantRaymond Spence
Tavia Stone
Open Position (DB)
SCIENCE DRIVENSYTEM
ARCHITECTURETEAM
JOHN SHALF Team Leader
Andrew Canning (.25- CRD)Chris Ding (.2 – CRD)Esmond Ng (.25-CRD) Lenny Oliker (.25-CRD)
Hongzhang Shan (.5-CRD)David Skinner (.5)
E.Strohmaier (.25-CRD)Lin Wang Wang (.5 – CRD)
Harvey WassermanMike Welcome (.15-CRD) Katherine Yelick (.05-CRD)
MASSSTORAGE
JASON HICKGroup Leader
Matthew Andrews (.5)Shreyas CholiaDamian Hazen
Wayne Hurlbert
Open Position (1)
SCIENCE DRIVEN SERVICES
FRANCESCA VERDIER Associate General Manager
OPEN SOFTWARE & PROGRAMMING
DAVID SKINNERGroup Leader
Mikhail AvrekhTom DavisRK Owen
Open Position (1) - Grid
ANALYTICSWES BETHEL
Team Leader(.5-CRD)
Cecilia Aragon (.2 - CRD)Julian Borrill (.5 - CRD)
Chris Ding (.3 - CRD)Peter Nugent (.25 - CRD) Christina Siegrist (CRD)
Dave Turner (.25)
Open Positions (1.5)
V- Vendor staffCRD – Matrixed staff from CRDES – funded by ESnetHEP/NP – funded by LBNL HEP and NP DivisionDB – Division Burden
ACCOUNTS & ALLOCATIONS
CLAYTON BAGWELL Team Leader
Mark HeerKaren Zukor (.5)
2005-2006 Accomplishments
Large-Scale Capability Computing Is Addressing New Frontiers
INCITE Program at NERSC in 2005:
• Turbulent Angular Momentum Transport; FaustoFausto Cattaneo, University of Chicago
– Order of magnitude improvement in simulation of accretion in stars and in the lab.
• Direct Numerical Simulation of Turbulent Non-premixed Combustion; Jackie Chen, Sandia Labs
– The first 3D Direct Numerical Simulation of a turbulent H2/CO/N2-air flame with detailed chemistry. Found new flame phenomena unseen in 2D.
• Molecular Dynameomics; Valerie Dagget, University of Washington
– Simulated folds for 38% of all known proteins– 2 TB protein fold database created
Comprehensive Scientific Support:•20-45% code performance improvements 2M extra hours
•All projects relied heavily on NERSC visualization services
DOE Joule
metric
The Good
• Deployed Bassi – January 2006– One of the fastest installations and acceptances– Bassi providing exceptional service
• Deployed NERSC Global File System – Sept 2005– Upgraded – January 2006– Excellent feedback from users
• Stabilized Jacquard – October 2005 to April 2006– Resolved MCE– errors– Installed 40 more nodes
The Good
• Improved PDSF – Added processing and storage– Converted 100’s of NSF file systems to a few GPFS file
systems– Access to NGF
• Increased Archive Storage function and performance– Upgraded to HPSS 5.1 – April 2006– More tape drives– More Cache disk– 10 GE Servers
• NERSC 5 procurement – On schedule and below cost (to do the procurement)
• Continued Network tuning
The Good
• Deployed Bassi – January 2006– One of the fastest installations and
acceptances– Bassi providing exceptional service
• Deployed NERSC Global File System – Sept 2005– Upgraded – January 2006– Excellent feedback from users
• Stabilized Jacquard – October 2005 to April 2006– Resolved MCM errors– Installed 40 more nodes
The Good
• Continued Network Tuning• Security
– Continued to avoid major incidents– Good results from the “Site Assistance Visit”
at LBNL • LBNL and NERSC “outstanding”• Still a lot of work to do – and some changes – before
they return in a year
• Over allocation issues (AY 05) solved– Better queue responsiveness– Stable time allocations
The Good
• Other– Thanks to ASCR the NERSC budget
appears stabilized– Worked with others to help define HPC
business practices– Continued progress in influencing
advanced HPC concepts• Cell, Power, Interconnects, Software
roadmaps, evaluation methods, working methods,…
The Not So Good
• Took a long time to stabilize Jacquard– Learned some lessons about light weight
requirements
• Upgrades on systems have not gone as well as we would have liked– Extremely complex – and much is not
controlled by NERSC
• Security attempts continue and increase in sophistication– Can expect continued evolution
• User and NERSC data base usage will be a point of focus
The Jury is still out
• Analytics ramp-up taking longer than we desired– NGF major step– Some success stories, but we don’t have breadth
• Scalability of Codes– DOE Expects significant (>50%?) of time to be for
jobs > 2,048 way for the first full year of NERSC-5– Many of the most scalable applications are
migrating to LCFs – so some of the low hanging fruit is already harvested
– Should be a continuing focus of NERSC and NUG
2005-2006 Progress On Goals
FY 04-06 Overall Goals
1. (Support for DOE Office of Science) Support and assist DOE Office of Science in meeting its goals and obligations through the research, development, deployment and support of high performance computing and storage resourcesand advanced mathematical and computer systems software.
2. (Systems and Services)Provide leading edge, open High Performance Computing (HPC) systems and services to enable scientific discovery. NERSC will use its expertise and leadership in HPC to provide reliable, timely, and excellent services to its users.
3. (Innovative assistance)Provide innovative scientific and technical assistance to NERSC's users. NERSC will work closely with the user community and together produce significant scientific results while making the best use of NERSC facilities.
4. (Respond to Scientific Needs)Be an advocate for NERSC users within the HPC community. Respond to science-driven needs with new and innovative services and systems.
FY 04-06 Overall Goals
5. (Balanced integration of new products and ideas)Judiciously integrate new products, technology, procedures, and practices into the NERSC production environment in order to enhance NERSC's ability to support scientific discovery.
6. (Advance technology)Develop future cutting-edge strategies and technologies that will advance high performance scientific computing capabilities and effectiveness, allowing scientists to solve new and larger problems, and making HPC systems easier to use and manage.
7. (Export NERSC knowledge)Export knowledge, experience, and technology developed at NERSC to benefit computer science and the high performance scientific computing community.
8. (Culture)Provide a facility that enables and stimulates scientific discovery by continually improving our systems, services and processes. Cultivate a can-do approach to solving problems and making systems work, while maintaining high standards of ethics and integrity.
2005-2006 Progress 5 Year Plan Milestones
5 Year Plan Milestones
• 2005– NCS enters full service.- Completed
• Focus is on modestly parallel and capacity computing.• >15–20% of Seaborg
– WAN upgrade to 10 Gb/s .- Completed– Upgrade HPSS to 16 PB. Storage upgrade to support 10
GB/s for higher density and increased bandwidth. .- Completed
– Quadruple the size of the visualization/post-processing server. .- Completed
• 2006– NCSb enters full service. .- Completed
• Focus is on modestly parallel and capacity computing• >30–40% of Seaborg .- Completed – Actually > 85% of
Seaborg SSP
5 Year Plan Milestones
• 2006– NERSC-5: initial delivery with possibly a phasing of delivery. –
Expected – but most will be in FY 07• 3 to 4 times Seaborg in delivered performance – Over Achieved –
more later• Used for entire workload and has to be balanced
– Replace the security infrastructure for HPSS and add native Grid capability to HPSS – Completed and Underway
– Storage and Facility-Wide File System upgrade. .- Completed and Underway
• 2007– NERSC-5 enters full service. - Expected– Storage and Facility-Wide File System upgrade. - Expected– Double the size of the visualization/post processing server. – If
usage dictates
Summary
• It is a good time to be in HPC• NERSC has far more success stories than
issues• NERSC Users are doing an outstanding
job producing leading edge science for the Nation– More than 1,200 peer reviewed papers for AY
05.
• DOE is extremely support of NERSC and its users