Upload
cleopatra-thompson
View
213
Download
1
Tags:
Embed Size (px)
Citation preview
Research ComputingUniversity Of South Florida
Providing Advanced Computing Resources for Research
and Instructionthrough
Collaboration
Mission
• Provide advanced computing resources required by a major research universityo Softwareo Hardwareo Trainingo Support
User Base
• 40 Research groups• 6 Colleges• 100 faculty• 300 students
Hardware
• System was build on the condominium model and consists of 300 Nodes 2400 Processorso University provides infrastructure and some
computational resourceso Faculty funding provides bulk of computational
resources
Software
• Over 50 scientific codes o Installationo Integrationo Upgradeso Licensing
Support Personnel
• Provide all systems administration• Software support• One-on-one consulting• System efficiency improvements• Users are no longer just the traditional “number crunchers
Current Projects
• Consolidating the last standalone cluster (of appreciable size)
• Advanced Visualization Centero Group of 19 Faculty applied for fundingo Personnelo Trainingo Large Resolution 3D display
Current Projects
• New computational resourceso Approximately 100 nodeso GPU resourceso Upgrade parallel file system
• Virtual Clusterso HPC for the other 90 %
• FACC
Florida State University's Shared HPC
Building and Maintaining Sustainable Research
Computing at FSU
Shared-FSU HPC Mission
• Support multidisciplinary research• Provide a general access computing
platform• Encourage cost sharing by departments
with dedicated computing needs• Provide a broad base of support and
training opportunities
Turn-key Research SolutionParticipation is Voluntary
• University provides staffing• University provides general infrastructure
o Network fabricso Rackso Power/Cooling
• Additional buy-in incentiveso Leverage better pricing as a group o Matching funds
• Offer highly flexible buy-in optionso Hardware purchase onlyo Short-term Service Level Agreementso Long-term Service Level Agreements
• Shoot for 50% of hardware costs covered by Buy-in
Research Support @ FSU
• 500 plus users • 33 Academic Units• 5 Colleges
HPC Owner Groups• 2007
o Department of Scientific Computingo Center for Ocean-Atmosphere Prediction Studieso Department of Meteorology
• 2008 o Gunzburger Group (Applied Mathematics)o Taylor Group (Structural Biology)o Department of Scientific Computingo Kostov Group (Chemical & Biomedical Engineering)
• 2009 o Department of Physics (HEP, Nuclear, etc.)o Institute of Molecular Biophysicso Bruschweiler Group (National High Magnetic Field Laboratory)o Center for Ocean-Atmosphere Prediction Studies (with the Department of Oceanography)o Torrey Pines Institute of Molecular Studies
• 2010o Chella Group (Chemical Engineering)o Torrey Pines Institute of Molecular Studieso Yang Group (Institute of Molecular Biophysics)o Meteorology Departmento Bruschweiler Groupo Fajer Group (Institute of Molecular Biophysics)o Bass Group (Biology)
Research Support @ FSU
• Publicationso Macromoleculeso Bioinformaticso Systematic Biologyo Journal of Biogeographyo Journal of Applied Remote Sensingo Journal of Chemical Theory and Computationo Physical Review Letterso Journal of Physical Chemistryo Proceeding of the National Academy of Scienceo Biophysical Journalo Journal Chemical Theory Computationo Journal: J. Phys. Chem.o PLoS Pathogenso Journal of Virologyo Journal of the American Chemical Societyo The Journal of Chemical Physicso PLoS Biologyo Ocean Modelingo Journal of Computer-Aided Molecular Design
Sliger Data Center
Shared-HPC pfs
FSU’s Shared-HPCStage 1: Infiniband Connected Cluster
Single and Multiprocessor UsageYear 1
DSL BuildingSliger Data Center
Shared-HPC pfs
Condor
FSU’s Shared-HPCStage 2: Alternative Backfilling
Backfilling Single Proc Jobs on Non-HPC Resources Using Condor
Condor Usage
• ~1000 processor cores available for single processor computations
• 2,573,490 processor hours used since Condor was made available to all HPC users in September
• Seven users have been using Condor from HPC
• Dominate users are Evolutionary Biology, Molecular Dynamics, and Statistics (same users that were submitting numerous single proc. jobs)
• Two workshop introducing it to HPC users
Single vs. Multi-processor JobsYear 2
Single vs. Multi-processor JobsYear 3
DSL BuildingSliger Data Center
Shared-HPC pfs
Condor
SMP
FSU’s Shared-HPCStage 3: Scalable SMP
FSU’s Shared-HPCStage 3: Scalable SMP
• One MOAB Queue for SMP or very large memory jobs
• Three “nodes”o M905 blade with 16 cores and 64GB memo M905 blade with 24 cores and 64GB memo 3Leaf system with up to 132 cores and 528 GB
mem
DSL Building
DSL Data Center
Sliger Data Center
Shared-HPC pfs
Condor
SMP
2°fs
Vis
Interactive ClusterFunctions
• Facilitates data exploration• Provides venue for software not well suited for
a batch scheduled environmento (e.g., some MatLab, VMD, R, Python, etc.)
• Provides access to hardware not typically found on standard desktops/laptops/mobile devises (e.g. lots of memory, high-end GPUs)
• Provides licensing and configuration support for software applications and libraries
Interactive ClusterHardware Layout
• 8 high-end CPU based host nodeso Multi-core Intel or AMD processorso 4 to 8 GB of memory per coreo 16X PCIe connectivityo QDR IB connectivity to Luster storageo IP (read-only) connectivity to Panasaso 10 Gbps connectivity to campus network backbone
• One C410x external PCI chassiso Compacto IPMI managemento Supports up to 16 NVIDIA Tesla M2050
Up to 16.48 teraflops
DSL Building
DSL Data Center
Sliger Data Center
Shared-HPC pfs
Condor
SMP
2°fs
Vis
Db.Web
Web/Database HardwareFunction
• Facilitates creation of Data analysis Pipelines/Workflows
• Favored by external funding agencieso Demonstrated cohesive Cyberinfrastructure o Fits well into required Data Management Plans
(NSF) • Intended to facilitate access to data on
Secondary storage or cycles on owner share of HPC
• Basic Software Install, no development support
• Bare Metal or VM
Web/Database HardwareExamples
Web/Database HardwareExamples
FSU Research CI
HPC
HTC
SMP
1° storage
2°Storage
Vis and interactive
DB and Web
Florida State University's Shared HPC
• Universities are by design multifaceted and lack a singular focus of support
• Local HPC resources should also be multifaceted and have a broad basis of support
HPC Summit
University of Florida
HPC Center
HPC Summit
Short history
• Started in 2003• 2004 Phase I:
CLAS – Avery – OIT• 2005 Phase IIb:
o COE – 9 investors• 2007 Phase IIb:
o COE – 3 investors• 2009 Phase III:
o DSR – 17 investors - ICBR - IFAS • 2011 Phase IV:
o 22 investors
HPC Summit
Budget
• Total budgeto 2003-3004 $0.7 Mo 2004-2005 $1.8 Mo 2005-2006 $0.3 Mo 2006-2007 $1.2 Mo 2007-2008 $1.6 Mo 2008-2009 $0.4 Mo 2009-2010 $0.9 M
HPC Summit
Hardware
• 4,500 cores• 500 TB storage• InfiniBand connected• In three machine rooms
o Connected by 20 Gbit/sec Campus Research Network
HPC Summit
System software
• RedHat Enterprise Linux o through free CentOS distributiono upgrade once per year
• Lustre file systemo mounted on all nodeso Scratch onlyo Provide backup through CNS service
Requires separate agreement between researcher and CNS
HPC Summit
Other software
• Moab scheduler (commercial license)• Intel compilers (commercial license)• Numerous applications
o Open and commercial
HPC Summit
Operation
• Shared cluster• some hosted systems• 300 users• 90% - 95% utilization
HPC Summit
Investor Model
• Normalized Computing Unito $400 per NCUo Is one coreo In fully functional system (RAM, disk, shared
file system)o For 5 years
HPC Summit
Investor Model
• Optional Storage Unito $140 per OSUo 1 TB of file storage (RAID) on one of a few
global parallel file systems (Lustre)o For 1 year
HPC Summit
Other options
• Hosted systemo Buy all hardware, we operateo No sharing
• Pay as you goo Agree to pay monthly billo Equivalent (almost) to $400 NCU prorated on a
monthly basis• Or rates are 0.009 cents per hour
o Cheaper than Amazon Elastic Cloud
www.ccs.miami.edu
Mission Statement
• UM CCS is establishing nationally and internationally recognized research programs, focusing on those of an interdisciplinary nature, and actively engaging in computational research to solve the complex technological problems of modern society. We provide a framework for promoting collaborative and multidisciplinary activities across the University and beyond
CCS overview
• Started in June 2007• Faculty Senate approval in 2008• Four Founding Schools: A&S, CoE,
RSMAS, Medical• Offices in all Campus• ~30 FTEs• Data Center at the NAP of Americas
UM CCS Research Programs and Cores
Physical Science&
Engineering
Computational Biology&
Bioinformatics
Data Mining
Visualization
ComputationalChemistry
SoftwareEngineering
High PerformanceComputing
Social Systems Informatics
Quick Facts• Over 1,000 UM users• 5,200 cores of Linux Based Cluster• 1,500 cores of Power-based Cluster• ~2.0 PT of Storage• 4.0 PT of Back-up • More at:
o http://www.youtube.com/watch?v=JgUNBRJHrC4
o www.ccs.miami.edu
High Performance Computing
• UM Wide Resource Provides Academic Community & Research Partners with Comprehensive HPC Resources:o Hardware & Scientific Software Infrastructureo Expertise in Designing & Implementing HPC Solutionso Designing & Porting Algorithms & Programs to Parallel
Computing Models• Open Access of compute processing (first come
serve)o Peer Review for large projects – Allocation Committeeo Cost Center for priority access
• HPC serviceso Storage Cloudo Visualization and Data Analysis Cloud o Processing Cloud