Upload
phamdiep
View
224
Download
0
Embed Size (px)
Citation preview
HP IT-Symposium 2006
www.decus.de 1
© 2004 Hewlett-Packard Development Company, L.P. Änderungen vorbehalten. 5982-7693DEE. August 2004
HPC ClusterHigh Performance Computing Cluster
Status Update
18-Mai-2006
Dr. Werner HöhnSenior ConsultantPresales, HPC, Bad Homburg
Vortrag 3F01
IT-Symposium 2006
Agenda
1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS
HP IT-Symposium 2006
www.decus.de 2
Agenda
1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS
HP – Decus IT-Symposium 2006 – www.decus.de 4
HPC platforms• Choice• Performance• Manageability
Version 2Version 2
CP6000CP3000 CP4000& CP4000BL
New!
HP Cluster Platforms
rx2620
rx1620
rx4640 Superdomerx7620 rx8620
HP IntegrityServers
DL140 G2DL145 G2
DL360 G4pDL385
DL380 G4
DL585
HP ProLiant Servers
HP Workstations
xw8200nw8240 xw9300 c8000
HP BladeSystemsBL20p G3
BL25pBL30p BL35p
BL40p BL45p
BL60pNew!
HP IT-Symposium 2006
www.decus.de 3
HP – Decus IT-Symposium 2006 – www.decus.de 5
Core 000
Core 001
HP Choice of Standard Processors: Leading the Dual-Core Curve
ProLiantDP/MP
Dual Core
ProLiantMP/DP
Dual Core
ProLiantUP
Dual Core
2Q05 4Q05 4Q05 3Q06
Itanium2Dual Core
Opteron, Xeon and IA64 in 2006
New Dual-Core Intel Xeon Processors for ProLiant Platforms arrived 4Q05
ProLiantDP 1066Mhz
Dual Core
2Q06Paxville Dempsey Montecito
HP – Decus IT-Symposium 2006 – www.decus.de 6
Architecture Comparison
• Bus bottlenecks reduced or eliminated
• Adding CPU’s adds memory and I/O bandwidth
• 5.3 GB/s dedicated CPU memory bandwidth
• CPU-to-CPU cHT links offer bandwidth of 3.2GB/s in each direction (HT1, 800MHz)
• Each PCI-X Bus has bandwidth of 3.2GB/s
• I/O is independent of memory access
• Under full load each CPU gets <=½ of max bus
bandwidth
• Memory and I/O must share the same bus
• However, FSB clockrate will significantly increase
• single bus : not highly scalable past 2-way
• multiple buses are coming
• more functional units
Intel Xeon DP AMD Opteron
HP IT-Symposium 2006
www.decus.de 4
HP – Decus IT-Symposium 2006 – www.decus.de 7
Streams benchmark on DL585
Previous, inofficial/official(1.8.05) numbers (MB/sec):
11800.23/145625974.973014.73triad
11926.22/145996044.213047.45add
11667.05/138945904.612984.81scale
11677.71/138935915.542994.53copy
DL585-4P DL585-2PDL585-1PKernel 2.6.6
http://www.cs.virginia.edu/stream/stream_mail/2005/0005.html
HP – Decus IT-Symposium 2006 – www.decus.de 8
K8LInternet news Auszug (im wesentlichen von Chuck Moore, 16. Mai
2006)• QuadCore chips in 2007, 65nm (ein chip)• Hypertransport bis zu 5.2GT/s bei vermutlich 8B pro Takt• Verbesserungen bei der Cache Kohärenz (multisocket systems ?)• L2 cache pro core/onchip, gemeinsamer L3 cache• Separate Stromversorgung für memory controller und cores• 48b virtual/physical addressing (bis zu 256TB) und 1GB pages• DDR2 ab FX2 (AM2) socket, DDR3 später (mit neuem socket ?)• RAS Eigenschaften für Memory Zugriff und HT• OOO special load - ähnlich memory disambiguation/speculative load• 2x 128b SSE units auch für FP ops
HP IT-Symposium 2006
www.decus.de 5
HP – Decus IT-Symposium 2006 – www.decus.de 9
DP Dempsey Overview
Yes**Demand Based Switching (DBS)
YesIntel I/OAT
2 x 2 MBIndependent L2 Cache
YesHyper-Threading Technology
Q1’06Availability Target
LGA 771Socket
65nmProcess technology
YesIntel® EM64T
DP Dempsey Feature Summary
MCH FSB
ExecutionCore
2MB L2Cache
ExecutionCore
2MB L2Cache
** DBS will not be supported on all SKUs** DBS will not be supported on all SKUs
HP – Decus IT-Symposium 2006 – www.decus.de 10
Intel x86 processor roadmap BemerkungenDual-Core Xeon/Paxville, Oktober 2005Paxvill DP begann mit 3.2GB Taktrate, 2x2MB onchip L2
Cache. 800MT/s FSBDempseyÄhnlich Paxville, aber 65nm DP. Hat gegenüber Intels
„Presler“ SMP support. 2.5-3.73 GHz. 666 bis 1033MHz. 2x2MB onchip L2 cache.
Woodcrest (80W)Dual-core Prozessor für vor allem 2socket Systeme
(basierend auf Intels Merom und Conroe cores, NGA). 1333MT/s FSB, dual independent buses. 4MB shared L2 cache. 1.6-3.0 GHz. Mehr functional units
ClovertownQad-core Version von Woodcrest aus 2 Woodcrest dies auf
einem MCM. Vermutlich 1066 MT/s FSB clockrate.
HP IT-Symposium 2006
www.decus.de 6
Agenda
1. Architekturen und HPC Cluster Bausteine2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS
HP – Decus IT-Symposium 2006 – www.decus.de 12
HP ProLiant servers for HPTCPower and choice for scale-out solutions
DL500 Series• DL585 − The industries most capable
Opteron-based 4-way server, with 64GB memory capacity and best-in-class management and uptime features
DL300 Series • DL360 / DL380 / DL385− Maximum compute power
with commercial robustness
DL100 Series• DL140/DL145 − High-performance, low-cost
2P/1U compute node optimized for HPC environments.
BL series• BL2xp/BL3xp/BL4xp− Performance 2P & 4p blades
designed for density− Cluster manageability &
connectivity
HP IT-Symposium 2006
www.decus.de 7
HP – Decus IT-Symposium 2006 – www.decus.de 13
HP Cluster Platform components:Specific nodes• ProLiant DL145 two processors, 1U
− Opteron 2.8GHz SC or 2.6GHz DC− two PCI-X slots− up to 16GB memory (PC3200 DDR)
• ProLiant DL585 four processors, 4U− Opteron 2.8GHz SC or 2.6GHz DC− six PCI-X buses, 8 slots− up to 64GB memory (PC2700 DDR), or 32GB (PC3200 DDR)
• Xeon-64 DL360 two socket 1U − 3.4GHz − two PCI-express slots− Up to 8GB memory (DDR333)
• Integrity servers− Next generation of Itanium2 processors− rx1620, rx2620 and rx4640
HP – Decus IT-Symposium 2006 – www.decus.de 14
ProLiant DL145 G2 highlights
Performance• AMD Opteron processors processor w/1GHz
HyperTransport− Model 285 (2.6GHz/1MB) dual-core− Model 254 (2.8GHz/1MB) single-core− Integrated memory controlling running at processor
frequency and support for AMD PowerNow!TM
• 8 DIMM slots supporting up to 16GB of 400MHz DDR1 memory (DDR3200)
• 2 PCI-X 64-bit/133 MHz slots (one full-length, one low-profile)
• Optional PCI-Express support @ x16 (full-length)• NHP SATA and SCSI Hard Disk Drive support• SATA RAID 0 & 1 Support (Optional SAS HBA)
Design & Connectivity• New bezel design with UID for easy
identification in large-scale rack deployments
• Simplified rack rail design and common 1U rail kit
• 4 USB ports: 2 front & 2 rear
Management• HP ProLiant Lights Out 100i Remote
Management • IPMI 1.5 • IPMI 2.0
low cost 1U server with two AMD Opteron 200 series processors,
delivering outstanding performance and price for both High Performance Computing (HPC) environments, and cost conscious server deployments.
SpeedbumpMarch 2006
HP IT-Symposium 2006
www.decus.de 8
HP – Decus IT-Symposium 2006 – www.decus.de 15
ProLiant DL145 G2 Overview
Ideal for large clustered High Performance Computing (HPC) environments and general purpose compute requirements for corporate datacenters and cost-conscious small and medium businesses.
Key Benefits– Maximum performance 2P/1U compute node at an affordable
price– Reliable and flexible infrastructure compute engine for
businesses of all sizes– Complete tools for essential system management – Tier 1 Vendor Service and Support
HP – Decus IT-Symposium 2006 – www.decus.de 16
DL145 G2 OS SupportFull Support for…• Microsoft Windows− 2003 Server (Enterprise, Standard, and Web)− 2003 Server for64-bit Extended Systems (Standard
& Enterprise)− 2000 Server and Advanced Server
• RedHat− Enterprise Linux AS 3 (32-bit & 64-bit), WS 3 (32-bit
& 64-bit), ES 3 32-bit & 64-bit), AS 4 (32-bit & 64-bit), WS 4 (32-bit & 64-bit), ES 4 32-bit & 64-bit)
• SuSE Linux ES8 and ES9 (32 bit & 64-bit)• Solaris 10 (64-bit)
HP IT-Symposium 2006
www.decus.de 9
HP – Decus IT-Symposium 2006 – www.decus.de 17
DL145 G2 Layout
(2) 133MHz PCI-X slots”
(1) full-length & (1) low-profile
slot; Optional
PCI-E @x16 in place of full-length
slot
2 Non-hot Plug SATA
or SCSI HDDs
500W Power Supply
6 Non-hot plug fans
Processor & Memory Modules
HP – Decus IT-Symposium 2006 – www.decus.de 18
Product Overview: 4U 4P - ProLiant DL585
Maximum Performance
•Up to four 852 series 2.6GHz AMD Opteron processors with 1MB L2 cache and on-board 2.6GHz memory controller for outstanding performance and scalability
•HyperTransport technology delivering 8GB/sec CPU to CPU throughput for maximum performance and scalability
•Up to 64GB 2-way interleaved DDR •PC2700: 64GB running at 266MHz or 48GB at 333MHz•PC3200: 32GB running at 400MHz
•8 expansion slots: 6 x 64-bit/100 MHz and 2 x 64-bit/133MHz PCI-X
•Dual-port Gbit NIC and Smart Array 5i Plus controller with Battery-backed Write Cache enabler
ProLiant Management•Powerful Integrated Lights-Out (iLO) technology embedded•Support for SmartStart and Systems Insight Manager
Outstanding Uptime•Advanced ECC memory protection•Hot Plug redundant power supplies and fans•Redundant ROMs
HP IT-Symposium 2006
www.decus.de 10
HP – Decus IT-Symposium 2006 – www.decus.de 19
Product Overview- ProLiant DL585
• Support for 4 800 series AMD Opteron Single and Dual Core processors and on-board full speed memory controller for outstanding performance and scalability
• HyperTransport technology delivering 8.0GB/sec CPU to CPU throughput for maximum performance and scalability
• Up to 128GB dual channel DDR1 at 266MHz (PC2700), 48GB at 333MHz(PC2700) or 32GB at 400MHz (PC3200)
• 6 x 64-bit/100 MHz PCI-X expansion slots and 2 x 64-bit/133MHz PCI-X slots• Dual-port Gbit NIC and Smart Array 5i Plus controller with Battery-Backed
Write Cache enabler
The industry’s top performing x86 4-way rack server, combining AMD’s new Opteron dual core processor technology, best-in-class management and high uptime features in a system ideal for large data center deployments.
• Powerful Integrated Lights-Out (iLO) technology embedded• Support for SmartStart and Systems Insight Manager
• Advanced ECC memory protection• Hot plug redundant power supplies and fans• Redundant ROMs
Performance
Management & Deployment
Uptime
HP – Decus IT-Symposium 2006 – www.decus.de 20
AMD Opteron™ Dual Core Overview• The AMD Opteron™ processor was
designed from the start to add a second core− Port already existed on crossbar/SRI− One die with 2 CPU cores, each core
has its own 1MB L2 cache• Drops into existing AMD Opteron 940-pin
sockets that are compatible with 90nm single core processors
• A BIOS update is all that is necessary to get a 2 processor/2 core server up and running as a 2 Processor/4 core server.
• The 2 CPU cores share the same memory and HyperTransport™ technology resources found in single core AMD Opteron processors− Integrated memory controller &
HyperTransport links route out the same as today’s implementation
CPU0
1MB L2 Cache
CPU1
System Request InterfaceCrossbar Switch
MemoryController HT0 HT1 HT2
Existing AMD Opteron™ Design
1MB L2 Cache
HP IT-Symposium 2006
www.decus.de 11
HP – Decus IT-Symposium 2006 – www.decus.de 21
Opteron Price-PerformanceSingle vs Dual Core
Price-Performance - SPEC rates
DL585 2.4 SC
DL585 2.6 SC
DL585 2.2 DC
Sun V40z 2.2 DC
DL145G2 2.6 SC
DL145G2 2.2 DC
75.0
100.0
125.0
150.0
175.0
200.0
225.0
250.0
5000 10000 15000 20000 25000 30000 35000 40000
List Price
Per
form
ance
SPE
Cin
t_ra
te +
SPE
Cfp
_rat
e (b
ase)
Better
Constant Price-PerformanceSource: www.spec.org
HP – Decus IT-Symposium 2006 – www.decus.de 22
HP extends its x86 Blade offerings:More performance & choice with 100% compatibility
Greater 32-bit performanceProLiant design consistencyTransparent 32/64-bit capabilitiesDual-core
Performance 2P blade server using AMD Opteron technology, providing the industry’s best blade performance, ideal for high performance blade deployments.•16GB RAM max•Up to 2 HP SCSI hard drives
BL25p
Double-dense 2P blade server using AMD Opteron technology, optimized for compute density and external storage solutions.•8 GB RAM max•2 SFF ATA or 1 SFF SAS drives with pre-failure alerting
BL35p
BL45p4P blade for superior scalability and Infrastructure Apps.•Occupies 2 server blade bays•32GB RAM max•Up to 2 HP SCSI hard drives
HP IT-Symposium 2006
www.decus.de 12
HP – Decus IT-Symposium 2006 – www.decus.de 23
HP ProLiant DL360 G4p• 1-2 Intel Xeon DP ( 3.4 und 3.6 GHz)• 2MB L2 Cache• Hyper-Threading Technologie• EM64T- 64 bit Erweiterung• 800 MHz Systembus• 1 – 12 GB DDR Hauptspeicher
(PC3200)• Online Spare Memory• 2 Hot Plug SCSI Festplatten (max.)
oder 2 SATA Festplatten (no dvd then)• Smart Array 6i Plus RAID Controller• 2 64-bit (133 MHz) PCI-X Steckplätze• Optinal PCI-Express• iLO Integrierter Management Prozessor• 2 integrierte Gigabit Netzwerkkarten• Optional redundante Netzteile• 1 Höheneinheit
HP – Decus IT-Symposium 2006 – www.decus.de 24
DL360 G4p with Dual Core Processors
Smart Array P600256MB BBWC includedRAID 0/1/5/6 in a slot
Smart Array 6iOptional 128MB BBWCRAID 0/1
RAID
4 SFF SAS Drive Bays2 Ultra320 SCSI Drive BaysMaximum internal drives
1 Available PCI-X Optional PCI-Express
2 PCI-XOptional PCI-Express
Slots
Dual Core 2.8GHz Intel Xeon; or up to 3.8GHz Intel Xeon processors; 2MB L2 cache per core; Up 12GB DDR2 memory
Up to 3.8GHz Intel Xeon processors with 2MB L2 cache; 1GB DDR2 memory standard
Processor/Memory
DL360 G4p SASDL360 G4p SCSI
Concentrated 1U compute powerFlexible, enterprise-class 1U server with integrated Light-Out management and essential fault tolerance
Dual Core Xeonavailable sinceFebruary ‘06
HP IT-Symposium 2006
www.decus.de 13
HP – Decus IT-Symposium 2006 – www.decus.de 25
HP Support Advantages• Access to 88,000 service professionals • 70 worldwide help desks for around-the-clock
support• 24 x 365 business critical support in 160
countries• Full range of startup, installation, extended
warranty, network planning, software updates, system health checks, recovery services, and IT outsourcing
• Instant Support Enterprise Edition (ISEE)− Single common support solution to manage
entire IT network− Filters events to identify all actionable service
events− Proactively reduces downtime risks – provides
quick recovery− Simplifies multi-vendor service management− Supports all ProLiant hardware & OS platforms
HP – Decus IT-Symposium 2006 – www.decus.de 26
More flexibility: More choice. HP does not force customers into one business model area. We provide a variety of solutions, allowing customers to purchase products where they feel most comfortable, such as from a trusted reseller. HP also provides complete CTO capabilities with its Factory Express solution.
• A pre-priced, pre-packaged, comprehensive and flexible portfolio of configured, customized and integrated factory solutions and deployment services
• Customers choose how their solution is built, integrated, tested, shipped and deployed.
Factory Express
Why We Win. The HP Difference
The ProLiant Advantage
HP IT-Symposium 2006
www.decus.de 14
Agenda
1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS
HP – Decus IT-Symposium 2006 – www.decus.de 28
HP Cluster Platforms• Factory integrated hardware solution with optional software
installation− Includes nodes, interconnects, network, racks, etc. integrated
& tested• Configure to order from 5 node to 1024 nodes (more by
request)− Uniform, worldwide specification and product menus− Fully integrated, with HP warranty and support
GigEProLiant BL35P/BL45P
GigE, IB, QuadricsIntegrity rx1620 Integrity rx2620
HP Cluster Platform 6000
GigE, IB, Myrinet, Quadrics
ProLiant DL145 G2ProLiant DL585
HP Cluster Platform 4000 and4000BL
GigE, IB, MyrinetProLiant DL360 G4 serverProLiant DL140 G2
HP Cluster Platform 3000
InterconnectsCompute Nodes
Dual Core
HP IT-Symposium 2006
www.decus.de 15
HP – Decus IT-Symposium 2006 – www.decus.de 29
Basic Elements in HP Cluster Platforms• Nodes:
− Control nodes (aka head node, mgmt node)− Compute nodes − Utility nodes (aka service nodes)
• Added nodes for special admin tasks (e.g., login, file)− Visualization nodes
• Network/switches: − Admin and cluster network (GigE)− Console network (leverages IPMI, ILO functionality in nodes)− Optional high-performance cluster interconnect
• Rack infrastructure− PDUs, monitor
• Software options• Integration
HP – Decus IT-Symposium 2006 – www.decus.de 30
Interconnect technology directions• Ethernet
− Integrated1GbE− Optional 10GbE w/ offload− Optional multifunction 1GbE
(e.g. trunking)
• InfiniBand− 4X DDR HCA & switches(for e.g. PCI-E in dl145)
• Myrinet− HP-MPI support for MX− Myri-10G
• QsNet− Elan4− QsNet III (Elan5)
• HTX options (?)
top-level switches (288 ports)
node-level switches (24 ports)
Connects to 12 nodes
HP IT-Symposium 2006
www.decus.de 16
HP – Decus IT-Symposium 2006 – www.decus.de 31
• InfiniBand− Emerging industry standard− IB 4x – speeds 1.8GB/s, 3.5μSec MPI latency− 24 port, 96 port, 288 port switches− Scalable topologies with federation of switches
• Myrinet− Rev F – speeds 489MB/s, 2.6μSec MPI latency− Rev E – speeds800MB/s, 2.7μSec MPI latency− 16 port, 128 port, 256 port switches− Scalable topologies with federation of switches
• Quadrics− Elan 4 – 800MB/s, <1.3μSec MPI latency− 8 port, 32 port, 64 port, 128 port switches− Scalable topologies with federation of switches
• GigE− 60-80MB/s, >40 μSec MPI latency
High Performance Interconnects
node-level switches (128 ports)
Connects to 64 nodes
top-level switches (288 ports)
node-level switches (24 ports)
Connects to 12 nodes
node-level switches (128 ports)
Connects to 64 nodes
top-level switches (264 ports)
PCI-e
HP – Decus IT-Symposium 2006 – www.decus.de 32
HP Cluster Platform Examples
32 compute node configuration
128 node configuration
Designed for expansion
HP IT-Symposium 2006
www.decus.de 17
HP – Decus IT-Symposium 2006 – www.decus.de 33
Front Rear
New additions to HP Cluster Platforms• CP3000 with dual-core Xeon
processors• Visualization building blocks• New CP4000BL based on BL35p
and BL45p blade servers, providing:− Simplified management− Performance and scalability− Reduced interconnect and network
complexity− High density− Centralized power management
• Cluster Platform Express− Faster, easier way of configuring and
ordering small clusters− Single-rack (up to 32 nodes) CP3000 and
CP4000, with GigE or Infiniband− Configuration tool at
www.hp.com/go/hptcHPC ClustersHP Cluster Platforms
HP Cluster Platform Express
HP BladeSystems
HP – Decus IT-Symposium 2006 – www.decus.de 34
XC Cluster: HP’s Linux-Based Production Cluster for HPC
• A production computing environment for HPC built on Linux/Industry standard clusters − Industrial-hardened, scalable, supported− Integrates leading technologies from open source and partners
• Simple and complete product for real world usage− Turn-key, with single system ease of deployment and
management− Smart provisioning takes the guess work and experimentation out
of deployment− Allows customer focus on computation rather than infrastructure
• General purpose technical computing flexibility− Supports throughput ‘farms’ as well as MPI parallel jobs− Plugs into ‘grid’ environments
HP IT-Symposium 2006
www.decus.de 18
HP – Decus IT-Symposium 2006 – www.decus.de 35
HP Delivers a Complete Solution
HP Cluster Platform
XC System Software
Linux OSCluster ManagerJob Scheduler (LSF)HP-MPISFS Client
NodesNetworksStorage
Compilers, Development
tools
Validated selection of Compilers,Math Libraries, Debuggers, Profiling Tools
ApplicationsExtensive portfolio of testedapplications
HP ServicesSupport and Consulting, Training,On site Staffing
HP – Decus IT-Symposium 2006 – www.decus.de 36
XC System Architecture
HP IT-Symposium 2006
www.decus.de 19
HP – Decus IT-Symposium 2006 – www.decus.de 37
Installation and configuration
• Head node configuration− Includes code replication
environment (systemimager)• “golden client”• Image server
• Other node configuration− Image installed via SystemImager in
a two phase process• Phase-1: node is generically
imaged using flamethrower (multicast)
• Phase-2: per-node personality is applied using configuration data
− At the end of this process, all nodes have been rebooted, and configured with their respective personality
• Smart provisioning: recommends and assigns roles, based on user preferences− Automated discovery of network
topoloty− Distributed service roles − Sets up firewall
HeadNode
SystemFiles
XCDistribution
Kickstart
HeadNode
SystemFiles
AdminNetwork
XCDB
SystemImagerPropagation
SystemImagerGolden Client
HP – Decus IT-Symposium 2006 – www.decus.de 38
Adding nodes• Simple set of commands to discover additional
nodes and identify associated switches• Define roles via cluster config commands• Image the nodes
• Cluster continues to be online and available
• Adding switches?−Discover switch command
HP IT-Symposium 2006
www.decus.de 20
HP – Decus IT-Symposium 2006 – www.decus.de 39
Monitoring • Nagios−An open source host, service and network monitoring
program −Monitoring of network services such as SMTP−Monitoring of host resources such as processor load−Simple plugin design that allows administrators to
easily develop their own service checks and define event handlers
−Parallelized service checks−Contact notifications when service or host problems
occurs and get resolved
• Info collected via Supermon
HP – Decus IT-Symposium 2006 – www.decus.de 40
XC Resource Management• SLURM and LSF−Scalability−Handling of STDIO, signals, etc…−Common underpinning to allow for PBS and/or other
batch schedulers
• LSF manages the user workload and creates demand for resources.
• SLURM manages the cluster resources and provisions the resources to workload queues.
HP IT-Symposium 2006
www.decus.de 21
HP – Decus IT-Symposium 2006 – www.decus.de 41
Manageability• Comprehensive and integrated system−Hardware and software−Supported by HP – includes patches and updates
• Automated discovery and smart provisioning• Central point of control and administration• Integrated job management− Includes job accounting functions from LSF
• Upgrade utilities
HP – Decus IT-Symposium 2006 – www.decus.de 42
XC System Software V3• Based on a Red Hat EL 4.0 AS compatible distribution• Open source utilities
− SLURM (for resource management)− SystemImager w/Flamethrower (cluster install and update )− Nagios and SuperMon (monitoring)− Syslog-ng (logging)
• Multiple LSF implementation choice:− Integrated with SLURM− LSF standalone, by pass SLURM− Bypass LSF install, and use 3rd party
• SFS support, with option to use as high performance global file system in cluster
• HP-MPI integrated• Full support and service - worldwide, extensive testing• Available on HP Cluster Platform 3000, 4000, and 6000
HP IT-Symposium 2006
www.decus.de 22
HP – Decus IT-Symposium 2006 – www.decus.de 43
ISVs and XC Clusters: Key applications tested and supported
Accelrys Materials Studio, Lion Biosciences, SCM ADF, Tripos, OpenEye…plus lots of open source code…and more coming
Life and Material Sciences
Abaqus, ACUSIM Acusolve, ADINA, ANSYS, AVL Fire, CD-Adapco Star-CD, ESI CFD/ACE, PamCrash/ PamFlow, Exa PowerFlow, Fluent, LSTC LS-Dyna, MSC Software Marc and Nastran, Mecalog Radioss, UGS Nastran….
CAE
Altair PBSPro, Globus Toolkit, Platform LSF and LSF Multicluster, United Devices MP Synergy
Grid and Resource Management
GO TO XC WEB SITE FOR LATEST UPDATE ON ISV SUPPORT!
http://www.hp.com/techservers/clusters/xc_clusters.html
HP – Decus IT-Symposium 2006 – www.decus.de 44
XC also available with large SMP nodes• HP Cluster Platforms built with rack-optimized
nodes – primarily 2P, along with 4P rx4640• Some customers desire a mix in cluster to enable
support for large memory jobs as well as typical distributed cluster applications
• XC Clusters have been deployed in multiple sites with modified HP Cluster Platform designs to incorporate nodes using HP Integrity rx8620 servers, with 16 way SMP.
HP IT-Symposium 2006
www.decus.de 23
HP – Decus IT-Symposium 2006 – www.decus.de 45
HP Services available for XC clusters• XC Clusters are rack integrated at the factory
− Includes staging and testing
• On-site customer installation by HP technicians• Required set-up and startup service from C&I • Standard product support and warranties
− Software: 90 day warranty− Hardware warranty = underlying server nodes− Standard support offerings available worldwide
• Optional services include: − Cluster integration management− Cluster system quickstart− Cluster applications quickstart− Training
HP – Decus IT-Symposium 2006 – www.decus.de 46
Performance Benchmarks• XC ranks at top of current TAP (Top Application Performance) list
maintained by Purdue http://www.purdue.edu/TAPlist/
HP IT-Symposium 2006
www.decus.de 24
HP – Decus IT-Symposium 2006 – www.decus.de 47
Linux Cluster Programming Environment
• Compilers− Intel Visual Fortran, Visual C++−Portland Group Inc−Pathscale−GNU
• Debuggers and Profilers−Etnus TotalView− Intel VTune, AMD OProfile
• Libraries−MPI - HP-MPI, MPICH, Linda, OpenMP−Math Libraries
• HP mlib, Intel MKL, AMD ACML
HP – Decus IT-Symposium 2006 – www.decus.de 48
New HP MPI V2.1: The Universal MPI for Linux, HP-UX, XC & Tru64• Broadens portfolio of applications−Transparent support for multiple interconnects
• TCP/IP; Quadrics; InfiniBand; Myrinet−Enables single executable for each OS (HP-UX, Linux,
Tru64 Unix)−Endorsed by major ISVs
• New Functionality and Performance Enhancements−MPI-2 support−MPICH compatibility−Profiling tools
• Available on non-HP platforms through our ISV partners supporting HP MPI
HP IT-Symposium 2006
www.decus.de 25
HP – Decus IT-Symposium 2006 – www.decus.de 49
HP CMU has 3 main features(Cluster Management Utility)
The Management featureThis will help you in your day to day administration. You can halt, boot, reboot or broadcast commands to a set of nodes.
The Cloning Feature
This will help you to deploy rapidly a golden image on all the nodes of a large cluster.
The Monitoring FeatureThis will help you to see rapidly the state of your cluster.HP CMU can warn you whenever the state of a node is changing
HP – Decus IT-Symposium 2006 – www.decus.de 50
HP CMU V2.0 Graphic User Interface
The root window lists the nodes on your cluster and provides you with a visual status of node activity.
You can select any number of nodes and apply a command to the entire group
On this single window you supervise the status of more than 1024 nodes.
A single click on a node cell opens a telnet or console session on the node.
HP IT-Symposium 2006
www.decus.de 26
HP – Decus IT-Symposium 2006 – www.decus.de 51
HP CMU Management feature
• Simply select the desired nodes and HP CMU will execute the command on all the nodes.
• HP CMU uses the ECI and RILOE/ILO management cards functionalities
HP – Decus IT-Symposium 2006 – www.decus.de 52
HP CMU Console Broadcasting
Broadcast typing
Broadcasted typingor direct access
HP IT-Symposium 2006
www.decus.de 27
HP – Decus IT-Symposium 2006 – www.decus.de 53
HP CMU Event Handling
• Node status is probed regularly over the network .When a node status changes (up to down or down to up), HP CMU can optionally :
• send a mail to configured user(s)• display a pop-up window• execute a script with node name and a status as argument.
HP – Decus IT-Symposium 2006 – www.decus.de 54
HP CMU Cluster Configuration tools
• The Scan Node tool automatically registers the nodes with their network parameters in the HP CMU database.
• The whole cluster configuration is handled in a single file and can be exported and imported for backup or replication
HP IT-Symposium 2006
www.decus.de 28
HP – Decus IT-Symposium 2006 – www.decus.de 55
CMU V3.0 backup
HP – Decus IT-Symposium 2006 – www.decus.de 56
HP CMU cloning Mechanism (phase 1)
HP IT-Symposium 2006
www.decus.de 29
HP – Decus IT-Symposium 2006 – www.decus.de 57
HP CMU cloning Mechanism (phase 2)
HP – Decus IT-Symposium 2006 – www.decus.de 58
HP CMU Monitoring interface – Cluster View
Alert raised
Group Summary
CPUusage
Node state
HP IT-Symposium 2006
www.decus.de 30
HP – Decus IT-Symposium 2006 – www.decus.de 59
HP CMU monitoring design goals
It is highly scalableThe nodes are divided into network entities (nodes on a same switch), and are reporting the monitoring results to a secondary server in their network entity.Each secondary server consolidate the data and send them to the management node
It is highly customizable The parameters to monitor are stored in a text file. For each parameter an independent script is associatedThe user can easily choose the parameters he wants but also define its own action (CPU consumption of its own application…)
It is highly adaptableThe monitoring daemons are totally independent from the GUI. So that the results of the cluster monitoring can be used by any other application if the appropriate plug-in is written
It is highly reliableThe whenever a monitoring daemon is not sending data anymore, it is automatically respaned by its master.
Agenda
1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS
HP IT-Symposium 2006
www.decus.de 31
HP – Decus IT-Symposium 2006 – www.decus.de 61
HP HPC solution = cluster of clusters ...
GigE ethernet for boot and system control traffic
10/100 Ethernet out-of-band management (power on/off, etc)
Connectivity to all nodes
…
HP SFS Cluster
OSSOSS
OSSOSSMDS
Admin
… …
ComputeCompute
ComputeComputeCompute
ComputeCompute
ComputeCompute
Admin
Connectivity to all nodes
Connectivity to all nodes
Connectivity to all nodes
Connectivity to all nodes
Connectivity to all nodes
ComputeComputeCompute
AdminLoginLoginLoginLogin
ComputeCompute
ComputeComputeComputeComputeComputeCompute
High Speed Interconnect (GbE, InfiniBand, Myrinet, Quadrics)
CampusNetworkCampusNetwork
Many Compute NodesLustre
Multiple OSS nodes…
HP – Decus IT-Symposium 2006 – www.decus.de 62
What is HP StorageWorks Scalable File Share?• Lustre™ file system• NFS• 1000+ Linux clients• 64 servers• 512TB – 1PB of
data • Best cost-
performance in the industry
HP StorageWorksScalable File Share
(HP SFS)
HP IT-Symposium 2006
www.decus.de 32
HP – Decus IT-Symposium 2006 – www.decus.de 63
Lustre- scalable high performance filesystem• New architecture – benefit of hindsight• Open source technology• Developed by Cluster File Systems−HP continuing involvement through DoD Pathforward
program (Hendrix). CMD, Security.• Separate and scalable metadata (MDT)• Object based storage (OST)• Highly efficient, network independent layer• Near linear scaling for clients and servers• POSIX compliant
HP – Decus IT-Symposium 2006 – www.decus.de 64
Lustre Komponenten
Recovery
File Status
File Creation
System & Parallel File I/O
File Locking
Directory,
Metadata & Concurrency
ObjectStorageTargets
ObjectStorageTargets
MetadataServers
MetadataServers
ObjectStorageTargets
ObjectStorageTargets
ClientsClients
MetadataServers
ObjectStorageTargets
ClientsClientsClientsClientsClients
Configuration information,
Network connection details
& Security management LDAP Server
HP IT-Symposium 2006
www.decus.de 33
HP – Decus IT-Symposium 2006 – www.decus.de 65
Lustre Key Features• Scalable Performance (100s of OSTs, 1000s of clients)• Separation of meta-data handling from I/O processing
• Striping of file data across multiple OSTs0 841 5 92 6 103 7 11
0 4 8 1 5 92 6 10 3 7 11
12MB file
OST4OST1 OST2 OST3
striped across 4 OSTs
OSTMDT Client
open()
close()
read()
write() + Failover !
HP – Decus IT-Symposium 2006 – www.decus.de 66
HP SFS- converting technology to product
• HP SFS = “Scalable File Share”.− Product offering that provides a
reliable global file system for Linux clusters
− Lustre open-source technology + HP management tools + hardware
− Balanced system with optimisedperformance
• Integration, qualification and support− One stop shop for complete Lustre
solution− Global support by HP Services
• System management− System administration− Additional features to deliver a “whole
product”
hp qualification
hp Storage
hp Servers
hp supportHP SFS
hp Sysmantools
Interconnect
HP IT-Symposium 2006
www.decus.de 34
HP – Decus IT-Symposium 2006 – www.decus.de 67
HP SFS Hardware ArchitectureHP SFS SystemServers grouped into cells1-32 cells per systemEach cell can perform a number of roles
HP ProLiant DL380ServerHigh bandwidth I/O Paired servers provide redundancy
HP SFS20 Storage EnclosureDeveloped for HP SFSDual attached for redundancy12 SATA Disks (250GB)2TB storage per enclosure2-8 enclosures per cell
SFS CellSFS Cell
Linux clients
SFS CellSFS CellSFS
CellSFS CellSFS CellSFS CellSFS
CellSFS Cell
Interconnect
HP – Decus IT-Symposium 2006 – www.decus.de 68
X-Large HP SFS/SFS20 Configuration240 TB usable, 11 GB/s read, 9.2 GB/s write
Base Cab Expansion CabsExpansion Cabs
• Base cabinet: 2 MDSes, 2 OSS, switches, console− 16 TB usable, 0.7 GB/s read, 0.6 GB/s write (on ELAN4)
• Expansion cabinets: 4 OSSes w/4 SFS20s per OSS− Per cabinet: 32 TB usable, 1.5 GB/s read, 1.2 GB/s write (ELAN4)
• Total depicted: 2 MDSes and 30 OSSes (on ELAN4)
HP IT-Symposium 2006
www.decus.de 35
HP – Decus IT-Symposium 2006 – www.decus.de 69
Storage- performance and availability• SFS20−Highly competitive cost/performance− 2TB per active SFS20, 8 SFS20s per OSS pair−Dual host-connected for failover −Raid5; Raid6 (ADG) optionally mirrored.
• EVA3000 (EVA4000)−Enterprise class storage−Dual controller; dual fabric; virtual Raid5
HP – Decus IT-Symposium 2006 – www.decus.de 70
SFS20 Product Overview
StorageWorks Modular Smart Array 20 (MSA20)Low-cost, high capacity, external storage ideal for low I/O workloads such as reference data, archival, and disk-to-disk backup
2U, Serial-ATA to U320 SCSI, external storage array
HP IT-Symposium 2006
www.decus.de 36
HP – Decus IT-Symposium 2006 – www.decus.de 71
HP SFS V2.1 – current, since Jan 2006• Dual GigE support−Double bandwidth
• SFS20− 500GB disk support
• DL360 G4p server support• Misc updates− Interconnect versions−Clients
• Failover on interconnect failure• Improved monitoring
HP – Decus IT-Symposium 2006 – www.decus.de 72
HP SFS V.south – Q2 2006• EVA4000−Approx double performance c/f EVA3000−FC dual path
• Lustre 1.4.6+−Quotas−ACLs− Improved networking configuration
• Enhanced systems management− Insight Manager integration
• Server software−RHEL4 / rh2.6
HP IT-Symposium 2006
www.decus.de 37
HP – Decus IT-Symposium 2006 – www.decus.de 73
HP SFS performance
HP SFS Read + Write (1 and 2 OSS each with 4 SFS20)
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
clients
MB
/s
Aggregate Read - 2 OSSAggregate Read - 1 OSSAggregate Write - 2 OSSAggregate Write - 1 OSS
HP – Decus IT-Symposium 2006 – www.decus.de 74
http://www.mscsoftware.com/support/prod_support/nastran/performance/v0109_sngl.cfm
Elapsed/CPU times for XXCMD benchmark :
HP IT-Symposium 2006
www.decus.de 38
HP – Decus IT-Symposium 2006 – www.decus.de 75
MSC.Nastran V2001.0.9 Serial Test Results
XXCMD with ACMS 730 Gb31 Gb400 Mb103Car Body1,584,622XXCMDA
1073 Roots 2400 Gb43 Gb800 Mb103Car Body1,584,622XXCMD
32 Frequency Increments 209 Gb5 Gb450 Mb108Car Body529,027XLTDF
77 Gb10 Gb400 Mb101Propeller Housing2,490,516XXAFST
3 Design cycles, 500+roots1500 Gb26 Gb1700 Mb200Car Body486,573XLOOP
Acoustics 448 + 34 Roots 328 Gb11 Gb400 Mb111Car Body654,560XLEMF
76 Frequency Increments 700 Gb0.6 Gb100 Mb108Cube w/ interior31,125LGQDF
CommentsTotal I/OSCR Disk UsedMEMSOLDescriptionNdofName
From MSC’s webpage :
HP – Decus IT-Symposium 2006 – www.decus.de 76
xxcmd sfs / dasMSC.Nastran freq. (med i/o)
0
5000
10000
15000
20000
25000
1 2 4 8 16 32
# hosts
time
(sec
)
sfs 1 per host
sfs 2 per host
msa 1 per host
msa 2 per host
Preliminary application i/o measurements for sfs
Data from : Mark Kelly, HP Richardson