HPC SOLUTIONS AT HP
Andrei BalcuConsultant Tehnic HP Romania
HPC FabricsDatacenter Power & Cooling
HPC Software Infrastructure
Purpose Built HPC Servers
Purpose BuiltHPC Storage
HPC is built on a converged infrastructure
SERVERS IN HPC
SERVERS IN HPC
CPU’s : what’s new ?
20112010
G6/G7 G7 and Gen8
Magny-Cours12 cores, 12M L3
ValenciaBulldozer Core (8 cores)
Lisbon6 cores, 6 M L3
InterlagosBulldozer Core (16 cores)
2012
Opteron 6100 series
Opteron 4000 series
2 mem channels
4 mem channels
2 mem channels
4 mem channels
Sandy Bridge8 cores, 20M L31600+ DDR3, Sockets R B2
Westmere-EP6 cores, 12M L3
130/95/80/60/40W, 1333 MHz DDR3
Xeon 5600 series
Nehalem-EX8 cores, 24 MB L3
Up to 130W
Xeon 6500 & 7500 series
3 mem channels
4 mem channels
G6/G7
Westmere-EX10 cores, 30 MB L3
Up to 130WG7
Gen8
Xeon E5-2600 series
Opteron 6200 series
SERVERS IN HPC
Next generation servers
Intel® Westmere-EP vs. Intel® Sandy Bridge-EP-EN
FeatureWestmere-EP E5-2600 (EP) E5-2400 (EN)
Cores Up to 6 cores / 12 threads Up to 8 cores / 16 threads
Cache Size 12 MB Up to 20 MB
Max Memory Channels per Socket
3 4 3
Max Memory Speed 1333 MHz 1600 MHz
New Instructions AES-NI Adds AVX
QPI frequency 6.4 GT/s Up to 8.0 GT/s
Inter-Socket QPI Links 1 2 1
PCI Express • 36 Lanes PCIe2* on Chipset 40 Lanes/Socket Integrated PCIe3
24 Lanes/Socket Integrated PCIe3
Server/Workstation Power TDP
Server/Workstation: 130W, 95W, 80W, LV (Low
Power)
150 (Workstation Only)130, 115, 95, 80,
70, 60 (Low power)
95, 80, 70, 60, 50 (Low Power)
HP FlexNet Adapters
HP Smart Storage
Insight Online
Innovation beyond industry standards
HP ProLiant Gen8 Marquee Features
iLO Management Engine
ProLiant System Architecture
Sea of Sensors 3D
INTELLIGENTPROVISIONING
AGENTLESS MANAGEMENT
ACTIVE HEALTH SYSTEM
REMOTE SUPPORT
Ready to deploy and update
without the need for HP discs or
downloads
Base hardware health monitoring
and alerting without OS agents
Continuously running
diagnostics to minimize downtime
Built-in phone-home function to ease setup and configuration
Cloud-enabled embedded management throughout all ProLiant Gen8 platforms
Core lifecycle management functions built in for instant availability
iLO Management Engine
HP FlexLOM - Grow Your Environment Without ComplexityChange ready for future proofing and adaptable infrastructure
Provides choice
• Upgrade options of 1Gb and 10Gb
Choose your fabric
• Ethernet, FlexFabric, Flex-10, Infiniband
Universal
• Available on all BL, SL and select DL servers
Flexible
• Supports shared iLO port like the traditional
LOM1 1 LOM is short for LAN on motherboard. The term refers to a
chip or chipset capable of network connections that has been embedded directly on the motherboard of a server
Gen8 Smart Array InnovationsIncreased performance, data availability and storage capacity
Faster access to data• Up to 2X performance
improvement*• 2X Cache (up to 2 GB)
Address explosive data growth• 2X Drives supported (up to 227)
Minimize data loss• Long term data retention with Flash Backed Write Cache standard
Reduce initial setup time • 95% reduction in parity initialization from several days to 5 hours**
*256KiB, Sequential write, RAID 5 with 15K SAS drives, performance will vary based on configuration** HP R & D, Validation information TBD
Over 5 Million SAS Smart Array controllers sold! Continuing the legacy of innovation with Gen8
Lower power, faster and more reliable
HP SmartMemory
• 15 - 20% less power than 3rd party memory at 3DPC for DDR3-1333 1.35V RDIMM and DDR3-1333 LRDIMM
• 25% greater throughput at either 1DPC or 2DPC versus 3rd party memory for DDR3-1333 UDIMM
• Genuine HP Qualified memory reliability assured by unique electronic signature
Workload optimized, engineered for any demand
Industry’s most complete portfolio for HPC
13
ProLiant DL Family
Versatile, rack-optimized servers with a balance of efficiency,
performance and management
ProLiant BL Family
Cloud-ready converged infrastructure
engineered to maximize every hour, watt and
dollar
Purpose built for the world’s most
extreme data centers
ProLiant SLFamily
Workload optimized, engineered for any demand
Industry’s most complete portfolio for HPC
14
ProLiant DL Family
Versatile, rack-optimized servers with a balance of efficiency,
performance and management
ProLiant BL Family
Cloud-ready converged infrastructure
engineered to maximize every hour, watt and
dollar
Purpose built for the world’s most
extreme data centers
ProLiant SLFamily
The world's leading server blade
Snap 1
ProLiant BL460c Gen8
The first server blade to deliver over 2,000
cores per rackSnap 1
ProLiant BL465c Gen8
Breakthrough server blade economics for essential enterprise
workloadsSnap 2
ProLiant BL420 Gen8
HP ProLiant BL400c Series Positioning
HP ProLiant BL460c Gen8 Overview• As the world's leading server blade, the ProLiant BL460c Gen8 offers
the ideal balance of performance, scalability, and expandability.
• This makes it ideal for:
• Heterogeneous datacenters and a wide variety of mainstream businesses
• HPC scale-out applications for small, medium, and enterprise data centers
• Key workloads include:
• Virtualization/consolidation
• IT infrastructure (file & print, networking, security, systems management, etc.)
• Web infrastructure (web serving, streaming media, etc.)
• Collaborative (e-mail, workgroup, etc.)
HP ProLiant BL420c Gen8 Overview•The BL420c Gen8 delivers breakthrough server blade economics for essential enterprise workloads. It provides the perfect balance of price, performance, and high availability in the enterprise space.
•This makes it ideal for:
• Mid-market and cost-sensitive enterprise customers
• Service Providers who prefer the manageability of blades
• Scale-out
•Key workloads include:
• Web Hosting/Services in the Enterprise space
• Single application on a single server
• IT Infrastructure (File & Print, Networking, Security & Systems Mgmt)
BL420c Gen8 BL460c Gen8
Processor Intel® Xeon® E5-2400 Series Intel® Xeon® E5-2600 Series
Chipset Intel® C600
Memory (12) DDR3, RDIMM/UDIMM, up to 1333MHz (16) DDR3, /RDIMM/UDIMM/LRDIMM/ LVDIMM
Max Memory 384GB (12 DIMMs x32GB) 512GB (16 DIMMs x32GB)
Internal Storage
2 SFF HP HDD SAS, SATA, SSDDynamic Smart Array B320i RAID controller
2 SFF HP HDD SAS, SATA, SSDSmart Array P220i controller
Max Internal Storage
2TB SAS; 2TB SATA; 1.6TB SSD
Networking (1) Dual Port networking daughter card: 1GbE, 10GbE, Flex-10, or FlexFabric
I/O Slots (2) PCIe Gen3: (1) x8 Type A mezzanine; (1) x16 Type B mezzanine
Integrated Management
HP iLO Management Engine, SIM, IRS - Optional: HP Insight Control, iLO Advanced
Form Factor Half-height c-Class server blade16 blades per c7000 (10U) enclosure; 8 blades per c3000 (6U) enclosure
Workload optimized, engineered for any demand
Industry’s most complete portfolio for HPC
21
ProLiant DL Family
Versatile, rack-optimized servers with a balance of efficiency,
performance and management
ProLiant BL Family
Cloud-ready converged infrastructure
engineered to maximize every hour, watt and
dollar
Purpose built for the world’s most
extreme data centers
ProLiant SLFamily
• Next generation NVIDIA Tesla performance• Up to 30% higher performance with M2090,
combined computation and visualization with M2070Q
• Optional HP PCIe IO Accelerator• Integrated solid state storage device to
accelerate I/O bound applications
• Future: Intel® Many Integrated Core (MIC) • Accelerate highly parallel applications, using
the standard IA instruction set
Integrated accelerator solutions for the SL200s family
Driving new levels of performance/$/watt/ft2
• Shared power & fans for reduced component quantity and increased power efficiency
• Ability to mix and match SL half-width nodes
• Front cabling for increased rear air-flow and ease of serviceability
• Individually serviceable nodes
*Needs1200mm deep racks
• SL230 –Socket-R, ultra-dense server for virtualization and HPC applications (1U)
• SL250 –Socket-R, hybrid-compute node for GPU computing and data base applications in HPC (2U)
• SL270 –Socket-R, high-performance GPU solution, optimized for extreme GPU density (4U)
• SL140 – Socket-B, cost-effective, power-efficient and ultra-dense solution (1U)
SL140SL230 SL270SL250
SL140s Gen8 SL230s Gen8 SL250s Gen8 SL270s Gen8Processor E5-2400 - 4/6/8 Cores E5-2600 - 4/6/8 Cores
Chipset Intel® C600
Memory12xDR3,
RDIMM/UDIMM,up to 1333MHz –ECC
16xDDR3, RDIMM/UDIMM up to 1600MHz-ECC
Max Memory 256GB 512GB
Internal Storage
2 LFF NHP4 SFF NHP
Opt: 2 SFF HP
2 LFF NHP 4 SFF NHP
Opt: 2 SFF HP
4 SFF HP2 LFF NHP 8 SFF HP
Max Internal Storage
4TB 3.5” SAS; 1.2TB 2.5” SAS; 6TB SATA;
480GB 2.5” SSD
4TB 3.5” SAS; 1.2TB 2.5” SAS; 6TB 3.5” SATA; 2TB
2.5” SATA; 480GB 2.5” SSD
2TB 2.5” hot plug SAS; 1.2TB 2.5” non-hot plug SAS; 2TB
2.5” hot plug SATA; 2TB 2.5” SATA; 480GB 2.5” SSD
4TB SAS; 4TB SATA; 960GB SSD
Networking1x Integrated NC366i
Dual Port Gigabit Server Adapter
1x Integrated NC366i Dual Port Gbe1xDual Port networking daughter card: QDR IB, 10GbE
I/O Slots1xPCIe Gen3: 1x16 HL/LP 1xPCIe Gen3: 1x16 HL/LP
4xPCIe Gen3: 1x8 HL/LP; 3x16 HL/LP
9xPCIe Gen3: 1x8 HL/LP; 8x16 HL/LP
Integrated Management HP iLO Mgt Engine, SIM, IRS Opt: HP Insight Control, iLO Adv
Form Factor1U HW -
8 trays per s6500 (4U) 1U HW –
8 trays per s6500 (4U) 2U HW –
4 trays per s6500 (4U) 4U HW –
2 trays per s6500 (4U)
HP ProLiant SL250s Gen8 2U Half Width Tray
16 DIMM Slots(Below GPU Tray)
2 Socket-R CPUs(Below GPU Tray)
PCIe Expansion Slot
Flex Fabric Slot
Management Port – iLO4
2- 1GbE Ports
Rear GPU or NHP HDDs
4 Nodes per 4U chassis8 CPUs per 4U chassis12 GPUs per 4U chassis
2 GPU Tray
4 HP SFF
INTERCONNECTS IN HPC
HPC Interconnects• Bandwidth (large data exchanges)• Latency ( microseconds )• Scalability: stay efficient even for a high number of links
•Can also accommodate I/O traffic
• Two HPC interconnects:
• Ethernet (1 GigE, 10 GigE 40GigE)
• Infiniband
IBTA specification
• Focus on partnership– Work with technology providers.
• Focus on qualification, integration, efficient supply chain– Rigorous quality testing and control– Efficient supply chain management
• IB products have one basic element : the ASIC. 2 providers : Mellanox or Qlogic.
• HP integrates IB switches from 2 providers : Mellanox and Qlogic (used to be 3 providers with Voltaire)
• We run Benches and tests for all HCA and components.• We qualify HCA, switches, cables on our platforms. • We verify the interopability of MLX and Qlogic.
HP Infiniband strategy
QSFP FDR Cables
HP 56Gbps FDR InfiniBand Portfolio
Unified Fabric Manager(UFM)
Installed-base QDR switchese.g. 4036E
ConnectX-3 HCAs in servers HP Systems Integration
In 2012: FDR Chassis aggregation switches
Acceleration Software
FDR 36-port edge switch
STORAGE IN HPC
Mix HP Storage in HPC cluster
• HP X9000 Network Storage System• Small files• High metadata operation rates• Wide access• /home typically...
• Lustre file system with optimized HPC focused hardware• Extreme sequential bandwidth• “True” parallel I/O
• several writers to same file• Or high single stream throughput• /scratch, /work typically…
HP Storage(X9000, P4000, MDS600) Lustre / DDN SFA10K
Many Applications, or instances of the same application
“One” parallelized application
Each one of many servers is running a single applications instance, up to one instance per core or VM
Parallelized applications are spread across multiple servers. May use MPI to communicate
Each has its own file/data set. Reading and writing to a single file
Many Metadata operations (IOPS) Few Metadata Operations (IOPS)
Metadata is distributed across multiple servers A single server for metadata is enough
Datasets are distributed across multiple servers to balance performance
Dataset is striped across multiple storage servers, for maximum read/write bandwidth
Typical applications: HLS (Next generation sequencing, biosciences /genomics NGS), media (animation render farms), public sector (content depots), financial services
Typical applications: computer-aided engineering, molecular modeling, high-energy physics
MANAGEMENT SW IN HPC
CMU : Cluster Management Utility
Insight CMU v7.0 7.0 (FEBRUARY 2012)
Hyperscale cluster lifecycle management software
Proven– 10 years+ in deployment, Top500 sites included with1000’s of
nodes
Built for Linux, with support for multiple Linux distributions• Including Hybrid support w/Windows
HP Insight CMU
Provision• Simplified
discovery,
firmware audits
• Fast and scalable
cloning
Monitor• ‘At a glance’
view of entire
system; zoom to
component
• Customizable
• Lightweight
Control• GUI and CLI
options
• Easy, friction-
less control of
remote servers
38April 2009
Worldwide CMU Deployments
HP ships 2 CMU clusters per week WW
UNIVERSITIES
GOVERNMENT and RESEARCH LABS
ENGINEERING
ENERGY
39
CMU main functionalities
DeploymentImaging (cloning)Autoinstall (kickstart|autoyast|preseed) Diskless
Scalable live monitoringScalable non intrusive monitoring engine (+collectl)Monitoring GUI / monitoring API
Day to day administrationinteractive cli ( + cmu_* linux commands)cmudiff, command broadcast multiple window broadcast (one window per host)single window PDSH, one command on all the hostsGUI (JAVA based for the desktop)
Time View
CMU Backup / Cloning Feature
Needs:
Setup of cluster is painful.System management of HPC clusters is difficult due to the large number of nodes.Cloning goals:
Avoid ‘one by one’ system installation on compute nodesFast Cluster installation with an optimised cloning mechanism
HOW:
Install one compute node Backup that compute node ( golden image )
Duplicate that golden image to all compute nodes
Diskless Installation
43
• Large Scale Diskless Support– When Diskless nodes are installed, the FS of the compute nodes completely runs via NFS, while the OS is loaded in RAM.
– Existing NFS-root based diskless support expanded to allow for multiple NFS servers
– Up to 4k diskless compute nodes
45
• CMU provides new binary for extracting GPU metric data from GPU driver–/opt/cmu/tools/cmu_get_nvidia_gpu
• New command cmu_config_nvidia to configure GPU monitoring–Configures load, mem_util, mem_alloc, power_state, and ECC_double_bit alerts by default
–Power_usage, various clock speeds, fan speeds, and temperature also configured but commented out by default
CMU GPGPU Support
THANK YOU