18
1 DELL Understanding the Performance Impact of Non-blocking vs. Blocking InfiniBand Cluster Configurations on DellM-Series Blades END-TO-END COMPUTING By Munira Hussain, Vishvesh Sahasrabudhe, Bhavesh Patel (Dell) and Gilad Shainer (Mellanox Technologies) Dell │ Enterprise Solutions Engineering www.dell.com/solutions

DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

Embed Size (px)

Citation preview

Page 1: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

1

DELL

Understanding the Performance

Impact of Non-blocking vs. Blocking

InfiniBand Cluster Configurations on

Dell™ M-Series Blades

END-TO-END

COMPUTING

By

Munira Hussain, Vishvesh Sahasrabudhe,

Bhavesh Patel (Dell)

and

Gilad Shainer (Mellanox Technologies)

Dell │ Enterprise Solutions Engineering

www.dell.com/solutions

Page 2: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

2

THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND

TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF

ANY KIND.

Dell, the Dell logo, and PowerEdge, are trademarks of Dell Inc; Intel and Xeon are registered trademarks and Core is

a trademark of Intel Corporation in the U.S and other countries; ATI is a trademark of AMD; Microsoft and

Windows are either trademarks or registered trademarks of Microsoft Corporation in the United States and/or

other countries. Red Hat and Red Hat Enterprise Linux are registered trademark of Red Hat, Inc.; SUSE is a

registered trademark of Novell, Inc. in the United States and other countries;

Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks

and names or their products. Dell disclaims proprietary interest in the marks and names of others.

©Copyright 2008 Dell Inc. All rights reserved. Reproduction in any manner whatsoever without the express written

permission of Dell Inc. is strictly forbidden. For more information, contact Dell.

Information in this document is subject to change without notice.

Page 3: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

3

Table of Contents

Introduction .................................................................................................................................................. 5

An Overview of InfiniBand ............................................................................................................................ 5

The Basics of the InfiniBand Fabric ........................................................................................................... 5

The InfiniBand Power Advantage ............................................................................................................. 6

The InfiniBand Performance Advantage ................................................................................................... 7

InfiniBand Software Solutions ................................................................................................................... 7

InfiniBand Growing Role in the Data Center ............................................................................................. 7

PowerEdge M1000e Architecture ................................................................................................................. 8

Midplane Fabric Connections ................................................................................................................... 9

I/O Communication Paths in Dell PowerEdge M1000e .......................................................................... 10

Server Blades with InfiniBand ConnectX Mezzanine Cards .................................................................... 11

InfiniBand Configuration ......................................................................................................................... 13

Performance Study and Analysis ................................................................................................................ 14

Cluster Test Bed Configurations .............................................................................................................. 14

Hardware Configurations .................................................................................................................... 14

Software Configurations ..................................................................................................................... 14

InfiniBand Blocking Configurations ......................................................................................................... 14

Fully Non-blocking Configuration ....................................................................................................... 14

50% and 75% Blocking ........................................................................................................................ 15

Benchmarking and Analysis: NAS Parallel Benchmark ............................................................................... 16

Summary and Conclusion ........................................................................................................................... 17

References .................................................................................................................................................. 18

Page 4: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

4

Page 5: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

5

Introduction With the launch of the Dell™ PowerEdge™ M-Series modular enclosure, Dell released a new set of

switches designed to give more value and flexibility than any before them. An increasingly important

part of Dell's modular switch lineup, Dell's InfiniBand switch provides a low-latency, high-throughput

option for many data centers and high performance computing clusters.

As InfiniBand increases its market presence, the need for InfiniBand module flexibility increases.

Previously on Dell's PowerEdge 1955 enclosure, Dell provided an InfiniBand Pass Through module which

provided one to one throughput out of each blade to the external InfiniBand infrastructure. In order to

provide more flexibility to support different types of InfiniBand environments, the Dell M-Series

supports an internal InfiniBand switch which has one port out for every two servers per module.

This whitepaper demonstrates how supporting one to one, non-blocking InfiniBand architecture (IBA) is

possible utilizing Dell's new InfiniBand switch.

An Overview of InfiniBand As the I/O technology with the largest installed base of 10, 20 and 40 Gb/s ports in the market (over 3

million ports by end of 2007), InfiniBand has clearly delivered real-world benefits as defined and

envisioned by the InfiniBand Trade Association (www.InfiniBandta.org), an industry consortium formed

in 1999. There are several factors that have enabled InfiniBand adoption in data centers and technical

compute clusters to quickly ramp and explain why it will continue to be the performance computing and

storage fabric of choice.

The Basics of the InfiniBand Fabric InfiniBand fabrics are created with Host Channel Adapters (HCA) and Target Channel Adapters (TCA) that

fit into servers and storage nodes and are interconnected by switches that tie all nodes together over a

high-performance network fabric.

The InfiniBand Architecture is fabric designed to meet the following needs:

High bandwidth, low-latency computing, storage and management over a single fabric

Cost-effective silicon and system implementations with an architecture that easily scales from

generation to generation

Highly reliable, available and scalable to tens of thousands of nodes

Exceptionally efficient utilization of compute processing resources

Industry-standard ecosystem of cost-effective hardware and software solutions

Page 6: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

6

Figure 1: Typical InfiniBand Architecture

With a true cut-through forwarding architecture and well defined end-to-end congestion management

protocol, InfiniBand defines cost-effective and scalable I/O solutions. Switch silicon devices support from

twenty-four 20 Gb/s to thirty-six 40 Gb/s InfiniBand ports, which equates to nearly three terabit per

second of aggregate switching bandwidth.

Switches and adapters support up to 16 virtual lanes per link to enable granular segregation and

prioritization of traffic classes for delivering Quality of Service (QoS).

InfiniBand also defines an industry-standard implementation of Remote Direct Memory Access (RDMA),

protocols and kernel bypass to minimize CPU overhead allowing computing resources to be fully used on

application processing rather than network communication.

InfiniBand is clearly driving the most aggressive performance roadmap of any I/O fabric, while remaining

affordable and robust for mass industry adoption.

The InfiniBand Power Advantage InfiniBand technology does not only provide a cost effective high-performance interconnect solution,

but also a very low power solution, of 5W or less per 20Gb/s or 40Gb/s InfiniBand port. Coupled with

high-performance and ability to consolidate clustering, networking and storage, a single InfiniBand

adapter can replace multiple legacy Clustering, Ethernet and Fibre Channel adapters to provide

significant power saving to the data center. These advantages are making InfiniBand a vital interconnect

for server blades.

Page 7: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

7

The InfiniBand Performance Advantage One of the key reasons that data centers are deploying industry-standard InfiniBand is the total

application level performance the fabric enables. First, InfiniBand is the only shipping solution that

supports 20Gb/s and 40Gb/s host connectivity and 60Gb/s and 120Gb/s switch to switch links. Second,

InfiniBand has world-class application latency with measured delays of 1μs end to end. Third, InfiniBand

enables efficient use of all of the processors and memory in the network by offloading all of the data

transport mechanisms in the adapter card and reducing memory copies. These three metrics combine to

make InfiniBand one of the industry’s’ most powerful interconnect.

The performance benefits are echoed in the trends of the Top500.org list that tracks the world’s most

powerful supercomputers. Published twice a year, this list is increasingly used as an indication of what

technologies are emerging in the clustered and supercomputing arena.

InfiniBand Software Solutions Open source and community-wide development of interoperable and standards-based Linux and

Microsoft® Windows® stacks are managed through the Open Fabrics Alliance. This alliance, consisting of

solution providers, end-users and programmers interested in furthering development of the Linux or

Windows stacks, has successfully driven InfiniBand support into the Linux kernel and gained WHQL

qualification for Microsoft Windows Server®. The successful inclusion of InfiniBand drivers and upper

layer protocols in the Linux kernel insures interoperability between different vendor solutions and will

ease the deployment of InfiniBand fabrics in heterogeneous environments.

From an application point of view, InfiniBand has support for a plethora of applications in both

enterprise and high-performance computing environments. In the enterprise environment, InfiniBand is

being used for grid computing and clustered database applications driven by market leaders. In the

commercial high-performance computing field, InfiniBand provides the fabric connecting servers and

storage to address a wide range of applications including oil and gas exploration, automotive crash

simulations, digital media creation, fluid dynamics, drug research, weather forecasting and molecular

modeling just to name a few.

InfiniBand Growing Role in the Data Center Data centers simultaneously run multiple applications and need to dynamically reallocate compute

resources between applications depending on end user workload. To meet these needs the network

fabric must seamlessly support compute, storage, inter-process communication, and management

traffic.

The emergence of virtual and grid computing solutions in addition to robust software solutions have set

the stage for mass deployment of InfiniBand in business and utility computing environments.

Industry-standard InfiniBand has the performance, proven reliability, manageability and widely available

software solutions making it ready for prime time.

Page 8: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

8

PowerEdge M1000e Architecture The Dell PowerEdge M1000e Modular Server Enclosure is a breakthrough in enterprise server

architecture. The enclosure and its components spring from a revolutionary, ground up design

incorporating the latest advances in power, cooling, I/O, and management technologies. These

technologies are packed into a highly available rack dense package that integrates into standard Dell and

3rd party 19” racks.

The PowerEdge M1000e enclosure is 10U high and provides the following features:

Up to 16 server modules.

Up to 6 network & storage I/O interconnect modules.

A high speed passive midplane that connects the server modules in the front and the power, I/O,

and management infrastructure in the rear of the enclosure.

Comprehensive I/O options that support dual links of 20 Gigabits per second today (with 4x DDR

InfiniBand) with future support for even higher bandwidth I/O devices when those technologies

become available. This support provides high‐speed server module connectivity to the network and

storage today and for well into the future.

Thorough power management capabilities including delivering shared power to ensure full capacity

of the power supplies available to all server modules.

Robust management capabilities including private Ethernet, serial, USB, and low-level management

connectivity between the Chassis Management Controller (CMC), the keyboard/video/mouse

switch, and server modules.

Up to two Chassis Management Controllers (CMC‐ 1 is standard, 2nd provides optional redundancy)

and 1 optional integrated Keyboard/Video/Mouse (iKVM) switch.

Up to 6 hot pluggable, redundant power supplies and 9 hot pluggable, N+1 redundant fan modules.

A system front control panel featuring an LCD display; two keyboard/mouse USB connections; and

one video “crash cart” connection.

Page 9: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

9

Figure 2: Dell PowerEdge M1000e Front View

Midplane Fabric Connections Dell M1000e Blades support 3 fabrics i.e. Fabric A, B and C. Fabric A consists of the dual integrated 1Gb

Ethernet controllers connected directly from the Blade to I/O modules A1 and A2 in the rear of the

enclosure. Fabric A is always an Ethernet fabric and will not be discussed further in this document.

Both Fabric B and C are supported through optional Mezzanine cards on separate x8 PCI Express lanes.

Fabric B and C can support 2 ports and each port has 4 lanes (1 lane consists of both Transmit and

Receive differential signals) connected from Mezzanine connector to I/O module as shown in Figure 3

and Figure 4. The InfiniBand Mezzanine card can be installed in either Fabric B or C on the Blades.

Fabric B and C I/O Modules receive 16 sets of signals, one set from each blade. Fabric TX and RX

differential pairs are the high speed routing lanes, supporting 1.25 Gb/s to 10.3125 Gb/s.

Fabrics internally support a Bit Error Rate (BER) of 10^-12 or better.

Page 10: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

10

Figure 3: Fabric B and C Midplane Connections

Midplane

Fabric C I/O

Module

1

Fabric B I/O

Module

1

Blade

(1 of 16)

Fabric C: 4-lane transmit/receive differential pairs

(16 lines total)

Fabric C I/O

Module

2

Fabric B I/O

Module

2

Fabric C: 4-lane transmit/receive differential pairs

(16 lines total)

Fabric B: 4-lane transmit/receive differential pairs

(16 lines total)

Fabric B: 4-lane transmit/receive differential pairs

(16 lines total)

Fabric B

Mezzanine

Fabric C

Mezzanine

I/O Communication Paths in Dell PowerEdge M1000e The InfiniBand Mezzanine cards supported in the blades utilize the dual port Mellanox® ConnectX®

chipset. As shown in the diagram below, Port 1 of InfiniBand Mezzanine card in Fabric B communicates

with the InfiniBand Switch inserted in the B1 I/O slot in the rear of the chassis. Port 2 of InfiniBand

Mezzanine card in Fabric B communicates with the InfiniBand Switch in B2 I/O slot. The 2nd InfiniBand

switch is not required unless it’s needed for additional performance or redundancy as described in this

paper.

The InfiniBand Mezzanine card in Fabric C has a similar communication path.

Page 11: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

11

Figure 4: Fabric B and C Midplane Connections

External

Fabric

Connec-

tions

M1000e MidplaneHalf Height Modular Server (1 of 16)

Fabric B

Mezzanine

Fabric C

Mezzanine

Fabric A

LOM

Fabric A1

Ethernet

I/O Module

Fabric A2

Ethernet

I/O Module

Fabric B1

I/O Module

Fabric B2

I/O Module

Fabric C1

I/O Module

Fabric C2

I/O Module

1-2 lane

1-4 lane

1-4 lane

1-2 lane

……

.

MCH/

IOH

4-8 lane

PCIe

8 lane

PCIe

8 lane

PCIe

Half Height Modular Server (16 of 16)

Fabric B

Mezzanine

Fabric C

Mezzanine

Fabric A

LOM

MCH/

IOH

4-8 lane

PCIe

8 lane

PCIe

8 lane

PCIe

1-4 lane

1-4 lane

1-2 lane

1-2 lane

1-4 lane

1-4 lane

1-4 lane

1-4 lane

CPU

CPU

CPU

CPU

….

….

….

….

….

….

Server Blades with InfiniBand ConnectX Mezzanine Cards Dell PowerEdge M600 Server Blades were used for the performance testing described in this paper. The

PowerEdge M600 is based on Intel™ processors/chipsets and its specifics are shown in Table 1.

Table 1: PowerEdge M600 Specifications

Server Module PowerEdge M600

Processor Intel Xeon® Dual and Quad Core™

40W, 65W, 80W, and 120W options

Chipset Intel 5000P

Page 12: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

12

Memory Slots 8 fully buffered DIMMs (667 MHz)

Memory Capacity 64 GB (8GB x 8)

Integrated Ethernet Controllers

(Fabric A)

2 x Broadcom 5708S GbE with hardware TCP/IP

offload engine and iSCSI firmware boot

Optional upgrade to full iSCSI offload available

with a license key

Fabric Expansion Support for up to 2 x 8 lane PCIe mezzanine

cards (Fabric B and C in Figure 3 and Figure 4)

Dual Port 4x DDR InfiniBand

Dual port 4Gb Fibre Channel and 8Gb Fibre

Channel

Dual port 10GbE (to be available)

Dual port 1GbE

Baseboard Management iDRAC with IPMI 2.0 + vMedia + vKVM

Local Storage Controller Options Serial Advanced Technology Attachment (SATA)

-- chipset based with no Redundant Array of

Independent Disks (RAID) or hotplug capability

Serial Attached SCSI (SAS) 6/IR (R0/1)

Cost Effective RAID Controller (CERC) 6/i (R0/1

with cache)

Local Storage Hard Disk Drive

(HDD)

2 x 2.5 inch hot pluggable SAS or SATA

Video ATI™ RN50

USB 2x USB 2.0 bootable ports on front panel for

floppy/CD/DVD/memory

Page 13: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

13

Console Virtual KVM through iDRAC

IPMI Serial over LAN (SoL) through iDRAC

Rear mounted iKVM switch ports (tier able)

Front KVM ports on modular enclosure control

panel

Operating Systems Red Hat® Enterprise Linux® 4/5, SUSE® Linux

Enterprise Server 9/10

InfiniBand Configuration Mellanox ConnectX IB MDI InfiniBand Host Channel Adapter (HCA) mezzanine cards are designed for

delivering low-latency and high-bandwidth performance-driven server and storage clustering

applications in Enterprise Data Center and High-Performance Computing environments.

Figure 5: Mellanox ConnectX HCA

M2401G InfiniScale® III InfiniBand Switch for the Dell M1000e is used to create reliable, scalable, and

easy to manage interconnect fabrics for compute, communication, storage, and embedded applications.

The switch has 24 ports including 16 internal 4x DDR downlink and 8 external 4x DDR ports. The

M2401G InfiniScale III supports 20Gb/s per 4X port and 60Gb/s per 12X port delivering 960Gb/s of

aggregate bandwidth.

Figure 6: InfiniBand Switch

Page 14: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

14

Performance Study and Analysis In this section we study the performance impact of InfiniBand blocking factor on a High Performance

Computing Cluster using a cluster of 32 node M600 blades. The study includes running a synthetic

cluster benchmark suite known as NAS Parallel Benchmark (NPB). The study was done by varying

InfiniBand configurations ranging from 0% blocking to 75% blocking configurations as described below.

Unless otherwise stated the results are shown normalized to the 50% blocking configuration. This is

because the 50% configuration is the most natural configuration that can be created using one switch

per chassis and all 8 external 4x DDR.

Cluster Test Bed Configurations

Hardware Configurations

The cluster consisted of a 32 node InfiniBand cluster, comprised of two fully populated M1000e with

sixteen nodes in each chassis. Each blade was configured with Quad Core Intel® Xeon ® E5450 running at

3.00 GHz CPU speed and 16GB of 667MHz SDRAM (2GB of memory per core within a node). The nodes

were configured with the latest BIOS available version A2.0.2.

Each M600 blade has two PCI-Express x8 Mezzanine card slots (slot B and slot C). One of these slots was

populated with dual port Mellanox ConnectX Mezzanine cards operating at Double Data Rate (DDR)

speed with 20Gbps signaling rate. The Mezzanine cards have the firmware version 2.3.0.

The HCA ports on the blade were connected to the internal InfiniBand switch by means of the chassis

midplane. The InfiniBand switch consists of 16 internal links through which the blades connect and 8

external links for outside connection. Hence in this case, when using a single switch within a chassis the

InfiniBand traffic will be 50% blocking and connect to a fabric outside the chassis; however, it will be

non-blocking within the chassis.

Software Configurations

The cluster was deployed with Red Hat Linux 4 update 5 errata kernel 2.6.9-55.0.12.ELsmp. The driver

stack used for this study is Mellanox Open Fabrics Enterprise Distribution version 1.3.

Our studies were conducted using NAS parallel benchmark which is a synthetic cluster benchmark. The

benchmarks were run with OpenMpi 1.2.5 that comes pre-compiled and packaged with the Mellanox

OFED 1.3 stack.

InfiniBand Blocking Configurations

Fully Non-blocking Configuration

The first test was to create a fully non-blocking configuration using two InfiniBand switches in each

chassis, one in I/O slot B1 and one in I/O slot C1 in the rear of the chassis. Two external 24 port switches

Page 15: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

15

were used to create a non-blocking network between the 4 InfiniBand switch modules in the two

chassis. To ensure a non-blocking configuration from the blades to the I/O modules, 8 of the blades in

each chassis had the ConnectX HCAs in Mezzanine slot B and the other 8 had ConnectX HCAs in

Mezzanine slot C. This configuration is illustrated in Figure 7.

Out of the eight external connections on each switch, there are four connections going to each external

24 port switch. This helps to avoid network congestion caused by multiple hops or credit loop scenarios.

A non-blocking configuration can also be created by replacing the two 24 switches with a single

InfiniBand Large Port Count (LPC) switch that supports 36 or more ports.

Figure 7: Configuration of a Fully Non-blocking InfiniBand Cluster

50% and 75% Blocking

The second test case was configured as a 50% blocking fabric. It was created by populating Mezzanine

slot B on all blades, using a single InfiniBand switch in I/O module slot B1 in the chassis and using one

external 24 port switch. We then connected all 8 external ports on the internal switch to the external 24

port switch. Therefore the InfiniBand traffic is restricted to only half the bandwidth when

communicating outside the chassis.

The 75% blocking configuration depicts the 50% blocking configuration except fewer cables connect to

the 24 port switch. The 75% blocking configuration was achieved by using only 4 uplinks between the

internal switches and the external switch. These configurations are shown in Figure 8 and Figure 9.

Page 16: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

16

Figure 8: Configuration of a 50% Blocking InfiniBand Cluster

Figure 9: Configuration of a 75% Blocking InfiniBand Cluster

Benchmarking and Analysis: NAS Parallel Benchmark The NAS Parallel Benchmark (NPB), developed at the NASA Ames Research Center, has been widely used

to measure, compare, and understand the characteristics of HPC clusters from both a computational

and communication angle. It is a collection of programs that is derived from Computational Fluid

Dynamics (CFD) code. Detailed description of the various benchmarks that form NPB can be found at

http://www.nas.nasa.gov/Resources/Software/npb.html. For the purposes of this study NPB

version 3.2 was used and the Conjugate Gradient (CG), Fourier Transform (FT), Integer Sort (IS), and

Multi Grid (MG) benchmarks were studied.

4 cables from slot B

4

4 cables from slot B 4

SFS 7000D

Page 17: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

17

According to Ahmad Faraj and Xin Yuan [1], both CG and FT have a large volume of inter-node

communication. IS has medium volume whereas MG has relatively less volume of inter-node

communication. As Figure 10 shows, the performance of the FT benchmark is significantly affected by

variations in the blocking factor. The FT benchmark comprises of a number of messages sent in a

collective communication pattern. These messages are greatly affected by the reduced bandwidth

across the bottleneck as seen by the graph.

The CG benchmark comprises a large volume of data communication, but it mainly calls point to point

routines. Hence from the graph it appears that for benchmarks that are both computational and

communication intensive, the performance can be tolerated as seen between 0% and 50% blocking

factor from the graph.

The IS benchmark shows some change when going from 50% blocking to 75% blocking. This benchmark

has significant collective communication and hence shows a greater affect on blocking factor compared

to the MG benchmark which has mainly point to point communication. Thus the MG benchmark shows

no significant impact or degradation for any configuration.

Figure 10: Effect of IB Blocking Factor on Various NAS Benchmarks

Summary and Conclusion This study showed the impact of blocking factor on the performance of various applications run on Dell

PowerEdge blade chassis. The results show a significant impact on the performance of an application

that is highly communication intensive. The bandwidth intensive NAS benchmarks that have collective

communication pattern show significant impact to the restraint cluster bandwidth caused by the

Page 18: DELL Introduction With the launch of the Dell PowerEdge M-Series modular enclosure, Dell released a new set of switches designed to give more value and flexibility than any before

18

blocking factor. For benchmarks that are both communication and computation intensive and that have

a small volume of communication data across nodes or that have a communication pattern that is

mainly point to point, the blocking factor may be of little importance

However it is possible that in real-world commercial applications, there is much less impact of blocking

factor on performance. The application communication characteristics as well as distribution of data

between the nodes govern the performance impact.

Thus it is recommended that the application characteristics be used for designing the appropriate IB

fabric. For certain bandwidth and latency sensitive applications it is imperative to use a complete non-

blocking configuration as described in the “InfiniBand Blocking Configurations” section on page 14. .

However based on the results above, a 50% blocking configuration might provide the best

price/performance benefit for commercial clusters or clusters with a mix of communication and

computation. This configuration will also benefit from ease of design and management as fewer

modules, external switches, and cables could be used.

References 1. Ahmad Faraj, Xin Yuan, “Communication Characteristics in the NAS Parallel Benchmarks”, A

Scientific and Technical Publishing Company(ACTA) – 2002.

2. Jiuxing Liu, Balasubramanian Chandrasekaran, Jiesheng Wu, Weihang Jiang, Sushmitha Kini,

Weikuan Yu, Darius Buntinas, Peter Wyckoff, D K Panda, “Performance Comparison of MPI

Implementation over InfiniBand, Myrinet and Quadrics”, Conference on High Performance

Networking and Computing archive Proceedings of the 2003 ACM/IEEE conference on

Supercomputing.