Nor Asilah Wati Abdul Hamid and Paul Coddington Presented by:

Preview:

DESCRIPTION

AVERAGES, DISTRIBUTIONS AND SCALABILITY OF MPI COMMUNICATION TIMES FOR ETHERNET AND MYRINET NETWORKS. Nor Asilah Wati Abdul Hamid and Paul Coddington Presented by: Ibrahim Saidu GS22854 Kumane Saed GS24433 Cheng Kian Yong GS24460 Luay GS 21605. Lecturer: Dr. Nor Asilah Wati Abdul Hamid. - PowerPoint PPT Presentation

Citation preview

AVERAGES, DISTRIBUTIONS AND SCALABILITY OF MPI

COMMUNICATION TIMES FOR ETHERNET AND MYRINET NETWORKS

Nor Asilah Wati Abdul Hamid and Paul Coddington

Presented by:Ibrahim Saidu GS22854Kumane Saed GS24433

Cheng Kian Yong GS24460Luay GS 21605

Lecturer: Dr. Nor Asilah Wati Abdul Hamid

INTRODUCTIONIn the past few years, commodity clusters have

become the dominant architecture for high performance computing.

Most parallel programs that run on clusters use the Message Passing Interface (MPI) for communicating data between nodes of the clusters.

It is well known that Myrinet with GM has significant advantages over Fast Ethernet with TCP.

In the case of Ethernet with TCP, retransmit timeouts (RTOs) can also occur

PROBLEM STATEMENT• Most modern parallel computers are clusters using

Myrinet or Ethernet communication networks.• Several studies have been published comparing the

performance of these two networks for parallel computing, however these focus on average performance, and do not address the distributions of communication times, which can have long tails due to contention effects.

• In the case of Ethernet with TCP, retransmit timeouts (RTOs) can also occur.

OBJECTIVESTo investigate the effect of Retransmit

timeouts (RTOs) on Ethernet performance and how much could be gained from reducing the effects of RTOs.

We have analyzed the distributions of communication times for standard MPI routines on Ethernet with TCP and Myrinet with GM communications networks on the same cluster.

We also studied the scalability of the distributions as the number of communicating processes increases.

RELATED WORK• [4,5,6,7]) measure only the average times for point-

to-point (ping-pong) communications between two nodes.

• [3] Studied the effects of TCP Retransmit Timeouts (RTO) on MPI communications over Ethernet networks, including collective communications.

• [3,4,5,6]) compare network performance using applications benchmarks such as the NAS Parallel Benchmarks.

• [3,4] analyzed the effects of tuning Ethernet drivers or TCP configuration to improve MPI performance on Ethernet networks.

RELATED WORK• [8] has used MPIBench to compare the MPI

performance (including distributions of communication times) of Ethernet and Myrinet networks, but these were not direct comparisons.

• [9] compare the performance of different Ethernet network topologies in commodity clusters, showed that there were significant problems with the performance of collective communications in MPICH version 1.2.0 on Fast Ethernet networks.

• [11] used later version of the MPICH for collective communication routines , which give much better performance on Ethernet networks and perhaps reduce the number of RTOs

METHODOLOGYIBM eServer 1350 Linux Cluster

IBM eServer 1350 Linux ClusterFast Ethernet Architecture

METHODOLOGY Bench Mark.Measurements of MPI communication times

were obtained using MPIBench [1,2,8]. All measurements were run with dedicated access to the cluster, so there were no other processes affecting the results.

Results 1. Send/Receive

Send/Receive (Cont..)Fast Ethernet are about 10 times higher than

Myrinet.

For higher message sizes the difference is primarily due to the difference in bandwidth for each network.

For Ethernet there is a jump between 64 and 128 CPUs (32 to 64 nodes) which is due to the communication no longer being between processors connected by a single switch.

Send/Receive (Cont..)

Send/Receive (Cont..)TCP Retransmit-Timeout (RTO), which the

TCP specifications say should be given by RTO = SRTT + 4 * RTTVAR

The average communication time without RTO (SRTT= 25 ms) plus the 200 ms minimum value for 4 * RTTVAR set by the Linux kernel.

Presumably caused by communications that suffer 2 or 3 RTOs before finally being completed

2. Combined Send and Receive

Combined Send/Receive (Cont..)Results are approximately a factor of 2 larger

than the MPI_Send/MPI_Recv Results indicated the duplex capability of

these networks is not being utilized.

3.Barrier

Barrier (Cont…)The big jump in the Ethernet result is

probably due to a different algorithm being used in MPICH 1.2.6 code.

Ethernet is approximately 4-5 times slower than Myrinet.

Barrier (Cont…)

4.Broadcast

Broadcast (Cont…)Through a single Ethernet switch, rather

than between switches, there are no RTOs for broadcast.

Myrinet distributions have quite long tails, which are caused by a small number of repetitions of the benchmark

5.Alltoall

Alltoall (Cont…)That average completion time for Myrinet

increases gradually with message size and number of processes.

Ethernet performance for more than 32 CPUs shows the effect of Retransmit -Timeouts

6. ConclusionsAs expected, the Myrinet network performs

significantly better than Fast Ethernet.The TCP RTO on the Ethernet network does

affect communications performance, but only for large message sizes and large numbers of processors, where the network becomes saturated.

The effects are much less serious than previous measurements.

FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

Thank you

Recommended