15
RFD tNavigator Performance Benchmarking and Profiling May 2016

RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

Embed Size (px)

Citation preview

Page 1: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

RFD tNavigator

Performance Benchmarking and Profiling

May 2016

Page 2: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

2

Note

• The following research was performed under the HPC Advisory Council activities

– Participating vendors: Intel, Dell, Mellanox

– Compute resource - HPC Advisory Council Cluster Center

• The following was done to provide best practices

– tNavigator performance overview

– Understanding tNavigator communication patterns

– Ways to increase tNavigator productivity

– MPI libraries comparisons

• For more info please refer to

– http://www.dell.com

– http://www.intel.com

– http://www.mellanox.com

– http://www.rfdyn.com/technology/

Page 3: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

3

RFD tNavigator

• tNavigator

– Developed by the research and product development teams of Rock Flow Dynamics

– Designed for running dynamic reservoir simulations on engineers’ laptops, servers, and HPC clusters.

– Written in C++ and designed from the ground up to run parallel acceleration algorithms on multicore and manycore

shared and distributed memory computing systems.

– Employs Qt graphical libraries, which makes the system true multiplatform.

– By taking advantage of the latest computing technologies like NUMA, Hyperthreading, MPI/SMP hybrids, the

performance of tNavigator by far exceeds the performance of any industry standard dynamic simulation tools.

– license pricing doesn’t depend on the number of cores employed in the shared memory computing systems

• One of the distinctive features includes the interactive user control of the simulation run

– Users can not only monitor every step of the reservoir simulation at runtime

– but also directly interrupt and change the simulation's configurations with just a mouse click.

Page 4: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

4

Objectives

• The presented research was done to provide best practices

– tNavigator performance benchmarking

• CPU performance comparison

• MPI library performance comparison

• Interconnect performance comparison

• System generations comparison

• The presented results will demonstrate

– The scalability of the compute environment/application

– Considerations for higher productivity and efficiency

Page 5: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

5

Test Cluster Configuration

• Dell PowerEdge R730 32-node (1024-core) “Thor” cluster

– Dual-Socket 16-Core Intel E5-2697A v4 @ 2.60 GHz CPUs (Power Management in BIOS sets to Maximum Performance)

– Memory: 64GB memory, DDR4 2133 MHz, Memory Snoop Mode in BIOS sets to Home Snoop, Turbo Enabled

– OS: RHEL 6.5, MLNX_OFED_LINUX-3.0-1.0.1 InfiniBand SW stack

– Hard Drives: 2x 1TB 7.2 RPM SATA 2.5” on RAID 1

• Mellanox ConnectX-4 EDR 100Gbps EDR InfiniBand Adapters

• Mellanox Switch-IB SB7700 36-port 100Gb/s EDR InfiniBand Switch

• Mellanox ConnectX-3 FDR InfiniBand, 10/40GbE Ethernet VPI Adapters

• Mellanox SwitchX-2 SX6036 36-port 56Gb/s FDR InfiniBand / VPI Ethernet Switch

• MPI: Intel MPI 5.1.3

• Application: RFD tNavigator v4.2.3-1177-g638ceab

• Benchmark datasets: SpeedTestModel

Page 6: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

6

PowerEdge R730Massive flexibility for data intensive operations

• Performance and efficiency

– Intelligent hardware-driven systems management

with extensive power management features

– Innovative tools including automation for

parts replacement and lifecycle manageability

– Broad choice of networking technologies from GigE to IB

– Built in redundancy with hot plug and swappable PSU, HDDs and fans

• Benefits

– Designed for performance workloads

• from big data analytics, distributed storage or distributed computing

where local storage is key to classic HPC and large scale hosting environments

• High performance scale-out compute and low cost dense storage in one package

• Hardware Capabilities

– Flexible compute platform with dense storage capacity

• 2S/2U server, 6 PCIe slots

– Large memory footprint (Up to 768GB / 24 DIMMs)

– High I/O performance and optional storage configurations

• HDD options: 12 x 3.5” - or - 24 x 2.5 + 2x 2.5 HDDs in rear of server

• Up to 26 HDDs with 2 hot plug drives in rear of server for boot or scratch

Page 7: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

7

RFD tNavigator Performance – Ethernet vs InfiniBand

• InfiniBand delivers superior scalability performance compared to Ethernet

– EDR InfiniBand provides higher performance and more scalable than Ethernet

– EDR InfiniBand delivers up over 35-72% of higher performance than 10/40 GbE

– InfiniBand continues to scalable to higher nodes or processes

32 MPI Processes / Node Higher is better

63%72%

35%

Page 8: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

8

RFD tNavigator Performance – Processes Per Node

• tNavigator process spawns multiple OpenMP worker threads onto CPU cores

– Typical case: launch 1 process per node (PPN=1), then spawn threads to utilize all cores

– We compare a case with 2PPN, where each process would spawn threads to its CPU socket

– Seen up to 20% gain in performance

Higher is better

13%20%

Page 9: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

9

RFD tNavigator Performance – CPU Processor

• “Broadwell” CPU provides more CPU cores per socket than “Haswell” family CPU

– The additional 14% of CPU cores translate to an additional 14% increase in performance

– Haswell: E5-2697 v3 is equipped with 14 cores per CPU which typically runs at 2.6GHz

– Broadwell: E5-2697A v4 is equipped with 16 cores per CPU which typically runs at 2.6GHz

Higher is better

14%

Page 10: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

10

• tNavigator demonstrates a need of a decent parallel file system

– Parallel file system like Lustre which supports RDMA transport, can be a good alternative to NFS

– Performance of NFS would cause performance degradation at scale

– Performance degradation appears at 4+ nodes, and would impact on scalability around 8+ nodes

– RamFS option is not recommended; it is to demonstrate the ideal situation whne I/O not bottlenecked

Higher is better 16 MPI Processes / Socket

RFD tNavigator Performance – File system

29%195%

Page 11: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

11

• The effect of writing results can have an impact on performance

– Both cases have the results written on shared memory locally on each node

– 8% of performance difference is seen

Higher is better

RFD tNavigator Performance – I/O

16 MPI Processes / Socket

8%

Page 12: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

12

RFD tNavigator Profiling – % MPI Communications

• Majority of the MPI time is spent on MPI_Allreduce

– imbalance is seen for some MPI processes for MPI_Allreduce

– MPI_allreduce accounts for 70% of MPI time (14% of wall time)

Page 13: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

13

RFD tNavigator Profiling – % MPI Communications

• Majority of data transfer messages are medium sizes for both data, except for:

– MPI_Allreduce has a large concentration (70% MPI, 16% wall) in small sizes (8,4,128 bytes)

– MPI_Bcast is concentrated at 4-byte (13% MPI, 3% wall)

– MPI_Waitall calls are 0-byte call (8% MPI, 2% wall)

Higher is better

Page 14: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

14

RFD tNavigator Summary

• tNavigator integrates the latest technologies which enables higher performance

– tNavigator uses NUMA, Hyper-Threading, MPI/SMP hybrid to achieve higher scaling

• tNavigator demonstrates to perform with the right set of hardware components

– Network: InfiniBand delivers 72% higher performance compared to Ethernet

– CPU: 14% increase in CPU cores translate directly to a 14% increase in performance

– Running additional MPI process per node could improve performance by up to 20%

– File system: tNavigator demonstrates a need of a decent parallel file system

• Parallel file system, such as Lustre, supports RDMA transport, can be a good alternative to NFS

• Performance of NFS would cause performance degradation at scale

• The effect of writing results can have an impact on performance

Page 15: RFD tNavigator Performance Benchmarking and Profilinghpcadvisorycouncil.com/pdf/RFD_tNavigator_Analysis_and_Profiling... · 3 RFD tNavigator • tNavigator – Developed by the research

1515

Thank YouHPC Advisory Council

All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and

completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein