17
AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 1 AMBER 12 GPU Support Revision 12.1 12/21/2012

AMBER Molecular Dynamics on GPU

Embed Size (px)

DESCRIPTION

Benchmarks showing benefits of running AMBER Molecular Dynamics Application on GPUs

Citation preview

  • 1. AMBER 12GPU Support Revision 12.112/21/20121 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012

2. Benefits of GPU AMBER Accelerated Computing Faster than CPU only systems in all tests Most major compute intensive aspects of classical MD ported Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated AMBER for free www.nvidia.com/GPUTestDrive2AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 3. Protein Folding Simulation With AMBER Accelerated By GPUs4.7x Faster 80 ns/day367 ns/day 4 core CPU 4 core CPU + Tesla M2070 GPUData courtesy of AMBER.org 4. Kepler - Our Fastest Family of GPUs Yet30.00Factor IX Running AMBER 12 GPU Support Revision 12.125.3925.00 The blue node contains Dual E5-2687W CPUs22.44 (8 Cores per CPU). 7.4x The green nodes contain Dual E5-2687W CPUs (820.0018.90Cores per CPU) and either 1x NVIDIA M2090, 1x K10Nanoseconds / Dayor 1x K20 for the GPU 6.6x15.0011.85 5.6x10.00 3.5x 5.00 3.42 0.00 Factor IX1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20XM2090GPU speedup/throughput increased from 3.5x (with M2090) to 7.4x (with K20X)when compared to a CPU only node4 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 5. K10 Accelerates Simulations of All Sizes 30 Running AMBER 12 GPU Support Revision 12.1 The blue node contains Dual E5-2687W CPUs 25 24.00(8 Cores per CPU).Speedup Compared to CPU Only The green nodes contain Dual E5-2687W CPUs (819.98 20Cores per CPU) and 1x NVIDIA K10 GPU 15 10 5.50 5.535.04 5 2.00 0 CPUTRPcage JAC NVE Factor IX NVE Cellulose NVE Myoglobin NucleosomeAll Molecules GBPME PMEPMEGBGB Gain 24x performance by adding just 1 GPU Nucleosome when compared to dual CPU performance 6. Run AMBER 28x Faster With Tesla K20 GPUs 30.0028.00 Running AMBER 12 GPU Support Revision 12.125.56SPFP with CUDA 4.2.9 ECC Off 25.00 The blue node contains 2x Intel E5-2687W CPUsSpeedup Compared to CPU Only (8 Cores per CPU) 20.00 Each green nodes contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs 15.00 10.007.286.50 6.565.00 2.661.000.00 CPU AllTRPcage GB JAC NVE PME Factor IX NVE Cellulose NVE Myoglobin GB NucleosomeMoleculesPMEPME GB Gain 28x throughput/performance by adding just one K20 GPUNucleosome when compared to dual CPU performance6AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 7. K20X Accelerates Simulations of All Sizes 3531.30 Running AMBER 12 GPU Support Revision 12.1 30 28.59The blue node contains Dual E5-2687W CPUs(8 Cores per CPU).Speedup Compared to CPU Only 25The green nodes contain Dual E5-2687W CPUs (8Cores per CPU) and 1x NVIDIA K20X GPU 20 15 10 8.30 7.15 7.435 2.790 CPUTRPcage JAC NVE Factor IX NVE Cellulose NVE Myoglobin NucleosomeAll Molecules GBPME PMEPMEGBGB Gain 31x performance by adding just one K20X GPUNucleosome when compared to dual CPU performance7AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 8. K10 Strong Scaling over NodesCellulose 408K Atoms (NPT) Running AMBER 12 with CUDA 4.2 ECC Off6The blue nodes contains 2x Intel X5670 CPUs (6 Cores per CPU)5The green nodes contains 2x Intel X5670 CPUs (6 Cores per CPU) plus 2x NVIDIA K10 GPUs4Nanoseconds / Day 2.4x3CPU Only 3.6x With GPU2 5.1x1Cellulose0 1 2 4Number of Nodes GPUs significantly outperform CPUs while scaling over multiple nodes 9. Kepler Universally Faster9Running AMBER 12 GPU Support Revision 12.18 The CPU Only node contains Dual E5-2687W CPUs(8 Cores per CPU).Speedups Compared to CPU Only7The Kepler nodes contain Dual E5-2687W CPUs (86 Cores per CPU) and 1x NVIDIA K10, K20, or K20XGPUs5JAC4 Factor IXCellulose3210CPU OnlyCPU + K10CPU + K20 CPU + K20X CelluloseThe Kepler GPUs accelerated all simulations, up to 8x 10. K10 Extreme PerformanceRunning AMBER 12 GPU Support Revision 12.1JAC 23K Atoms (NVE)120 The blue node contains Dual E5-2687W CPUs(8 Cores per CPU). 97.99The green node contain Dual E5-2687W CPUs (8100Cores per CPU) and 2x NVIDIA K10 GPUsNanoseconds / Day 80 60 40 20 12.470 1 Node 1 NodeDHFRGain 7.8X performance by adding just 2 GPUswhen compared to dual CPU performance 11. K20 Extreme PerformanceDHRF JAC 23K Atoms (NVE)Running AMBER 12 GPU Support Revision 12.1SPFP with CUDA 4.2.9 ECC Off 120The blue node contains 2x Intel E5-2687W CPUs95.59 (8 Cores per CPU) 100Each green node contains 2x Intel E5-2687WCPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU Nanoseconds / Day80604020 12.47 01 Node 1 NodeDHFR Gain > 7.5X throughput/performance by adding just 2 K20 GPUswhen compared to dual CPU performance11AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 12. Replace 8 Nodes with 1 K20 GPU 90.00 35000 $32,000.00 Running AMBER 12 GPU Support Revision 12.1 81.09 SPFP with CUDA 4.2.9 ECC Off 80.00 30000 The eight (8) blue nodes each contain 2x Intel 70.00 E5-2687W CPUs (8 Cores, $2000 per CPU) 65.00 25000 Each green node contains 2x Intel E5-2687W 60.00 CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPU ($2500 per GPU) 50.00 20000 40.00 15000 30.00 10000 20.00 $6,500.00 5000 10.000.00 0Nanoseconds/Day Cost DHFR Cut down simulation costs to and gain higher performance12 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 13. Replace 7 Nodes with 1 K10 GPUPerformance on JAC NVE Cost Running AMBER 12 GPU Support Revision 12.1SPFP with CUDA 4.2.9 ECC Off 80 35000The eight (8) blue nodes each contain 2x Intel 70 30000 E5-2687W CPUs (8 Cores, $2000 per CPU) 60The green node contains 2x Intel E5-2687W25000 CPUs (8 Cores per CPU) plus 1x NVIDIA K10 Nanoseconds / DayGPU ($3000 per GPU) 5020000 4015000 3010000 20 1050000 0CPU OnlyGPU Enabled CPU OnlyGPU EnabledDHFR Cut down simulation costs to and increase performance by 70%13AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 14. Extra CPUs decrease Performance Cellulose NVE Running AMBER 12 GPU Support Revision 12.18The orange bars contains one E5-2687W CPUs (8 Cores per CPU).7 The blue bars contain Dual E5-2687W CPUs (86Cores per CPU)Nanoseconds / Day54 1 E5-2687W2 E5-2687W3210CelluloseCPU Only CPU with dual K20sWhen used with GPUs, dual CPU sockets perform worse than single CPU sockets. 15. Kepler - Greener Science Running AMBER 12 GPU Support Revision 12.1Energy used in simulating 1 ns of DHFR JAC 2500The blue node contains Dual E5-2687W CPUs (150W each, 8 Cores per CPU). The green nodes contain Dual E5-2687W CPUs (8 2000Cores per CPU) and 1x NVIDIA K10, K20, or K20XLower is betterGPUs (235W each).Energy Expended (kJ) 1500Energy Expended 1000= Power x Time5000 CPU OnlyCPU + K10 CPU + K20CPU + K20XThe GPU Accelerated systems use 65-75% less energy 16. Recommended GPU Node Configuration for AMBER Computational Chemistry Workstation or Single Node Configuration# of CPU sockets2Cores per CPU socket4+ (1 CPU core drives 1 GPU)CPU speed (Ghz) 2.66+System memory per node (GB)16Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 1-2# of GPUs per CPU socket (4 GPUs on 1 socket is goodto do 4 fast serial GPU runs) GPU memory preference (GB) 6 GPU to CPU connectionPCIe 2.0 16x or higher Server storage2 TBNetwork configurationInfiniband QDR or better16 Scale to multiple nodes with same single node configuration AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012 17. Benefits of GPU AMBER Accelerated Computing Faster than CPU only systems in all tests Most major compute intensive aspects of classical MD ported Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yetTry GPU accelerated AMBER for free www.nvidia.com/GPUTestDrive17AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012