BENCHMARKING NEURAL NETWORKS ON ORACLE CLOUD INFRASTRUCTURE … · 2019-07-16 · TRAINING DATA BENCHMARKING WITH ORACLE CLOUD AND MAPR GPU SERVER Hardware Oracle Volta Bare Metal

BENCHMARKING NEURAL NETWORKS ON ORACLE CLOUD INFRASTRUCTURE WITH MAPR

EXECUTIVE SUMMARY

Enterprises today depend on high-quality and high-speed data access to implement their data-driven use cases. Data assets for enterprises reside in a variety of locations — in the cloud, on-premises, and at the edge. Building data-driven use cases usually entails gathering data assets, building complex machine-learning (ML) and deep learning (DL) models, training them using the data assets, and finally deploying them in production. According to Gartner[1], data scientists and IT operations personnel spend 30-65% of their time obtaining the data and making it available for models. In order to develop data-driven use cases, data must be available almost instantly so the overall implementation time can be drastically reduced. This has quickly become the most pressing challenge standing in the way of building enterprise-wide AI competency.

The MapR® Data Platform provides industry-leading high-speed data access performance. Coupled with the flexibility of Oracle Cloud Infrastructure and its use of NVIDIA Tesla GPUs, MapR delivers the industry’s best performing solutions for model training for enterprises.

This paper presents benchmarking conducted by MapR with help from Oracle and the resulting performance numbers with ResNet-50 and ResNet-152 models on the Oracle Cloud Infrastructure. The test results show that using an architecture of the above-mentioned technologies from MapR, Oracle, and NVIDIA allows nearly 100% utilization of the GPU processing power and linearly scales from 1 GPU all the way to 8 GPUs. This solution is ready for training any ML or DL workloads in the industry today.

[1] Preparing and Architecting for Machine Learning

W H I T E PA P E R

https://www.gartner.com/binaries/content/assets/events/keywords/catalyst/catus8/preparing_and_architecting_for_machine_learning.pdf

TRAINING DATA BENCHMARKING WITH ORACLE CLOUD AND MAPR

ORACLE CLOUD INFRASTRUCTURE

Oracle Cloud Infrastructure offers the most powerful bare metal compute instances with local flash storage in the industry. Only Oracle offers this local storage based on advanced NVMe SSD technology and backed by a storage performance SLA. Oracle Cloud Infrastructure enables you to deploy the optimal amount of infrastructure to meet your demands. Oracle offers the following advantages:

• Thirty percent cheaper GPUs with NVIDIA bare metal GPU instances

• Up to 40 percent better performance for big data workloads on Oracle bare metal compute instances

• Up to 1 PB of high-performance solid-state block storage per node with guaranteed performance SLAs

• Up to 50 TB of blisteringly fast NVMe SSD storage per node, for up to 50 percent better HDFS performance versus other public cloud storage

• Guaranteed network performance with 25 Gbps bandwidth between any two nodes, guaranteed by the only network performance SLA in the industry

• Ninety percent lower costs for data lakes

• Additional discounts available from a sales perspective for critical partners like MapR

More about Big Data on Oracle Cloud Infrastructure.

MAPR DATA PLATFORM

A typical DL lifecycle includes data collection, often with some AI processing at the edge, centralized data processing/transformation, model training using the processed data, and moving the trained models into production systems and back out to the edge for processing during data collection. MapR’s global namespace, incremental mirrors, and multi-master streaming and table replication capabilities manage all the data logistics for every stage of this DL lifecycle. Developers can take advantage of MapR’s consistent snapshots across files, streams, and tables to ensure consistent models and point-in-time association of data and models as they train and test models against incrementally modified data sets. MapR’s distributed data platform handles all data management and movement without requiring any user initiated transfers of data between the edge, the central data store, and the GPU compute servers. And when frequent access to older data is no longer needed, MapR’s policy-based data tiering moves data to lower-cost erasure coding or object storage and automatically brings it back when needed with no user or administrator intervention.

SOLUTION ARCHITECTURE

The Oracle Cloud deep learning architecture presented here meets the functionality and perfor-mance requirements of data logistics for deep learning lifecycles.

The basic architecture includes a single GPU server with a five node MapR cluster. The Oracle Volta Bare Metal GPU server has eight NVIDIA Tesla V100 GPUs, two Intel® Xeon® 8167M processors, and 768 GB of memory. Five Oracle Bare Metal Dense IO servers each with two Intel® Xeon® E5-2699 processors, 512 GB of memory, and eight Samsung 172X 3.2TB NVMe SSD drives.

2/11

https://cloud.oracle.com/iaas/big-data


GPU SERVER

Hardware

Oracle Volta Bare Metal BM.GPU3.8 server

• Eight GPUs: NVIDIA Tesla V100 SXM2 16 GB

• Two CPUs: Intel® Xeon® Platinum 8167M 2.0 GHz

• OCPU: 52

• Memory: 768 GB

• Network: 1x 25 Gbps. (dedicated)

Software

• Ubuntu 16.04

• Docker CE 18.06.1

• NVIDIA Container Runtime 2.0.0+docker18.06.1-1

• NVIDIA Driver 396.44

• MapR PACC 6.0.1 based on nvcr.io/ NVIDIA /tensorflow:18.06-py3

• MapR POSIX Platinum 6.0.1 Client

GPU SERVER

Volta Bare MetalBM.GPU3.8

MAPR CLUSTER

(5) Dense I/O Bare MetalBM.DenseIO1.36

3/11


MAPR CLUSTER

Hardware

Five Oracle Dense I/O Bare Metal BM.DenseIO1.36

• Two CPUs: Intel® Xeon® E5-2699 v3 2.3 GHz

• OCPU: 36

• Memory: 512 GB

• Network: 1x 10 Gbs (dedicated)

• Eight Local Disks: 3.2 TB Samsung 172X NVMe SSD

Software

• CentOS 7.5

• MapR 6.0.1

BENCHMARK

The TensorFlow Convolutional Neural Network benchmark for TensorFlow 1.8 (tf_cnn_benchmark.py from branch cnn_tf_v1.8_compatible at https://github.com/tensorflow/benchmarks) was used to train two different network models — ResNet-50 and ResNet-152. We trained each model on 1, 2, 4, and 8 GPUs. The models were trained with the 143 GB ImageNet data. Prior to model training, labelled images were packaged into TensorFlow record files of around 140 MB each.

DOCKER CONTAINER

All tests were run from within a Docker container running on the GPU server. NVIDIA GPU Cloud provides GPU-accelerated Docker containers for deep learning software. MapR provides the Persistent Application Client Container (PACC), a Docker-based container image that includes a secure container-optimized MapR client for POSIX access to the filesystem. In addition to providing the pre-built PACC, MapR also supports building a PACC from an existing Docker container image. For our tests, we used the mapr-setup tool to build a PACC from an NVIDIA GPU Cloud TensorFlow container. This gives us a Docker container with all of the GPU optimized libraries and tools provided by the NVIDIA GPU Cloud container as well as the secure and optimized MapR client. Appendix X provides instructions for building this container.

TESTING METHODOLOGY

To load all image data into the Linux buffer cache on the GPU server, we ran the tf_cnn_benchmark through one epoch using the 143 GB of image data on local disk. Then, to establish baseline performance numbers, we ran all our tests reading data from local memory in the Linux buffer cache to establish the performance capability of the system with no storage I/O performance cost. We then ran the benchmarks, clearing all local and MapR caches before each run, with data from the MapR cluster to see the impact of reading data from the distributed MapR Data Platform. Each test was run twice, with average results in images per second shown in the results section below.

ResNet-50 and ResNet-152 were run using the MapR FUSE client within the container (mapr-posix-client-container). This POSIX client provides throughput up to 1 GB/sec. See Appendix X.

4/11


RESULTS

Benchmark results for 1, 2, 4, and 8 GPUs are shown below. They compare the baseline result against image data in Linux buffer cacheversus image data on MapR.

The training rate for the ResNet-50 and ResNet-152 models was nearly identical whether the data resided in local buffer cache on the GPU server or remotely on storage media on the MapR nodes. This shows us that MapR can provide data to the GPUs as fast as it can be consumed. Performance is identical whether the training data is first copied to the GPU server or simply accessed directly from the MapR cluster during the training run.

5/11


GPU UTILIZATION

GPU utilization, as measured by the nvidia-smi utility, shows that GPUs are nearly fully utilized. As the number of GPUs used to train the network models increases, utilization increases nearly linearly. This is consistent with the benchmark results.

ResNet-50

6/11


ResNet-152

7/11


CONCLUSION

The results shown in this white paper prove the MapR Data Platform running on Oracle Cloud Infrastructure with NVIDIA GPUs provides a cloud-based solution for use cases requiring DL workloads. They also show linear scalability across GPUs with performance equivalent to local, in-memory data. Using the MapR PACC container with NVIDIA drivers allows the training jobs to shuffle large amounts of data across the GPU, which means that models can be trained on larger datasets leading to higher precision and quicker training iterations.

ADDITIONAL RESOURCES• MapR Now Validated on Oracle Cloud Infrastructure

• MapR Data Platform Reference Architecture for Oracle Cloud Infrastructure Deployments

8/11

https://blogs.oracle.com/cloud-infrastructure/mapr-now-validated-on-oracle-cloud-infrastructure

https://cloud.oracle.com/iaas/whitepapers/MapR-Reference-Architecture-on-OCI.pdf


APPENDIX X: TENSORFLOW CNN BENCHMARK OPTIONS

Runstrings for the 8 GPU tests showing all specified options are shown below for each of the models. These commands were invoked from the Docker container.

ResNet-50

python \/mapr/mapr-cluster.private3.maprvcn.oraclevcn.com/apps/imagenet/benchmarks-cnn_tf_v1.8_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \--batch_size=256 \--model=resnet50 \--optimizer=momentum \--variable_update=independent \--nodistortions \--gradient_repacking=1 \--num_gpus=8 \--num_epochs=1 \--data_dir=/mapr/mapr-cluster.private3.maprvcn.oraclevcn.com/apps/imagenet/train_tfrecords \--train_dir=/mapr/mapr-cluster.private3.maprvcn.oraclevcn.com/apps/imagenet/checkpoint7d4fbfb00fd3 \--use_fp16=True \--print_training_accuracy \--local_parameter_device=gpu \--data_name=imagenet

ResNet-152

python \/mapr/mapr-cluster.private3.maprvcn.oraclevcn.com/apps/imagenet/benchmarks-cnn_tf_v1.8_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \--batch_size=128 \--model=resnet152 \--optimizer=momentum \--variable_update=independent \--nodistortions \--gradient_repacking=1 \--num_gpus=8 \--num_epochs=1 \--data_dir=/mapr/mapr-cluster.private3.maprvcn.oraclevcn.com/apps/imagenet/train_tfrecords \--train_dir=/mapr/mapr-cluster.private3.maprvcn.oraclevcn.com/apps/imagenet/checkpointe53f9c4eb867 \--use_fp16=True \--print_training_accuracy \--local_parameter_device=gpu \--data_name=imagenet

9/11


APPENDIX B: BUILDING THE DOCKER CONTAINER

MapR provides a script (mapr-setup.sh) to build a PACC container from the NVIDIA GPU cloud container. Complete instructions for downloading and invoking the script can be found in the MapR documentation at https://mapr.com/docs/home/AdvancedInstallation/CreatingPACCImage.html

To create the Docker container, run mapr-setup.sh as show below. When run on an Ubuntu 16.04 system, accept all defaults except when prompted for the “Docker FROM base image name:tag”. Output shown below has much of the text removed for brevity.

andy@mapr-gpu:~$ ./mapr-setup.sh docker client

MapR Distribution Initialization and UpdateCopyright 2018 MapR Technologies, Inc., All Rights Reserved

http://www.mapr.com

Build MapR client image? (y/n) [y]:Image OS class (centos7, ubuntu16) [ubuntu16]:Docker FROM base image name:tag [ubuntu:16.04]: nvcr.io/nvidia/tensorflow:18.06-py3MapR core version [6.0.1]:MapR MEP version [5.0.0]:Install Hadoop YARN client (y/n) [n]:MapR client image tag name [maprtech/pacc:6.0.1_5.0.0_ubuntu16]: maprtech/pacc:6.0.1_5.0.0_TF_18.06_py3Container network mode (bridge|host) [bridge]:Container memory: specify host XX[kmg] or 0 for no limit [0]:Building maprtech/pacc:6.0.1_5.0.0_TF_18.06_py3...Sending build context to Docker daemon 123.9kBStep 1/7 : FROM nvcr.io/nvidia/tensorflow:18.06-py3...Step 7/7 : ENTRYPOINT [“/opt/mapr/installer/docker/mapr-setup.sh”, “container”]---> Running in 477a9ffb5c54Removing intermediate container 477a9ffb5c54---> e2d641e3eda3Successfully built e2d641e3eda3Successfully tagged maprtech/pacc:6.0.1_5.0.0_TF_18.06_py3

Edit ‘/home/andy/docker_images/client/mapr-docker-client.sh’ to set MAPR_CLUSTER and MAPR_CLDB_HOSTS and then execute it to start the container

10/11

https://mapr.com/docs/home/AdvancedInstallation/CreatingPACCImage.html

https://mapr.com/

MapR and the MapR logo are registered trademarks of MapR and its subsidiaries in the United States and other countries. Other marks and brands may be claimed as the property of others. The product plans, specifications, and descriptions herein are provided for information only and subject to change without notice, and are provided without warranty of any kind, express or implied. Copyright © 2018 MapR Technologies, Inc.

Upon completion, mapr-setup.sh creates a script (mapr-docker-client.sh) to invoke the newly created Docker container. Modify environment variables in that script to specify the MapR cluster name, MapR container location database (CLDB) hosts, and MapR user information and ticket file for secure access to the cluster. To use the NVIDIA Docker runtime, specify –runtime=nvidia in the MAPR_DOCKER_ARGS variable. Sample settings are shown below.

MAPR_CLUSTER=mapr-cluster.private3.maprvcn.oraclevcn.comMAPR_CLDB_HOSTS=10.0.5.2,10.0.5.5,10.0.5.6MAPR_MOUNT_PATH=/maprMAPR_TICKET_FILE=/home/andy/mapr_ticketMAPR_CONTAINER_USER=andyMAPR_CONTAINER_UID=1001MAPR_CONTAINER_GROUP=usersMAPR_CONTAINER_GID=100MAPR_DOCKER_ARGS=”--runtime=nvidia”

Contact [email protected]

Try MapRdownload

For More [email protected]

mailto:Info%40mapr.com?subject=

https://mapr.com/try-mapr/

https://mapr.com/

mailto:Sales%40MapR.com?subject=

Documents

BENCHMARKING NEURAL NETWORKS ON ORACLE CLOUD INFRASTRUCTURE … · 2019-07-16 · TRAINING DATA BENCHMARKING WITH ORACLE CLOUD AND MAPR GPU SERVER Hardware Oracle Volta Bare Metal