12
OnTap AI Test Drive FAQ

Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

OnTap AI Test Drive FAQ

Page 2: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

DGX POD Test Drive FAQLocation: Flexential Hillsboro 2 Data Center, Hillsboro, ORReviewed By: Tony Paikeday, Andria Zou, Bill Cory, Josh Lindstrom, Karthik Mandakolathur, Hoseb Dermanilian

What is the Test Drive?The Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform. It is a complimentary proof of concept for qualified customers who intend to purchase a DGX POD, but want to de-risk their deployment through first doing a test drive. Traditionally, customers may be deploying AI apps on the cloud or on non-optimal servers, like standard CPUs. The ONTAP AI platform accelerates ML/DL development workflow and helps customers see results much faster, allowing them to deploy with confidence. After successful completion of the Test Drive, customers can purchase the same system, and leverage Flexential to provide the data center facilities and operational support in a cost-effective OpEx model.

Customers are offered a 5-day period during which they can access the Test Drive environment and exercise the platform using their preferred test cases/plan, facilitated by NVIDIA. Once completed, the customer will be scheduled for deployment on the platform which may take 1-2 weeks, or possibly longer depending on demand, at which point they will have access to the live system.

What are the key benefits? Best-in-class performanceONTAP AI infrastructure comprises of best-of-breed elements; NVIDIA DGX systems, NetApp All Flash FAS (AFF) Flash Storage, and Mellanox Spectrum Ethernet switches.

Page 3: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

● NVIDIA DGX Systems including NVIDIA DGX-1 and NVIDIA DGX-2, delivering

groundbreaking AI training capacity, integrating up to 16 of the world’s fastest data center accelerators for up to 2 PFLOPS of AI performance per system (ie: DGX-2) DGX systems are powered by the NVIDIA DGX software stack, which is fully-optimized to deliver maximum GPU-accelerated performance for the most popular AI workloads.

● The NetApp AFF A800 system is the industry’s first end-to-end NVMe solution. NetApp

AFF A800 provides the highest possible throughput at the lowest possible latency. NetApp AFF is a state-of-the-art storage system that enable you to meet enterprise storage requirements with the industry-leading performance, superior flexibility, cloud integration, and best-in-class data management. Designed specifically for flash, AFF systems help accelerate, manage, and protect business-critical data. Leveraging industry-leading data management capabilities, ONTAP AI enables you to manage and to protect data with a single set of tools regardless of where the data resides

● With a 300ns port-to-port latency, Mellanox Spectrum Ethernet switches are the fastest

100GbE Ethernet switch in the industry. Spectrum switches provide a robust high bandwidth data path for RoCE based GPU-GPU and GPU-Storage communications. Additionally, Spectrum switches support simplified RoCE configuration and built-in advanced network telemetry to reduce mean time to issue resolution.

Ready for use on professionally run infrastructureThe ONTAP AI racks are hosted in the Flexential data center and fine-tuned for performance by Trace3.

● Flexential is a hybrid IT solutions provider with a growing fleet of 40 highly redundant

data centers that span 21 domestic and international markets. Our facilities can handle the unique power requirements of AI/DL/ML, supporting high-power density deployments of over 1,500 watts per square foot and up to 50kW per cabinet. Our purpose-built facilities have 24-foot ceilings, and proprietary, highly efficient cooling units that allow for exceptionally low PUEs of 1.3.

● Trace3 is a certified NVIDIA DGX and NetApp partner. Trace3’s Data Intelligence practice

is a strategic implementation partner for the Test Drive program, our experts execute solutions and help create, sustain, and improve these capabilities. Our goal is to enable your business to make cost-effective decisions when it comes to your data and infrastructure, without compromising reliability, security, and speed.

Flexible consumption modelsCustomers have three flexible ways to consume the infrastructure once the Test Drive is over:

● Buy or lease the ONTAP AI rack to run workloads on-premises.

Page 4: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

● Buy the ONTAP AI rack outright and host it in Flexential’s state-of-the-art data center.

● Optimize capital expenses by leasing the ONTAP AI rack and using it as a service.

Where is this available?

Today, the ONTAP AI Test Drive platform is only available in Flexential’s Hillsboro 2 data center in Portland, but the platform will expand into different markets depending on demand. Flexential’s Hillsboro location was selected as the initial venue for 5 reasons:

1. The Flexential facility in Hillsboro is state-of-the-art and can cool up to 50kW per cabinet.

2. The proximity to Northern California and Seattle provides low latency connectivity to HQ locations for many of our target ONTAP AI customers.

3. When customers move forward with a purchase of ONTAP AI infrastructure at the end of the Test Drive there is no sales tax in Oregon. Saving customers almost 10% on the capital purchase relative to California or Washington, as long as they keep the equipment in Flexential’s data center in Oregon.

4. The Flexential data center is among the most highly connected facilities in the Northwest, providing numerous options for IP Transit or transport services.

5. Flexential has a footprint of 40 data centers in 21 markets in North America, so if a customer chooses to deploy outside of Oregon, Flexential is still able to assist them.

What is the platform?

ONTAP AI reference architecture is a joint platform developed by NVIDIA, Mellanox, NetApp and implemented by Trace3 and Flexential. It includes:

- NetApp - All Flash FAS System- NVIDIA - 4 DGX-1 offering available now, DGX-2 offering soon based on demand - Mellanox - SN2700 32x100GbE Ethernet switches- Flexential providing rack, network, and data center power and cooling - Trace3 providing VAR and SI services

This integrated solution is provided today by VARs or resellers, such as our partner Trace3. Trace3 is responsible for selling the hardware to the end customer should they wish to buy it and handle any customization of configuration based on the Test Drive results.

The platform is not for generalized compute. It is for ML and AI applications and runs the Linux-based “CUDA” SDK and APIs. It is also not for VDI.

See more here: https://www.netapp.com/us/products/ontap-ai.aspx

Page 5: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

Why Flexential?

Flexential is providing colocation, space, power, and network to support this project. Flexential has 40 data centers in 21 domestic and international markets, many of which are ideal locations for AI workloads, which can take more than 50kW of power per rack, and the associated cooling. Many data centers, particularly older ones operated by enterprise customers, are unable to provide such density and cooling. Flexential can provide this without additional costs related to in-row cooling, liquid cooling or larger containment features. As demand builds it is important to know that each application, configuration and data center is different, so it will be necessary to work with Flexential’s solutions architects to design the specific fit.

Flexential also offers network services, hyper-scale cloud connectivity, security, and professional services that make it an ideal partner. We can help customers move data, integrate to hyper-scale or private cloud, and provide the services to secure their platforms.

Hyper-scale cloud connectivity is available to any Test Drive customer by using IP Transit. For those that require private connectivity, such as Azure Express Routes or AWS Direct Connect, this is also available, but will require additional costs and further conversation with Flexential in order to design the right solution.

Who is our first contact for this program?Please contact your Account Manager or send an email to [email protected]

How do I register customers for the Test Drive?Prospective customers wishing to Test Drive CAN work with their existing NVIDIA or NetApp representative OR be connected through the following alias to be used by Flexential or Trace3:

Sales Account execs should introduce opportunity to NVIDIA via:

[email protected] LINE: FLEXENTIAL ONTAP AI TEST DRIVECC: ACCOUNT EXEC EMAIL, PROSPECT CONTACT EMAIL

A NVIDIA solution engineer will work with the customer on their use case, solutions design, test plan, and success criteria. This will be validated before any tests are scheduled. This process will take 1-2 weeks. Testing will be available 30 days or longer, based on demand.

What if a customer wants to buy the solution?

Page 6: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

Trace3 is the integrator and fulfillment partner and will be involved in the solutions design process post testing (or if testing is not required). Trace3 will work with the account team to size the customer’s need, and will work with all relevant partners (NVIDIA, Mellanox, Flexential, NetApp) on a final solution design.

This will result in:- Customer transacting a purchase OR lease of hardware with Trace3, licensing, etc.- A colocation and network services contract with Flexential.- NetApp, NVIDIA, and Mellanox will provide support as contracted.

Customer or Trace3/leasing company holds the hardware paper, including appropriate setup services, racks, etc. We will be working with Trace3 to ensure a closed loop process and communications throughout.

Trace 3 has a dedicated team for the Test Drive program for both sales and technical. All requests should be sent to Josh Lindstrom.

Sales:Josh [email protected]

Technical: Eric [email protected]

Flexential will have more information on sizing, bundles, and recommended colocation services as this matures with real customer use cases. Rough sizing and solution diagrams have been created to meet the needs of small, medium, and large T-shirt sized colocation deployments, based on reference architectures provided by NVIDIA.

Will SW developers get involved and write software for particular tasks? No. We will ensure this process is plug and play. There are three methods of engagement.

1) Run benchmark

● Any commonly available benchmark (e.g. MLPerf)

● No restrictions on data access

● NGC Containers

● Jupyter notebooks: Anomaly Detection, Rapids

2) Test with public data sets

Page 7: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

● Object Detection: Mask R-CNN | SSD

● Image Recognition: Resnet-50

● Translation: Transformer | NMT

● Language: BERT - Question & Answering

● Speech Synthesis: Tacotron2

● Recommendation: Neural Collaborative Filtering (NCF)

● Machine Learning: Decision Tree & Anomaly detection: XGBoost, Clustering: DBScan, ETL Data Loading, Graph Analytics Visualization

● Dataset size: The larger the better

3) Run your own workloads

● Test your models and datasets on NVIDIA DGX POD

What kind of customers are ideal for the Test Drive program? The ideal customers are those that want to scale their small DGX deployment and extend to their production environment, but have challenges supporting a 17-45 kW/rack deployment at their data center. The customers should already have a business application workload in mind, and with a large data set that is ready for some performance benchmarking.

To attempt to start from scratch with a customer that doesn’t have a data set ready to go this program will not work. It is too much of an uphill climb to get a customer of that type ready for the Test Drive.

How secure is customers’ data? This is a single tenancy Test Drive platform solution. This Test Drive platform is dedicated for the customer’s sole use. The customer is responsible for wiping out the data from the test platform when completed. We will provide a certificate that will ensure clean wipe out of the system.

The Flexential Hillsboro 2 data center in which this Test Drive equipment sits has six layers of security, and the ONTAP AI system is racked within a cage in the data hall, which can only be accessed by Flexential employees with appropriate credentials.

What are the costs associated with this Test Drive program? The cost for the Test Drive is $0. Trace3 will request a zero-dollar PO from the customer in order to execute.

Page 8: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

What is the process and pricing associated with a purchase after the Test Drive program? There will be two pricing models. Please contact Josh Lindstrom at Trace3. We will provide a DGX-POD platform cost, additional services, and the co-lo service cost in $/KW/rack/month.

What are the desirable industry verticals for the Test Drive program? The target industry verticals are healthcare, insurance, government, and higher education.

What is the SKU Kit if my customer is interested in purchasing the same DGX POD? Single SKU will be provided by Trace3 at the time of request.

What is the ONTAP AI product lead time to order a DGX-1 POD in Flexential? Who to contact? 4 to 8 weeks lead time to order a new system and the ease of deployment.

Trace3 Contact:Josh [email protected]

Can the customers from different continents access this Test Drive solution? Yes. The customer can access this ONTAP AI capability globally, as the environment is connected to IP Transit. Keep in mind that network performance will be a limiting factor on the performance of the Test Drive for someone that is many thousands of miles away from the Oregon data center.

How does a customer move data to ONTAP AI Test Drive Platform?IP Transit is provided as part of the architecture of the Test Drive. The customer will upload a data set that is large enough to be representative of a real-life scenario, but not so large that it will take weeks to upload across the wire.

There are two principle options for placement of data in the environment. If the data set size lends itself to copying over the VPN tunnel credentials supplied to the users, that is preferred. The time to copy is dictated by the quality of the customer source connectivity to the 1Gbps connections into the POD environment in Oregon. An initial speed test would be performed to estimate the copy time – it is generally obvious if this approach is feasible.

It would also be possible for the customer to ship media to the facility, generally presumed to be one or more hard drives in USB carriers for direct attachment to the environment for copying.

What is the data transfer limit for this DGX-1 Test Drive POD?

Page 9: Best-in-class performance - trace3.com€¦  · Web viewThe Test Drive allows customers interested in the DGX platform to bring their data sets, test their code, and pilot the platform

A customer can transfer their data out of the Test Drive DGX-1 POD at 10 TB/day which is limited by the 1GbE link.

Where do I find more information about this program?

● NVIDIA Website:https://www.NVIDIA.com/en-us/data-center/dgx-pod-reference-architecture/

● NetApp Website:https://www.netapp.com/us/products/ontap-ai.aspx

● Flexential Blog: https://www.flexential.com/knowledge-center/closing-gap-your-ai-proof-concept-techniques-reduce-tco-and-improve-your-roi

● High Level Solution Slides: https://flexential.sharepoint.com/CINO/NVIDIA_ONTAPAI_SOLUTION_V1.pptx?d=w6bfd111d4ee7459eb67cd9aac5e58272