Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are...

Preview:

Citation preview

Solutions and Reference architecture

NandaVijaydev– BlueData(recentlyacquiredbyHPE)TheAIConference,NewYork

AccelerateInnovationintheEnterprisewithDistributedML/DL

•  AI,MachineLearning(ML),andDeepLearning(DL)•  ExampleEnterpriseUseCases•  DeploymentChallengesforDistributedML/DL•  DistributedTensorFlowandHorovodonContainerswithIntelXeonprocessorsandIntelMKL

•  LessonsLearnedandKeyTakeaways

Agenda

AI, Machine Learning, and Deep Learning

Let’s Get Grounded…What is AI? What are Machine Learning and Deep Learning?

Artificial intelligence

Machine learning

Deep learning

Artificial intelligence (AI) Mimics human behavior. Any technique that enables machines to solve a task in a way like humans do.

Machine learning (ML) Algorithms that allow computers to learn from examples without being explicitly programmed.

Deep learning (DL) Subset of ML, using deep artificial neural networks as models, inspired by the structure and function of the human brain.

Example: Siri

Example: Google Maps

Example: Self-driving car

Why Should You Be Interested in AI / ML / DL?

EveryonewantsAI/ML/DLandadvancedanalytics….

….butfacemanychallenges

Usecases

Newroles,skillgaps

Cultureandchange

Datapreparation

Legacyinfrastructure

AIandadvancedanalyticsinfrastructurecouldconstitute

15-20%ofthemarketby20211

AIandadvancedanalyticsrepresent

2oftop3CIOpriorities

EnterpriseAIadoption

2.7Xgrowthinlast4years2

1 IDC. Goldman Sachs. HPE Corporate Strategy.2018 2 Gartner - “2019 CIO Survey: CIOs Have Awoken to the Importance of AI”

Key Questions Remain …

How do you integrate your AI and data ecosystem for ML / DL and advanced analytics?

How do you modernize, consume, and prepare your EDW or Hadoop big data foundation for AI?

How do you get started with gaining intelligence with your data?

What is the best way to prepare your company for a data-centric and AI future?

What opportunities does AI bring to your business? What are the major use cases?

AI / ML / DL Adoption in the Enterprise

Health Personalized medicine, image analytics

Manufacturing Predictive and prescriptive maintenance

Consumer tech Chatbots

Financial services Fraud detection, ID verification

Government Cyber-security, smart cities and utilities

Energy Seismic and reservoir modeling

Service providers Media delivery

Retail Video surveillance, shopping patterns

Example Enterprise Use Cases

Financial Services Use Cases

FraudDetection

•  Real-TimeTransactions•  CreditCard•  Merchant•  Collusion•  Impersonation•  SocialEngineering

Fraud

RiskModeling&CreditWorthiness

Check

•  LoanDefaults•  DelayedPayments•  Liquidity•  Market&Currencies•  Purchasesand

Payments•  TimeSeries

CLVPredictionand

Recommendation

•  HistoricalPurchaseView

•  PatternRecognition•  RetentionStrategy•  Upsell•  Cross-Sell•  Nurturing

CustomerSegmentation

•  BehavioralAnalysis•  Understanding

CustomerQuadrant•  EffectiveMessaging&

ImprovedEngagement•  TargetedCustomer

Support•  EnhancedRetention

Other

•  ImageRecognition•  NLP•  Security•  VideoAnalysis

WideRangeofML/DLUseCasesforWholesale/CommercialBanking,CreditCard/Payments,RetailBanking,etc.

CLV:CustomerLifetimeValue

Fraud Detection Use Case

•  OneofthemostcommonusecasesforML/DLinFinancialServicesistodetectandpreventfraud

•  Thisrequires:–  DistributedBigDataprocessingframeworkssuchasSpark– ML/DLtoolssuchasTensorFlow,H2O,andothers–  Continuousmodeltraininganddeployment– Multiplelargedatasets

Fraud Detection Use Case (cont’d)

•  DatascienceteamsneedtheabilitytocreatedistributedML/DLenvironmentsforsandboxaswellastrialanderrorexperimentation

•  Thisrequires:–  Hardwareacceleration(e.g.Xeon,MKL)– MultipledifferentML/DLanddatasciencetools–  Fastandrepeatabledeploymentofclusters

•  PrecisionMedicineandPersonalSensing–  Diseaseprediction,diagnosis,anddetection

(e.g.genomicsresearch)–  Usingdatafromlocalsensors(e.g.mobile

phones)toidentifyhumanbehavior•  ElectronicHealthRecord(EHR)correlation

–  “Smart”healthrecords•  ImprovedClinicalWorkflow

–  Decisionsupportforclinicians•  ClaimsManagementandFraudDetection

–  Identifyfraudulentclaims•  DrugDiscoveryandDevelopment

ML / DL in Healthcare – Use Cases

•  Manytypesofdata–  Genomic–  Microbiome–  Epigenome–  Etc.

•  Hugevolumesofdata(petabytes>exabytes)

Use Case: Precision Medicine

360° View of the Patient

Visit

CareSite

Rx Patient

Demographics

Studies

Genomics

Diagnosis

Labs

Deployment Challenges for Distributed ML / DL

Why Distributed ML / DL?

Speed

LargeDataVolumes

FaultTolerance

•  Complexity,lackofrepeatabilityandreproducibilityacrossenvironments

•  Sharingdata,notduplicatingdata

•  Needagilitytoscaleupanddowncomputeresources

•  Deployingmultipledistributedplatforms,libraries,applications,andversions

•  Onesizeenvironmentfitsnone

•  Needaflexibleandfuture-proofsolution

Laptop On-PremCluster

Off-PremCluster

Distributed ML / DL – Challenges

Example Deployment Challenges

•  Howtorunclustersonheterogeneoushosthardware–  CPUsandGPUs,includingmultipleGPUversions

•  Howtomaximizeuseofexpensivehardwareresources•  Howtominimizemanualoperations

–  Automatingtheclustercreationandanddeploymentprocess–  Creatingreproducibleclustersandreproducibleresults–  Enablingon-demandprovisioningandelasticity

Example Deployment Challenges

•  Howtosupportthelatestversionsofsoftware–  Deploymentcomplexityandupgrades–  Versioncompatibility

•  Howtoensureenterprise-classsecurity–  Network,storage,userauthentication,andaccess

Dockerissoftwarethatperformsoperating-system-levelvirtualization

alsoknownascontainerization.

Containerizationallowstheexistenceofmultipleinstancesonaserver.

Source:https://en.wikipedia.org/wiki/docker_(software)

Modern Technology Innovations

SimplifyDeployments InnovateFaster DeployAnywhere

Distributed ML / DL and Containers

•  ML/DLapplicationsarecomputehardwareintensive

•  Theycanbenefitfromtheflexibility,agility,andresourcesharingattributesofcontainerization

•  Butcaremustbetakeninhowthisisdone,especiallyinalarge-scaledistributedenvironment

22

IT

Data

DataPlatforms

DataScienceandML/DL

Tools

Solutions GenomeResearch VideoSurveillanceCustomer360

ExampleIndustryUseCases

FraudDetection

HDFS/NFS

UserAccess Security TimetoDeploy Multi-Tenant

DataDuplicationDataStore Cloud

AI-Driven Solutions for the Enterprise

Turnkey Container-Based Solution

IOBoost™–ExtremeperformanceandscalabilityElasticPlane™–Self-service,multi-tenantclusters

DataTap™–In-placeaccesstodataon-premorinthecloud

BlueDataEPIC™SoftwarePlatformDataScientists Developers DataEngineers

DataAnalysts

BI/AnalyticsTools Bring-Your-Own

NFS HDFS

Compute

Storage

On-Premises PublicCloud

BigDataTools ML/DLTools DataScienceTools

CPUs GPUs

TensorFlow and Horovod on Containers with Intel Xeon processors and Intel MKL

Distributed TensorFlow – Concepts

•  RunningTensorFlowtraininginparallel,onmultipledevices

•  Goalistoimproveaccuracyandspeed•  Differentlayersmaybetrainedondifferentnodes(modelparallelism)

•  Samemodelcanappliedondifferentsubsetofdata,indifferentnodes(dataparallelism)

https://towardsdatascience.com/distributed-tensorflow-using-horovod-6d572f8790c4

Distributed TensorFlow – Schemes •  Dataparallelismimplementation

–  Needstosyncmodelparameters–  Usesacentralizedordecentralizedschemeto

communicateparameterupdate

•  CentralizedschemesuseParameterServertocommunicateupdatestoparameters(gradients)betweennodes

•  Decentralizedschedulesusering-allreducescheme

•  HorovodisanopensourceframeworkdevelopedbyUberthatsupportsallreduce

https://towardsdatascience.com/distributed-tensorflow-using-horovod-6d572f8790c4

Meet Horovod

•  Distributedtrainingframeworkfor–  Tensorflow,PyTorch,Keras

•  SeparatesinfrastructurecapabilitiesfromML•  InstallseasilyonexistingMLframework

–  pipinstallhorovod•  Usesbandwidthoptimalcommunication

protocol–  RDMA,InfiniBandifavailable

SharedData

TensorFlow with Horovod on Docker DockerContainers

MPI3.1.3TensorFlow1.7*

MKL

MPI3.1.3TensorFlow1.7*

MKL

MPI3.1.3TensorFlow1.7*

MKL

Horovodclusteronmultiplecontainers,andmachines

TensorFlow with Horovod

•  tensorflow_wrd2vec.pyfromgithttps://github.com/horovod/horovodexamples

•  DatacomesfromsharedNFSmounts,automaticallysurfacedbyBlueDataintocontainers

•  Passwordlesssshsetupduringclustercreation•  Allprerequisitesinstalledonallnodes,including:

–  MKL–MathKernelLibrary–  tensorflow,pytorch,scikit-learn,...(computeframeworks)–  openmpi(Todistributethejob)–  tensorboardforvisualization

App Store with Pre-Built ML / DL Images

*

Dockerimagesformultipleapplications

andversions

Abilitytocreateandaddnewimages

mpirun–np4/--allow-run-as-root/-d-Hbluedata-302.bdlocal:2,bluedata-301.bdlocal:4/-bind-tonone-map-byslot/-xLD_LIBRARY_PATH/-xPATH/-mcapmlob1/-mcabtl^openibpythontensorflow_word2vec_logs.py

TensorFlow with Horovod

Lessons Learned and Key Takeaways

Lessons Learned and Takeaways

•  EnterprisesareusingML/DLtodaytosolvedifficultproblems(exampleusecases:frauddetection,diseaseprediction)

•  DistributedML/DLintheenterpriserequiresacomplexstack,withmultipledifferenttools(TensorFlowisonepopularoption)

•  Theonlyconstantischange…beprepared–  Businessneeds,usecases,andtoolswillconstantlyevolve

•  Deploymentsarechallenging,withmanypotentialpitfalls–  Containerizationcandeliveragilityandcostsavingbenefits

Lessons Learned and Takeaways

•  Leverageaflexible,scalable,andelasticplatformforsuccess–  BlueDataprovidesaturnkeycontainer-basedplatformforlarge-scaledistributedAI/ML/DLintheenterprise

–  Enterprise-gradesecurityandperformance,proveninproductionatleadingGlobal2000organizations

–  Decouplecomputefromstorageforgreaterefficiency,anddeployon-premises,inahybridmodel,ormulti-cloud

–  Savetime,savemoney,andaccelerateinnovation

Tolearnmore,visittheBlueDataboothintheExpoHall

Thank You

www.bluedata.com

Recommended