35
Solutions and Reference architecture Nanda Vijaydev BlueData (recently acquired by HPE) The AI Conference, New York Accelerate Innovation in the Enterprise with Distributed ML / DL

Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Solutions and Reference architecture

NandaVijaydev– BlueData(recentlyacquiredbyHPE)TheAIConference,NewYork

AccelerateInnovationintheEnterprisewithDistributedML/DL

Page 2: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

•  AI,MachineLearning(ML),andDeepLearning(DL)•  ExampleEnterpriseUseCases•  DeploymentChallengesforDistributedML/DL•  DistributedTensorFlowandHorovodonContainerswithIntelXeonprocessorsandIntelMKL

•  LessonsLearnedandKeyTakeaways

Agenda

Page 3: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

AI, Machine Learning, and Deep Learning

Page 4: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Let’s Get Grounded…What is AI? What are Machine Learning and Deep Learning?

Artificial intelligence

Machine learning

Deep learning

Artificial intelligence (AI) Mimics human behavior. Any technique that enables machines to solve a task in a way like humans do.

Machine learning (ML) Algorithms that allow computers to learn from examples without being explicitly programmed.

Deep learning (DL) Subset of ML, using deep artificial neural networks as models, inspired by the structure and function of the human brain.

Example: Siri

Example: Google Maps

Example: Self-driving car

Page 5: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Why Should You Be Interested in AI / ML / DL?

EveryonewantsAI/ML/DLandadvancedanalytics….

….butfacemanychallenges

Usecases

Newroles,skillgaps

Cultureandchange

Datapreparation

Legacyinfrastructure

AIandadvancedanalyticsinfrastructurecouldconstitute

15-20%ofthemarketby20211

AIandadvancedanalyticsrepresent

2oftop3CIOpriorities

EnterpriseAIadoption

2.7Xgrowthinlast4years2

1 IDC. Goldman Sachs. HPE Corporate Strategy.2018 2 Gartner - “2019 CIO Survey: CIOs Have Awoken to the Importance of AI”

Page 6: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Key Questions Remain …

How do you integrate your AI and data ecosystem for ML / DL and advanced analytics?

How do you modernize, consume, and prepare your EDW or Hadoop big data foundation for AI?

How do you get started with gaining intelligence with your data?

What is the best way to prepare your company for a data-centric and AI future?

What opportunities does AI bring to your business? What are the major use cases?

Page 7: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

AI / ML / DL Adoption in the Enterprise

Health Personalized medicine, image analytics

Manufacturing Predictive and prescriptive maintenance

Consumer tech Chatbots

Financial services Fraud detection, ID verification

Government Cyber-security, smart cities and utilities

Energy Seismic and reservoir modeling

Service providers Media delivery

Retail Video surveillance, shopping patterns

Page 8: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Example Enterprise Use Cases

Page 9: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Financial Services Use Cases

FraudDetection

•  Real-TimeTransactions•  CreditCard•  Merchant•  Collusion•  Impersonation•  SocialEngineering

Fraud

RiskModeling&CreditWorthiness

Check

•  LoanDefaults•  DelayedPayments•  Liquidity•  Market&Currencies•  Purchasesand

Payments•  TimeSeries

CLVPredictionand

Recommendation

•  HistoricalPurchaseView

•  PatternRecognition•  RetentionStrategy•  Upsell•  Cross-Sell•  Nurturing

CustomerSegmentation

•  BehavioralAnalysis•  Understanding

CustomerQuadrant•  EffectiveMessaging&

ImprovedEngagement•  TargetedCustomer

Support•  EnhancedRetention

Other

•  ImageRecognition•  NLP•  Security•  VideoAnalysis

WideRangeofML/DLUseCasesforWholesale/CommercialBanking,CreditCard/Payments,RetailBanking,etc.

CLV:CustomerLifetimeValue

Page 10: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Fraud Detection Use Case

•  OneofthemostcommonusecasesforML/DLinFinancialServicesistodetectandpreventfraud

•  Thisrequires:–  DistributedBigDataprocessingframeworkssuchasSpark– ML/DLtoolssuchasTensorFlow,H2O,andothers–  Continuousmodeltraininganddeployment– Multiplelargedatasets

Page 11: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Fraud Detection Use Case (cont’d)

•  DatascienceteamsneedtheabilitytocreatedistributedML/DLenvironmentsforsandboxaswellastrialanderrorexperimentation

•  Thisrequires:–  Hardwareacceleration(e.g.Xeon,MKL)– MultipledifferentML/DLanddatasciencetools–  Fastandrepeatabledeploymentofclusters

Page 12: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

•  PrecisionMedicineandPersonalSensing–  Diseaseprediction,diagnosis,anddetection

(e.g.genomicsresearch)–  Usingdatafromlocalsensors(e.g.mobile

phones)toidentifyhumanbehavior•  ElectronicHealthRecord(EHR)correlation

–  “Smart”healthrecords•  ImprovedClinicalWorkflow

–  Decisionsupportforclinicians•  ClaimsManagementandFraudDetection

–  Identifyfraudulentclaims•  DrugDiscoveryandDevelopment

ML / DL in Healthcare – Use Cases

Page 13: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

•  Manytypesofdata–  Genomic–  Microbiome–  Epigenome–  Etc.

•  Hugevolumesofdata(petabytes>exabytes)

Use Case: Precision Medicine

Page 14: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

360° View of the Patient

Visit

CareSite

Rx Patient

Demographics

Studies

Genomics

Diagnosis

Labs

Page 15: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Deployment Challenges for Distributed ML / DL

Page 16: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Why Distributed ML / DL?

Speed

LargeDataVolumes

FaultTolerance

Page 17: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

•  Complexity,lackofrepeatabilityandreproducibilityacrossenvironments

•  Sharingdata,notduplicatingdata

•  Needagilitytoscaleupanddowncomputeresources

•  Deployingmultipledistributedplatforms,libraries,applications,andversions

•  Onesizeenvironmentfitsnone

•  Needaflexibleandfuture-proofsolution

Laptop On-PremCluster

Off-PremCluster

Distributed ML / DL – Challenges

Page 18: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Example Deployment Challenges

•  Howtorunclustersonheterogeneoushosthardware–  CPUsandGPUs,includingmultipleGPUversions

•  Howtomaximizeuseofexpensivehardwareresources•  Howtominimizemanualoperations

–  Automatingtheclustercreationandanddeploymentprocess–  Creatingreproducibleclustersandreproducibleresults–  Enablingon-demandprovisioningandelasticity

Page 19: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Example Deployment Challenges

•  Howtosupportthelatestversionsofsoftware–  Deploymentcomplexityandupgrades–  Versioncompatibility

•  Howtoensureenterprise-classsecurity–  Network,storage,userauthentication,andaccess

Page 20: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Dockerissoftwarethatperformsoperating-system-levelvirtualization

alsoknownascontainerization.

Containerizationallowstheexistenceofmultipleinstancesonaserver.

Source:https://en.wikipedia.org/wiki/docker_(software)

Modern Technology Innovations

SimplifyDeployments InnovateFaster DeployAnywhere

Page 21: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Distributed ML / DL and Containers

•  ML/DLapplicationsarecomputehardwareintensive

•  Theycanbenefitfromtheflexibility,agility,andresourcesharingattributesofcontainerization

•  Butcaremustbetakeninhowthisisdone,especiallyinalarge-scaledistributedenvironment

Page 22: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

22

IT

Data

DataPlatforms

DataScienceandML/DL

Tools

Solutions GenomeResearch VideoSurveillanceCustomer360

ExampleIndustryUseCases

FraudDetection

HDFS/NFS

UserAccess Security TimetoDeploy Multi-Tenant

DataDuplicationDataStore Cloud

AI-Driven Solutions for the Enterprise

Page 23: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Turnkey Container-Based Solution

IOBoost™–ExtremeperformanceandscalabilityElasticPlane™–Self-service,multi-tenantclusters

DataTap™–In-placeaccesstodataon-premorinthecloud

BlueDataEPIC™SoftwarePlatformDataScientists Developers DataEngineers

DataAnalysts

BI/AnalyticsTools Bring-Your-Own

NFS HDFS

Compute

Storage

On-Premises PublicCloud

BigDataTools ML/DLTools DataScienceTools

CPUs GPUs

Page 24: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

TensorFlow and Horovod on Containers with Intel Xeon processors and Intel MKL

Page 25: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Distributed TensorFlow – Concepts

•  RunningTensorFlowtraininginparallel,onmultipledevices

•  Goalistoimproveaccuracyandspeed•  Differentlayersmaybetrainedondifferentnodes(modelparallelism)

•  Samemodelcanappliedondifferentsubsetofdata,indifferentnodes(dataparallelism)

https://towardsdatascience.com/distributed-tensorflow-using-horovod-6d572f8790c4

Page 26: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Distributed TensorFlow – Schemes •  Dataparallelismimplementation

–  Needstosyncmodelparameters–  Usesacentralizedordecentralizedschemeto

communicateparameterupdate

•  CentralizedschemesuseParameterServertocommunicateupdatestoparameters(gradients)betweennodes

•  Decentralizedschedulesusering-allreducescheme

•  HorovodisanopensourceframeworkdevelopedbyUberthatsupportsallreduce

https://towardsdatascience.com/distributed-tensorflow-using-horovod-6d572f8790c4

Page 27: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Meet Horovod

•  Distributedtrainingframeworkfor–  Tensorflow,PyTorch,Keras

•  SeparatesinfrastructurecapabilitiesfromML•  InstallseasilyonexistingMLframework

–  pipinstallhorovod•  Usesbandwidthoptimalcommunication

protocol–  RDMA,InfiniBandifavailable

Page 28: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

SharedData

TensorFlow with Horovod on Docker DockerContainers

MPI3.1.3TensorFlow1.7*

MKL

MPI3.1.3TensorFlow1.7*

MKL

MPI3.1.3TensorFlow1.7*

MKL

Horovodclusteronmultiplecontainers,andmachines

Page 29: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

TensorFlow with Horovod

•  tensorflow_wrd2vec.pyfromgithttps://github.com/horovod/horovodexamples

•  DatacomesfromsharedNFSmounts,automaticallysurfacedbyBlueDataintocontainers

•  Passwordlesssshsetupduringclustercreation•  Allprerequisitesinstalledonallnodes,including:

–  MKL–MathKernelLibrary–  tensorflow,pytorch,scikit-learn,...(computeframeworks)–  openmpi(Todistributethejob)–  tensorboardforvisualization

Page 30: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

App Store with Pre-Built ML / DL Images

*

Dockerimagesformultipleapplications

andversions

Abilitytocreateandaddnewimages

Page 31: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

mpirun–np4/--allow-run-as-root/-d-Hbluedata-302.bdlocal:2,bluedata-301.bdlocal:4/-bind-tonone-map-byslot/-xLD_LIBRARY_PATH/-xPATH/-mcapmlob1/-mcabtl^openibpythontensorflow_word2vec_logs.py

TensorFlow with Horovod

Page 32: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Lessons Learned and Key Takeaways

Page 33: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Lessons Learned and Takeaways

•  EnterprisesareusingML/DLtodaytosolvedifficultproblems(exampleusecases:frauddetection,diseaseprediction)

•  DistributedML/DLintheenterpriserequiresacomplexstack,withmultipledifferenttools(TensorFlowisonepopularoption)

•  Theonlyconstantischange…beprepared–  Businessneeds,usecases,andtoolswillconstantlyevolve

•  Deploymentsarechallenging,withmanypotentialpitfalls–  Containerizationcandeliveragilityandcostsavingbenefits

Page 34: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Lessons Learned and Takeaways

•  Leverageaflexible,scalable,andelasticplatformforsuccess–  BlueDataprovidesaturnkeycontainer-basedplatformforlarge-scaledistributedAI/ML/DLintheenterprise

–  Enterprise-gradesecurityandperformance,proveninproductionatleadingGlobal2000organizations

–  Decouplecomputefromstorageforgreaterefficiency,anddeployon-premises,inahybridmodel,ormulti-cloud

–  Savetime,savemoney,andaccelerateinnovation

Page 35: Accelerate Innovation in the Enterprise Solutions and Reference … · 2020. 7. 9. · What are Machine Learning and Deep Learning? Artificial intelligence Machine learning Deep learning

Tolearnmore,visittheBlueDataboothintheExpoHall

Thank You

www.bluedata.com