Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Solutions and Reference architecture
NandaVijaydev– BlueData(recentlyacquiredbyHPE)TheAIConference,NewYork
AccelerateInnovationintheEnterprisewithDistributedML/DL
• AI,MachineLearning(ML),andDeepLearning(DL)• ExampleEnterpriseUseCases• DeploymentChallengesforDistributedML/DL• DistributedTensorFlowandHorovodonContainerswithIntelXeonprocessorsandIntelMKL
• LessonsLearnedandKeyTakeaways
Agenda
AI, Machine Learning, and Deep Learning
Let’s Get Grounded…What is AI? What are Machine Learning and Deep Learning?
Artificial intelligence
Machine learning
Deep learning
Artificial intelligence (AI) Mimics human behavior. Any technique that enables machines to solve a task in a way like humans do.
Machine learning (ML) Algorithms that allow computers to learn from examples without being explicitly programmed.
Deep learning (DL) Subset of ML, using deep artificial neural networks as models, inspired by the structure and function of the human brain.
Example: Siri
Example: Google Maps
Example: Self-driving car
Why Should You Be Interested in AI / ML / DL?
EveryonewantsAI/ML/DLandadvancedanalytics….
….butfacemanychallenges
Usecases
Newroles,skillgaps
Cultureandchange
Datapreparation
Legacyinfrastructure
AIandadvancedanalyticsinfrastructurecouldconstitute
15-20%ofthemarketby20211
AIandadvancedanalyticsrepresent
2oftop3CIOpriorities
EnterpriseAIadoption
2.7Xgrowthinlast4years2
1 IDC. Goldman Sachs. HPE Corporate Strategy.2018 2 Gartner - “2019 CIO Survey: CIOs Have Awoken to the Importance of AI”
Key Questions Remain …
How do you integrate your AI and data ecosystem for ML / DL and advanced analytics?
How do you modernize, consume, and prepare your EDW or Hadoop big data foundation for AI?
How do you get started with gaining intelligence with your data?
What is the best way to prepare your company for a data-centric and AI future?
What opportunities does AI bring to your business? What are the major use cases?
AI / ML / DL Adoption in the Enterprise
Health Personalized medicine, image analytics
Manufacturing Predictive and prescriptive maintenance
Consumer tech Chatbots
Financial services Fraud detection, ID verification
Government Cyber-security, smart cities and utilities
Energy Seismic and reservoir modeling
Service providers Media delivery
Retail Video surveillance, shopping patterns
Example Enterprise Use Cases
Financial Services Use Cases
FraudDetection
• Real-TimeTransactions• CreditCard• Merchant• Collusion• Impersonation• SocialEngineering
Fraud
RiskModeling&CreditWorthiness
Check
• LoanDefaults• DelayedPayments• Liquidity• Market&Currencies• Purchasesand
Payments• TimeSeries
CLVPredictionand
Recommendation
• HistoricalPurchaseView
• PatternRecognition• RetentionStrategy• Upsell• Cross-Sell• Nurturing
CustomerSegmentation
• BehavioralAnalysis• Understanding
CustomerQuadrant• EffectiveMessaging&
ImprovedEngagement• TargetedCustomer
Support• EnhancedRetention
Other
• ImageRecognition• NLP• Security• VideoAnalysis
WideRangeofML/DLUseCasesforWholesale/CommercialBanking,CreditCard/Payments,RetailBanking,etc.
CLV:CustomerLifetimeValue
Fraud Detection Use Case
• OneofthemostcommonusecasesforML/DLinFinancialServicesistodetectandpreventfraud
• Thisrequires:– DistributedBigDataprocessingframeworkssuchasSpark– ML/DLtoolssuchasTensorFlow,H2O,andothers– Continuousmodeltraininganddeployment– Multiplelargedatasets
Fraud Detection Use Case (cont’d)
• DatascienceteamsneedtheabilitytocreatedistributedML/DLenvironmentsforsandboxaswellastrialanderrorexperimentation
• Thisrequires:– Hardwareacceleration(e.g.Xeon,MKL)– MultipledifferentML/DLanddatasciencetools– Fastandrepeatabledeploymentofclusters
• PrecisionMedicineandPersonalSensing– Diseaseprediction,diagnosis,anddetection
(e.g.genomicsresearch)– Usingdatafromlocalsensors(e.g.mobile
phones)toidentifyhumanbehavior• ElectronicHealthRecord(EHR)correlation
– “Smart”healthrecords• ImprovedClinicalWorkflow
– Decisionsupportforclinicians• ClaimsManagementandFraudDetection
– Identifyfraudulentclaims• DrugDiscoveryandDevelopment
ML / DL in Healthcare – Use Cases
• Manytypesofdata– Genomic– Microbiome– Epigenome– Etc.
• Hugevolumesofdata(petabytes>exabytes)
Use Case: Precision Medicine
360° View of the Patient
Visit
CareSite
Rx Patient
Demographics
Studies
Genomics
Diagnosis
Labs
Deployment Challenges for Distributed ML / DL
Why Distributed ML / DL?
Speed
LargeDataVolumes
FaultTolerance
• Complexity,lackofrepeatabilityandreproducibilityacrossenvironments
• Sharingdata,notduplicatingdata
• Needagilitytoscaleupanddowncomputeresources
• Deployingmultipledistributedplatforms,libraries,applications,andversions
• Onesizeenvironmentfitsnone
• Needaflexibleandfuture-proofsolution
Laptop On-PremCluster
Off-PremCluster
Distributed ML / DL – Challenges
Example Deployment Challenges
• Howtorunclustersonheterogeneoushosthardware– CPUsandGPUs,includingmultipleGPUversions
• Howtomaximizeuseofexpensivehardwareresources• Howtominimizemanualoperations
– Automatingtheclustercreationandanddeploymentprocess– Creatingreproducibleclustersandreproducibleresults– Enablingon-demandprovisioningandelasticity
Example Deployment Challenges
• Howtosupportthelatestversionsofsoftware– Deploymentcomplexityandupgrades– Versioncompatibility
• Howtoensureenterprise-classsecurity– Network,storage,userauthentication,andaccess
Dockerissoftwarethatperformsoperating-system-levelvirtualization
alsoknownascontainerization.
Containerizationallowstheexistenceofmultipleinstancesonaserver.
Source:https://en.wikipedia.org/wiki/docker_(software)
Modern Technology Innovations
SimplifyDeployments InnovateFaster DeployAnywhere
Distributed ML / DL and Containers
• ML/DLapplicationsarecomputehardwareintensive
• Theycanbenefitfromtheflexibility,agility,andresourcesharingattributesofcontainerization
• Butcaremustbetakeninhowthisisdone,especiallyinalarge-scaledistributedenvironment
22
IT
Data
DataPlatforms
DataScienceandML/DL
Tools
Solutions GenomeResearch VideoSurveillanceCustomer360
ExampleIndustryUseCases
FraudDetection
HDFS/NFS
UserAccess Security TimetoDeploy Multi-Tenant
DataDuplicationDataStore Cloud
AI-Driven Solutions for the Enterprise
Turnkey Container-Based Solution
IOBoost™–ExtremeperformanceandscalabilityElasticPlane™–Self-service,multi-tenantclusters
DataTap™–In-placeaccesstodataon-premorinthecloud
BlueDataEPIC™SoftwarePlatformDataScientists Developers DataEngineers
DataAnalysts
BI/AnalyticsTools Bring-Your-Own
NFS HDFS
Compute
Storage
On-Premises PublicCloud
BigDataTools ML/DLTools DataScienceTools
CPUs GPUs
TensorFlow and Horovod on Containers with Intel Xeon processors and Intel MKL
Distributed TensorFlow – Concepts
• RunningTensorFlowtraininginparallel,onmultipledevices
• Goalistoimproveaccuracyandspeed• Differentlayersmaybetrainedondifferentnodes(modelparallelism)
• Samemodelcanappliedondifferentsubsetofdata,indifferentnodes(dataparallelism)
https://towardsdatascience.com/distributed-tensorflow-using-horovod-6d572f8790c4
Distributed TensorFlow – Schemes • Dataparallelismimplementation
– Needstosyncmodelparameters– Usesacentralizedordecentralizedschemeto
communicateparameterupdate
• CentralizedschemesuseParameterServertocommunicateupdatestoparameters(gradients)betweennodes
• Decentralizedschedulesusering-allreducescheme
• HorovodisanopensourceframeworkdevelopedbyUberthatsupportsallreduce
https://towardsdatascience.com/distributed-tensorflow-using-horovod-6d572f8790c4
Meet Horovod
• Distributedtrainingframeworkfor– Tensorflow,PyTorch,Keras
• SeparatesinfrastructurecapabilitiesfromML• InstallseasilyonexistingMLframework
– pipinstallhorovod• Usesbandwidthoptimalcommunication
protocol– RDMA,InfiniBandifavailable
SharedData
TensorFlow with Horovod on Docker DockerContainers
MPI3.1.3TensorFlow1.7*
MKL
MPI3.1.3TensorFlow1.7*
MKL
MPI3.1.3TensorFlow1.7*
MKL
Horovodclusteronmultiplecontainers,andmachines
TensorFlow with Horovod
• tensorflow_wrd2vec.pyfromgithttps://github.com/horovod/horovodexamples
• DatacomesfromsharedNFSmounts,automaticallysurfacedbyBlueDataintocontainers
• Passwordlesssshsetupduringclustercreation• Allprerequisitesinstalledonallnodes,including:
– MKL–MathKernelLibrary– tensorflow,pytorch,scikit-learn,...(computeframeworks)– openmpi(Todistributethejob)– tensorboardforvisualization
App Store with Pre-Built ML / DL Images
*
Dockerimagesformultipleapplications
andversions
Abilitytocreateandaddnewimages
mpirun–np4/--allow-run-as-root/-d-Hbluedata-302.bdlocal:2,bluedata-301.bdlocal:4/-bind-tonone-map-byslot/-xLD_LIBRARY_PATH/-xPATH/-mcapmlob1/-mcabtl^openibpythontensorflow_word2vec_logs.py
TensorFlow with Horovod
Lessons Learned and Key Takeaways
Lessons Learned and Takeaways
• EnterprisesareusingML/DLtodaytosolvedifficultproblems(exampleusecases:frauddetection,diseaseprediction)
• DistributedML/DLintheenterpriserequiresacomplexstack,withmultipledifferenttools(TensorFlowisonepopularoption)
• Theonlyconstantischange…beprepared– Businessneeds,usecases,andtoolswillconstantlyevolve
• Deploymentsarechallenging,withmanypotentialpitfalls– Containerizationcandeliveragilityandcostsavingbenefits
Lessons Learned and Takeaways
• Leverageaflexible,scalable,andelasticplatformforsuccess– BlueDataprovidesaturnkeycontainer-basedplatformforlarge-scaledistributedAI/ML/DLintheenterprise
– Enterprise-gradesecurityandperformance,proveninproductionatleadingGlobal2000organizations
– Decouplecomputefromstorageforgreaterefficiency,anddeployon-premises,inahybridmodel,ormulti-cloud
– Savetime,savemoney,andaccelerateinnovation
Tolearnmore,visittheBlueDataboothintheExpoHall
Thank You
www.bluedata.com