25
PowerAI World’s Fastest AI Platform for Enterprise Sumit Gupta VP, HPC, AI, and Analytics IBM Cognitive Systems May 2017

PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

Embed Size (px)

Citation preview

Page 1: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

PowerAIWorld’s Fastest AI Platform for Enterprise

Sumit GuptaVP, HPC, AI, and AnalyticsIBM Cognitive Systems

May 2017

Page 2: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

NewadditionstoPowerAI

2

Page 3: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

3

Transmission Line Inspection

Page 4: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

4

Data LakeTransform & Prep Data (ETL)

Trained Model

Images of Damaged

Components

Model Training

Transform & Prep Data (ETL)

Off-LineTraining

Production

Live Video

Page 5: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

5

Data Lake & Data Stores

Distributed Computing

ML & DL Libraries & Frameworks

Cognitive APIs (Eg: Watson)

In-House Cognitive APIs

Applications

Hadoop HDFS,NoSQL DBs

Spark, MPI

TensorFlow, Caffe, SparkML

Speech, Vision, NLP, Sentiment

Segment Specific: Finance, Retail, Healthcare, etc.

Accelerated Servers Storage

Accelerated Infrastructure

Transform & Prep Data (ETL)

Page 6: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

6

Data Lake & Data Stores

Distributed Computing

ML & DL Libraries & Frameworks

Cognitive APIs (Eg: Watson)

In-House Cognitive APIs

Applications

Accelerated Servers Storage

Data Prep, ETL, Curation, Data

Labeling

Performance to Reduce Training Time

Multi-tenant, Cluster Virtualization, DL

Framework Scaling

Feature extraction, Selecting Right Model,

Hyper-parameter tuning

Finding Right “Tagged” Data, Model Integrity

Use Case Identification, Access to Enough Data

Transform & Prep Data (ETL)

Page 7: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

PowerAI: Enterprise Class, Ease of Use, Faster Training

Enterprise Software Distribution

BinaryPackageofMajorDeepLearningFrameworkswithEnterpriseSupport

Tools for Ease of Development

GraphicaltoolstoEnhanceDataScientistDeveloper

Experience

Faster Training Times for Data Scientists

PerformanceOptimizedforSingleNode&Distributed

ComputingScaling

Page 8: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

PowerAI: Making AI More Accessible to Developers

• AIVision:TargetedatApplicationDevelopers

• DataExtraction,TransformationandPreparationtool

• DLInsight

• DistributedDeepLearning

Multi-tenant,Enterprise-readyDeepLearningPlatformforDataScientists8

Page 9: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

PowerAI

DL Frameworks + Libraries(TensorFlow, Caffe, ..)

IBM Data Science Experience (DSX)

Distributed Computing with Spark & MPI

DL Developer Tools

SpectrumScaleHigh-SpeedFileSystemviaHDFSAPIsClusterofNVLink Servers

PowerAI Enterprise (Coming soon)

IBM Enterprise Support

Application Dev Services

EnterpriseSupport&ServicestoAugmentEnterprise

Expertise

Packaged,Pre-CompiledDeepLearningFrameworks

(TensorFlow,Caffe,Torch,..)

OptimizedforScaling&FastTrainingTime

DataScientistsProductivityToolsTargetedtoDL

Developers

IBMConfidential

Page 10: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

DL Frameworks (TF, Caffe, etc)

Data Prep & ETL via Spectrum Conductor

with Spark

InputData

Deep Learning GUIData & Model

Management, ETL Tools, Monitor, Visualize,

Advise

DL InsightTuning Engine

AI VisionComputer Vision App Development Toolkit

IBM Spectrum Conductor with SparkSystem mgmt, Distributed ETL, Distributed Training, Hyper-Parameter Optimization

Distributed Training

Page 11: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

11

Page 12: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

Tumor Proliferation Assessment – mitosis detectionImages from electron-microscope Size of image - 70K * 60K

Framework Format Input Size (Faster R-CNN)

Caffe LMDB 1K*1K

TensorFlow TensorRecord 1K*1K

Data Transformation

Data Distribution among training, validation and testing

Data Shuffle

Page 13: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

Import data from different formats Transform, split and shuffle data

Page 14: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster
Page 15: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

RandomTPE

Tree-based ParzenEstimator

Bayesian

Multi-tenant Spark Cluster(IBM Spectrum Conductor with Spark)

Spark search jobs are generated dynamically and executed in parallel

Page 16: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

Data preparation Model training/tuning

Inference Marked result

Page 17: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

AIVision

Data Lake & Data Stores

Distributed Computing

ML & DL Libraries & Frameworks

Accelerated Servers Storage

Data set management Training task management

Model management Inference API management

Service Management LayerImage preprocessing

managementData label management

Self-defined Training with visualized

monitoring

Custom Learning for Image Classification

Inference API deployment

Image Labeling and Preprocessing

Vision Recognition LayerVideo Labeling

ServiceCustom Learning for

Object Detection

Page 18: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

AI Vision

18

Result on public cloud API : white, red, yellow and teal bird

Result on public cloud API : white and black short beak bird

I’m Aethopyga I’m Pycnonotus

We need to get a new model to classify birds with professional knowledge.

Acridotheres Acrocephalus Aethopyga

Butorides Corvus… >20 categories

User defines categories in AI Vision

Aethopyga: 0.90708

Pycnonotus: 0. 99988

Page 19: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

AI Vision

19

Medical image analysis for cytologic examination AI Talents:

We need tools to speed up

(study number from China)

Page 20: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

SAMPLE USE CASE: SALES ORDER PROCESSING

• Traditional capture is difficult on Sales Orders (SO)• Sales orders contain line data; one SO can have hundreds or

thousands of different line items• Large enterprises might have tens of thousands of clients ordering

items or services by email• Each client might have multiple locations that each has unique order

template(s)• Sample calculation: 40 000 clients x 20 locations -> 800 000 unique

Sales Order templates• To implement using traditional capture by templating:

• 10 hours / template -> 8 million hour exercise -> very bad business case!

• Each order could have hundreds of complex order items

Page 21: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

EXAMPLE: SALES ORDER PROCESSING USING DATACAP & ELINAR.AI

Oldorders/invoices+extractedinformation

=SeveralweeksofSuperComputercapacity(Power8 Minsky + power.ai)

TrainedAIModel

DatacapValidati-on&

Verificat-ion

IncomingOrder/Invoice

DatacapOCR/Layout

DatacapExtracti-

on

CustomerERP/Finance

Order/InvoiceHistory

AITraining

Page 22: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

DETAILS ON IMPLEMENTATION

• Lots of training material needed• IBM Datacap is used to create page layout.xml for each order• Previously human extracted values need to be mached into each

layout.xml for training purposes

• Clever data preparation allows higher quality/accuracy• We can use simple rules to tag certain types of data before it is fed

into neural network; for example Unit of Measurement (UOM) and ZIP code are easy

• Neural network can use these “hints” to increase training accuracy when data set is small; for example if page has 23 UOM tokens it is quite obvious that there has been 23 different order line

• Implemented using Torch LSTMs

Page 23: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

COMING SOON: ELINAR A.I. MINER FOR GDPR DATA• Set of AIs that can reliably extract personal data and privacy

information from:• Business documents and records• Databases and NoSQL data sources• Images

• Pipeline uses Neural Networks implemented using Caffe and Torch augmented with IBM BigInsights text miners and business rules

• Fully developed on IBM Power platfrom, AIs using power.ai

Page 24: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

WHY IBM POWER.AI?

• Nice packaging that has everything Deep Learning Nerd needs J

• Very fast time to value due simple installation; everything works “out-of-the-box”

• Leverages unique Power8 CPU-GPU NVLink communications on “Minsky” and P100 GPUs

• Allows developer to run insanely powerful “Minsky” supercomputer with standard AI tooling like Caffe and Torch

• We previously developed on high end x86, there is no going back

• Can run larger models faster

Page 25: PowerAI - GPU Technology Conferenceon-demand.gputechconf.com/gtc/...sumit-gupta-ibm-powerai-deep-lea… · DL Developer Tools Spectrum Scale High-Speed File System via HDFS APIs Cluster

Thank You