Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Autonomous Driving Munich Meetup #8Dec 18th 2018Munich, Germany
Data Management for AD (Autonomous Driving)
Frank KraemerIBM Systems Architectmailto:[email protected]
© 2018 IBM Corporation
Automotive Industry generates large amounts of data
Sources: Images from https://www.youtube.com/watch?v=4jW0fJ80VG8https://www.youtube.com/watch?v=dhEgD6ZFlQEhttps://www.youtube.com/watch?t=21&v=39QMYkx89j0
▪ Storage of data (sensor /
video) is very costly.
▪ Handling of these data is
difficult i.e. due to high
required bandwidth.
▪ For testing purposes sensor /
video data are much more
complex in comparison to
discrete bus signals,
electronic values, etc.
Sensor / video data must be synchronously captured, stored, modified and executed with other
testing data such as CAN, FlexRay, Radar, LiDAR, HiSonic, etc. – most common formats are:
ADTF v2/3 (digitalwerk) RTMaps (Intempora) MDF4 and ROS/rosbag.
© 2018 IBM Corporation
Automotive Sensor Setup for AD
3http://currencyobserver.com/2017/12/global-automotive-sensors-market-2017-2022/
Each data source: ~ 2 Gbit/sSensors sets: ~ 30 Gbit/sData collection volume: ~ 15 TB/h
© 2018 IBM Corporation
Data Management for ADAS/AD
Test Drives
50-70 TB / day / car
R&D Labs: tagging
R&D Labs: developing & testing & (re-)simulation & AI training
▪ 300-500 PB data in total
> 200h / 1h driving
o Europeo USAo Chinao Japano Asiao Africa
Training Data as a Service (TDaaS)
Labeling
© 2018 IBM Corporation
The IBM AD Solution Approach for Data
4. How to analyze sensor and video data with fast analytics and modern BigDatatools?
2. How to distribute data globally within an enterprise and partners?
1. How to implement & operate an efficient storage, workflow and management system?
„The Data Foundation“
3. How to preserve digital data for decades with optimized costs?
Analytics & HDFS
Hortonworks / Hadoop / Spark / DSX
IBM AREMA
High-Speed WAN File TransferIBM Aspera / Mass Data Migration / Cloud
Spectrum Computing
Docker / Kubernetes / GPUs
IBM Object Storage (COS)
6. How to do efficient Container workload and IT resource scheduling?
‘Cold’ ArchivingLong Term Archiving / Low Cost Storage / Tape / Cloud
5. How to run Machine Learning (ML) and AI training with Nvidia GPU technology at scale?
Enterprise-Class AI
DGX / AC922 / CUDA / PowerAI Vision
IBM Spectrum Discover
© 2018 IBM Corporation
Data is used in various ADAS/AD development and testing processes
Video & Ground Truth
Deep Learning / AI Training
SW Algorithm Development
MiL / SiL Dev & Testing
HiL Testing
Mostly under the process and methods constrains of Automotive SPICE and ISO26262
CNN (Neuronal Networks)
HPC, Simulation Env.
HiL Environment
ALM & PLM
[HiL Hardware in the Loop / MiL Model in the Loop / SiL Software in the Loop]
[ALM Application Lifecycle Management][PLM Product Lifecycle Management]
© Copyright IBM Corporation 2018
Edge
Fast Ingest /
Real-time Analytics
Classification &
Metadata Tagging
ETL / Data Processing
Archive
Hadoop / Spark
Data Lakes ML / DL
High throughput
Performance Tier
High volume ingest & index
Automated tagging
High throughput
Small & large I/O
Performance & Capacity
Throughput-oriented
Performance & Capacity
Hight throughput
Low latency
Small, random I/O
SSD
SSD/Hybrid
Hybrid/HDD SSD/NVMe
Tape
High scalability
Large I/O
Sequential writes
HDD Cloud
The AD Data Pipeline
Data In
Trained Model
Transient Storage
Global Ingest
Software Defined
Throughput-oriented
Capacity Tier
Throughput-oriented
Globally accessible
SDS/Cloud
Cloud
Insights Out
CLASSIFY ANALYZE / TRAININGEST
© Copyright IBM Corporation 2018
EdgeSpectrum Discover
Elastic
Storage Server
(Hybrid models)
Elastic
Storage Server
(SSD models)
Elastic
Storage Server
(Hybrid models)
Elastic
Storage Server
(SSD / NVMe models)
Elastic
Storage Server
(HDD models)
The SpectrumAI Data Pipeline for AD
Fast Ingest /
Real-time Analytics
Classification &
Metadata Tagging
ETL / Data Processing
Archive
Hadoop / Spark
Data Lakes ML / DL
Spectrum Archive
(Tape)
Data In Insights Out
Trained Model
Spectrum Scale / Elastic
Storage Server
(SSD models)
Cloud Object
Storage
Cloud Object
Storage
Transient Storage
Global Ingest
Cloud Object
Storage
CLASSIFY ANALYZE / TRAININGEST
© 2018 IBM Corporation9
Building-block ”HOT” High Performance I/O File Storage
Block
iSCSI
Client workstations Users, Containers
and applications
HPC & HTCCompute farm
Traditionalapplications
GLOBAL Namespace
Analytics
Transparent HDFS
OpenStack
Cinder
Glance
Manila
Object
Swift S3
Transparent Cloud
Powered byIBM Spectrum Scale
Automated data placement and data migration
Disk Tape Shared Nothing Cluster (FPO)
FlashNVMe
New Genapplications
Transparent Cloud Tier (TCT)
Worldwide File Data Distribution (AFM)
Site B
Site A
Site C
SMBNFS
POSIX
File
EncryptionFile AuditLoggingImmutability
DR Site
AFM-DR
JBOD/JBOF
ESS
Spectrum Scale RAID
Compression
DGX-1/2
S3 Data Cloud
Management APIAdvanced GUIRESTful API
Cloud Data Sharing
Hi-Perf Scale POD I/O
10© Copyright IBM Corporation 2018
Composable to grow as needed
• Up to 9 DGX-1 servers (72 GPUs) in a rack
• Storage scale-out from a single 300TB node to 8 Exabytes and a Yottabyte of files
High-Performance to feed the GPUs
• NVMe throughput of 120GB/s in a rack
• Over 40GB/s sustained random read per 2U
Extensible for the AI Data Pipeline
• Support for any tiered storage, including Cloud and Tape
Converged Solution for Data Science Productivity
Introducing IBM SpectrumAI
with NVIDIA DGX-1/2
A Scalable, software-defined infrastructure powered by IBM Spectrum Scale and NVIDIA DGX-1 systems. IBM SpectrumAI with NVIDIA DGX is the perfect engine for your data pipeline.
The workhorse of an AI data infrastructure on which companies can build their shared data service.
IBM Storage and SDI
© Copyright IBM Corporation 2018© Copyright IBM Corporation 2018
IBM NVMe_Powered_ESS
Densest and fastest storage
with up to 40GB/s
throughput
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
IBM ESS GSxS models
based on SSDs
10 - 40 GB/s throughput
IBM SpectrumAI with NVIDIA DGX Reference Architecture Building Blocks Description
DGX-1 or DGX-2 Servers – purpose-built solutions for AI and machine learning, integrating eight of the world’s most advanced data center accelerator – the NVIDIA Tesla V100 Tensor Core GPU
The NVIDIA DGX software stack, optimized for maximized GPU-accelerated training performance, including the new RAPIDS framework to accelerate data science workflow
IBM Spectrum Scale v5, the leading software-defined file storage, architected specifically for AI workloads with enhanced small file, metadata and random IO performance.
NVMe all-Flash storage for extremely low latency power efficiency and data density. Using IBM Spectrum Scale distributed data protection it delivers over 300TB in every 2U building block and 120GB/s of data throughput in a rack. (GA 2019)
Seamless data pipeline connectivity across multiple racks, other IBM SpectrumAI configurations, and workstations to provide the Data Scientists with a unified view of their AI data pipeline.
Mellanox IB Networking
NVIDIA DGX Servers
Mellanox IB Networking
NVIDIA DGX Servers
© 2018 IBM Corporation
IBM SpectrumAI Reference Architecture
12 https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=86022386USEN&
© 2018 IBM Corporation13
IBM Cloud Object Store COS (S3)
Object Storage definition:
a massively scalable, simple to
manage storage technology that uses
logical constructs to store data as
discrete objects in a flat address space
instead of the hierarchical,
directory‐based file systems.
FILE STORAGE OBJECT STORAGE
• Stores billions of files
• Optimum Performance
• File system hierarchy
• Full POSIX Support
• NAS protocol support
• Best for file based workflows
• Best I/O Performance
• Low Latency access
• Stores billions of objects
• Optimum Price
• Scales uniformly
• S3 protocol API
• Geo dispersed
• Cloud native App support
• High Latency access
Idea:“Combine the best of both worlds.“
https://www.ibm.com/cloud/object-storage
© 2018 IBM Corporation14
Scanning and Event Notifications
IBM Spectrum Discover
File and Object Storage Data Activation/OptimizationData Insight
Large-Scale Analytics
Risk Mitigation
Data Optimization
• Dataset identification
• Data pipeline progression
• Data discovery
• Data inspection
• Data classification
• Data clean-up
• Archival / tiering
• Duplicate data removal
• Trivial data removal
Use Cases
• Metadata curation
• Custom metadata tagging
• Automatic Indexing
• Policy Engine
• Action Agent API
Reporting DashboardSearch
IBM Spectrum Discover
Planned for 2019 and other 3rd parties
© 2018 IBM Corporation
Workload and data flow for AI flow is complex
Traditional Business Data
Sensor Data
Data from collaboration
partners
Data from mobile app and social media
Legacy Data
Data Preparation
Pre-Processing
Training Dataset
Data Source Model Training Inference
AI Deep Learning Frameworks(Tensorflow, Caffe, …)
Monitor & Advise
Instrumentation
Iterate
Distributed & Elastic Deep Learning (Fabric)
Parallel Hyper-Parameter Search & Optimization
Network Models
Hyper-Parameters
Testing Dataset
Trained Model
Deploy in Production using Trained Model
New Data
Years of DataHours and weeks of
preparation
Weeks and months of training
Sub Seconds to results
Heavy IO
https://public.dhe.ibm.com/common/ssi/ecm/75/en/75016775usen/systems-hardware-ibm-spectrum-computing-analyst-paper-or-report-75016775usen-20180618.pdf
IBM Reference Architecture for AI Infrastructure
© 2018 IBM Corporation16
IBM AI Infrastructure Reference Software Stack
Data Layer
Runtimes,Resource & WL Managers
DL FrameworksML Libraries
ML/DL UI and Flow
Data Science AppsValue-add Tools
IBM Spectrum Conductor
Tensor Flow
Caffe PyTorch Chainer MLLib GraphxScikit-learn
R xgboost
GPU Support / Distributed / BYOF / Session Scheduler / MPI / Containers… Anaconda PythonSpark
PIEDSX Anaconda
Distributed Deep Learning (DDL)
Data Prep / Parallel Training / Model Tuning / Model Evaluation / Inference Services…
IBM Spectrum Conductor Deep Learning Impact (DLI)
PowerAI Vision DLaaS
IBM Spectrum Scale IBM Cloud Object Store
IBM PowerAIEnterprise
Hardware Layer AC922 + V100 GPUs ESS /FS9100
© 2018 IBM Corporation17
IBM PowerAI Vision for Data Labeling and Export
PowerAI Vison includes an intuitive toolset that empowers subject matter experts to create Vision models, without coding or deep learning expertise. It includes the most popular deep learning frameworks and their dependencies, and it is built for easy and rapid deployment and increased team productivity.
© 2018 IBM Corporation
Reference Materials
18
AI in action: Autonomous vehicleshttps://www.ibm.com/blogs/systems/ai-in-action-autonomous-vehicles/
IBM Storage Solutions for ADAS and Autonomous Driving (AD)https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=34019934USEN
IBM Big Data for Autonomous Drivinghttps://www.youtube.com/watch?v=eGhiIHDJaqI
IBM SpectrumAI Information- Solution Brief- Reference Architecture- Benchmark Results
https://www.ibm.com/it-infrastructure/storage/ai
© 2018 IBM Corporation
Reference IBM Spectrum Scale @ DESY Hamburg, Germany
IBM Spectrum Scale @ DESY in Hamburg, GermanyOverall capacity 100 PB and data rates up to 50 GB/sechttp://iopscience.iop.org/article/10.1088/1742-6596/664/4/042053
© 2018 IBM Corporation
Reference IBM Spectrum Scale @ Mobileye, Israel for ADAS Storage
Critical factors are Data Volume and I/O Performance.
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
ESS 5U84 Storage
© 2018 IBM Corporation
Reference IBM Spectrum Scale ESS CORAL, USA
▪ 2.5 TB/sec single stream IOR as requested from ORNL
▪ 1 TB/sec 1MB sequential read/write as stated in CORAL RFP
▪ Single Node 16 GB/sec sequential read/write as requested from ORNL
▪ 50K creates/sec per shared directory as stated in CORAL RFP
▪ 2.6 Million 32K file creates/sec as requested from ORNL
▪ Summit’s 250-petabyte storage system is delivered by a cluster of 77x
IBM ESS Storage Systems that will deliver 2.5 TBs of data.
▪ Summit will have the capacity of 30B files and 30B directories and will
be able create files at a rate of over 2.6 million I/O file operations per
second.
https://www.ibm.com/blogs/systems/fastest-storage-fastest-system-summit/