View
5
Download
0
Category
Preview:
Citation preview
Summit & Sierra by the numbers
Single Node 16 GB/sec sequential read/write
50K creates/sec per shared directory
1 TB/sec 1MB sequential read/write
2.5 TB/sec single stream IOR
2.6 Million 32K file creates/sec
Together, more than
44,000 NVIDIA GPUs
400 PBof IBM Storage
Two of the world’s most powerful supercomputers,built for AI
Nothing else like Spectrum Scale
Not even close.
© IBM Corporation 2020 2
The IBM Shark Project Team, 1994
© IBM Corporation 2020 3
GPFS (Spectrum Scale) started in 1998
GPFS: A Shared-Disk File System for Large Computing Clusters
Frank Schmuck and Roger HaskinIBM Almaden Research CenterSan Jose, CA
Frank Roger
© IBM Corporation 2020 4
© IBM Corporation 2020 5
Spectrum Scale Almaden Research Lab, CA, USA
Latitude: 37°12‘37.53‘N / Longitude: 121°48‘25.23‘W
© IBM Corporation 2020 6
Spectrum Scale Lab Poughkeepsie, NY, USA
© IBM Corporation 2020 7
Spectrum Scale Lab Kelsterbach, Hesse, Germany
© IBM Corporation 2020 8
Spectrum Scale Lab Pune, Maharashtra, India
© IBM Corporation 2020 9
Spectrum Scale Lab Bengaluru, Karnataka, India
© IBM Corporation 2020 10
Spectrum Scale Lab Beijing, China
© Copyright IBM Corporation 2020
Spectrum Scale Parallel Architecture
11
No Hot Spots – Maximum Performance
▪ All NSD servers export to all clients in active-active mode
▪ Spectrum Scale stripes files across NSD servers and NSDs in units of file-system block-size
▪ File-system load spread evenly▪ Easy to scale file-system capacity and
performance while keeping the architecture balanced
▪ Supports 25/40/100/200/400 GbE and Infiniband
Spectrum Scale NSD Client does real-time parallel I/O to all the Spectrum Scale NSD servers and storage volumes/NSDs
© Copyright IBM Corporation 2020
Accelerate I/O Infrastructure with Spectrum Scale RDMA
12
Source: Mellanox
© Copyright IBM Corporation 2020
Performance with Network Acceleration and CPU/GPU Offloads
13
By removing CPU overhead for packet-processing and data movement tasks, storage performance can be maximized by lowering latencies and delivering best of breed access to storage. Advanced storage offloads include Remote Direct Memory Access (RDMA) and NVMe over Fabric (NVMe-oF).
Source: Mellanox
© Copyright IBM Corporation 2020
ConnectX-6 Dx Ethernet SmartNIC
14
NEWS
© Copyright IBM Corporation 2020
IBM Spectrum Scale „The data foundation“
15
Block
Client workstations
Users and applications
HPCCompute farm
Traditionalapplications
Shared Namespace
Analytics
Transparent
HDFS
OpenStack
Cinder
Glance
Manila
Object
Swift
S3
Powered by IBM Spectrum ScaleAutomated data placement and data migration
Disk Tape Shared Nothing Cluster / ECE
FlashNVMe
Worldwide Data Distribution and collaboration
Site B
Site A
Site C
SMBNFS
POSIX
File
Encryption
DR Site
AFM-DR
JBOD/JBOF
Spectrum Scale RAID
RestAPI
Audit Logging
Transparent Cloud
Tier
Share
Containers
Container
Storage Interface Driver
AFM
Kubernetes
TCT
GUI / Admin Watch Folder
Compression
Immutability
© Copyright IBM Corporation 2020
• Tiering from flash, to disk, to tape, to cloud.• Cloud appears as external storage pool.• Auto Tiering & migration.• High performance Read/Write operations.• Public cloud-ready.• Support of multi cloud environments.
AWS
Azure
Private CloudReplicated
Compressed
Encrypted
IntegrityValidated
Transparent Cloud Tiering
Backup
DR
Tiering
Archive
Datasharing
IBM Cloud
Storage Tiering Architecture
IBM Spectrum Scale (HOT)• File based storage with Object & HDFS support
• High End I/O performance
• Information Lifecycle Management (ILM)
• Sub Micro-seconds access time
IBM Cloud Object Storage (S3) (WARM)• Site Fault Tolerant
• Geo Dispersed and WW scale
• Easy to Deploy
• Milli-seconds access time
IBM Spectrum Archive & Tape (COLD)• Lowest TCO
• Tape ILM target – especially frozen archive
• Long term retention and Minutes access time
• Access as files via LTFS
• Reduced floor space requirements and energy consumption
• Up to 260PB native capacity in a single Tape Library
© Copyright IBM Corporation 2020
Spectrum Scale Active Archive to optimize costs
© Copyright IBM Corporation 2020
IBM Spectrum Archive managing continuous data growth
Datalake Integration with Hadoop/HDFS
▪ IBM Spectrum Scale allows Hadoop applications to access data on
centralized or local storage
▪ Data can also be accessed through NFS, SMB and POSIX
▪ Spectrum Scale Storage can also be shared with other applications
▪ Hortonworks Data Platform (HDP) fully integrates with
IBM Spectrum Scale
• HDP uses best of breed open source Apache Hadoop components
• Fully tested and supported with centralized management GUI (Ambari)
▪ HDP can leverage Spectrum Scale tiering function
facilitate different performance tiers (hot, warm and cold)
▪ HDP supports federation of different data lakes
HDFS RPC
Applications
Higher-level languages:
Hive, BigSQL JAQL, Pig …
MapReduce API
Hadoop File system APIs
HDFS Client
Spectrum Scale HDFS Connector
Global Name Space
NF
S,
PO
SIX
, S
MB
© IBM Corporation 2020 20
Spectrum Scale Reference Customer
https://www.spectrumscaleug.org/wp-content/uploads/2019/12/SC19-Spectrum-Scale-Data-Migration-with-AFM-Nuance.pdf
© IBM Corporation 2020 21
Nuance File Distribution by Size
❑ Spectrum Scale stripes the data/metadata/journal between pooled disks/ssdsin order to achieve scalable performance
❑ File data and metadata is striped on all relevant disks/ssds
❑ Metadata and potentially small writes are first committed to the FS journal (LOG) in order to be able to recover from “short writes”.
© IBM Corporation 2020 22
I/O Software Stack for demanding applications
The software used to provide data model support and to transform I/O to better perform on today’s I/O systems is often referred to as the I/O stack.
© IBM Corporation 2020 23
Data Model I/O libraries supported by Spectrum Scale
Source http://bit.ly/ATPESC-2019-HDF5
© IBM Corporation 2020 24
IBM Data Management Platform – “Stay in Control”
© IBM Corporation 2020 25
Grafana integration with Spectrum Scale
Grafana
▪ talks to backends over REST HTTP API
IBM Spectrum Scale BRIDGE for Grafana
▪ standalone Python application
▪ openTSDB data exchange format
▪ full set of IBM Spectrum Scale supported metrics
▪ communicates with Grafana via port 4242 (default by openTSDB)
© IBM Corporation 2020 26
– Supported LDAP servers:https://www.ibm.com/support/knowledgecenter/en/SSEQTP_liberty/com.ibm.websphere.liberty.autogen.base.doc/ae/rwlp_config_ldapRegistry.html
– More than one LDAP possible
– Test LDAP connectivity (authenticate a
user)
GUI 1 - External LDAP for users Release 5.0.3
© IBM Corporation 2020 27
GUI 2 - Migrate to External Pool
• Migration to external pool
• Provides best practices
• Exclude snapshots and DMAPI files
• Exclude small files
• Exclude files accessed lately
• Exclude migrated files
Release 5.0.3
© IBM Corporation 2020 28
GUI 3 - Management of NFS Exports & SMB Shares
▪ Effective managent of NFS clients
▪ Support for NFSv4 pseudo paths
▪ Improved performance monitoring of NFS exports and SMB Shares
Release 5.0.3
© IBM Corporation 2020 29
GUI 4 - Manage Quotas
• Separate views for each quota type:
• User, Group and Fileset Quotas
• Define both capacity and inode quotas for each type
• Defining and enabling default quotas for user, group, and fileset
• Any other possible quota setting (enable/disable, quota scope, grace time)
Release 5.0.3
© IBM Corporation 2020 30
GUI 5 - Recovery Group Server NodesRelease 5.0.3
▪ New page for displaying details of I/O nodes from a ESS perspective
▪ List all physcial disks that have an active path to selected recovery group node
© IBM Corporation 2020 31
GUI 6 - Declustered ArraysRelease 5.0.3
▪ Show all nodes, pdisks, vdisks, recovery groups ▪ Characteristics of physical disks hardware (Rotating,
SSD or NVRAM) product, vendor, FRU, rotating speed▪ Display capacity details▪ Hide / Show log arrays▪ Background tasks (disk scrubbing, etc)
© IBM Corporation 2020 32
GUI 7 - Edit Sensor Configuration
• Enable/disable Sensor
• Specify data collection interval
• Select nodes where to run the sensor
• Default recommendation for interval and nodes provided
Release 5.0.2
© IBM Corporation 2020 33
Spectrum Scale on AWS
https://aws.amazon.com/de/quickstart/architecture/ibm-spectrum-scale/
© IBM Corporation 2020 34
Spectrum Scale Container Storage Interface (CSI)
Knowledge Center for Spectrum Scale CSI
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.csi.v5r04.doc/bl1csi_kc_landing.html
OperatorHub (Spectrum Scale CSI Operator)
https://operatorhub.io
Quay Images
https://quay.io/ibm-spectrum-scale/ibm-spectrum-scale-csi-driver:v1.0.0
https://quay.io/ibm-spectrum-scale/ibm-spectrum-scale-csi-operator:v1.0.0
Githubs for contribution to our projects or to make the driver/operator yourself
https://github.com/IBM/ibm-spectrum-scale-csi-driver
https://github.com/IBM/ibm-spectrum-scale-csi-operator
Kubernetes listing of all CSI drivers
https://kubernetes-csi.github.io/docs/drivers.html#production-drivers
IBM Spectrum Scale Container Storage Interface Driver
© IBM Corporation 2020 35
Global Namespace / Global Data for Engineers
Source Bosch: Bosch Abstatt, Germany
© IBM Corporation 2020 36
Spectrum Scale Global Data Sharing
— Remote file system mount (cross-cluster-
mount) allows to share files between site
synchronously at high speeds
— IBM Spectrum Scale Active File Management
(AFM) allows to share files asynchronously
between sites
• Files are globally visible and only locally present
when accessed or pre-fetched
• Tolerates reliability and latency of WAN connections
— IBM Aspera can be used for efficient long-
distance file transfer
Global Name Space
Spectrum Scale
Spectrum Scale
Spectrum Scale
NFS, SMB, POSIX, Swift, S3, HDFS
Active file management
Spectrum Scale
Remote mount
Global Name Space
NFS, SMB, POSIX, Swift, S3, HDFS
Global Name Space
NFS, SMB, POSIX, Swift, S3, HDFS
File transfer
Spectrum Scale
AFM (WAN Caching)
© IBM Corporation 2020 37
Demo: Aspera Async vs Linux Rsync – (Thanks to Nils)
Storage
Network
A
Async replication
/gpfs/gpfs_src
A Aspera sync
Storage
Network
A
/ibm/userfs
San Francisco, USA Kelsterbach, Germany
Multiple IBM firewalls and Internet
Fileset size Async Rsync Speedup
256 MB fileset 48 s 4m 8s 5x
512 MB fileset 1m 16s 9m 41s 7x
1024 MB fileset 2m 7s 27m 10s 12x
2048 MB fileset 3m 55s 56m 57s 14x
Comparing rsync with IBM Aspera async
Spectrum Scale ClusterSpectrum Scale Cluster
0
100
200
300
400
500
600
0
2
4
6
8
10
12
14
16
64M 128M 256M 512M 1024M 2048M
Seconds
Multip
lier
Improvement Async Std. dev Rsync Std. dev.
rsync transfer times can differ significantly
async transfer times remain consistent
Speed improvements increase with transfer size
Aspera sync provides• Speed• Predictability of transfer times• Ability to move immutable files
© IBM Corporation 2020 38
Spectrum Scale Cloud data exchange via TCT
— With IBM Spectrum Scale Transparent Cloud Tiering files are
copied to Object Storage (S3)
• Object storage can be on or off-premises
• Mapping of files and objects is included (manifest)
— Objects can be imported as files in other TCT instance (cluster)
• Based on file to object mapping (manifest)
• Import creates stub-files, does not transfer data
• Objects are transferred upon access or pre-fetch operation
— No global locking, last writer wins
Global Name Space
Global Name Space Global Name Space
TCT
TCT TCT
Export
Import
© IBM Corporation 2020 39
Multicloud Spectrum Scale Transparent Cloud Tiering (TCT)
Cloud Object StorageProvider 2
Cloud Service Node Class Cloud Service Node Class
Cloud Object StorageProvider 1
Cloud Account
Node3Node 5 Node 7
Node 8 Node 9
CES Protocol nodes
NSD Storage nodes
File SystemSpectrum Scale
Cluster
File SetFile System
Node1
Cloud ServiceNode
Cloud ServiceNode
Cloud ServiceNode
Cloud ServiceNode
Node 6Node 4Node 2
Container
Cloud Client
Container
Cloud ServiceNode
Cloud ServiceNode
Cloud Account
Container
Cloud Client
Cloud Client
Cloud Account
Container
Node 1
Remote Mounted
Cloud Client
Spectrum Scale Cluster
© IBM Corporation 2020 40
Learn more IBM Aspera & Spectrum Scale
IBM Redpaper describes the integration of IBM Aspera
sync with IBM Spectrum Scale
• Describes Aspera sync tools (async and ascp) and
important command line options
• Describes the integration of Aspera sync with Spectrum
Scale policy engine
• Describes use cases
• Differentiates the solution from Spectrum Scale AFM
https://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/redp5527.html
© IBM Corporation 2020 41
A hosted service that enables organizations to securely and
reliably move large files and data sets across on-premises
and hybrid cloud environments at unrivaled speed.
On-Premises Data Centers
Private Cloud
IBM Aspera on Cloud
IBM Aspera on Cloud
Cloud &Hybrid Multicloud
42
© Copyright IBM Corporation 2020
Spectrum Scale Deployment Strategy 2020 (#1)
43
Unified Installation and Configuration through reusable Ansible playbooks
© Copyright IBM Corporation 2020
Spectrum Scale Deployment Strategy 2020 (#2)
44
Cluster and Resource configuration that can be stored, versioned and replayed
© Copyright IBM Corporation 2020
Spectrum Scale Deployment Strategy 2020 (#3)
45
Stored Cluster and Infrastructure configuration can be used to replicatevalidated and tested configurations from existing clusters to new ones
© Copyright IBM Corporation 2020
Spectrum Scale Cloud: Cloud DevOps Deployment (#1)
46
Containerized installation and deployment on any Cloud
© Copyright IBM Corporation 2020
Spectrum Scale Cloud: Cloud DevOps Deployment (#2)
47
Containerized installation and deployment on any Cloud
© Copyright IBM Corporation 2020
Spectrum Scale Cloud: Cloud DevOps Deployment (#3)
48
Cloud Agnostic, infrastructure provisioning through TerraformCloud Agnostic, installation and configuration through Ansible
© Copyright IBM Corporation 2020
Spectrum Scale Cloud: Cloud DevOps Deployment (#4)
49
© Copyright IBM Corporation 2019© IBM Corporation 2020 50
What is the difference between Bosch Green and Bosch Blue?
https://www.coolblue.be/en/advice/bosch-green-vs-bosch-blue.html
ECE = Erasure Code Edition ESS = Elastic Storage System
Source:
© Copyright IBM Corporation 2019© IBM Corporation 2020 51
Hardware Architecture Comparison for Spectrum Scale
ESS• Twin-tailed disks, dual servers – provide very high availability• However, in case when a failure of both the master and
backup servers happens it results in data unavailability
ECE• Network RAID Internal disk rich commodity servers• Tolerates concurrent failure of an arbitrary pair of servers
(or 3 servers if 8+3p erasure code) and disks
DYI - “Green” Pro - “Blue”
© Copyright IBM Corporation 2019© IBM Corporation 2020 52
Erasure Encoding Edition overview
• Servers have internal JBOD disk – At least 4 servers required– Each server must have at least 4 disk– Plus one disk NVMe or SSD drive
• Servers are connected with high speed network
• Data blocks are erasure encoded and stored across all servers– Number of servers determines RAID code
• If server fails data remain available
Data Block
Tracks
© Copyright IBM Corporation 2019© IBM Corporation 2020 53
Spectrum Scale RAID codes
• Two types of RAID with 2-fault and 3-fault tolerant codes – 3 or 4 way replication– 8 + 2P or 8 + 3P, with ECE additionally 4 + 2P or 4 + 3P
3-way Replication (1+2)4 or 8 + 2p Reed Solomon2-fault
tolerant
codes
3-fault
tolerant
codes
1 strip
(GPFS
block)
2 or 3
replicated
strips
4-way Replication (1+3)
8 strips
(GPFS block)
2 or 3
redundancy
strips
4 or 8 + 3p Reed Solomon
© Copyright IBM Corporation 2019© IBM Corporation 2020 54
Software Architecture / Failover Scenario
ESSECE4U106 Storag
e
4U106 Storag
e
4U106 Storag
e
4U106 Storag
e
© Copyright IBM Corporation 202059
the fastest servers in the world are the world’s
slowest servers if they are waiting for data
ALL GPUS & CPUS WAIT AT THE SAME SPEED
Composable to grow as needed
• Up to 9 DGX-1 or 3 DGX-2 servers in a rack
• Scale-out storage from a single 300TB node to Yottabyte of data (8 Exabytes/single filesystem)
High-Performance to feed the GPUs
• NVMe throughput of 120GB/s in a rack
• >40GB/s sustained random read per 2U
Converged Solution for Data Science Productivity
Extensible for the AI Data Pipeline
• Support for any tiered storage,
including Cloud and Tape
Introducing IBM Spectrum Storage for AI
with NVIDIA DGX Systems
A Scalable, software-defined infrastructure powered by IBM Spectrum Scale and NVIDIA DGX-1 and DGX-2 systems. IBM Spectrum Storage for AI with NVIDIA DGX Systems is a powerful engine for your data pipeline.
The workhorse of an AI data infrastructure on which companies can build their shared data service.
IBM Elastic Storage System 3000
The best Storage for Autonomous Driving deploy IBM Spectrum Scale for AI training and re-simulation
Co-hosting NVIDIA CEO, Jensen Huang, in the IBM booth at SC19 in Denver with Sam Werner, Eleanor Lewin, ….
IBM Spectrum Storage for AI with NVIDIA DGX Systems
High-performance, highly available, parallel file system software equipped to address the capacity and
performance requirements of AI/ML/DL
NVIDIA DGX
IBM Elastic Storage System 3000
• New integrated system powered by NVMe flash technology and IBM Spectrum Scale
• Combines the performance of Non-Volatile Memory Express (NVMe), reliability and the leading software-defined file storage
NVIDIA DGX-1 or DGX-2 systems, storage, networking, and NVIDIA AI
software to support workgroups
IBM Spectrum Storage for AI with NVIDIA DGX-1 Architectures
One DGX-1 system to one ESS 3000 configuration with one Mellanox switch
Four DGX-1 systems in a two ESS 3000 configuration with two Mellanox switches
Nine DGX-1 systems in a three ESS 3000 configuration with two Mellanox switches
Reference ArchitectureIBM ESS 3000
IBM Spectrum Storage for AI with NVIDIA DGX-2 Architecture
3 DGX-2 systems + 3 ESS 3000s
Reference Architecture
© Copyright IBM Corporation 2020
IBM Spectrum Storage for AI with Nvidia DGX Systems
65
Reference architecture: https://www.ibm.com/downloads/cas/MNEQGQVPSolution brief: https://www.ibm.com/downloads/cas/QMPXQV1B
IBM Spectrum Storage for AI with Dell DSS 8440
High-performance, highly available, parallel file system software equipped to address the capacity and
performance requirements of AI/ML/DL
Dell DSS 8440
IBM Elastic Storage System 3000
• New integrated system powered by NVMe flash technology and IBM Spectrum Scale
• Combines the performance of Non-Volatile Memory Express (NVMe), reliability and the leading software-defined file storage
https://blog.dellemc.com/en-us/dell-emc-dss-8440-dynamic-machine-learning-server/
The DSS 8440 is a system optimized to support AI applications. It is a combination of Dell Technologies and NVIDIA technology: Dell Technologies servers with NVIDIA Tesla V100 Tensor Core GPUs.
© Copyright IBM Corporation 2020
IBM Spectrum Storage for AI with NVIDIA DGX ArchitecturesI/O Performance at Scale
• Grow capacity in a cost-effective, modular approach
• Each config delivers balanced performance, capacity and scale
• IBM Spectrum Scale supports IB RDMA or RoCE Ethernet
I/O Scaling with NVIDIADGX-1
© Copyright IBM Corporation 2019
IBM Spectrum Storage for AI with NVIDIA DGXThroughput Scaling
Near Linear Scaling by
adding 40GB/s per 2U
appliance
No need for downtime or
reconfiguration
Best in class throughput
Much faster than NFS based
solutions
Native RDMA verbs support
© Copyright IBM Corporation 2019
IBM Spectrum Scale v5 on NVMe Flash
Sequential Read
Random Read
Delivering the random
read throughput
needed for AI.
Sustained random read
performance about 40 GB/s in
2U NVMe array
Sequential Reads take
advantage of prefetch for
highest peak performance
As workload expands, all data
access patterns become
random
© Copyright IBM Corporation 2020
High bandwidth, low latency, and CPU offloads
https://www.ibm.com/downloads/cas/MNEQGQVP
© Copyright IBM Corporation 2020
NVIDIA Collective Communications Library (NCCL)Massively Scale Your Deep Learning Training
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that are optimized to achieve high bandwidth over high-speed interconnects.
IBM AC922Nvidia
https://developer.nvidia.com/nccl
© Copyright IBM Corporation 2020
Various Approaches for Distributed Training using TensorFlowReduce training time for deep neural networks by using many GPUs
Horovod offers fast and easy distributed deep learning in TensorFlow. The “data parallel” approach to distributed training involves splitting up the data and training on multiple nodes in parallel.
TensorFlow is an open-source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them.
© IBM Corporation 2020 73
Spectrum Scale FAQ & Redbooks
Contact mailto:scale@us.ibm.comif you need more info.
https://www.ibm.com/support/knowledgecenter/STXKQY/ibmspectrumscale_welcome.html
(Coming soon!)
IBM Spectrum Scale
Performance: remove data-related bottlenecks • with a parallel, scale-out solution
• 2.5TB/s demonstrated throughput
Ease of management: enable global collaboration • with unified storage and global namespace
• Data Lake serving HDFS, files and object across sites
Economics: optimize cost and performance • with automated data placement
• thin-provisioning preview and TRIM support, QOS on project preview
Robust: ensure data availability, integrity and security • with erasure coding, replication, snapshots, and encryption
• end-to-end checksum, Spectrum Scale RAID, NIST/FIPS certification
Highly scalable high-performance unified storage for files and objects with integrated analytics
Recommended