A match made in Userspace heaven: ISA-L, DPDK Framework, & SPDK Virtual Block DevicesGreg Tucker, ISA-L Technical Lead
Prital Shah, ISA-L Product Manager
AgendaISA-L Update
ISA-L Compression
DPDK Acceleration Framework
SPDK VBDEV Crypto POC
SPDK BDEV Examples
2
3
Intelligent Storage Acceleration Library(ISA-L)
Pure assembly library
Operating System Agnostic
Future Proof & Backward Compatible
Free and Open Source
Performance
Linux
MAC
WIN
BSD
ISA-L Functions & Integration
Data Integrity
Hashing
Compression Fast deflate
Released
1H‘18
Multi hash
Integration
Swift
Ceph
Multi-buffer hash HDFS
DPDK FW
Spark SQL
SHA1
Data Protection
Crypto 128, 256 bit
RS-EC encode/decode
Fast inflate
AES_GCM
RAID5,6
AES_XTS AES_CBC
Rolling hash
SHA256 SHA512 MD5 MH_SHA1 MH_SHA256MH_SHA1+ murmur
Level 0 : Static Huffman
Level 1,2,3Fully Dynamic
GATK
CRC 16 bit 32 bit 64 bit
ISA-L Architecture Updates
Data Integrity
Hashing
Compression Fast deflate
Multi hash
Integration
Swift
Ceph
Multi-buffer hash HDFS
DPDK FW
Spark SQL
SHA1
Data Protection
Crypto 128, 256 bit
RS-EC encode/decode
Fast inflate
AES_GCM
RAID5,6
AES_XTS AES_CBC
Rolling hash
SHA256 SHA512 MD5 MH_SHA1 MH_SHA256MH_SHA1+ murmur
Level 0 : Static Huffman
Level 1,2,3Fully Dynamic
GATK
CRC 16 bit 32 bit 64 bit
Released
1H‘18
Arch updates
ISA-L New Functions
Data Integrity
Hashing
Compression Fast deflate
Multi hash
Integration
Swift
Ceph
Multi-buffer hash HDFS
DPDK FW
Spark SQL
SHA1
Data Protection
Crypto 128, 256 bit
RS-EC encode/decode
Fast inflate
AES_GCM
RAID5,6
AES_XTS AES_CBC
Rolling hash
SHA256 SHA512 MD5 MH_SHA1 MH_SHA256MH_SHA1+ murmur
Level 0 : Static Huffman
Level 1,2,3Fully Dynamic
GATK
CRC 16 bit 32 bit 64 bit
Released
1H‘18
New functions
T10 diff + Copy
ISA-L Future Improvements
Data Integrity
Hashing
Compression Fast deflate
Multi hash
Integration
Swift
Ceph
Multi-buffer hash HDFS
DPDK FW
Spark SQL
SHA1
Data Protection
Crypto 128, 256 bit
RS-EC encode/decode
Fast inflate
AES_GCM
RAID5,6
AES_XTS AES_CBC
Rolling hash
SHA256 SHA512 MD5 MH_SHA1 MH_SHA256MH_SHA1+ murmur
Level 0 : Static Huffman
Level 1,2,3Fully Dynamic
GATK
CRC 16 bit 32 bit 64 bit
Released
1H‘18
Future Work
9
ISA-L Data Compression – Fast Deflate
DEFLATE (aka zlib, gzip, pkzip, etc)• Lossless compression• Ubiquitous adoption
Fast Compression • Novel, fully zlib-compatible implementation• Level 0-3 optimized for throughput
Optimized Decompression• Any level - Fully compatible with zlib and gzip archives
Compression of silesia corpus§ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,including the performance of that product when combined with other products.§ Configurations: see slide 50.§ For more information go to www.intel.com/benchmarks
11
isal_2.22.0 -0
brotli_2017-12-12 -0
lz4_1.8.0 snappy_1.1.4
zlib_1.2.11 -1
zstd_1.3.3 -1
1
1.5
2
2.5
3
3.5
0 50 100 150 200 250 300 350
Compression Speed vs RatioCalgary Corpus - Intel® Xeon® Platinum 8180 2.5GHz
Throughput in MB/s
Co
mp
ress
ion
Ra
tio
(ori
gin
al/
com
pre
sse
d)
§ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,including the performance of that product when combined with other products.§ Configurations: see slide 50.§ For more information go to www.intel.com/benchmarks
Be
tte
r co
mp
ress
ion
Faster
Co
mp
ress
ion
Ra
tio
(ori
gin
al/
com
pre
sse
d)
Throughput in MB/sAll results collected by Intel Corporation.Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software designor configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and onthe performance of Intel products, visit Intel Performance Benchmark Limitations (http://www.intel.com/performance/resources/benchmark_limitations.htm).Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including theperformance of that product when combined with other products. For more information go to http://www.intel.com/performance
zlib_1.2.11 -9
zlib_1.2.11 -1
isal_2.22.0 -1
isal_2.22.0 -0
isal_2.22.0 -z9
1
1.5
2
2.5
3
3.5
0 100 200 300 400 500 600
Decompression Throughput vs Compression RatioSilesia Corpus - Intel® Xeon® Platnum 8180 2.5GHz
Faster
Be
tte
r co
mp
ress
ion 2.2 X
13
14
Storage Workload AccelerationSoftware and Hardware-based Approaches and Trade-offs
Software-based Approaches Fixed-function, Reconfigurable Approaches
ISA-L (SIMD) ASIC (QAT) FPGA GPGPU
Application Flexibility
Ea
se o
f P
rog
ram
min
g
Distance from CPU Core
La
ten
cy, G
ran
ula
rity
On-core
On-chip
QPI-attach
PCIe-attach
Fixed-function
Reconfigurable
CPU
DPDK Framework
Generic APIs
Application is abstracted from the underlying SW and HW with DPDK
Preserve Platform and Application software investment
Optimized platform software ingredients (e.g. vSwitch) to take advantage of HW and SW ingredients
Flexible and outstanding performing data plane
Accelerators (Intel® QAT)
DPDK Framework
Application
Smart NIC Integrated FPGA
UPI
Optimized Platform Software
Application Abstracted from Platform
Optimized Software on CPU ISA (e.g. AES, AVX)
DPDK Support for Accelerators
16
Compression
Application
Algorithm Definitions
Operation APIs
Session APIs
Burst APIs Stats APIsCapabilities
APIs
rte_compressdev APIs
QAT ISA-L Zlib
Compression PMDs
Similar to cryptodev, a new API (compressdev) will be created to support hardware and software accelerators for compression.
For hardware, this will support Intel® QuickAssist Technology.
For software, this will support compression capabilities that exist within the Intel® Storage Acceleration Library (ISA-L), and also a zlibimplementation.
The first version is planned for release 18.05.
SPDK Encrypt BDEV
SPDK
NVMe
NVMe-oFTarget
NVMe Driver
BDEV
NVMeBD
SSD for Datacenter
Work in progress – patch in review
AES_CBC 128, 3DES
Multi-buffer AES-NI PMD or QATBDEV
Crypto vpdev
DPDK cryptodev
Released
In progress
NVMe-oFInitiator
BDEV
NVMeoFBD
NVMe-oFTarget
18
SPDK Virtual BDEVPerfect place to add storage algorithms
SPDK
NVMe
NVMe-oFTarget
NVMe Driver
BDEV
NVMeBD SSD for
Datacenter
BDEV enables stackable SW
BDEV provides abstraction for storage solutions to be inserted
Storage Services can be:
• Encryption
• Compression
• Dedup
• Analytics
BDEV
Storage Algo
StorageServices
Released
Pathfinding
3rd Party
NVMe-oFInitiator
BDEV
NVMeoFBD
NVMe-oFTarget
SPDK Cryptographic Hash BDEV
SPDK
NVMe
NVMe-oFTarget
NVMe Driver
BDEV
NVMeBD SSD for
Datacenter
Employ ISA-L Multi-buffer hashing
BDEV
HashDedup/
Data integrity
Released
Proposed
3rd Party
NVMe-oFInitiator
BDEV
NVMeoFBD
NVMe-oFTarget
Intel® Xeon® Platinum 8180 Processor @ 2.5 GHz 1 Socket
ISA-L
Cycle/Byte Performance
(lower is better)
Single Core Throughput
(higher is better)
CRC 64bit 0.20 12 GB/s
Rolling Hash 64 bit 2.54 985 MB/s
Multihash SHA-1 0.41 5.8 GB/s
Multihash SHA-1+Murmur 0.63 3.8 GB/s
Multihash SHA-256 0.82 2.9 GB/s
Multibuffer SHA-1 0.45 5.4 GB/s
Multibuffer SHA-256 0.88 2.7 GB/s
Multibuffer SHA-512 1.07 2.2 GB/s
Multibuffer MD5 0.25 9.7 GB/s
SPDK RAID BDEV
SPDK
NVMe
NVMe-oFTarget
NVMe Driver
BDEV
NVMeBD
SSD for Datacenter
Employ ISA-L RAID calculations
BDEV
RAID
Released
Proposed
Intel® Xeon® Platinum 8180 Processor @ 2.5 GHz 1 Socket
ISA-L (cold tests) ISA-L (warm tests)
Cycle/Byte Performance
(lower is better)
Single Core Throughput
(higher is better)
Cycle/Byte Performance
(lower is better)
Single Core Throughput
(higher is better)
RAID5 XOR 0.11 23.2 GB/s 0.03 88 GB/s
RAID6 PQ 0.11 22.4 GB/s 0.05 47 GB/s
NVMe Driver
BDEV
NVMeBD
…
SPDK Erasure code BDEV
SPDK
NVMe
NVMe-oFTarget
NVMe Driver
BDEV
NVMeBD
SSD for Datacenter
Employ ISA-L erasure coding
BDEV
EC
Released
Proposed
NVMe-oFInitiator
BDEV
NVMeoFBD
NVMe-oFTarget
Intel® Xeon® Platinum 8180 Processor @ 2.5 GHz 1 Socket
ISA-L
Cycle/Byte Performance
(lower is better)
Single Core Throughput
(higher is better)
Reed Solomon EC (10+4) 0.19 12.6 GB/s
SPDK Data Compression Example
SPDK
SSD for Datacenter
Employ ISA-L Fast Deflate
blobfs
Compress extent
Released
Proposed
Intel® Xeon® Platinum 8180 Processor @ 2.5 GHz 1 Socket
ISA-L
Cycle/Byte Performance
(lower is better)
Single Core Throughput
(higher is better)
Compression ratio
(lower is better)
Deflate - 1 7.18 348 MB/s 36.85 %
Inflate - 1 5.27 475 MB/s “
NVMe Driver
BDEV
NVMeBD
blobstore
ApplicationCompress
SPDK Encrypt BDEV
SPDK
NVMe
NVMe-oFTarget
NVMe Driver
BDEV
NVMeBD SSD for
Datacenter
Employ ISA-L Block Ciphers
BDEV
Block Cipher Key Manage
Released
Proposed
NVMe-oFInitiator
BDEV
NVMeoFBD
NVMe-oFTarget
Intel® Xeon® Platinum 8180 Processor @ 2.5 GHz 1 Socket
ISA-L
Cycle/Byte Performance
(lower is better)
Single Core Throughput
(higher is better)
AES-XTS 128 0.69 3.5 GB/s
AES-XTS 256 0.92 2.6 GB/s
AES-CBC 128 Decode 0.66 3.6 GB/s
AES-CBC 192 Decode 0.77 3.1 GB/s
AES-CBC 256 Decode 0.89 2.7 GB/s
AES-GCM 128 0.69 3.5 GB/s
AES-GCM 256 0.90 2.7 GB/s
25
Summary
Don’t be afraid to touch data!
• ISA-L has the throughput to hit many storage use cases.
• Data compression not a huge obstacle. Options with SW or HW acceleration.
• VBDEVs are a great place to put storage algorithms.
For more info on ISA-L see https://github.com/01org/isa-l/wiki
26
Backup
Units of Measurement
Cycles/Byte
Throughput (MB/s, GB/s)
Calgary Corpus Weighted Ave
Compression Ratio
Intel® ISA-L Performance Overview
CPU Gen over Gen Performance
Intel® Xeon® processor generation over generation performance metrics
Functional Library Comparisons(performance vs. other libraries available)
ISA-L 2.22
OpenSSL 1.0.2n
zlib 1.2.11
27
Intel® ISA-L Performance OverviewPlatform configuration details
BIOS Configuration
P-States: Disabled
Turbo: Disabled
Speed Step: Disabled
C-States: Disabled
Power Performance Tuning: Disabled
ENERGY_PERF_BIAS_CFG: PERF
Isochronous: Disabled
Memory Power Savings: Disabled
28
Intel® AtomTM Processor C3000
C3958, 16C, 2.0 GHz, C0
Ostrich Bay CRB
2x16 GB DDR4 2400 MT/s ECC RDIMM
Intel® Xeon® Processor Scalable Family
Platinum 8180 Processor, 28C, 2.5 GHz, H0
Neon City CRB
6x16 GB DDR4 2666 MT/s ECC RDIMM
Intel® Xeon® Processor E5-2600v4
E5-2650v4, 12C, 2.2 GHz, M0
Aztec City CRB
4x8 GB DDR4 2400 MT/s ECC RDIMM
Intel® Xeon® Processor D-1500
D-1541, 8C, 2.1 GHz, V2
Camelback Mountain CRB
2x8 GB DDR4 2400 MT/s ECC RDIMM
Intel® Xeon® Processor D-2183IT
D-2183IT, 16C, 2.2 GHz, M1
Yuba City CRB
4x8 GB DDR4 2400 MT/s ECC RDIMM
Intel® AtomTM Processor C2000
C2750, 8C, 2.4 GHz, B0
Mohon Peak CRB
2x8 GB DDR3 1600 MT/s ECC RDIMM
Loop to Reduce Timer Latencies and Transients
Start timer
Iterate over data set
Stop timer
Report total bytes processed/time
Turbo Off for Repeatability
Cold Cache Tests
Pick large data set by default (larger than last-level cache)
Ensures memory fetch/put included
Intel® ISA-L Performance Tests
29
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
Some results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance..
Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804.
The benchmark results may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and workloads utilized in the testing, and may not be applicable to any particular user's components, computer system or workloads. The results are not necessarily representative of other benchmarks and other benchmark results may show greater or lesser impact from mitigations.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.
The cost reduction scenarios described are intended to enable you to get a better understanding of how the purchase of a given Intel based product, combined with a number of situation-specific variables, might affect future costs and savings. Circumstances will vary and there may be unaccounted-for costs related to the use and deployment of a given product. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs or cost reduction.
No computer system can be absolutely secure.
© 2018 Intel Corporation. Intel, the Intel logo, Xeon and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Notices and Disclaimers
30