Upload
internet-world
View
710
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
2
STORAGE and PERFORMANCE
Darren WilliamsTechnical Director, EMEA & APAC
3
3 TB SQL – 17 k IOPS
And
Batch – 20 k IOPS
And
OLTP – 10 k IOPS
And…
VDI
HPC
Analytics
OLTP
Database
THE PROBLEM WITH PERFORMANCE
Accelerate Workloads
DecreaseCosts--------
Storage Decisions
-Accelerate Productivity
Resources
Workload11k IOPS0% Write
13k IOPS25% Write
17k IOPS80% Write
A “More Assets” Problem
Space
Energy
Personnel
96 drivesOr
more discsOr
more cacheOr
more arrays
72 drivesOr
discsOr
cacheOr
arrays
60 drives
-3 TB
3 TB
Speed
Productivity
Total Costs
A Demand Solution
Resources
Workload
12 TB
-Scale
-Total Costs
Batch
Video
4
Speed
Design
• 10s of MB/s Data Transfer Rates
• 100s of Write / Read operation per second
• .001s Latency (ms)
• Motors• Spindles• High Energy
Consumption
SINCE 1956, HDDS HAVE DEFINED APPLICATION PERFORMANCE
5
Speed
Design
• 100s of MB/s data transfer rates
• 1000s of Write or Read operations per second
• .000001 Latency (µs)
• Silicon• MLC/SLC NAND• Low energy
consumption
FLASH ENABLES APPLICATIONS TO WRITE FASTER
6USE OF FLASH – HOST SIDE – PCIE / FLASH DRIVE DAS
• PCIe – Very fast and low latency– Expensive per GB– No redundancy– CPU/Memory stolen from host
• Flash SATA/SAS– More cost effective– Cant get more than 2 drives per blade– Unmanaged can have perf / endurance issues
6
7USE OF FLASH – ARRAY BASED CACHE / TIERING
• Array flash cache– Typically read only– PVS already caches most reads– Effectiveness limited by storage array designed for hard
disks
• Automated storage tiering– “Promotes” hot blocks into flash tier– Only effective for READ– Cache misses still result in “media” reads
7
8USE OF FLASH – FLASH IN THE TRADITIONAL ARRAY
• Flash in a traditional array– Typically uses SLC or eMLC media– High cost per GB– Array is not designed for flash media– Unmanaged will result in poor random write
performance– Unmanaged will result in poor endurance
8
9USE OF FLASH – FLASH IN THE ALL FLASH ARRAY
• Optimized to sustain High Write and Read throughput
• High bandwidth and IOPS. Low latency.• Multi-protocol• LUN Tunable performance• Software designed to enhance lower cost NAND
MLC• Flash by optimizing High Write throughput while
substantially reducing wear• RAID protection and replication
10
RACERUNNER OS
11
4K data blocks
Rewritten data block
A physical HDD is a bit-addressable medium! Virtually limitless write and rewrite capabilities.
NAND FLASH FUNDAMENTALS:HDD WRITE PROCESS REVIEW
12STANDARD NAND FLASH ARRAY WRITE I/O
Fabric
ISCSI FC SRP
Unified Transport
NAND Flash x8
NAND Flash x8
NAND Flash x 8
HBA HBA HBA
RAID
2. Write request passes through the transport stack to RAID.
1. Write request from host passes over fabric through HBAs.
3. Request is written to media.
13
2MB NAND Page
1. NAND Page contents are read to a buffer.
2. NAND Page is erased (aka, “flashed”).
3. Buffer is written back with previous data and any changed or new blocks – including zeroes.
NAND FLASH FUNDAMENTALS:FLASH WRITE PROCESS
14UNDERSTANDING ENDURANCE/RANDOM WRITE PERFORMANCE Endurance
Each cell has physical limits (dielectric breakdown) 2K-5K PE’s Time to erase a block is non-deterministic (2-6 ms) Program time is fairly static based on geometry Failure to control write amplification *will* cause wear out in a
short amount of time Desktop workload is one of the worst for write amplification Most writes are 4-8KB
• Random Write Performance– Write amplification not only causes wear out issues, it also
creates unnecessary delays in small random write workloads.– What is the point of higher cost flash storage with latency
between 2-5ms?
14
15
SRP
NAND SSD x 8
RaceRunnerBlockTranslation Layer:
Alignment | Linearization
RACERUNNER OS:DESIGN AND OPERATION
Fabric
iSCSI
Unified Transport
NAND SSD x 8
HBA HBA HBA
2. Write request passes through the transport stack to BTL.
1. Write request from host passes over fabric through HBAs.
4. Request is written to media.
Data integrity Layer
Enhanced RAID
3. Incoming blocks are aligned to native NAND page size.
NAND SSD x 8
FC
16THE DATA WAITING DAYS ARE OVER
INVICTA
2-6 Nodes6TB-72TB
650,000 IOPS7GB/s Bandwidth
INVICTA – INFINITY (Q1/13)7-30 Nodes21TB-360TB
800,000 – 4 Million IOPS40GB/s Bandwidth
Scalability Path
ACCELA1.5TB – 12TB250,000 IOPS
1.9 GB/s Bandwidth
17THE DATA WAITING DAYS ARE OVERACCELA INVICTA INVICTA INFINITY
Height 2U 6U-14U 16U-64U
Capacity 1.5TB-12TB 6TB-72TB 21TB-360TB
IOPS Up to 250K 250K – 650K 800K – 4M
Bandwidth Up to 1.9GB/Sec Up to 7GB/Sec Up to 40GB/Sec
Latency 120µs 220µs 250µs
Interfaces 2/4/8 Gbit/Sec FC1/10 GBE Infiniband
Protocols FC, ISCSI, NFS, QDR
Features RAID Protection & Hot SparingAsync Replication
VAAIWrite Protection Buffer
RAID Protection and Hot SparingLUN Mirroring and LUN Striping
Async ReplicationVAAI
Write Protection BufferOptions vCenter Plugin/INVICTA Node
KitvCenter
Plugin/INFINITY Switch Kit
vCenter Plugin
18MULTI-WORKLOAD REFERENCE ARCHITECTURE
Dell DVD StoreMS SQL Server
1200 Transactions Per Second (Continuous)
4,000 IOPS.05 GB/s
VMWare View
600 Desktops Boot Storm (2:30)
109,000 IOPS.153 GB/s
SQLIOMS SQL Server
Heavy OLTP Simulation100% 4K Writes (Continuous)
86,000 IOPS.350 GB/s
Batch Report Simulation 100% 64K Reads (Continuous)
16,000 IOPS1 GB/s
Workload Engines
• INVICTA • 350,000 IOPS• 3.5 GB/s • 18 TB
• 8 Servers
Workload Type Workload Demand
215,000 IOPS1.553 GB/s
Mercury
Raid 5 HDD Equivalent = 3,800RAID 10 HDD Equivalent = 2,000
In 2012 Mercury traveled to Barcelona, New York, San Francisco, Santa Clara, and Seattle demonstrating the ability to accelerate multiple workloads on to Solid State Storage.
19FASTER DATABASE BENCHMARKING
$13,000 Power Cost Reduction, 35U to 2U
AMD’s systems engineering department needed to bring various database workloads up quickly and efficiently in the Opteron LabEliminate the time spent performance tuning disk-based storage systems
Replaced 480 Short-Stroked Hard Disk Drives with one 6 TB WHIPTAIL Array supporting multiple storage protocols
50x reduction in Latency
40% improvement in database load times
Engineering team improved workload cycle times
20WHAT WHIPTAIL CAN OFFER:
• Performance
• Cost
Highly experienced - 250+ customers since 2009 for VDI, Database , Analytics etc…
Best in class performance at most competitive price
IOPS ……………… 250K – 4m
Throughput ….. 1.9GB/s – 40GB/s
Latency …………. 120µs
Power ……………. 90% less
Floor Space ……. 90% less
Cooling ………….. 90% less
Endurance ……. 7.5yrs Guaranteed
Making Decision faster …. POA
21Q&A
Email: [email protected]