Upload
ontico
View
682
Download
5
Embed Size (px)
DESCRIPTION
Доклад Петра Зайцева на HighLoad++ 2014.
Citation preview
Peter Zaitsev,CEO, Percona
November 1, 2014Highload++ 2014
Moscow,Russia
SSD/Flash for Modern Databases
www.percona.com2
Percona
• Percona Server• Percona Xtrabackup• Percona XtraDB Cluster• Percona Toolkit
We love Open Source
Software
• Consulting • Support • Managed Services
We want to help you to
succeed with MySQL and
Beyond
www.percona.com3
In this Presentation
Flash technology overview
Review some of the available technology
What does this mean for databases ?
Specific opportunities for MySQL
www.percona.com4
Before SSDs
www.percona.com5
There were HDDs
Good at Sequential Read/Writes
RT=Seek Time + Rotation Latency
Reads/Write – Similar Latency
No Specific Write Limits
Retain data for a long time
One IO Request in Parallel
Low cost per GB
www.percona.com6
RAID and SAN
www.percona.com7
Using Many HDDs together
Caching Reads
Buffering Writes (Writeback Cache)
Better Sequential Read/Write speed
Better throughput at high concurrency
Higher IO latencies for uncached IO
www.percona.com8
Flash Revolution
Use Flash chips instead of platte
rs
No moving part
s
No seeks
www.percona.com9
NAND Flash
Cell
Page/Read Block
Erase Block
Write but no overwrite
Wears with writes (erases)
www.percona.com10
Writing to the Flash
•Set all bits to “1111111…”Erase•Set some of the bits to 0: “0100111..”Write•Impossible. Do Erase, when Write
Change Zero to one
www.percona.com11
Types of NAND Flash
From AnandTech:
www.percona.com12
Flash Storage Design
Cache
Battery/Super Capacitor
Controller + Complex Firmware
Built-in Parallelism
www.percona.com13
Flash Controller Tasks
Write wear leveling
Garbage collection
Error correction
Bad block mapping
Read scrubbing
Read disturb management
Encryption
www.percona.com14
Flash Properties
Lots of IOs per device! (100K+)
Less random IO penalty
Writes more expensive than reads (but can be faster)
Limited by amount of writes
Limited retention
Concurrent execution on single device
Fast write acknowledgement (safe or not)
Can burst writes
www.percona.com15
Flash Interface Designs
DIMM
PCI-E
SFF-8639
SATA/SAS
FC and Network
www.percona.com16
Transitioning
AHCI NVMe
www.percona.com17
AHCI vs NVMe
• Source: AnandTech.com
www.percona.com18
Sandisk ULLtraDIMM
www.percona.com19
HGST Virident
www.percona.com20
Sandisk FusionIO
www.percona.com21
Intel P3700
www.percona.com22
Intel 730 (SATA)
www.percona.com23
mSATA
www.percona.com24
M.2 Interface
www.percona.com25
Violin Memory
www.percona.com26
“Consumer” vs “Enterprise”
Performance
Endurance
Durability
Retention
Encryption
www.percona.com27
Not your HDD
All HDDs are the same; All SSDs are different
www.percona.com28
Evaluation
Performance changes over time
Empty Space Matters
Complex internals
Watch stability carefully
www.percona.com29
How Flash Fails
Clear write amount defined EOL (but often can handle a lot more)
One day… it’s gone
“Power Loss Protection”
Internal ECC and redundancy
www.percona.com30
To RAID or not to RAID ?
More valuable for consumer grade
Watch for good Flash support
RAID controller logic may slow things down
Use a redundant array of inexpensive servers instead?
www.percona.com31
Redundancy
Device internal redundancy
Hardware RAID
Software RAID
Filesystem “RAID”
www.percona.com32
OS Support
Flash support is actively being improved
TRIM
Sparse Files
www.percona.com33 www.percona.com
Flash And Databases
www.percona.com34
Database History
Most have been designed in HDD time
Optimize for sequential IO
Count on cheap sequential writes
RAID, BBU to improve performance
www.percona.com35
It’s time for Flash
Your OLTP Database should live on Flash
www.percona.com36
But What Flash ?
Pick a flash type that is right for your application
www.percona.com37
IO vs Memory
www.percona.com38
Warmup
Much faster warmup times
Even if the database fits in memory, SSD might be justified
www.percona.com39
Tolerate more IO bound load
• 5ms• Can do 20 IO/s for 100ms
response time (non parallel)HDD
• 0.1ms• Can do 1000 IO/s for 100ms
response time (non parallel)Flash
www.percona.com40
Endurance
Might be a top consideration
www.percona.com41
Endurance Math
• 4400GB/day over 5 Years• 1400MB/sec peak writes• 66 days at peak write
throughput
HGST FlashMax III 2200GB
• 72TB total life time writes• 400MB/sec write• 52 hours at peak write
throughput
Crucial M500 960GB
www.percona.com42
Databases and Flash
How do we optimize databases to us
Flash best?
www.percona.com43
“Torn Page” problem
Flash can avoid this with little cost due to internal design
FusionIO NVMFS (Atomic Writes)
Copy-on-Write File Systems• ZFS• BTRFS
Filesystem level data journaling less preferred• data=journal for EXT4
Skip-Innodb-double-write
www.percona.com44
Fast IO Path
Bypass Caching O_DIRECT
Native Asynchronous IO
Efficient Checksuming
Innodb_checksum_algorithm=crc32
Innodb_flush_method=O_DIRECT
www.percona.com45
IO Cost Accounting
Sequential vs Random IO balance
IO vs CPU Balance
Smaller page sizes might make sense• innodb_page_size=4K
www.percona.com46
Less Pre-fetching
Most pre-fetched data must be used
Often best to try It out
www.percona.com47
Less merging on flushing
Do not assume flushing multiple sequential dirty pages has same cost
Innodb_flush_neighbors=0
www.percona.com48
Less Space on Disk
Innodb Compression (2x typical)
TokuDB Compression (5-10x typical)
Archiving data off OLTP System
www.percona.com49
Less Writes on Flash
Hybrid Flash/SSD System
Transactional Logs, Other logs on the HDD with RAID and BBU
Small Temporary objects on tmpfs
Innodb_log_file_size=<LARGE>
www.percona.com50
Logs on RAID can be fast
www.percona.com51
Single Intel 730 Sysbench
www.percona.com52
IOPS
www.percona.com54
Is Flash Too Fast ?
• Multiple instances might scale better
www.percona.com55
Other Thoughts
Host hardware and OS matter, especially with high end flash
Virtualization has higher relative overhead
Network higher relative overhead
www.percona.com56 www.percona.com
Peter [email protected]
@PeterZaitsevhttps://www.linkedin.com/in/peterzaitsev
Thank You!