31
And How It Effects SQL Server

Solid State Storage Deep Dive

Embed Size (px)

DESCRIPTION

Solid State Storage Deep Dive. And How It Effects SQL Server. Today’s Topic Covers…. NAND Flash Structure MLC and SLC Compared NAND Flash Read Properties NAND Flash Write Properties Wear-Leveling Garbage Collection Write Amplification TRIM Error Detection and Correction Reliability - PowerPoint PPT Presentation

Citation preview

Page 1: Solid State Storage Deep Dive

And How It Effects SQL Server

Page 2: Solid State Storage Deep Dive

• NAND Flash Structure• MLC and SLC Compared• NAND Flash Read Properties• NAND Flash Write Properties• Wear-Leveling• Garbage Collection• Write Amplification• TRIM• Error Detection and Correction• Reliability• Form Factor• Performance Characteristics• Determining What’s Right for You• Not All SSD’s Are Created Equal

Page 3: Solid State Storage Deep Dive

• Two Main Flavors NAND And NOR• NOR–Operates like RAM.– NOR is parallel at the cell level.– NOR reads slightly faster than NAND.– Can execute directly from NOR without copy to

RAM.• NAND– NAND operates like a block device a.k.a. hard disk.– NAND is serial at the cell level.– NAND writes significantly faster than NOR.– NAND erases much faster than NOR--4 ms vs. 5 s.

Page 4: Solid State Storage Deep Dive

• Serial array of transistors.– Each transistor holds 1 bit(or more).

• Arrays grouped into pages.– 4096 bytes in size.– Contains “spare” area for ECC and other ops.

• Pages grouped into Blocks– 64 to 128 pages.– Smallest erasable unit.

• Pages grouped into chip– As big as 16 Gigabytes.

• Chips grouped on to devices.– Usually in a parallel arrangement.

Page 5: Solid State Storage Deep Dive

NAND Flash Structure. Gates, Cells, Pages and Strings.

Page 6: Solid State Storage Deep Dive

• MLC (Multi-Level Cell)– Higher capacity (two bits per cell).– Low P\E cycle count 3k~ 10K~.– Cheaper per Gigabyte.– High ECC needs.

• SLC (Single-Level Cell)– Fast read speed• 25ns vs. 50ns

– Fast Write Speed• 220ns vs. 900ns

– High P\E cycle count 100k~ to 300k~– Tend to be conservative numbers.

– Minimal ECC requirements• 1 bit per 512 bytes vs. 12~ bits per.

– Expensive• Up to 5x the cost of MLC.

Page 7: Solid State Storage Deep Dive

It isn’t RAM.◦ Slower access times.

1~ ns vs. 50~ ns. No write in place.

It isn’t a hard disk.◦ Much faster access times.

Nanoseconds vs. Milliseconds◦ No moving parts.

Page 8: Solid State Storage Deep Dive

Program Erase Cycle◦ Erased state all bits are 1.◦ Programmed bits are 0.◦ Programmed pages at a time.

One pass programming.◦ Erased block at a time(128 pages).

Must erase entire block to program a single page again.

◦ Finite life cycle, 10k~ MLC 100k~ SLC. Once failed to erase may still be readable.

Page 9: Solid State Storage Deep Dive

Data written in pages and erased in blocks. Blocks are becoming larger as NAND Flash die sizes shrink.

Page 10: Solid State Storage Deep Dive

• Wear-Leveling– Spreads writes across blocks.– Ideally, write to every block before erasing any.–Data grouped into two patterns.• Static, written once and read many times.• Dynamic, written often read infrequently.

– If you only Wear-Level data in motion you burn out the page quickly.

– If you Wear-Level static data you are incurring extra I/O

Page 11: Solid State Storage Deep Dive

Background Garbage Collection◦ Defers P/E cycle.◦ Pages marked as dirty, erased later.◦ Requires spare area.◦ Incurs additional I/O.◦ Can be put under pressure by frequent small

writes.

Page 12: Solid State Storage Deep Dive

Write Amplification◦ Ripples in a pond.◦ Device moves blocks around.◦ Incoming I/O greater than Device has.◦ Every write causes additional writes.

Small writes can be a real problem. OLTP workloads are a good example. TRIM can help.

Page 13: Solid State Storage Deep Dive

Initial Write of 4 pages to a single erasable block.

Page 14: Solid State Storage Deep Dive

Four new pages and four replacement pages written. Original

pages are now marked invalid.

Page 15: Solid State Storage Deep Dive

Garbage collection comes along and moves all valid pages to a new block

and erases the other block.

Page 16: Solid State Storage Deep Dive

• TRIM– Supported out of the box on Windows 7, Windows

2008 R2. • Some manufacturers are shipping a TRIM service

that works with their driver– Acts like spare area for garbage collection.–OS and file system tell drive block is empty.– Filling file system defeats TRIM.– File fragmentation can hurt TRIM.• Grow your files manually!• Don’t run disk defrag!

Page 17: Solid State Storage Deep Dive

Many things cause errors on Flash!• Write Disturb– Data Cells NOT being written to are corrupted.• Fixed with normal erase.

• Read Disturb– Repeated reads on same page effects other pages on

block.• Fixed with normal erase.

• Charge Loss/Gain– Transistors may gain or lose charge over time.• Flash devices at rest or rarely accessed data.• Fixed with normal erase.

All of these issues are generally dealt with very well using standard ECC techniques.

Page 18: Solid State Storage Deep Dive

As cells are programmed other cells may experience voltage change.

Page 19: Solid State Storage Deep Dive

As cells are read other cells in same block can suffer voltage change.

Page 20: Solid State Storage Deep Dive

If flash is at rest or rarely read cells can suffer charge loss.

Page 21: Solid State Storage Deep Dive

• Not all drives are benchmarked the same.• Short-stroking– Only using a small portion of the drive.– Allows for lots of spare capacity via TRIM.

• Huge queue depths.– Increases latency.– Can be unrealistic.

• Odd block transfer sizes.– Random IO testing.• Some use 512 byte while others use 4k.

– Sequential IO testing.• Most use 128k.• Some use 64k to better fit into large buffers.• Some use 1mb and high queue depths.

Page 22: Solid State Storage Deep Dive

Read the numbers carefully.◦ Random IO bench usually 4k.

SQL Server works on 8k.◦ Sequential IO bench usually 128k.

SQL Server works on 64k to 128mb◦ Queue depths set high.

SQL Server usually configured for low Queue depth.

Page 23: Solid State Storage Deep Dive

• SLC is ready “Out of the box.”– Requires much less infrastructure on disk to

support robust write environments.• MLC needs some help.– Requires lots of spare area and smarter

controllers to handle extra ECC.– eMLC has all management functions built onto the

chip.

• Both configured similarly.– RAID of chips.– TRIM, GC and Wear-Leveling

Page 24: Solid State Storage Deep Dive

Longevity between devices can be huge. Consumer grade drives are consumable.

◦ Aren’t rated for full drive writes. Desktop drives usually tested on a fraction of drive

capacity!◦ Aren’t rated for continuous writes.

It may say three year life span. Could be much shorter look at total writes.

Page 25: Solid State Storage Deep Dive

• SAS is the king of your heavy workloads.• Command Queuing– SAS supports up to 216 usually capped at 64.– SATA supports up to 32.

• Error recovery and detection.– SMART isn’t.– SCSI command set is better.

• Duplex– SAS is full duplex and dual ported per drive.– SATA is single duplex and single ported.

• Multi-path IO– Native to SAS at the drive level.– Available to SATA via expanders.

Page 26: Solid State Storage Deep Dive

• Flash comes in lots of form factors.• Standard 2.5” and 3.5” drives,• Fibre Attached

• Texas Memory System RAM-SAN 620• Violin Memory

• PCIe add-in cards.• Few “native” cards.• Fusion-io• Texas Memory System RAM-SAN 20• Bundled solutions.• LSI SSS6200• OCZ Z-Drive• OCZ Revodrive

• PCIe To Disk• 2.5” form factor and plugs• Skips SAS/SATA for direct PCIe lanes.

Page 27: Solid State Storage Deep Dive

You MUST understand your workloads.◦ Monitor virtual file stats

http://sqlserverio.com/2011/02/08/gather-virtual-file-statistics-using-t-sql-tsql2sday-15/ Track random vs. sequential Track size of transfers

◦ Capture IO Patterns http://sqlserverio.com/2010/06/15/fundamentals-of-st

orage-systems-capturing-io-patterns/◦ Benchmark!

http://sqlserverio.com/2010/06/15/fundamentals-of-storage-testing-io-systems/

Page 28: Solid State Storage Deep Dive

• From new– Best possible performance.– Drive will never be this fast again.

• Previous writes effect future reads.– Large sequential writes nice for GC.– Small random writes slow GC down.– Wait for GC to catch up when benching drive.• Give the GC time to settle in going from small random to

large sequential or vice versa.• Steady state is what we are after.

• Performance over time slows.– Cells wear out.• Causes multiple attempts to read or write• ECC saves you but the IO is still spent.

Page 29: Solid State Storage Deep Dive

• Not all drives are equal.• Understand drives are tuned for workloads.–Desktop drives don’t favor 100% random writes…– Enterprise drives are expected to get punished.

• Fix it with firmware.–Most drives will have edge cases.• OCZ and Intel suffered poor performance after drive

use over time.• Be wary of updates that erase your drive.– Gives you a temporary performance boost.

Page 30: Solid State Storage Deep Dive

• Flash read performance is great, sequential or random.

• Flash write performance is complicated, and can be a problem if you don’t manage it.

• Flash wears out over time. – Not nearly the issue it use to be, but you must

understand your write patterns.– Plan for over provisioning and TRIM support.• It can have a huge impact on how much storage you

actually buy. – Flash can be error prone. • Be aware that writes and reads can cause data corruption.

Page 31: Solid State Storage Deep Dive

Solid State Storage Deep DiveWes Brown