65
An Overview of Flash Storage for Databases Morgan Tocker <[email protected] > 1 Wednesday, March 9, 2011

An Overview of Flash Storage for Databases

  • Upload
    confoo

  • View
    2.741

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: An Overview of Flash Storage for Databases

An Overview of Flash Storage for Databases

Morgan Tocker<[email protected]>

1Wednesday, March 9, 2011

Page 2: An Overview of Flash Storage for Databases

Introduction

★ No invested interest in which hardware I recommend.✦ [Disclaimer] Some hardware vendors have engaged in our

services to evaluate and improve performance of their products.

2

[ Me]

Director of Training. Previously worked at MySQL, Sun

Microsystems.

[Percona]

Consulting, Training, Support & Development

for MySQL.

Wednesday, March 9, 2011

Page 3: An Overview of Flash Storage for Databases

What this talk is about

★ Flash technologies (NAND, NOR).★ Server Usage.

✦ Not USB thumb drives.✦ Not Consumer usage.

★ “For Database” == MySQL.✦ Should be more or less applicable for all databases.

3Wednesday, March 9, 2011

Page 4: An Overview of Flash Storage for Databases

Agenda

★ Introduction.★ A look at the current market.★ Applications.

4Wednesday, March 9, 2011

Page 5: An Overview of Flash Storage for Databases

Revolutionary

★ Change in technology -✦ From spinning disk to solid state.

★ No mechanical moving parts.★ Jump in performance.★ Requires changes in the Application.★ Hard not to predict a quick replacement to all SSDs in

the next 5-10 years*

5* However, at the moment hard disks are still becoming cheaper (size) quicker than SSDs!

Wednesday, March 9, 2011

Page 6: An Overview of Flash Storage for Databases

“Numbers everyone should know”

6

L1 cache reference 0.5 nsBranch mispredict 5 nsL2 cache reference 7 nsMutex lock/unlock 25 nsMain memory reference 100 nsCompress 1K bytes with Zippy 3,000 nsSend 2K bytes over 1 Gbps network 20,000 nsNAND Flash (my estimate) 50,000 nsRead 1 MB sequentially from memory 250,000 nsRound trip within same datacenter 500,000 nsDisk seek 10,000,000 nsRead 1 MB sequentially from disk 20,000,000 nsSend packet CA->Netherlands->CA 150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html and Google http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

Wednesday, March 9, 2011

Page 7: An Overview of Flash Storage for Databases

Physics Behind

★ “Floating Gate Transistors”✦ Non volatile memory.

★ One State - Single State (SLC)✦ Faster, more reliable, expensive.

★ Many States - Multi Level Cell (MLC)✦ Usually 4 states.✦ Slower, less reliable, cheaper.

7Wednesday, March 9, 2011

Page 8: An Overview of Flash Storage for Databases

Classification

★ NOR✦ Speeds like memory for reads.✦ Much, much slower for erase/writing data.✦ Practical use: storing firmware.

★ NAND✦ Faster writes.✦ Only block-level read access (4K).✦ Idea is to compact as many cells in limited space - to make it

competitive with hard drives.

8Wednesday, March 9, 2011

Page 9: An Overview of Flash Storage for Databases

Erasing (NAND)

★ Erase is to set all bits to “1111...”✦ Erasing process is similar to “flash” in photocameras - this is

where the name FLASH comes from.✦ Erase is slow, done in batch operations (up to 1MB).

★ Change “1” -> “0” is fast.★ Change “0” -> “1” is possible only be erase.

✦ 1st write: “1111” -> “1110”. Block marked as “written”✦ 2nd write: even “1110” -> “1010” is not possible.

9Wednesday, March 9, 2011

Page 10: An Overview of Flash Storage for Databases

Erase Challenges

★ Erase is slow✦ You want to erase many blocks in a single “flash”.✦ Block Management.

★ [via software] When you write, card never writes the same block.

★ Background process to run garbage collection.

10Wednesday, March 9, 2011

Page 11: An Overview of Flash Storage for Databases

Erase Lifecycle

★ SLC ~100K times per cell (may vary).★ MLC ~10K times per cell (may vary).★ For many this is a major point of discussion.

✦ How big of an issue depends a lot on firmware.✦ Many cells and even distribution (“wear levelling”) makes it a

couple of years under heavy work load.

11Wednesday, March 9, 2011

Page 12: An Overview of Flash Storage for Databases

Write degradation

★ Expected.✦ More full the device, harder it is to garbage collect.

★ Graph for Fusion-io 320G MLC card:

12Wednesday, March 9, 2011

Page 13: An Overview of Flash Storage for Databases

Firmware Really Matters (1)

★ I would not expect even less flat performance on a cheaper, non-enterprise class of hardware.✦ Come to my talk on Friday.✦ I will tell you consistency of performance is more important

than anything else.

13Wednesday, March 9, 2011

Page 14: An Overview of Flash Storage for Databases

Firmware Really Matters (2)

★ Many revisions of firmware for each vendor.✦ Important to compare apples-to-apples in any comparisons.✦ I heard a rumour one large SSD vendor is on their 4th

successful complete ground up implementation ;)

14Wednesday, March 9, 2011

Page 15: An Overview of Flash Storage for Databases

Agenda

★ Introduction.★ A look at the current market.★ Applications.

15Wednesday, March 9, 2011

Page 16: An Overview of Flash Storage for Databases

The current market (1)

★ Fusion-IO.✦ Established player with a large product line.✦ Enjoyed near-monopoly for a while being only PCI card

vendor.★ Virident.

✦ Previously a MySQL Appliance vendor.✦ Switched business model in ~2010 to just ship PCI Flash

cards.✦ Very good, consistent results.

16Wednesday, March 9, 2011

Page 17: An Overview of Flash Storage for Databases

The current market (2)

★ Intel/OCZ/other.✦ Typically aims for pro-desktop market.✦ Does not necessarily offer the same features/promises as the

“enterprise hardware”...

17Wednesday, March 9, 2011

Page 18: An Overview of Flash Storage for Databases

You pay more for...

★ Greater amount of over provisioning (more consistent).★ Internal redundancy (aka RAID).★ More complex firmware (more consistent).★ Guarantee of durability (such as a capacitor).★ Greater life-span (more write cycles).★ Better Performance (much more IOPS).

18Wednesday, March 9, 2011

Page 19: An Overview of Flash Storage for Databases

Fusion-io

19Wednesday, March 9, 2011

Page 20: An Overview of Flash Storage for Databases

Performance Specification

★ 160G SLC✦ 110K read IOPS (4K)✦ 26us read latency.

★ 320G MLC✦ 71K read IOPS.✦ 41us read latency.

★ “Duo” Range (not covered).★ Lifetime:

✦ SLC flash @ 40% write duty | 25 calendar years✦ MLC flash @ 20% write duty | 10 calendar years✦ MLC flash @ 40% write duty | 5 calendar years

20Wednesday, March 9, 2011

Page 21: An Overview of Flash Storage for Databases

Fusion-io Overview

★ Fast. Very fast.✦ Cheaper than disks in terms of $-per IOPS.

★ PCI-E - closest to CPU.★ Durability.★ Shares host memory / CPU★ Most complex part - firmware.★ Large amount of space reservation for heavy writes.

21Wednesday, March 9, 2011

Page 22: An Overview of Flash Storage for Databases

Fusion-io drawbacks

★ Expensive. Let’s say “$6000+” (retail; your price may be less).✦ For full performance, requires additional 25% space

reservation.✦ DRAM is actually probably cheaper per GB.

★ PCI-E is not hot swap.✦ Also has potential for errors (when host fails, garbage keeps

being sent. Fusion-io handles this well.)

22Wednesday, March 9, 2011

Page 23: An Overview of Flash Storage for Databases

Fusion-io durability

★ Cache is located on host system.★ “Transaction log” to prevent lost data.

✦ Crash recovery.

23Wednesday, March 9, 2011

Page 24: An Overview of Flash Storage for Databases

Fusion-io read performance

24

160GB SLC card8 threads: 33K IOPS (525MB/sec), 0.28 ms 95% response time

RAID 10 is Dell Perc 6ion 8 disks 2.5” 15 RPM SAS

Wednesday, March 9, 2011

Page 25: An Overview of Flash Storage for Databases

Fusion-io write performance

★ 8 threads: 20K IOPS (314MB/sec), 0.26 ms 95% response time.

25Wednesday, March 9, 2011

Page 26: An Overview of Flash Storage for Databases

Fusion-io databases

★ Many read / write threads to utilize throughput.★ “MySQL” is not able to fully use it.

✦ Better in 5.5, MySQL-5.1-plugin, XtraDB.★ InnoDB IO path “needs work”.

26Wednesday, March 9, 2011

Page 27: An Overview of Flash Storage for Databases

Virident TachIOn

27Wednesday, March 9, 2011

Page 28: An Overview of Flash Storage for Databases

Virident

★ PCI interface. ★ Has NAND flash upgrade modules.★ Good stable results.★ Advertised 300,000 IOPS in 75:25 (read:write).

28Wednesday, March 9, 2011

Page 29: An Overview of Flash Storage for Databases

Virident Options

★ 300G, 400G, 600, 800G SLC cards.✦ 400G is $13,600

★ (More or less the same price range as Fusion-io).

29Wednesday, March 9, 2011

Page 31: An Overview of Flash Storage for Databases

Intel SSDs

31Wednesday, March 9, 2011

Page 32: An Overview of Flash Storage for Databases

Intel SSDs

★ Were awesome in 2008.✦ Many accolades, first SSDs that probably made sense for a

lot of pro-desktop users.★ A couple of iterations of firmware, but mostly intel

treated customers like mushrooms for 2 years.✦ No clear advance warning of road map.✦ Finally a replacement 510 series announced last month.

• Slides don’t feature these. Have not used them.

32Wednesday, March 9, 2011

Page 33: An Overview of Flash Storage for Databases

Intel Overview

★ SATA form factor.★ Intel X25-M Gen 1 (50nm) & Gen 11 (35nm).

✦ MLC★ Intel X25-E (50nm)

✦ SLC✦ “Enterprise”.

★ New 510 series - just released last month.

33Wednesday, March 9, 2011

Page 34: An Overview of Flash Storage for Databases

X25-E

★ 32G / 64G★ Throughput: 35K IOPS reads, 3.5K IOPS writes.★ Latency: 75us reads, 85us writes.★ 64G - $725

✦ $11/GB★ Write endurance:

✦ 1 petabyte of random writes (32G)✦ 2 petabytes of random writes (64G)

34Wednesday, March 9, 2011

Page 35: An Overview of Flash Storage for Databases

X25-M Gen II

★ 80G / 160G★ Throughput: 35K IOS reads, 6.5 / 8.5K IOPS writes.★ Latency: 65us reads, 85us writes.★ 160GB - $415

✦ ~$3 / GB★ Write Endurance.

✦ Not mentioned in official specification.

35Wednesday, March 9, 2011

Page 36: An Overview of Flash Storage for Databases

X25-E and X25-M

★ Even if “E” is enterprise - power loss means data loss.✦ Loss of transactions.

★ You can disable write cache, but performance is woeful.

36Wednesday, March 9, 2011

Page 37: An Overview of Flash Storage for Databases

X25 Deployments

★ RAID✦ Software / hardware?✦ Level 0? 1? 10? 5? 50?

★ Engineering process could be complicated and expensive.✦ There are/were ready solutions (Schooner[1], Gear6[2], Cisco

servers).

37[1] Changed business model recently.[2] Went broke.

Wednesday, March 9, 2011

Page 38: An Overview of Flash Storage for Databases

Agenda

★ Introduction.★ A look at the current market.★ Applications.

38Wednesday, March 9, 2011

Page 39: An Overview of Flash Storage for Databases

MySQL Specific (1)

★ SSD is very good at Random reads.✦ Not so good at sequential writes!

★ Data files on SSD.✦ Table files (*.ibd).✦ Rollback segments (ibdata1).

★ Logs on RAID with BBU.✦ Binary logs.✦ Transaction logs.✦ Double write buffer.✦ Insert buffer.✦ Slow log, error log, general log.

39 See: http://yoshinorimatsunobu.blogspot.com/2009/05/tables-on-ssd-redobinlogsystem.html

Wednesday, March 9, 2011

Page 40: An Overview of Flash Storage for Databases

MySQL Specific (2)

★ Buy memory, or buy SSDs?✦ [Usually] Buy memory when it’s possible.

40Wednesday, March 9, 2011

Page 41: An Overview of Flash Storage for Databases

Other Reasons to use Flash (1)

★ Server Consolidation.✦ Hard drives do ~100-200 IOPS*✦ Now one card can get 100K (theorhetical)!✦ ~x2 - x10 reduction in many cases (see craigslist).

41 * Assuming no RAID controller performing additional merging.Wednesday, March 9, 2011

Page 42: An Overview of Flash Storage for Databases

Other Reasons to use Flash (2)

★ Power consumption reduction.✦ “Transactions per watt” incredibly lower.

• See: http://www.percona.com/files/percona-live/jeremy-Craigslist.pptx.pdf

✦ Important for a large number of people. Even if power is cheap, colo facilities often limit availability per-rack.

42Wednesday, March 9, 2011

Page 43: An Overview of Flash Storage for Databases

Other Reasons to use Flash (3)

★ Limit variance / risk of operational issues from cold starts.✦ Easy to see something like an advertising network miss

response time goals when aim is 50ms/page.• Each IO is ~10ms.• Following a few secondary keys to a primary key and you miss it.

★ Good for throughput too.

43Wednesday, March 9, 2011

Page 44: An Overview of Flash Storage for Databases

Applications must change

Wednesday, March 9, 2011

Page 45: An Overview of Flash Storage for Databases

Short Term (1)

★ Multi-threaded IO is required to exploit all throughput offered.✦ InnoDB Plugin, MySQL 5.5 ready.✦ Many other databases are not ready.

45Wednesday, March 9, 2011

Page 46: An Overview of Flash Storage for Databases

Short Term (2)

★ Opportunities for Multi-level caches when data exceeds SSDs size.✦ See Flashcache (Facebook), ZFS L2 ARC, Veritas.

46Wednesday, March 9, 2011

Page 47: An Overview of Flash Storage for Databases

Long Term

★ Decades of hard drive assumptions about random IO cost need to be unwound.✦ For example, InnoDB, Oracle, PostgreSQL work like this...

47Wednesday, March 9, 2011

Page 48: An Overview of Flash Storage for Databases

Basic Operation (High Level)

Log Files

48

SELECT * FROM CityWHERE CountryCode=ʼAUSʼ

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 49: An Overview of Flash Storage for Databases

Basic Operation (High Level)

Log Files

48

SELECT * FROM CityWHERE CountryCode=ʼAUSʼ

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 50: An Overview of Flash Storage for Databases

Basic Operation (High Level)

Log Files

48

SELECT * FROM CityWHERE CountryCode=ʼAUSʼ

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 51: An Overview of Flash Storage for Databases

Basic Operation (High Level)

Log Files

48

SELECT * FROM CityWHERE CountryCode=ʼAUSʼ

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 52: An Overview of Flash Storage for Databases

Basic Operation (High Level)

Log Files

48

SELECT * FROM CityWHERE CountryCode=ʼAUSʼ

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 53: An Overview of Flash Storage for Databases

Basic Operation (High Level)

Log Files

48

SELECT * FROM CityWHERE CountryCode=ʼAUSʼ

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 54: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 55: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 56: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 57: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 58: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

01010

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 59: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

01010

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 60: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

01010

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 61: An Overview of Flash Storage for Databases

Basic Operation (cont.)

49

Log Files

UPDATE City SET name = 'Morgansville'

WHERE name = 'Brisbane' AND CountryCode='AUS'

01010

Buffer PoolTablespace

Wednesday, March 9, 2011

Page 62: An Overview of Flash Storage for Databases

Long Term (cont.)

★ Examples of “the database is the log” for MySQL are the PBXT and RethinkDB storage engines.

50Wednesday, March 9, 2011

Page 63: An Overview of Flash Storage for Databases

Storage Hardware also changes

★ Most of us used to buying RAID controllers, placing disks below them.✦ Only a very limited number of RAID controllers understand

SSDS.✦ RAID controllers are used to optimizing IO for devices

capable of 100-200 IOPS.✦ If we look at Fusion-IO, the devices also internally RAID

(~RAID4).

51Wednesday, March 9, 2011

Page 64: An Overview of Flash Storage for Databases

Technologies to look at

★ More PCI express cards.✦ Potential to lower barrier to entry - only ~2-3 players,

competition not as hot as it could be (yet).★ More Enterprise focused MLC.

✦ Better software (firmware) means more wear levelling, improved performance, etc.

✦ More storage in fewer cells = lower cost.★ Violin Memory

✦ I am not hands-on familiar with their technology, but they have some very high end offerings.

✦ Expect more awesome high end offerings (all vendors).

52Wednesday, March 9, 2011

Page 65: An Overview of Flash Storage for Databases

Questions

★ Thank you for Confoo for letting me speak about such a niche topic!

★ If I’m out of time, please feel free to catch me around.

53Wednesday, March 9, 2011