47
It’s a Solid State World How Exadata X3 leverages flash memory Gwen Shapira Marc Fielding

OOW13: It's a solid state-world

Embed Size (px)

DESCRIPTION

It's a Solid State World How Exadata X3 leverages flash memory

Citation preview

Page 1: OOW13: It's a solid state-world

It’s a Solid State World

How Exadata X3 leverages flash memory

Gwen Shapira

Marc Fielding

Page 2: OOW13: It's a solid state-world

About Gwen – Solutions Architect,

Cloudera

– Oracle ACE Director

– Presents, Blogs, Tweets

– @gwenshap

© 2013 Pythian 2

Page 3: OOW13: It's a solid state-world

About Marc

© 2013 Pythian 3

• Senior Consultant with Pythian’s

Advanced Technology Group

• 12+ years Oracle production

systems experience starting with

Oracle 7

• Blogger and conference

presenter

pythian.com/news/author/fielding

• Occasionally on twitter: @mfild

Page 4: OOW13: It's a solid state-world

Remember your first SSD?

… you’ll never forget it

4 © 2013 Pythian

Page 5: OOW13: It's a solid state-world

Sh*t people say about SSDs

© 2013 Pythian 5

Fast for reads

Don’t use for writes

Use for random writes

Don’t use for REDO

Used for REDO

Only used in Exadata

Only Sun flash devices are supported

Unreliable

Becomes slower over time

Type of SSD matters

Use SATA SSD

Use PCI SSD Use SSD in SAN

Too expensive

Is it same as Flash?

Page 6: OOW13: It's a solid state-world

© 2013 Pythian 6

Solid State Disk

=

No moving parts

=

Low-latency random I/O

Page 7: OOW13: It's a solid state-world

The technology: NAND flash • Slower than RAM, but both

nonvolatile and affordable in large capacities

• SLC

– One bit per cell

– High performance

• MLC

– Two bits per cell

– More capacity = cheaper

© 2013 Pythian 7

0

1

00

01

10

11

Page 8: OOW13: It's a solid state-world

We will talk about • I/O Performance

• Using SSDs for Oracle

• How Exadata uses SSDs

• SSD devices

• Practice: Reading SSD

Vendor Specs

© 2013 Pythian 8

Page 9: OOW13: It's a solid state-world

Cells, pages, and blocks

© 2013 Pythian 9

Cell

1bit Page

4K

Block

128 Pages

512K

Plane = 1024 Blocks = 512MB

Planes are grouped into dies

which are grouped into packages

Page 10: OOW13: It's a solid state-world

The big gocha

• Reads = 4KB pages

• Writes = 4KB pages

• Deletes = 512KB blocks

© 2013 Pythian 10

Page 11: OOW13: It's a solid state-world

Reads: orders of magnitude • CPU registers – 0.3 * ns (1 cycle)

• CPU Cache L1 – 1.2* ns

• CPU Cache L2 – 3.0* ns

• CPU Cache L3 – 12-24 ns

• Main Memory (RAM) – 60-100 ns

• SSD – 60,000 ns

• Magnetic Storage (“DISK”) – 3,000,000 ns

• SAN devices ~ 15,000,000 ns

© 2013 Pythian 12

Page 12: OOW13: It's a solid state-world

Don’t forget throughput • 15K RPM SAS HDD – 120-200MB/s

• PCIe SSD – 1-2GB/s

• But … How many disks do you use?

• Network bandwidth?

• CPU Bus bandwidth?

© 2013 Pythian 13

Page 13: OOW13: It's a solid state-world

Writes

• Writes on new SSD – 250,000 ns

• Comparable to rotating disk

How much data can you write to a new 250GB

SSD?

© 2013 Pythian 14

Page 14: OOW13: It's a solid state-world

Deletes • Can’t overwrite data without deleting first

• Can only delete blocks of 128*4K pages

• To Overwrite a page:

– Read 127 pages

– Write 127 to a free block

– Delete old block

– Perform the write we originally requested

• Takes 2ms

• Each cell can only be written 100K times

© 2013 Pythian 15

Page 15: OOW13: It's a solid state-world

The SSD controller • Does the “magic” behind the scenes

• Deletes in the background (“garbage collection”)

• Tracks free space

• Balances I/O over cells

(“wear leveling”)

• Manages spare capacity

(“overprovisioning”)

• Manages RAM cache

© 2013 Pythian 16

Page 16: OOW13: It's a solid state-world

The consequences • Write Amplification

– How much data is really written when we write 1MB

– 1 means no overhead

– The closer to 1 the better

– Less than 1 means the vendor is lying

• Never benchmark a brand-new SSD – Run benchmarks long enough to run out of

overprovisioned space

© 2013 Pythian 17

Page 17: OOW13: It's a solid state-world

We will talk about • I/O Performance

• Using SSDs for Oracle

• How Exadata uses SSDs

• SSD devices

• Practice: Reading SSD

Vendor Specs

© 2013 Pythian 18

Page 18: OOW13: It's a solid state-world

© 2013 Pythian 22

Page 19: OOW13: It's a solid state-world

Solid-state your whole database?

• SSDs solve I/O latency problems

• But not if db file sequential read is not in your

top 5 wait events

• And not if you haven’t maxed out your RAM for buffer

cache (yet)

• If your CPU utilization is high, solve this first.

© 2013 Pythian 23

Page 20: OOW13: It's a solid state-world

SSD mistakes

• SSD in primary but not DR site

– I/O capacity to apply real-time updates

– What if you need a switchover

• Over-managing active segments

– If DBAs didn’t have enough to do already…

• Database smart flash cache

© 2013 Pythian 25

Page 21: OOW13: It's a solid state-world

Database “smart” flash cache

© 2013 Pythian 26

Disk

SGA

Flash Cache

Block

read from

disk

Block evicted

from SGA is

written to

SSD cache

by DBWR

If block is

needed, it is

read from

SSD

Page 22: OOW13: It's a solid state-world

Database “smart” flash cache • Pros:

– Automatically keeps active data in SSD

• Cons: – Large overhead for managing cache, all taken from SGA

– Overhead for DBWR

– No benefit and some overhead for writes

– Only one disk

Using Smart Flash Cache will make your I/O faster than using just disks, but smartly placing data on SSD will be even faster.

© 2013 Pythian 27

Page 23: OOW13: It's a solid state-world

We will talk about • I/O Performance

• Using SSDs for Oracle

• How Exadata uses SSDs

• SSD devices

• Practice: Reading SSD

Vendor Specs

© 2013 Pythian 28

Page 24: OOW13: It's a solid state-world

In the beginning • Exadata V1, 2008

• Joint project of HP and Oracle

• Designed for big and long-running

queries (think data warehouses)

• No flash cache

© 2013 Pythian 29

Page 25: OOW13: It's a solid state-world

And then • Exadata V2, 2009

• Brand-new PCI-based flash cache

• Integrated with storage servers

• A full high-performance rack has:

– 4 * 14 Sun F20 flash accelerator cards

– 96GB * 4 * 14 = 5.4TB SLC flash

– 75 GB/sec flash throughput

– 1.5m IOPS

• Note that InfiniBand will limit you to 4GB/sec per DB node

© 2013 Pythian 30

Page 26: OOW13: It's a solid state-world

Fast-forward to 2012 • Exadata X3, 2012

• Still integrated with storage servers

• A full high-performance rack has:

– 4 * 14 Sun F40 flash accelerator cards

– 400GB * 4 * 14 = 22.4TB MLC flash

– 100 GB/sec flash throughput

– 1.5m IOPS

• Same InfiniBand speeds

© 2013 Pythian 31

Page 27: OOW13: It's a solid state-world

Just announced • Flash cache compression

– Fit more data into your flash

– Exadata hardware support TBD

– Only if the data isn’t already compressed (HCC)

© 2013 Pythian 32

Page 28: OOW13: It's a solid state-world

Exadata smart flash cache

• Not the database smart flash cache

• No victim caching here

• Flash memory on storage servers

• Can be used for traditional storage too (but you

lose capacity to redundancy)

© 2013 Pythian 33

Page 29: OOW13: It's a solid state-world

Uncached reads

© 2013 Pythian 34

1. Uncached data is read

from disk first

2. Sent to the database

3. and then copied to cache

Disks SSD Cache

cellsrv Database

Page 30: OOW13: It's a solid state-world

Cached reads – Cached blocks come from

flash cache directly

– Except smart scans: disk only

– If you set cell_flash_cache keep

they read from

both disk and flash

© 2013 Pythian 35

Disks SSD Cache

cellsrv Database

Page 31: OOW13: It's a solid state-world

Writes (1) – Writes go to disk first

– Then copied to cache,

sometimes

• Indexes and tables with

random read I/O are

prioritized

• Or use cell_flash_cache

keep

© 2013 Pythian 36

Disks SSD Cache

cellsrv Database

Page 32: OOW13: It's a solid state-world

Writes (2) – Write back cache

– 11.2.0.3 BP9+

– Writes go to SSD first

– Then copied to disk,

eventually

© 2013 37

Disks SSD Cache

cellsrv Database

Page 33: OOW13: It's a solid state-world

Exadata smart flash logging • In some Exadata systems: I/O outliers

• Slow log file syncs

• But aren’t flash writes slow?

• We now write to both disk and flash

• Puts an upper limit on latency

• Data corruption bug fixed in

11.2.3.2.1, and ASM resilvering

bug fixed in 11.2.0.3 BP9

© 2013 Pythian 38

Page 34: OOW13: It's a solid state-world

Mixed workloads • Classic example: OLTP and DW on

same system

• DW does long-running, I/O-intensive

queries

• OLTP does relatively little I/O transfer

• But OLTP very latency sensitive

• DW monopolizes the flash cache

• How to prioritize cache for OLTP?

© 2013 Pythian 39

Page 35: OOW13: It's a solid state-world

The workaround • Control via I/O resource manager alter iormplan dbplan=((name=dss, level=1, flashcache=off),

(name=other, level=1, flashCache=on));

• Disables flash cache entirely for a DB

• Very coarse control: on or off

• Obvious effect in I/O performance

• Use only if you need it

• cellcli list flashcachecontent can show what is in the cache

© 2013 Pythian 40

Page 36: OOW13: It's a solid state-world

We will talk about • I/O Performance

• Using SSDs for Oracle

• How Exadata uses SSDs

• SSD devices

• Practice: Reading SSD

Vendor Specs

© 2013 Pythian 41

Page 37: OOW13: It's a solid state-world

Interfaces • SATA

– 32 outstanding IO

– 6Gb/s = 600MB/s

– significant latency

• SAS

– 256 outstanding IO

– 6Gb/s = 600MB/s

© 2013 Pythian 42

Page 38: OOW13: It's a solid state-world

Interfaces • PCIe

– “Flash” “Accelerator”

– Multiple 500 MB/s lanes

– Low latency

– Multiple SAS/SATA controllers on card

for extra throughput

© 2013 Pythian 43

Page 39: OOW13: It's a solid state-world

Interfaces

• Fiber channel

– Use existing storage

infrastructure

– High latency

– Shared: works with RAC

• Proprietary PCI

– By flash array vendors

– Avoids latency penalty of FC

© 2013 Pythian 44

Page 40: OOW13: It's a solid state-world

We will talk about • I/O Performance

• Using SSDs for Oracle

• How Exadata uses SSDs

• SSD devices

• Practice: Reading SSD

Vendor Specs

© 2013 Pythian 45

Page 41: OOW13: It's a solid state-world

© 2013 Pythian 46

Write faster

than read?

Page 42: OOW13: It's a solid state-world

© 2013 Pythian 47

Identical

read/write?

Intel SSD 910

Page 43: OOW13: It's a solid state-world

© 2013 Pythian 48

Page 44: OOW13: It's a solid state-world

© 2013 Pythian 49

RAMSAN

Page 45: OOW13: It's a solid state-world

© 2013 Pythian 50

Page 46: OOW13: It's a solid state-world

Wrapping up • SSDs make random reads wicked fast

• Writes and deletes are complicated

• Exadata’s smart flash cache speeds up random reads

• Not all SSDs are the same

• Read vendor specs carefully

© 2013 Pythian 51

Page 47: OOW13: It's a solid state-world

Thank you and Q&A

© 2013 Pythian 52

[email protected]

@gwenshap

[email protected]

@mfild