55
Databases in a Solid State World How Exadata X3 and Other Database Systems Leverage the Performance of Flash Gwen Shapira, Senior Consultant February, 2013

Ssd collab13

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ssd   collab13

Databases in a Solid State World

How Exadata X3 and Other Database Systems Leverage the Performance of FlashGwen Shapira, Senior ConsultantFebruary, 2013

Page 2: Ssd   collab13

About Me– Oracle ACE Director– Member of Oak Table– 14 years of IT

– Performance Tuning– Troubleshooting– Hadoop

– Presents, Blogs, Tweets

– @gwenshap

© 2013 Pythian2

Page 3: Ssd   collab13

About Pythian• Recognized Leader:

– Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server

– Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments

• Expertise:

– Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs.

– Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC

• Global Reach & Scalability:

– Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response

© 2013 Pythian3

Page 4: Ssd   collab13

© 2013 Pythian4

You Never Forget

Your First SSD

Page 5: Ssd   collab13

Sh*t People Say about SSD:

© 2013 Pythian5

Fast for reads

Don’t use for writes

Use for random writes

Don’t use for REDO

Used for REDO

Only used in Exadata

Only Sun flash devices are supported

Unreliable

Becomes slower over time

Type of SSD matters

Use SATA SSD

Use PCI SSDUse SSD in SAN

Too expensive

Is it same as Flash?

Page 6: Ssd   collab13

Solid State Disk=No Spinning=Low Latency Random IO

© 2013 Pythian6

Page 7: Ssd   collab13

We are talking about: NAND FLASH

• As opposed to RAM Flash which is rare but awesome

• SLC – One bit per cell. – High performance.

• MLC– Two bit per cell– High capacity

© 2013 Pythian7

0

1

00

01

10

11

Page 8: Ssd   collab13

Will Talk About:

• IO Performance• Using SSDs for

Oracle• How Exadata and

ODA uses SSDs• SSD devices• Practice: Reading

SSD Vendor Specs

© 2013 Pythian8

Page 9: Ssd   collab13

Anatomy of a SSD

© 2013 Pythian9

Cell1bit

Page4K

Block128 Pages

512K

Plane = 1024 Blocks = 512MBPlanes are grouped into Die which are grouped in Packages

Page 10: Ssd   collab13

The Big Catch:We read and write pagesBut delete blocks

© 2013 Pythian10

Page 11: Ssd   collab13

IO Operations

© 2013 Pythian11

Page 12: Ssd   collab13

Reads • CPU registers – 0.3 * ns (1 cycle)• CPU Cache L1 – 1.2* ns • CPU Cache L2 – 3.0* ns• CPU Cache L3 – 12-24 ns

•Main Memory (RAM) – 60-100 ns•SSD – 60,000 ns•Magnetic Storage (“DISK”) – 3,000,000 ns

•SAN devices ~ 15,000,000 ns

© 2013 Pythian12

Page 13: Ssd   collab13

What about throughput?

• 15K RPM SAS HDD – 120-200MB/s• PCIe SSD – 1-2GB/s• But … How many disks do you use?• Network bandwidth?• CPU Bus bandwidth?

© 2013 Pythian13

Page 14: Ssd   collab13

Writes

• Writes on new SSD – 250,000 ns• Similar to sequential write to disk

How much data can you write to a new 250GB SSD?

© 2013 Pythian14

Page 15: Ssd   collab13

Deletes

• Can’t overwrite data without deleting first• Can only delete blocks of 128*4K pages• To Overwrite a page:

– Read 127 pages– Write 127 to a free block– Delete old block– Perform the write we originally requested

• Takes 2ms• Each cell can only be written 100K times

© 2013 Pythian15

Page 16: Ssd   collab13

The Controller

• Over-provision SSDs• Maintain free lists• Delete and cleanup in background• Balance use of cells (Wear leveling)• RAM caching

© 2013 Pythian16

Page 17: Ssd   collab13

Consequences:

• Write Amplification– How much data is really written when we write

1MB– 1 means no overhead– The closer to 1 the better

• Benchmarks on new SSD are worthless– Run benchmarks long enough to run out of

overprovisioned space

© 2013 Pythian17

Page 18: Ssd   collab13

Will Talk About:

• IO Performance• Using SSDs for

Oracle• How Exadata and

ODA uses SSDs• SSD devices• Practice: Reading

SSD Vendor Specs

© 2013 Pythian18

Page 19: Ssd   collab13

Redo Logs

A: Redo log writes are sequential writes and therefore won’t benefit from SSD

B: Log file sync times are critical to Oracle performance. Therefore placing redo logs on SSD will have dramatic impact on performance.

© 2013 Pythian19

Page 20: Ssd   collab13

Don’t use SSD for redo if:

• You don’t have “log file sync” related performance problems

• You have dedicated disks for each redo log• Even better if multiple disks, striped.• Your SAN is well configured and has ample

caching• You have RAC and no shared SSDs

© 2013 Pythian20

Page 21: Ssd   collab13

SSD can make Redo faster if:• You are suffering from high “log file

parallel write”• And your storage admin won’t even

discuss it• Redo is on LUN shared with:

– Redo from multiple databases– Other services (SAP, etc)

• Not enough cache on storage array• Storage network is a bottleneck

© 2013 Pythian21

Page 22: Ssd   collab13

Placing Data on SSD

© 2013 Pythian22

Page 23: Ssd   collab13

Should you place data on SSD?• SSD solves IO latency problems• If “DB File Sequential Read” is not in your

top 5 wait events, you probably don’t need your data on SSD.

• If you don’t maximize RAM use for buffer cache – don’t get SSD (yet)

• If your CPU utilization is high, solve this first.

© 2013 Pythian23

Page 24: Ssd   collab13

Not enough space?

• Move most active segments • Random reads get most benefits from SSD• Active indexes with unique-scans• Fewer writes is better• AWR has IO statistics per segment• https://github.com/gwenshap/Oracle-DBA-

Scripts/blob/master/SSD.sql

© 2013 Pythian24

Page 25: Ssd   collab13

Why Choose?

• SAN Devices that contain both HDD and SSD

• Smart controllers move most active data to SSD automatically.

• Pros: No need to choose and manually migrate data

• Cons: Your most active data will move without advanced notice

© 2013 Pythian25

Page 26: Ssd   collab13

Top Mistakes

• Using SSD for production and HDD for Standby– If production needs SSD…– Good chance that standby will fall behind

• Database Smart Flash Cache

© 2013 Pythian26

Page 27: Ssd   collab13

Database Smart Flash Cache

© 2013 Pythian27

Disk

SGA

Flash Cache

Block read from disk

Block evicted from SGA is written to SSD cacheby DBWR

If block is needed, it is read from SSD

Page 28: Ssd   collab13

Database Smart Flash Cache• Pros:

– Automatically keeps active data in SSD

• Cons:– Large overhead for managing cache, all taken from SGA– Overhead for DBWR– No benefit and some overhead for writes– Only one SSD device

Using Smart Flash Cache will make your IO faster than using just disks, but smartly placing data on SSD will be even faster.

© 2013 Pythian28

Page 29: Ssd   collab13

Will Talk About:

• IO Performance• Using SSDs for

Oracle• How Exadata and

ODA uses SSDs• SSD devices• Practice: Reading

SSD Vendor Specs

© 2013 Pythian29

Page 30: Ssd   collab13

Exadata has LOTS of SSD

• Quarter rack has 3 storage cells• Each with 4 Sun Flash Accelerator F40• 400GB * 4 * 3 = 4.8TB• 21.5GB/s throughput• 375,000 IOPS• Note that IB will limit you to 4GB/s per DB

node

© 2013 Pythian30

Page 31: Ssd   collab13

Exadata Smart Flash Logging• Redo log writes are written to disk and

SSD together.• Log sync is finished when one write is

successful.• Can’t Lose.• Can’t try that at home• This improves performance for redo when

disks are busy with high throughput operations

© 2013 Pythian31

Page 32: Ssd   collab13

Exadata Smart Flash Cache

• Not same as DB Smart Flash Cache• SSDs are on storage cells• SSD on Exadata can also be used as ASM

disks and not cache.

© 2013 Pythian32

Page 33: Ssd   collab13

Exadata Smart Flash Cache

• Reading un-cached data:1. Un-cached data is read

from disk first2. Sent to the database3. and then copied to cache

© 2013 Pythian33

Disks SSD Cache

Cellsrv Database

Page 34: Ssd   collab13

Exadata Smart Flash Cache

• Cached reads:– Read from disk and SSD simultaneously– Whichever returns first– Effectively increase read throughput– Smart scans mostly

read from disk– Except for objects

using “cell_flash_cache”KEEP clause.

© 2013 Pythian34

Disks SSD Cache

Cellsrv Database

Page 35: Ssd   collab13

Exadata Smart Flash Cache

• Writes:– Write through cache– Writes go to disk first– Then copied to cache, sometimes– Indexes and tables with random IO– ALTER TABLE customers STORAGE

(CELL_FLASH_CACHE KEEP)

© 2013 Pythian35

Disks SSD Cache

Cellsrv Database

Page 36: Ssd   collab13

Exadata Smart Flash Cache

• Writes:– Write back cache– Writes go to SSD first– Then copied to disk, eventually

© 2013 Pythian36

Disks SSD Cache

Cellsrv Database

Page 37: Ssd   collab13

ODA and SSD

• “Four 2.5-inch 200 GB SAS-2 SLC SSDs per shelf for database redo logs “

• Allows multiple databases on ODA• Reduces risk of disk bottlenecks

© 2013 Pythian37

Page 38: Ssd   collab13

Will Talk About:

• IO Performance• Using SSDs for

Oracle• How Exadata and

ODA uses SSDs• SSD devices• Practice: Reading

SSD Vendor Specs

© 2013 Pythian38

Page 39: Ssd   collab13

Interfaces

• SATA– 32 outstanding IO– 6Gb/s = 600MB/s– significant latency

• SAS– 256 outstanding IO– 6Gb/s = 600MB/s– Used on ODA

shared storage

© 2013 Pythian39

Page 40: Ssd   collab13

Interfaces

• PCIe– “Flash”

“Accelerator”– Multiple 500 MB/s

lanes– Low latency– Multiple SAS/SATA

controllers on cardfor extra throughput

© 2013 Pythian40

Page 41: Ssd   collab13

Interfaces

• Fiber– Use existing

enterprise infrastructures

– Shared storage– Usual SAN

headache– Mandatory for RAC

© 2013 Pythian41

Page 42: Ssd   collab13

Will Talk About:

• IO Performance• Using SSDs for

Oracle• How Exadata and

ODA uses SSDs• SSD devices• Practice: Reading

SSD Vendor Specs

© 2013 Pythian42

Page 43: Ssd   collab13

© 2013 Pythian43

Write latency lower than read?

Page 44: Ssd   collab13

© 2013 Pythian44

Intel SSD 910

identical read/write latency?

Page 45: Ssd   collab13

© 2013 Pythian45

Page 46: Ssd   collab13

© 2013 Pythian46

RAMSAN

Page 47: Ssd   collab13

© 2013 Pythian47

Page 48: Ssd   collab13

Quick Recap

• SSDs make random reads wicked fast• Writes and deletes are complicated• Place segments with many random reads

on SSD• Exadata uses Smart Flash Cache to

increase throughput• Not all SSDs are the same• Read specs carefully

© 2013 Pythian48

Page 49: Ssd   collab13

Thank you – Q&A

To contact us

[email protected]

1-877-PYTHIAN

To follow us

http://www.pythian.com/blog

http://www.facebook.com/pages/The-Pythian-Group/163902527671

@pythian

http://www.linkedin.com/company/pythian

© 2013 Pythian49

Page 50: Ssd   collab13

Toolkit – Colour palette

• The theme colours for this template are pre-loaded. However, if you’re curious this is the palette:

RGB 0 0 0 RGB 204 204 204 RGB 153 153 153 RGB 255 255 255

RGB 0 119 139 RGB 0 163 173 RGB 255 143 40 RGB 255 210 0 RGB 200 0 0

© 2013 Pythian50

Page 51: Ssd   collab13

Toolkit – Service Icons Higher res will be uploaded soon

© 2013 Pythian51

Page 52: Ssd   collab13

Toolkit – General Icons

© 2013 Pythian52

Page 53: Ssd   collab13

Toolkit – Social Media Icons

© 2013 Pythian53

Page 54: Ssd   collab13

Toolkit – Industry Logos

© 2013 Pythian54

Page 55: Ssd   collab13

Toolkit – Stock Photos (will grow)

© 2013 Pythian55