Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Oracle Aware Flash: Maximizing Performance and Availability for your Database
Gurmeet Goindi Principal Product Manager Oracle Kirby McCord Database Architect US Cellular
Kodi Umamageswaran Vice President, Exadata Software Oracle
2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Program Agenda
Flash and Databases
Oracle’s Innovations in Flash
Customer Case Study – US Cellular
Questions & Answers
4 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The Promise Of Flash Replace expensive 15K RPM disks with fewer Solid
State (Flash) Devices – Reduce failures & replacement costs – Reduce cost of Storage subsystem – Reduce energy costs
Lower transaction & replication latencies by eliminating seek and rotational delays Replace power hungry, hard-to-scale DRAM with
denser, cheaper devices – Reduce cost of memory subsystem – Reduce energy costs
5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The Complexity Of Flash Data loss
– Caused by defects, erase cycles (endurance), and interference Density Vs Endurance
– SLC: Single bit per cell Less density, greater endurance
– MLC: Multiple bits per cell Greater density, less endurance
Write-in-place is prohibitive – Read, Erase, Rewrite: Must be avoided
Low per Chip bandwidth Compensate for these: Complex Firmware
6 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Applications Of Flash In Databases
As an additional storage layer (caching) – Stage active Database objects in flash – Accelerate reads and writes to these objects
For Data Files – Improves user transaction response time – Increases overall throughput for IO intensive workloads
7 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
To Cache Or Not To Cache
Random reads against tables and indexes – Cached: more likely to have subsequent reads
Sequential read tables, or Scans – Not Cached: sequentially accessed data is unlikely to be followed by reads
of the same data Backups, mirrored copies of the block
– Not Cached: Why? But most general purpose flash solutions are
database agnostic and cache all the above workloads
8 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Flash And Database Logs
Flash has very good average write latency Greatly improves user
transaction response time Flash occasional outliers, one
or two orders of magnitude slower
– Garbage collection etc contribute to that delay
OLTP workloads dislike such large variations
9
Where To Introduce A Flash Device Direct Attached
– Mount flash inside your server – PCI-E or SSD
Networked Storage – Share device on the network (FC or 10GE) – Popular implementations: Tiered Storage: Multiple tiers of disk drives (SSD,
FC, SAS); various performance characteristics and data moves between these tiers
Hybrid Storage: Combination of Direct Attached flash on the storage controller and HDD in expansion shelves
All Flash Arrays: All storage is some form of flash device – either SSD or custom Flash modules
NAS/SAN
SSD PCI-E Flash
PCI-E Flash SSD HDD HDD SSD SSD SSD
10
Server Attached Flash
PCI Express attached Flash I/O cards – Extremely fast, improves performance for currently active data – Eliminates round trip between server and storage => Fastest
I/O response – Easy to install and configure, but usually not hot swappable
SSD attached via SATA or SAS – Slower than PCI Express – Cheaper than PCI Express attached flash
SSD
PCI-E Flash
11
Server Attached Flash - Limitations Database limited to size of single server
– Storage limited by number of PCIe or disk slots in the server ANY component failure = loss of database access
– Firmware, Card, Driver, etc. – Local second server with identical Flash recommended ($$$ x 2) – MTBF of the Server decreases as the number of flash cards increases
No Data Loss Requires Synchronous Log Shipping to standby server – Negates performance benefit of Flash writes
Poor resource utilization – Local storage results in islands of storage – No IO resource management for prioritizing multiple databases
12
Tiered Or Hybrid Storage
Hierarchy of storage tiers based on performance The Storage controller moves the data between these layers based
on usage More resilient and scalable architecture compared to server
attached flash Better storage resource utilization as it gets shared among multiple
servers Server attached flash is usually faster then networked attached
flash
13
Tiered Or Hybrid Storage - Limitations
The Storage Controller software determines which data is hot – Has limited visibility to the content of the data – Driven by usage patterns and not application needs – Usage pattern changes as the database workload changes (OLTP Vs
Reporting) makes the achieved performance less predictable – Usually caches data based on yesterday’s workload
Storage controllers are IO bound – Flash can exacerbate that limitation
14
All Flash Arrays No spinning media, all SSD, or proprietary flash modules, or a
combination Best in class performance among storage based solutions Expensive, but innovations in Flash and Storage management are driving
the cost down Limited throughput – usually in low single digit GB/s Most implementations don’t scale very well Availability not at the same maturity level as traditional storage arrays. Are missing functionality that traditional arrays have added over the last
two decades.
15
Putting It All Together
Flash devices in application tier (server attached flash) lacks enterprise class scalability and high availability Flash devices in traditional storage arrays are not efficient as
storage controllers don’t respond quickly enough to workload changes and are IO bound All Flash Arrays lack the features and stability of traditional arrays
16
Program Agenda
Flash and Databases
Oracle’s Flash Architecture and Innovations in Flash
Customer Case Study – US Cellular
Questions & Answers
17 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Oracle’s Flash Architecture
Scale out architecture adds flash capacity and performance by adding
storage servers adds networking and CPU needed to process
flash in one unit Database Aware Storage
Metadata about IO present on the cell Flash on the Storage Server enables sharing
A block on disk is stored in only one flash cache
18 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Oracle’s Innovations in Flash
Exadata Smart Flash Cache Exadata Smart Flash Log Exadata Smart Flash Cache Compression Exadata Smart Flash Cache Scan Awareness Exadata Smart File Initialization
19 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache
Understands different types of I/Os from database
– Skips caching I/Os to backups, data pump I/O, archive logs, tablespace formatting
– Caches Control File Reads and Writes, file headers, data and index blocks
Write-back flash cache – Caches writes from the database not just reads
RAC-aware from day one
20 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Log
Outliers in log IO slow down lots of clients Outliers from any one copy of mirror
affect response time Performance critical algorithms like
space management and index splits are sensitive to log write latency Legacy storage IO cannot
differentiate redo log IO from others
foreground
client
Log Buffer
Log writer
foreground
client
foreground
client
foreground
client
21 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Log
Smart Flash Log uses flash as a parallel write cache to disk controller cache
Whichever write completes first wins (disk or flash)
Reduces response time and outliers – “log file parallel write” histogram improves – Greatly improves “log file sync”
Uses almost no flash capacity (< 0.1%) Completely automatic and transparent
22 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression Exadata Automatically Compresses all data in
Smart Flash Cache – Compression engine built into flash card – Zero performance overhead on reads and writes – Logical size of flash cache increases upto 2x
Data Format Compression
Uncompressed Tables 1.3X to 4X
Indexes 1.3X to 4X
Oracle E-biz uncompressed DB 4X
OLTP Compressed Tables 1.2X to 2X
HCC Compressed Tables Minimal
Great for
OLTP
23 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression
Flash Device
User Data
Flash Media Data
As user adds more data, data is compressed and written Flash device has no logical space at the end for user data but
has lot of physical space
24 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression
Flash Device
User Data
Flash Media Data
Extend the logical address space to store more data
25 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression
Flash Device
User Data
Flash Media Data
In reality, data doesn’t compress with a uniform compression ratio If a new block (green) update compresses more than previous
block (red) there is nothing to do, there is more free space
26 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression
Flash Device
User Data
Flash Media Data
If new block (green) does not compress as well as the previous block (red) Run out of space? That should never happen!
27 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression
Flash Device
User Data
Flash Media Data
Exadata periodically monitors free space Performs TRIMs on blocks to ensure enough free space for all IOs Space reclaimed via LRU tail – not the next block TRIMs show up as writes (watch for it in iostat) Device size shown in Linux much larger than 2x
TRIM block
28 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression Exadata automatically Compresses all data in
Smart Flash Cache – Compression engine built into flash card – Zero performance overhead on reads and writes – Logical size of flash cache increases upto 2x – User gets large amount of data in flash for same
media size – Enabled via “cellcli –e alter cell flashCacheCompression=TRUE”
– Elasticity of flash cache is completely automatic and transparent
29 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Scan Awareness
On a traditional cache, if you scan dataset larger than cache size
– Blocks 0,1,2,3 brought into cache, cache is full – Block 20,21,22,23 say replaces 0,1,2,3
Repeat the same scan – Block 0,1, 2, 3 will replace blocks 20,21,22,23 – Block 20,21,22,23 will again replace block 0,1,2,3
Traditional caches churn with no actual benefit Some implementations call the insertion of new
block in the middle scan resistant
LRU
Insert new block
Churn
30 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Scan Awareness Exadata Smart Flash Cache is scan resistant
– Ability to bring subset of the data into cache and not churn – OLTP and DW scan blocks can co-exist
Nested scans bring in repeated accesses – Repeat, For each item in large table, scan small table – Smart enough to pull the small table into flash since it is
accessed repeatedly even though the size of large table alone is larger than flash cache
No need to set “KEEP” attribute in data warehouses Happens automatically, no tuning or configuration
needed
Cache
31 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart File Initialization File Initialization Write Back Flash Cache
11.2.3.2 Smart File Initialization
11.2.3.3
CELLSRV
Database
Metadata
I/Os in parallel
CELLSRV
Database
I/O
I/O optimized
CELLSRV
Database
Metadata
I/O optimized
32 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart File Initialization Combine the benefits of previous Smart Initialization
and Writeback Flash Cache – Write file creation meta-data to writeback flash cache – Tiny amount of flash space used to cache large portions
of initialized data on disk – File creation sped up by an order of magnitude – Initialization I/Os to disk deferred or not performed if data
loaded
Create tablespace, file extensions, autoextend show benefit Redo log initialization included in Exadata 12.1.1.1.0 Happens automatically, no tuning needed
CELLSRV
Database
Metadata
I/O optimized
33 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Benefits
Smart Flash Cache is database aware Smart Flash Logging avoids redo log outliers Smart Flash Compression doubles flash media
capacity Smart Flash Cache Scan provides subset scannning
and is table scan resistant Smart File Initialization creates a file but just writing
meta-data to flash cache Happens automatically, no tuning needed
34
Program Agenda
Flash and Databases
Oracle’s Innovations in Flash
Customer Case Study – US Cellular
Questions & Answers
Oracle Database Kirby McCord Lead Integration Infrastructure Architect
36
USCellular United States Cellular Corporation provides a comprehensive range of wireless products and services, excellent customer support, and a high-quality network to 5.0 million customers in 23 states. The Chicago-based company had 7,000 full- and part-time associates as of June 30, 2013.
37
Who am I?
Lead Infrastructure Architect at USCellular Oracle OCP since Oracle 7
Member of the IOUG Conference Committee • Infrastructure Track Member 2011 • Infrastructure Track Lead 2012 • Engineered System Manager 2013
IOUG Collaborate Speaker 2012 & 2013
Contributor to IOUG’s Exadata Tip Booklet
38
Exadata System
ZFSSA 7410 Storage Array
X2-2 Storage Expansion (HC Disks)
X2-2 Full Rack (HP Disks)
ODS
EDW
DW
Misc
4 Databases ODS – Primary loaded via GoldenGate, some batch loading. Mixed workloads on access. Majority of workload on server. EDW – Warehouse with atomic layer in 3NF with some star schema built in DW – Traditional Warehouse, star schema. Misc – Small adhoc workload
All Databases are • Multi-instance • Have Physical Standby’s • Backed up to the ZFFSA
39
Previous Solution – ODS Only 10gR2 Database built around Oracle’s Reference Architecture Monolithic Unix Server • 24 cores, • 384 GB RAM • 10gE Networking • 8 8G FC SAN connectivity Enterprise Grade Storage Array • 15K SAS drives (short armed disks) • Lots of cache • Dedicated to Database
Note: The 2 solutions had similar architectures, loaded via GoldenGate and Informatica and mixed workload applications accessing. The Exadata based system is from a newer billing system, which is 2-3x the amount of changes to load via GG. We also have a lot more applications accessing the ODS. The new workload is larger than what the previous hardware could handle.
40
What Did We See - Exadata ODS
1.49 ms single block reads While doing 42K read IOPS and 11K write iops over an hour period.
Note: The other databases were active on the Exadata System during this time.
What? Writes are supposed to be fast! Wait until later slides.
41
Comparison to Old system Metric Exadata ODS Monolithic
Hardware ODS Comparison
Single Block Reads 1.5 ms 3.8 ms > 2x
Log File Synch Waits
.85 ms 5.7 ms > 6x
Note: The Exadata ODS is over twice the workload as the previous version. In addition, the Exadata system is shared with several databases, while the Monolithic Hardware was dedicated.
42
Write Back Flash Enablement Design to accelerate write intensive workloads. From previous slide, we had lots of “free buffer waits”. Enabled this feature on X2-2. Result: No more “free buffer waits”.
Writes I/Os
43
Maintenance Activities Preventative Maintenance Operations • Battery replacement • Proactive disk replacement
Unplanned Operations • Disk Replacement • Flash Drive Replacement • Motherboard Replacements
Patching • All patches for over a year…
All Have Been Done in a Rolling Fashion!
44
What This Means to Us More Flexibility in System Use • We are less concern about unplanned activities on the system. The users
can go after the system when they need to, not during certain windows. • Maintenance activities have less impact on system availability.
More Use of the Data • Exadata’s Flash reduces the i/o contention of the mixed workloads within
the database and between competing databases • More concurrent users mean more business questions being answered.
Faster Access to the Data • Faster I/O means less time waiting for queries to return, more time to
analyze the results
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 45
Program Agenda
Flash and Databases
Oracle’s Innovations in Flash
Customer Case Study – US Cellular
Questions & Answers
46
Graphic Section Divider
47