Copyright © 2006 Quest Software
RAC be Nimble,RAC be Quick
Bert Scalzo, Domain Expert, Oracle Solutions
2
About the Author …Domain Expert & Product Architect for Quest Software
Oracle Background:• Worked with Oracle databases for well over two decades (starting with version 4)• Work history includes time at both Oracle Education and Oracle Consulting
Academic Background:• Several Oracle Masters certifications• BS, MS and PhD in Computer Science• MBA (general business)• Several insurance industry designations
Key Interests:• Data Modeling• Database Benchmarking• Database Tuning & Optimization• "Star Schema" Data Warehouses• Oracle on Linux – and specifically: RAC on Linux
Articles for:• Oracle’s Technology Network (OTN)• Oracle Magazine,• Oracle Informant• PC Week (eWeek)
Articles for:• Dell Power Solutions
Magazine• The Linux Journal• www.linux.com• www.orafaq.com
4
Agenda
• RAC Performance– “RAC in the Box” syndrome
– Must Optimize every subsystem
• RAC Optimization Approach– RAC Optimization Methodology
– via Quest’s RAC Tools (just example - not a sales pitch)
• Real-world Scenario– Dell’s use of Quest’s Solution for Oracle RAC
– “Best Practices” applied incrementally & results
5
Optimization Focus
• “Low Hanging Fruit”
• Obvious yet Overlooked
• Subtle yet Highly Critical
Radically Different than other RAC tuning sessions:
Not going to delve into obscure hardware, OS, network, Oracle and RAC tuning parameters or configurations,
Just easy stuff that makes a big difference
6
Oracle RAC is Great, but …
• Too often people expect RAC to “auto-magically” function out of the box with little to no optimization
• During RAC optimization attempts, people far too often concentrate on just a single dimension – the Oracle database “stuff” (i.e. waits, parameters, etc)
• During RAC optimization attempts, Oracle is often too heavily weighted as the source and reason for most, if not all, of the performance bottlenecks
• Not enough “application nature” is identified and accounted for during the optimization process
• Result: Too many people achieve sub-par results!
7
I call it “RAC in the BOX” syndrome
• That’s just my stupid name for it (hope it catches)
• But nearly half the RAC sites I visit are suffering from RAC performance issues related to this!
• Still far too RAC experts among the general DBA population (although improving every day)
• Really no simple, single button tools yet to make RAC “auto-magically” “fire on all cylinders”
• Many people bail on RAC far too soon and simply fall-back to big SMP boxes (the evil that they know)
• But with just a little manual “box winding”, anyone should be able to “pop the RAC weasel” free
9
RAC Performance = Sum of its Parts
• Application Nature (affects everything else below)
• Public Network
• Storage Network
• Storage Sub-System
• Oracle Instance Configuration
• Oracle Cluster Configuration
• Private Network (i.e. Interconnect)
10
RAC Performance
Testing Methodology
1. Benchmark Factory www.TPC.org testsor Oracle trace file playback generator
3. TOAD with DBA Module“AWR/ADDM Reports “
Identify and correctTop five wait events
RAC Performance Testing Methodology
2. Spotlight on RAC Monitor RAC system for performance bottlenecks
11
Quest Solutions for Oracle RAC
Benchmark Factory®
• Test Oracle RAC environments rapidly and reliably
• Perform database “scalability” or “goal” testing to determine the most optimal RAC configuration
• Tests:
– TPC-C
– TPC-H
– Trace File playback
– etc, etc, etc …
• Let’s DBA concentrate on task at hand - Optimization
13
Quest Solutions for Oracle RAC Spotlight® on RAC
• Monitor Oracle RAC environments rapidly and reliably
• Diagnose Oracle RAC environment health levels at– Node
– Cluster
– ASM
– Instance
– Interconnect
• Intelligent performance alerts plus market-leading GUI for entire RAC to instances architecture & bottlenecks
• Let’s DBA concentrate on task at hand - Diagnosing
17
Quest Solutions for Oracle RAC TOAD® with DBA Module
• Expedite typical DBA management & tuning tasks
• Great Productivity Enhancing Features– Database Health Check
– Database Probe
– Database Monitor
– AWR/ADDM Reports
– UNIX Monitor
– Stats Pack Reports (coming soon)
• See Toad World paper– Title: “Maximize Database Performance Via Toad for Oracle”
– http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx
• Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing)
19
Real-world Scenario
Quest strategic partner & customer Dell uses Quest’s solution for Oracle RAC to test the performance of the Oracle RAC architecture running on Dell Power Edge servers and EMC Clarion SAN & iSCSI Disk Arrays
23
Step-By-Step Example
Apply Methodology, “Best Practices”, and Quest’s RAC tools to optimize
and quantify the approximate percentage of the improvements
Note – will quote some specific examples for a given RAC setup, your mileage will surely vary
Test = TPC-C (OLTP) for 200-2000 users, 10 GB
24
Step 1 - Application NatureKnow Your Application Demands (this info flows downstream)
• OLTP vs. Data Warehousing– Primarily Read vs. Write
– Average Transaction Size
– Likelihood of “Dead Lock”
– Logging, Flashback and Recovery Requirements
– etc, etc, etc …
• Concurrent User Load Profile (i.e. user load over time)
• Focus on User Response Time Requirements– For example, TPC-C must run each transaction <= 2 seconds
– Response Time ~= Wait Events
• Cary Millsap (Hotsos) calls this “Method R”
• Anjo Kolk & et al … (Oracle) call this “YAPP Method”
• Kyle Haily (PerfVision) paper “Waits Defined”
• etc, etc, etc …
• Don’t skip this step – cost can be enormous – and that no network, OS, or database tuning can compensate for !!!!!
25
Application “Best Practices”Well known rules:
• Write efficient SQL and/or PL/SQL code (explain plans)
• Use bind variables to reduce unnecessary “re-parsing”
• Often the underlying Application Code (e.g. Benchmark) NOT changeable, so you can’t do anything
Deeper TPC-C Analysis (remember across all next steps):
• Primarily Read with Some Writes
• Small Average Transaction Size
• High Concurrency with Potential Deadlocks
• Logging for ACID compliance and no flashback
26
Public Network “Best Practices”
Well known rules:
• Isolate Network (for single or related applications only)
• Use Gigabit Ethernet (consider “bonding” multiple cards)
• Use Layer 2 or 3 Switches and verify Gigabit throughput
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = No jumbo frames, Standard SDU/TCU
• High Concurrency = Multiple Ethernet Segments (collisions)
• No Logging, etc… = Nothing
27
Storage Network “Best Practices”
Well known rules:
• Isolate Network (for single or related applications only)
• Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI
• Consider multiple pathways per storage controller and HBA
• Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Jumbo frames since “Block Level” IO
• High Concurrency = Fiber Channel and Multiple Pathways (if budget)
• No Logging, etc… = Nothing
28
Storage Sub-System “Best Practices”Well known rules:• More Smaller Disks generally higher overall throughput• More memory cache generally higher overall throughput• Avoid “write-back” mode if no backup power source (e.g. battery)• Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc• Stripe Depth (i.e. size) from 256 KB to 1 MB• Stripe Width (i.e. # disks) between 4 and 16• Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request• Average I/O <= Stripe Width X Stripe Depth• Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered)
TPC-C Analysis Ramifications:• Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms• Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count• High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s• No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK
29
Oracle Instance “Best Practices”
Well known rules:
• Size & Tune the SGA appropriately for application nature
• Choose reasonable block size based on application nature (?8K?)
• Partition large objects & indexes across storage devices & spindles
• Don’t assume any “golden rules” – i.e. test all assumptions!
TPC-C Analysis Ramifications:
• Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20
• Small Transaction = Size redo logs correctly for small size X high load
• High Concurrency = cursor_space_for_time=t, cursor_sharng=similar
• No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING”
30
Oracle Cluster “Best Practices”
Well known rules:
• Increase default SGA size for all ASM instances (64M too small)
• Interconnect is the most important bottleneck – many bonded NIC’s
• Consider hash partitions & reverse indexes to spread IO across nodes
• Don’t assume any “golden rules” – i.e. really test all assumptions!!!
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Decrease db_file_multiblock_read_count
• High Concurrency = Decrease block size (?4K?) – see next slide
• No Logging, etc… = Nothing
33
Private Network “Best Practices”
Well known rules:
• Isolate Network (for single cluster only – and 0% public)
• Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI)
• Consider multiple pathways per HBA and storage controller
• Jumbo frames since high “Block Level” IO between nodes
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Nothing
• High Concurrency = Lower block size until interconnect traffic OK
– Consider increasing the OS priority of the global cache cluster services
• No Logging, etc… = Nothing
34
Let’s Apply the Recommendations
Read Count
Block Size
Cursor
Space
Cursor
Share
Index Cach
e
Index Cost
Jumbo
Test 1 16 8 False Exact 0 100 False
Test 2 2 8 False Exact 0 100 False
Test 3 2 4 False Exact 0 100 False
Test 4 2 4 True True 80 20 False
Test 5 2 4 True True 80 20 True
35
Results – TPS (lesser interest here)
Transactions / Second
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
50 100 150 200 250 300 350 400 450 500
Run 1
Run 2
Run 3
Run 4
Run 5
36
Results – Average Response Time
Average Response Time
0.00
1.00
2.00
3.00
4.00
5.00
6.00
50 100 150 200 250 300 350 400 450 500
Run 1
Run 2
Run 3
Run 4
Run 5
SubSecond
37
Thank youPlease offer any questions or comments
Remember:
• Eliminate “RAC in the Box” syndrome – eat low hanging fruit
• Example was for TPC-C or OLTP type application
• TPC-H or Data Warehouse would NOT be the same
• Your mileage may well vary (especially percentages)
Toad World Article: “Maximize Database Performance Via Toad for Oracle”
http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx
Dell Power Solutions article:
http://www.quest.com/success_stories/Dell-Quest.pdf