Upload
rocker12
View
221
Download
1
Embed Size (px)
DESCRIPTION
A Wait Time Based Methodology for Database Performance Analysis Presented by Matt Larson Chief Technology Officer Confio Software Prepared for Hotsos Symposium, 2005 Presentation Agenda 2 Former DBA consultant specializing in Oracle Co-author of three Oracle books (Oracle software company Co-author of two other database related performance tuning books 2 nd Edition, Oracle8 Server Unleashed) 3 Problems with Conventional Tuning Tools: Like the Drunk Under the Streetlight 4
Citation preview
Resource MappingA Wait Time Based Methodology for
Database Performance Analysis
Prepared for Hotsos Symposium, 2005Presented by Matt Larson
Chief Technology Officer Confio Software
2
Presentation Agenda
Introduction Conventional Tuning vs. Wait-based
Tuning Foundation: Resource Mapping
Methodology 5 Key Steps of Applying RMM RMM Advantages Conclusion
3
Who am I?
Former DBA consultant specializing in Oracle performance tuning
Co-author of three Oracle books (Oracle Development Unleashed, Oracle Unleashed 2nd Edition, Oracle8 Server Unleashed)
Co-author of two other database related books
CTO and founder of Oracle performance software company
4
Problems with Conventional Tuning Tools: Like the Drunk Under the Streetlight
5
Conventional Tuning
Art, not a science Ratio-based (cache hit ratios, etc.) Sometimes fruitless It’s “tuned” (I guess?) Different tuning/investigation process for
each DBA/DBA Team/Company
6
Problems with Conventional Tuning Tools
Optimize systems, not business results Conventional tools:
• V$ Views: limited visibility & granularity• Statspack: averages across entire database• Explain Plan: deemphasizes how non-object
resources affect performance Incorrect Data hides real results
• System-wide averages• Event counters• Incomplete visibility
7
What Problems are you Trying to Solve?
• I spend the whole week monitoring and optimizing Oracle configurations, but I have no demonstrable results to show for it - why?
• Will more hardware make my application run faster? By how much?
• Will the new application run efficiently on the production server?
• Why does one application keep impacting my SLA compliance?
• If I could make one (or 2, 3, or 4) changes to my database to have the biggest impact, what would they be?
8
You know you are working on the wrong thing when… After spending an agonizing week tuning
Oracle buffers to minimize I/O operations, management typically rewards you with:
• A. An all expense paid vacation• B. A free lunch• C. A stale donut• D. Reward? Nobody even noticed!
9
You know you have a visibility problem when… You measure database performance based
on:
• A. Increasing trends in user response time• B. Increasing system down time• C. Increasing help desk calls• D. Increasing decibel levels from irate users
10
Your role is sub-essential to the business of your organization when… Your role in the rollout of a new
customer facing application results in:
• A. Keys to drive the CEO’s Porsche• B. Keys to use the executive restroom• C. A mop to use in the executive restroom• D. Your office has been moved to the
restroom
11
You know you are accustomed to measuring the wrong thing when… You measure the commute time to work
based on:
• A. The time it takes to get there• B. Counting the times your wheels rotate• C. Monitoring your tachometer• D. The number of speeding tickets
12
Wait-based Performance Tuning
Emerging best-practice for database tuning
Proponents include leading consultants, trainers and authors
Oracle is starting to build wait-based tuning tools into the database particularly in 10g
Tune by determining where processing time is spent
13
Oracle 10g - Moving towards wait-based
Adding wait-based columns to existing views New wait-based views
Example: v$session_wait_history
• Provides the last 10 wait events for a session• Session ID, Username, Event, Wait_Time, etc.• Used to provide wait_time for only a few events
14
DBA Success Stories using RMM
DBA solves a “Cold Case”. Problem unresolved for 1 year with traditional tools; Solution identified in 10 minutes during hands-on training
DBA ends “Crit Sit” 2 week situation ends quickly after identification of Library Cache pin wait and load locks. Metalink identifies Oracle bug, patch successfully applied
DBA saves $700K. 90% CPU capacity initiates expansion from 12 to 24 CPU server. DBA identifies parallel queries across 16 parallel threads as source of bottleneck. CPU eliminated as constraint, no new server required.
15
RMM: Confio’s Underlying Methodology
Resource Mapping Methodology:
Three Key Principles of RMM1. SQL View: All statistics at SQL statement level2. Time View: Measure Time, not number of times a resource is utilized3. Full View: Separately measure every resource to isolate source of
problems
Resource Mapping
MethodologyDBFlashWait-Event
Analysis
General approach-
best practice
Rigorous, complete
requirements
Packaged product
implementation
16
Blind Spot Blind SpotCPU 74%
Reads 1789327
Counters
CPU 38%
Reads 4955
Counters
Confio’s Resource Mapping Methodology• The principles of RMM can be illustrated by using the analogy that
data processing is like an assembly line. Data goes in one end, is subject to a series of changes, and comes out the other end as a finished product
• The assembly line (or SQL Statement) must be observed at the lowest level where a unit of work is being performed (SQL View Principle)
• Measurements are made with regard to time instead of counting how often an event occurs (Time View Principle)
• All resources system-wide must be monitored to get a full view of potential bottlenecks i.e. no blind spots (Full View Principle)
145 secondsTime
8726 seconds
TimeFollow a unit of work
through every operation
17
Track SQL Time, Not System Counters
SQL 1
SQL 2
SQL 3
Resources I/O Network RedoLocks
• Watching Counters leads to wrong conclusions: Time is more relevant
• Total System Counters hide information: Need breakdown to individual SQLs
5 R
25 R
50 Reads
Total System Counter
80K Reads
30 Minutes
15M
5M
6 M
10 M
100 Minutes
35 A
50 A
50 A
125 Attempts
4 M
200 Minutes
5M
4 M
200 Minutes
5M
5K Packets 216K Writes
18
RMM-compliant Performance Tools Oracle Tracing
• RMM compliant when wait events are traced• Shows SQL level statistics (SQLView), all events
(FullView) and events by time (TimeView)• Text-based, short-term technical reporting• Primarily used for reactive tuning
Confio DBFlash for Oracle• RMM compliant • 24/7 proactive monitoring• Graphical, long-term trend reporting• RMM-based Alerting
19
Applying RMM for Business Results
Five Step Process focusing on what matters
1. Identify 2. Allocate 3. Quantify 4. Prioritize 5. Assign
20
Step 1: Identify
Identify SQL Statements having largest impact • (SQL View and Time View
principles) Longest wait times = most
significant “pain points” for customers
Conversely, low cache hit ratios or high latch usage may not impose high wait times for users (so why fix them?)
SQL statements prioritized by Total Wait Time
21
Step 2: Allocate
Allocate impact to real customers (internal or external)
Allocate wait time to Program, Session, Machine• SQL View principle makes
this connection Understanding database
customer and application
Programs Prioritized by Total Wait Time
22
Step 3: Quantify How much is save in time/money if fixed? Enabled by Full View and Time View principles Soft dollar savings
• Data entry clerks• DBA time spent in problem resolution
Hard dollar savings• Reduce hardware upgrades• Meet SLA’s avoiding penality• Ensure business isn’t lost due to poor performing
or unavailable system
Quantifiable benefit ofTuning a specific statement
23
Step 4: Prioritize
If last step properly executed, this step is fairly straight forward
Allow’s DBA to cut through the clutter of potential new projects, investigations, and trials.
Better justification for priorities. (e.g. We aren’t working on your problem since this other has a higher demonstrable business impact)
24
Step 5: Assign
Assign the right people to the problem• Log_buffer waits• Network issues• Same query 10,000/hour
Enabled by Full View principle Avoid finger-pointing by accurately
assigning quickly
25
Resource Mapping Methodology
RMM
Wait Based Tuning Network, Storage, Application, Web, etc.
26
Web Server
Custom Biz Logic
Network
Database Server
Storage Box
Software Layers
Silo Monitoring
Each team uses their own tool to partially monitor their non-Oracle layers. No view across layers. Management has no clear view.
Web Team
Custom App Team
Network Team
Database/OS Teams
Storage/OS Teams
IT Management
Business Management
Sitescope
Often No Commercial Tools
HP Openview
Wait-based tuning
EMC Control Center
LIMITED VIEW
LIMITED VIEW
27
The Solution - Integrated Vision
Web Server
Custom Biz Logic
Network
Database Server
Storage Box
Web Team
Custom App Team
Network Team
Database/OS Teams
Storage/OS Teams
All teams see a complete picture of all layers and dependencies. Enables more efficient “Umbrella” solution.
RMM across the stack
IT Management
Business Management
28
RMM Achieved Business BenefitsRMM Does: Business Benefit:35% reduction in database capacity requirement
Reduce capital investment Avoid unnecessary additionsRecovers un-used capacity
Standardizes “expert” analysis ability across entire DBA team
Reduce training & consulting costs
Quantifies performance impact
Focus tuning efforts on biggest business impacts
Identifies problem Root Cause and resolution
Assign human resources and responsibility
Anticipates + resolves performance bottlenecks
Maintain SLA and end user performance
29
Example 1: Problem Observed
Critical situation: Secure Service Center application performance unsatisfactory• Response time between 2400 and 9000
seconds• Very high network traffic (3x—4x normal),
indicating time-outs and user refreshes• “CritSit” declared: major effort to resolve
problem
30
Observations using Resource Mapping Methods
Lib cache pin wait
Lib cache load lock
Notice scale: > 8000 secs
1: Identify accumulated Waits 2: Identify specific resources used
31
Results
Library cache pin nearly unobservable
Library cache load lock no longer observable
Notice scale: < 1400 secs max
32
Results
Response time improvement from 8000 seconds (worst case) to 900 seconds
Variance improvement:• Before: response time 2400 - 8000 sec• After: response time 800 - 900 sec
33
Example 2: Performance Drain – Identify the Source
Slow response reported DBA and database focus
of delays Database problem?
No – SQL*Net Message identified as source of delay
2nd highest wait event
34
RMM Drill Down identifies source of problem
Single application generates all SQL*Net Messages
App on same server as Oracle!
Answer: Misconfiguration – TCP/IP
used within server Change to IPC, eliminate
NIC traffic and 30% of wait time
Solution requires knowing: Which SQL, What Wait Time, Which Resource
35
Example 3: Scattered Reads Situation: LINS06 database - Hourly profile identifies high
wait anomaly 3-10x higher than other periods – requires investigation
wait time42,000 seconds
10:00-11:00
36
Drill Down to Key RMM Parameters
Db file scattered
reads
Db file scattered reads
Notice scale: > 6000 secs
37
Conclusion
Look for what has an impact Resource Mapping is more that Wait
Time – Analysis must include: • SQL level granularity• Full Resource granularity
Isolating the SQL and Resource allows you to find and fix the Root Cause
DBAs can have an impact and be heroes!
38
Thank you for coming
Matt Larson
Contact Information• [email protected]• 303-938-8282 ext. 110• Company website
www.confio.com