Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
<Insert Picture Here>
Fault-Diagnostic Best Practices: What Every DBA Must Know About Oracle Database 11g Mark RamacherDirector, Server ManagementServer Technologies
Oracle Database 11g R1 Fault Diagnostic Automation
Realistic Testingand
Automatic HealthChecks
DiagnosticWorkflow
Automation
IntelligentResolution
ProactivePatching
Goal: Reduce Time to Problem Resolution
Prevention Resolution
Diagnostic Solution Delivery
Diagnostic Workflow Automation
Historic Issues with RDBMS Diagnostic Data
• No organization• DBA must search around for relevant diagnostics to send
• No catalog of failures• Just a text stream (alert log) for history• DBA: How healthy has my database been this last quarter?• DBA: Have I seen this failure before?
• Not always sufficient on first failure• DBA must reproduce the failure with debug switches• Cause of multiple round trips between customer and support
Historic Issues with RDBMS Diagnostic Data (continued)
• Unmanaged• Grows forever• DBA must decide when and which files to delete
• Unrestrained• Floods of data from repeated occurrences of an error• DBA must perform emergency space management
The New World of 11g Diagnostics
• Organized• Diagnostic data is annotated and can be queried and
correlated• DBA uses automated tool to find failure data
• Cataloged• Automated Problem and incident management• DBA can query to see history of failures and which are
duplicates
• First Failure Capture• DBA’s work is done after sending initial diagnostic package
The New World of 11g Diagnostics
• Managed• Auto purging• DBAs don’t have to monitor space usage of trace files
• Constrained• Flood control• One less worry for a DBA in time of crisis
Concepts : Problems and Incidents
• Problems are fundamental code or configuration issues that can cause execution failures• They exist until they are corrected, e.g. by patch• They are managed to resolution
• An incident is a single occurrence of a problem• They happen at point(s) in time and thus have timestamps• They induce diagnostic actions like dumps and traces• They are associated to problems by a “problem key”
• E.g. error code
Automatic Diagnostic Repository (ADR)
• Stores diagnostic data in a directory hierarchy• Holds data concurrently for multiple Oracle products• Each product instance has its own diagnostic workspace
• ADR data is highly structured• Formalizes incidents and problems and assigns them IDs• Metadata is kept for each incident and problem• Incident related diagnostic data is placed in its own directory• Alert log and trace files are annotated and can be queried
Automatic Diagnostic Repository
diag
rdbms
DBName
SID
ADRBase
$ORACLE_HOME/log
DIAGNOSTIC_DEST
ADRHome
$ORACLE_BASE
ADRCIlog.xml alert_SID.log
V$DIAG_INFO
BACKGROUND_DUMP_DEST
USER_DUMP_DEST
CORE_DUMP_DEST
alert cdump (others)hmincpkg incident
metadata
incdir_1 incdir_n…
trace
Support Workbench
Automatic Diagnostic Repository (ADR)
• Self-managing• Trace files purged after 1 month (configurable)• Incident/Problem metadata purged after 1 year (configurable)• Note: incidents can be flagged as “don’t purge” to override
purging• Repeated incidents are flood controlled (5 dumps per hour
per problem)• Recreates itself as needed
Incident Packaging Service (IPS)
Where there is structure, there can be automation…
• IPS uses the ADR structure to automate the packaging of diagnostic data
• Solves the problem of “what needs to be sent”• Gathers all relevant diagnostic data for a problem• Correlates related incidents to make sure it captures
root cause
Problem ID
ADR
Zip FileDiagnostic
Incident Packaging Service (IPS)
Package BOMCorrelation
Generate Package Modify Contents
Add
Delete
Scrub
Incident Packaging Service (IPS)
• Recommends further diagnostic actions for DBA • For example “build SQL test case”
• Packages structure and metadata so that Oracle side automation can takes place
• Use of IPS is critical to speed up problem resolution!
11g Health Monitoring
• Health Monitoring is designed to help the DBA:• Find problems before they impact service availability• Determine the scope of a problem• Validate that a problem is resolved
• Provides a number of “health checkers”• Dictionary• DB structure integrity (control files, data file headers, etc.)• Redo log content• Undo Segment integrity• Data block integrity
11g Health Monitoring
• Checkers can be “reactively” activated during incidents• Targeted• E.g. check integrity of blocks near a corrupted block
• All checkers can be activated on demand
First Failure Analysis
• Dumping the required diagnostic data “out of the box”• Reduces round trips with Oracle support
• Internal logic activates detailed dumps for the given failure circumstances
• Additional data is detailed but targeted• Minimal increase in overall diagnostic data size
EM Support Workbench
• Support Workbench reduces the DBA’s diagnostic management to a few clicks
• Two main entry flows• From an incident alert on the DB home page• From the Support Workbench home page
• Support Workbench home page• View recent and historical problems• View diagnostics packages• View health checker findings
Support Workbench Home
Support Workbench – Problem Details
• View and process a problem• View “reactive” checker findings for this problem• View all incidents of this problem• View associated Metalink service request• Package diagnostics for the problem
• Perform guided resolution on the problem• Data Repair advisor (link appears only if relevant) • SQL Repair advisor
Problem Details
Support Workbench - Packaging
• Two flows that guide you through IPS packaging
• Quick package• Wizard to guide you through the basic packaging steps• Cannot modify contents
• Advanced packaging• Content Editing• Additional user dumps
• Automated upload to Oracle*• Automated service request creation
* Requires OCM (Oracle Configuration Manager)
Advanced Packaging
Intelligent Repair Advisors
• Data Recovery Advisor• Guided expert data recovery system using diagnostic data
and health check output
• SQL Test Case Builder• Automatically retrieves exact environment information from
ADR to build SQL test cases and replicate SQL issues
• SQL Repair Advisor• Analyzes failing SQL statements to isolate bug• May recommend SQL Patch as work around
Data Recovery Advisor
• In an outage, uncertainty and confusion are common
• Largest part of downtime is:• Investigating the problem, planning a solution
• Data Recovery Advisor• Automates investigation, reports all problems• Intelligently determines plan for recovery
• Handles multiple failure situations • Presents only feasible recovery options
• Are there backups, is there a standby? • Ranked by repair time and data loss
• Can automatically apply recovery planTime to Repair
Recovery Time
Investigation Time
Planning Time
Reduces downtime by Eliminating Confusion
SQL Test Case Builder
Business Requirement• Bug resolution
• Test case required for fast bug resolution• Not always easy to provide a test case
• What information should be provided?• How much data is need?
• Getting the test case to Oracle can be trickySolution• Oracle automatically creates a test case• Collects necessary information relating to a SQL incident• Collected data is packaged to be sent to Oracle
• Collected data allows a developer to reproduce the problem
SQL Repair Advisor
Solution• Advisor
• Investigates the incident locally• Automatically determines the root cause• Provides a workaround (SQL Patch) for just the effected SQL• If not, sends necessary diagnostic information to Oracle
Business Requirement• The most common types of SQL problems - exception, performance
regression etc., are hard to diagnose• A lot of time is spent trying to reproduce the problem• If a workaround is found it has to be applied to entire system
SQL Repair Advisor FlowSQL
statement Execute Statementcrashes
Generateincident in ADRautomatically
DBA runsSQL Repair Advisor
DBA getsalerted
SQL
Rep
air A
dvis
orin
vest
igat
es
SQL patchgenerated
DBA acceptsSQL patch
Statementexecutes
successfullyagain
SQL statementpatched
Execute
Trace files
Automatic Diagnostic WorkflowCriticalError
ADR
Zip FileIPS
DBA
Support Workbench
Repair
Checkers
IPS
alert
Oracle Support
Summary of Important Changes to 11g Diagnostic Data Management
• Location of traces and dumps has changed• Now located within the ADR file hierarchy• Dumps are separated out from traces
• Format of alert.log is changed• New XML format allows easy parsing• Old style still available for backward compatibility
• Automatic space management• Incident flood control minimizes number of traces• ADR auto-purging reduces disk footprint
• Use IPS (through Support Workbench) to package diagnostic data sent to Oracle!
Recommended Campground Demos
Moscone West Exhibit Hall
Change Management & Data Masking for DBAs
Moscone West Exhibit Hall
Self-Managing Database: Automatic Application & SQL Tuning
Moscone West Exhibit Hall
Self-Managing Database: Oracle Database 11g SQL Plan Management
Moscone West Exhibit Hall
Self-Managing Database: Automatic Fault Diagnostics
Moscone West Exhibit Hall
Oracle Real Application Testing: Database Replay
Moscone West Exhibit Hall
Oracle Real Application Testing: SQL Performance Analyzer
Moscone West Exhibit Hall
Self-Managing Database: Automatic Performance Diagnostics
LocationDemo