Upload
markus-michalewicz
View
451
Download
6
Embed Size (px)
DESCRIPTION
Oracle Open World 2014 presentation [CON8127] on Maximizing Oracle RAC Uptime. This presentation discusses tools integrated into the Oracle RAC Stack and shows which tools to use in the various stages of the system's lifecycle to ensure smooth operation.
Citation preview
Maximizing Oracle RAC Uptime
Ian Cookson, Markus Michalewicz Oracle Real Application Clusters (RAC) Product Management / Development September 29, 2014
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation Installation
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
spain
Oracle GI | Leaf
• Server OS:
– HUBs 4GB+ memory recommended
• One HUB at a time will host GIMR database.
• Only HUBs will host (Flex) ASM instances.
• Leafs can have less memory, dependent on the use case.
• Installer enforces HUB minimum memory requirement.
– OL 6.5 UEK (other kernels are supported)
Installation – System assumed for this presentation
brazil
argentina germany
Oracle GI | HUB Oracle GI | HUB
Oracle GI | HUB
Oracle RAC Oracle RAC
italy
Oracle GI | Leaf
Oracle RAC
6
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• Installation is an infrequent task
• It should be standardized
– Follow: http://www.slideshare.net/MarkusMichalewicz/oracle-rac-12c-collaborate-best-practices-ioug-2014-version
– and come to the Oracle RAC demo booth (3787)
• Tools to use:
1. Linux: pre-install package
2. Cluster Verification Utility (CVU)
3. Oracle Universal Installer (OUI)
Installation
[root@germany ~]# uname –a
3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013
x86_64 x86_64 x86_64 GNU/Linux
#Get the pre-install package
[root@germany Desktop]# yum list oracle-*
oracle-rdbms-server-11gR2-preinstall.x86_64 1.0-7.el6 ol6_latest
oracle-rdbms-server-12cR1-preinstall.x86_64 1.0-8.el6 ol6_latest
7
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• OUI provides a simple GUI for:
• Installation and Configuration
• Upgrades
• OUI calls cluvfy for:
• Verification checks
• Generating ‘fixup’ scripts
8
Oracle Universal Installer (OUI)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation Implementation
9
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Implementation
10
• Implementation is a recurring task
– Initial implementation
– Change implementation(s) as required
• Implementation tasks are system-specific
• Tools to use:
1. CVU
2. OraChk CVU
OraChk
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Cluster Verification Utility (CVU) – Introduction
• Purpose: – Verification of pre-install & post-install cluster setup
– Run manually (command: cluvfy) or as part of the OUI
– Available from OTN and included in Oracle Grid Infrastructure
– Supports the Oracle RAC stack since version 10g Rel. 1
• What does it do?: – Runs specified verification tests and optionally generates a ‘fixup’ script (run under root)
– Utilizes a ‘stage’ concept, enabling users to run the necessary tests for a ‘pre’ or ‘post’ installation
11
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
What does CVU Check?
• System requirements – Are the installation requirements met for Clusterware, or RAC?
• Network and connectivity
• Cluster Time Synchronization (CTSS or NTP)
• Existence of required OS users and permissions
• Prerequisites for adding nodes
• etc.
12
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 13
CVU for Pre-Implementation Checks
• Purpose:
– Verification of configuration after installation, prior to implementation (is the system ready?)
• What Checks to be Made?:
– Use ‘post’ checks to verify that system is indeed ready, and
– Confirm that post-installation changes made to the system will not cause problems
• Examples:
– cluvfy comp healthcheck -collect cluster -mandatory –deviations -save
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
CVU for Pre-Implementation Checks - Example $ cluvfy stage -post hwos -n germany,argentina –verbose
Performing post-checks for hardware and operating system setup
Checking node reachability...
Check: Node reachability from node "germany“ Destination Node Reachable? ------------------------------------ ------------------------ germany yes argentina yes
Result: Node reachability check passed from node "germany“
Checking user equivalence...
Check: User equivalence for user "grid“ Node Name Status ------------------------------------ ------------------------ argentina passed germany passed
Result: User equivalence check passed for user "grid“
…
14
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• OraChk – Formerly RACchk or RACcheck
– aka ExaChk
• RAC Configuration Audit Tool – For details see MOS note ID 1268927.1
• Checks Oracle Stack: – Standalone Database
– Grid Infrastructure & RAC
– Maximum Availability Architecture (MAA) Validation
– Oracle Hardware
OraChk
15
Engineered Systems
require less initial testing
OraChk
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 16
OraChk – Installation and Configuration
• Installation: – Download the latest version of orachk (90 day reminder…)
– Unzip in local directory under the oracle user
– Check permission are 755 on orachk
• Configuration: – Run manually or in silent mode (via daemon)
– Implementation – run singly (manually) to validate system setup, etc prior to going live
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 17
OraChk – Usage
• Usage : ./orachk [-abvhpfmsuSo:c] -a - all checks
-b - best practices only
-p - patch recommendations only
-f - offline (reports from existing data only)
-u - pre-upgrade checks
-S or -s - for silent installs, with or without SUDO capabilities
-c - check individual components (ie. orachk –a –c ASM)
-o - to invoke optional functionality (ie. to display only non-passing audit checks, verbose format, etc)
-m - exclude MAA checks
-v - what is the tool version?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 18
OraChk – Example
Check Id Status Type Message Status On Details
E960DB20CA5A634F
E04312C0E50A62E0 FAIL SQL Check
Table containing SecureFiles LOB storage belongs
to a tablespace with extent allocation type that is
not SYSTEM managed (not AUTOALLOCATE)
All Databases View
6580DCAAE8A28F5B
E0401490CACF6186 WARNING OS Check
The number of async IO descriptors is too low
(/proc/sys/fs/aio-max-nr)
All Database
Servers View
5ADD88EC8E0AFF2E
E0401490CACF0C10 WARNING OS Check
net.core.wmem_max Is NOT Configured
According to Recommendation
All Database
Servers View
84BE4DE1F00AD833
E040E50A1EC07771 INFO OS Check
Kernel Parameter fs.file-max Is Lower Than The
Recommended Value
All Database
Servers View
66E70B43167837ABE
040E50A1EC02FEA INFO OS Check ORA-00600 errors found in alert log
All Database
Servers View
Database Server
Oracle orachk Assessment Report
System Health Score is 75 out of 100 (detail)
OraChk report in html format Summary with links to content
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 19
OraChk – Example
MAA Scorecard
Oracle orachk Assessment Report
System Health Score is 75 out of 100 (detail)
DATA CORRUPTION
PREVENTION BEST
PRACTICES
FAIL OS Check Active Data Guard is not configured All Database Servers View
FAIL SQL Parameter
Check
Database parameter
DB_BLOCK_CHECKSUM is NOT set to
recommended value
All Instances View
OraChk highlights failures Here: Data Guard not setup
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Operation
20
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 21
• Operation is an ongoing task
– Oracle Grid Infrastructure provides all necessary tools for normal operation.
• Operation should not create extra tasks
– Automation is the key
• Tools to use:
1. CVU (periodic runs)
2. OraChk (interval runs via daemon)
3. Cluster Health Monitor (CHM/OS)
Operation
CVU
OraChk
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Operations – Periodic CVU Checks are the Default
22
[GRID]> crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr ONLINE ONLINE argentina STABLE ONLINE ONLINE brazil STABLE ONLINE ONLINE germany STABLE ...
ora.cvu 1 ONLINE ONLINE brazil STABLE ora.germany.vip 1 ONLINE ONLINE germany ...
[GRID]> crsctl status res ora.cvu -p NAME=ora.cvu TYPE=ora.cvu.type ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r-- ACTIONS= ACTION_SCRIPT= ACTION_TIMEOUT=60 ACTIVE_PLACEMENT=0 AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX% AUTO_START=restore CARDINALITY=1 CHECK_INTERVAL=60 CHECK_RESULTS=PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVG-1101 : SCAN name "cupscan.cupgnsdom.localdomain" failed to resolve,PRVF-4657 : Name resolution setup check for "cupscan.cupgnsdom.localdomain" (IP address: 10.1.1.55) failed,PRVF-4090 : Node connectivity failed for interface "*",PRVG-11050 : No matching interfaces "*" for subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVG-11050 : No matching interfaces "*" for subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVF-7530 : Sufficient physical memory is not available on node "germany" [Required physical memory = 4GB (4194304.0KB)],PRVF-4354 : Proper hard limit for resource "maximum open file descriptors" not found on node "germany" [Expected = "65536" ; Found = "4096”…
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Operations – Setup Periodic OraChk System Checks
23
<<< Configure & start orachk daemon for scheduled interval runs >>>
$ ./orachk -id DBA -set \ > "[email protected];\ > AUTORUN_SCHEDULE = 4,8,12,16,20 * * *;\ > AUTORUN_FLAGS=-profile dba; COLLECTION_RETENTION=30“ $ ./orachk -d start
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• Service integrated with the Oracle Clusterware stack
• Introduced in 11.2.0.2 (Linux, Solaris, Windows), 11.2.0.3(AIX)
• Gathers OS level metrics to monitor resource degradation and failure
• Stores data in a central repository (GIMR)
• Runs real time with locked down memory for last gasp analysis
• Integration with QoS (Memory Guard) and CRS (server pool categorization)
• Integrated into EM Cloud Control
Cluster Health Monitor (CHM/OS)
germany argentina
italy brazil
osysmond
Oracle GI Oracle GI
Oracle GI Oracle GI
osysmond
osysmond
osysmond
OLOGGERD
24
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Monitor – Deamons / Processes
25
osysmond ologgerd oclumon
Function • Collect OS metrics • Process raw data for subset
of processes • Compress and send data to
ologgerd • Store/forward in case of
network failures
• Consume data from all active osysmonds
• Store data in the repository • Service requests from
clients
• Display OS level metrics in historic/ real time mode
• Perform repository management operations
Managed by ohasd osysmond Command line utility
Instances and location Every node of the cluster (including leaf nodes)
One per cluster (Replica for 11.2.x)
Can be invoked from any hub node in the cluster
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26
Cluster Health Monitor in EM Cloud Control
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27
Cluster Health Monitor in EM Cloud Control
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Monitor – command line reporting
• Command line reporting of current and historic OS metrics (oclumon)
– from any hub node in the cluster
• Example: [germany]: > oclumon dumpnodeview -process
28
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Monitoring
29
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Monitoring
30
• Monitoring is an ongoing task
– There is optional monitoring available for an Oracle RAC cluster via QoS and Oracle EM
– Quality of Service Management (QoS) comes with a monitoring only feature
• Monitoring is a pro-active task.
• Tools to use:
1. Oracle Enterprise Manager 12c CC
2. Oracle Quality of Service Management (Memory Guard)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 31
Monitoring the RAC Cluster with EM Cloud Control
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Quality of Service Management – Memory Guard
• QoS Feature externalized for general use
• Memory Guard protects resources – Receives a stream of OS Memory metrics from CHM/OS
• Issues alert should any server be at risk
• Protects existing work and applications by automatically closing the server to new connections (ie. stops service on at-risk node)
• Automatically re-opens server to connections once the memory pressure has subsided
32
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Autonomous Computing
33
QoS CHM
CHA HngMgr
Policy
Self- Optimizing
Self- Protecting
Self- Configuring
Self- Healing
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enabling Autonomous Computing
Cluster Health Monitor (CHM)/OS & QoS 11.2+
LOGGERD sysmond
CHM/OS
• QoS Support for Measure only with Performance Objectives and Alerts
• QoS Support for Measuring and Monitoring Admin-Managed Databases
Further QoS & CHM Enhancements in 12.1.0.2
Cluster Health Advisor Coming soon…
34
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Diagnosis
35
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• Diagnosis is a recurring task
– Ideally, there will be no incidents on system.
– Realistically, there will be more than one.
• Diagnosis is a reactive task. – It should be performed as efficiently as possible.
• Tools to use:
1. Trace File Analyzer (TFA)
Diagnosis
36
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• Trace File Analyzer
– Improved comprehensive first failure diagnostics collection
– Efficient collection, packaging and transfer of data
– Collect for all relevant components (OS, Grid Infra., ASM, RDBMS), including Exadata cell nodes
– One command to collect all information, from all nodes (or single-instance, single-node)
• More information: MOS note ID 1513912.1
Trace File Analyzer (TFA) – log collection in action
37
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 38
Trace File Analyzer (TFA) – intelligent log collection
Sending diagcollect request to host : argentina
Getting list of files satisfying time range [Tue Sep 03 14:17:43 PDT 2014, Tue Sep 03 18:17:43 PDT 2014]
germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswiostat/germany_iostat_14.09.03.1500.dat.gz
germany: Zipping File: /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log
Trimming file : /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log with original file size : 109kB
germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswtop/germany_top_14.09.03.1500.dat.gz
germany: Zipping File: /opt/oracle/oak/log/germany/oak/oakd.log
Trimming file : /opt/oracle/oak/log/germany/oak/oakd.log with original file size : 9.2MB
germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/gipcd/gipcd.log
germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log
Trimming file : /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log with original filesize 4.3MB
germany: Zipping File: /var/log/messages
…
germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswslabinfo/germany_slabinfo_14.09.03.1800.dat
Collecting ADR incident files...
Total Number of Files checked : 10543
Total Size of all Files Checked : 3.9GB
Number of files containing required range : 68
Total Size of Files containing required range : 129MB
Number of files trimmed : 10
Total Size of data prior to zip : 144MB
Saved 63MB by trimming files
Zip file size : 8.6MB
Total time taken : 47s.
Logs are collected to:
/opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/germany.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip
/opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/argentina.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip
$ ./tfactl diagcollect One simple command
OS Watcher files
Pruning
47 seconds! – 1 command, 2 nodes, 4 databases, ASM, Clusterware, OS
Relevant files only
144MB pruned and compressed down to 8.6MB
ADR Incident files
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 39
Trace File Analyzer (TFA) – Efficiency from A-Z
germany
Oracle GI | HUB
Oracle RAC
brazil
Oracle GI | HUB
Oracle RAC
LOGs
LOGs
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Utility Cluster
40
Enterprise Management (EM) Server
+1
Grid Home Server (Rapid Home Provisioning)
Storage Server
Node2 Node1
Oracle ASM
Oracle Clusterware
ASM ASM
Flex ASM Storage
IOsrv IOsrv
Utility Cluster
Node 1
Database Domain
Application Domain
Application Domain
Application Domain
Database Domain
Application Domain
Application Domain
Application Domain
Node 2
• Utility Cluster – Centralize and standardize storage,
deployment, management and diagnostics
• Architecture: – An Oracle Grid Infrastructure based cluster
– “Solution-in-a-Box” approach on ODA
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
41
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 42