58

Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

  • Upload
    others

  • View
    13

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects
Page 2: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Best Practices for Maintaining Oracle RAC/Single-Instance Database Environments

Bryan Vongray - Senior Principal Technical Support Engineer

Bill Burton - Consulting Member of Technical StaffDiagnostics & Machine Learning

Scott Jesse - Senior Director, Customer Support

DB ScalabilityMay 15th, 2019

Helping Us Help You

Page 3: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.

Confidential – Oracle Internal/Restricted/Highly

Page 4: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• SRs having Extended Resolution Timesare often the result of insufficient diagnostic data

• Over 70% of Incoming SRs to the Scalability Support Team areRediscoveries of Known Issues

FACT

4

Page 5: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Program Agenda

Trace File Analyzer Collector (aka TFA) Introduction

Core Functionality of TFA

ORAchk & EXAchk

TFA Utilities – Detecting and Analyzing Issues

Coming Soon

Maintaining TFA

Q&A

1

2

3

4

5

6

5

7

Page 6: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Introducing…Autonomous HealthFramework

6

A collection of tools as components, which work together autonomously 24x7 to keep database systems healthy and running while minimizing human reaction time.

Page 7: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Avoid the Pitfalls of Inefficient and Incomplete Diagnostics Collection

7

Become Proactive and Avoid Encountering Known Issues

Help Us Help You!

Page 8: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

What is TFA?

• Diagnostic Collection utility designed to simplify diagnostic data collection– A single command performs complete diagnostic collection for a given

problem

– Available in Continuous Service Mode (Clusters and SIHA environments) and Standalone Mode (SIHA environments)

– Built-in Access Control allowing non-root users to collect full diagnostics (Continuous Service only)

– Automatic Diagnostic Collection when known events are seen ( Continuous Service Only)

– Ability to Upload Diagnostic Collections Directly to Service Requests

– Fully Integrated in the Database Cloud

8

Page 9: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Why TFA?

Provides one interface for all diagnostic needs

Collects data across the cluster and consolidates it in one place

Collects all relevant diagnostic data at the time of the problem

Reduces time required to obtain diagnostic data, which saves your business money

9

TFA makes it quicker & easierto detect, diagnose & Resolve Database problems

Page 10: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 10

Lots of Pings

Customer Experience Before TFA

Oracle Grid Infrastructure& Databases

Oracle Support

1 Open new Service Request

Collect data from all nodes without regard to relevance2

3 Upload data

Collect more missing data (ping)

4 5Upload more missing data

6Download tools/scripts(ping)

7 Run tools/scripts

Upload results of tools/scripts8

Multiple iterations & pings during SR resolution

Page 11: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 11

Customer Experience with TFA Autonomous Usage

Oracle Grid Infrastructure& Databases

Oracle Support

TFA

1

TFA detects a fault

2Diagnosticsare collected

3Distributed diagnostics are consolidated and packaged

4Notification of fault is sent

5 Diagnostic collection is uploaded to Oracle Support for root cause analysis & resolution

Page 12: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 12

Customer Experience with TFA On-Demand Usage

Oracle Grid Infrastructure& Databases

Oracle Support

TFA

5

Upload diagnostic collection to Oracle

Support

2Real-time

statussummary

3Diagnose

with DB tools

4Perform

diagnostic collection

1 Request desired action on-demand

Page 13: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

• All major Operating Systems are supported– Linux (OEL, RedHat, SUSE, Itanium &

zLinux)

–Oracle Solaris (SPARC & x86-64)

– AIX

– HPUX (Itanium & PA-RISC)

–Windows

• All Oracle Database & Grid versions 10.2+ are supported

• You probably already have TFA installed as it is included with:

Supported Platforms and Versions

Oracle Grid Infrastructure

Oracle Database

11.2.0.4+12.2.0.1+

12.1.0.2+12.2.0.1+

18.0.0.0+18.0.0.0+

OS versions supported are the same as those supported by the Database Java Runtime Edition 1.8 required

13

Page 14: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Continuous Service Mode Installation (Preferred)

1. Download latest version: Doc 1513912.1

2. Transfer zip to required machine

3. Unzip

4. Execute installTFA-<platform> self extracting install script as root user

• Will install/upgrade on all cluster nodes

• Will auto discover relevant Oracle Software & Exadata Storage Servers

• Will start monitoring all discovered items for significant events & collect diagnostics when necessary

TFA Installation

./installTFA-<platform>

Standalone Mode Installation

1. Download latest version: Doc 1513912.1

2. Transfer zip to required machine

3. Unzip

4. Execute installTFA-<platform> self extracting install script with the “-extractto <path>” keyword as the Oracle Software Owner./installTFA-<platform> -extractto <path>

14

Page 15: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Command line

• Specify all command options at the command line

Shell1. Set and change context

2. Run commands from within the shell

15

Menu

• Select menu navigation options then choose the command you want to run

REST

• Invoke commands over HTTPS

TFA Command Interfaces

tfactl <command> tfactltfaclt > database MyDBMyDB tfactl > oratop

tfactl menutfactl rest -start

https://host:port/ords/{api}

Page 16: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

On-Demand Diagnostic Collection with TFA

Standard Diag Collection Targeted Diag Collection via SRDC

1. Run

2. Upload resulting zip file to SR

tfactl diagcollect –srdc <srdc>1. Run

OR

Run

OR

Run

2. Upload resulting zip file to SR

tfactl diagcollect –last <n><d>|<h>

tfactl diagcollect –from <date> -to <time>

tfactl diagcollect

16

Note: List of SRDCs can be found on Slide 20

Page 17: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

TFADiagnostic Collection Example

[grid@cehaovmsp1079 ~]$ tfactl diagcollect -from "Sep/24/2017 06:00:00" -to "Sep/24/2017 18:00:00“

Collecting data for all nodes

Scanning files from Sep/24/2017 06:00:00 to Sep/24/2017 18:00:00

. . .

Collection Id : 20170925125014cehaovmsp1079

Detailed Logging at :

/u01/app/grid/tfa/repository/collection_Mon_Sep_25_12_50_14_EDT_2017_node_all/diagcollect_20170925125014_cehaovmsp1079.log

2017/09/25 12:51:10 EDT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with

dotcom

2017/09/25 12:51:10 EDT : Collection Name : tfa_Mon_Sep_25_12_50_14_EDT_2017.zip

2017/09/25 12:51:11 EDT : Collecting diagnostics from hosts : [cehaovmsp1079, cehaovmsp1080]

2017/09/25 12:51:11 EDT : Scanning of files for Collection in progress...

2017/09/25 12:51:11 EDT : Collecting additional diagnostic information...

2017/09/25 12:51:21 EDT : Getting list of files satisfying time range [09/24/2017 06:00:00 EDT, 09/24/2017 18:00:00 EDT]

2017/09/25 12:51:31 EDT : Collecting ADR incident files...

2017/09/25 12:52:56 EDT : Completed collection of additional diagnostic information...

2017/09/25 12:53:11 EDT : Completed Local Collection

2017/09/25 12:53:12 EDT : Remote Collection in Progress...

.-----------------------------------------.

| Collection Summary |

+---------------+-----------+------+------+

| Host | Status | Size | Time |

+---------------+-----------+------+------+

| cehaovmsp1080 | Completed | 64MB | 109s |

| cehaovmsp1079 | Completed | 76MB | 120s |

'---------------+-----------+------+------‘

Logs are being collected to: /u01/app/grid/tfa/repository/collection_Mon_Sep_25_12_50_14_EDT_2017_node_all

17

Page 18: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

TFADiagnostic Collection Example

2017/09/25 12:51:10 EDT : Collection Name : tfa_Mon_Sep_25_12_50_14_EDT_2017.zip

2017/09/25 12:51:11 EDT : Collecting diagnostics from hosts : [cehaovmsp1079, cehaovmsp1080]

2017/09/25 12:51:11 EDT : Sending diagcollect request to host : cehaovmsp1080

2017/09/25 12:51:11 EDT : Scanning of files for Collection in progress...

2017/09/25 12:51:11 EDT : Collecting additional diagnostic information...

2017/09/25 12:51:21 EDT : Getting list of files satisfying time range [09/24/2017 06:00:00 EDT, 09/24/2017 18:00:00 EDT]

2017/09/25 12:51:21 EDT : Starting Thread to identify stored files to collect

2017/09/25 12:51:21 EDT : Getting List of Files to Collect

. . .

2017/09/25 12:51:25 EDT : Trimming file : cehaovmsp1079/diag/crs/cehaovmsp1079/crs/trace/alert.log with original file size:5MB

. . .

2017/09/25 12:51:31 EDT : Collecting ADR incident files...

2017/09/25 12:51:31 EDT : Waiting for collection of additional diagnostic information

2017/09/25 12:52:56 EDT : Completed collection of additional diagnostic information...

2017/09/25 12:53:11 EDT : Completed Zipping of all files

2017/09/25 12:53:11 EDT : Finalizing the Collection Zip File

2017/09/25 12:53:11 EDT : Finished Finalizing the Collection Zip File

2017/09/25 12:53:11 EDT : Total Number of Files checked : 4991

2017/09/25 12:53:11 EDT : Total Size of all Files Checked : 2.4GB

2017/09/25 12:53:11 EDT : Number of files containing required range : 576

2017/09/25 12:53:11 EDT : Total Size of Files containing required range : 1.3GB

2017/09/25 12:53:11 EDT : Number of files trimmed : 38

2017/09/25 12:53:11 EDT : Total Size of data prior to zip : 861MB

2017/09/25 12:53:11 EDT : Saved 1GB by trimming files

2017/09/25 12:53:11 EDT : Zip file size : 76MB

2017/09/25 12:53:11 EDT : Total time taken : 120s

18

Page 19: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

• For certain types of problems Oracle Support will ask you to run a Service Request Data Collection (SRDC)

• Without TFA this would be a manual process involving:

• Reading many different support documents

• Collecting output from many different tasks

• Gathering lots of different diagnostics

• Packaging & uploading

• With TFA simply run:

SRDC Example

tfactl diagcollect -srdc <srdc>

[oracle@cehaovmsp1079 ~]$ /u01/app/12.2.0/grid/bin/tfactl diagcollect -srdc ora600

Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :

Enter the Database Name [<RETURN>=ALL] :

1. Sep/25/2017 13:21:04 : [orcl] ORA-00600: internal error code, arguments:

[dbketest_assert], [], [], [], [], [], [], [], [], [], [], []

Please choose the event : 1-1 [1] 1

Selected value is : 1 ( Sep/25/2017 13:21:04 )

Scripts to be run by this srdc: ipspack rdahcve1210 rdahcve1120 rdahcve1110

Components included in this srdc: OS CRS DATABASE NOCHMOS

Collecting data for local node(s)

Scanning files from Sep/25/2017 07:21:04 to Sep/25/2017 19:21:04

Collection Id : 20170925132228cehaovmsp1079

. . .

2017/09/25 13:22:54 EDT : Collecting ADR incident files...

2017/09/25 13:23:45 EDT : Completed collection of additional diagnostic information...

2017/09/25 13:23:49 EDT : Completed Local Collection

.-----------------------------------------.

| Collection Summary |

+---------------+-----------+------+------+

| Host | Status | Size | Time |

+---------------+-----------+------+------+

| cehaovmsp1079 | Completed | 14MB | 76s |

'---------------+-----------+------+------'

Logs are being collected to:

/u01/app/grid/tfa/repository/srdc_ora600_collection_Mon_Sep_25_13_22_28_EDT_2017_no

de_local

19

Page 20: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 20

Full List of SRDCsType of Problem SRDC Types

ORA Errors

• ORA-00020• ORA-00060• ORA-00600• ORA-00700• ORA-01031• ORA-01555• ORA-01578• ORA-01628• ORA-04030

• ORA-04031• ORA-07445• ORA-08102• ORA-08103• ORA-27300• ORA-27301• ORA-27302• ORA-29548• ORA-30036

Database performance

• dbperf • dbsqlperf

Database resource • dbunixresources

Other internal database errors

• internalerror

Database patching• dbpatchinstall• dbpatchconflict

Transparent Data Encryption (TDE)

problems• dbtde

Database Export• dbexp• dbexpdp• dbexpdpapi

• dbexpdpperf• dbexpdptts

Database Import• dbimp• dbimpdp

• dbimpdpperf

RMAN• dbrman• dbrman600

• dbrmanperf

Type of Problem SRDC Types

System change number • dbscn

GoldenGate• dbggclassicmode• dbggintegratedmode

Database install / upgrade

• dbinstall• dbupgrade

• dbpreupgrade

Database storage • dbasmCorrupt block relative

dba• dbblockcorruption

ASM/DBFS/DNFS/ACFS • dnfs

Partition problems • dbpartition

Slow partitioned

table/index

commands

• dbpartitionperf

SQL performance • dbsqlperf

UNDO corruption • dbundocorruption

Exalogic • esexalogic

Listener errors • listener_services

Naming service errors • naming_services

Database Auditing • dbaudit

Excessive SYSAUX Space • dbawrspace

Type of Problem SRDC TypesDatabase resources • dbunixresourcesDatabase startup /

shutdown• dbshutdown• dbstartup

XDB • dbxdbData Guard • dbdataguard

Enterprise Manager tablespace usage

metric• emtbsmetrics

EM general metrics • emmetricalert

EM debug log collection

• emdebugon • emdebugoff

EM target discovery• emcliadd• emclusdisc• emdbsys

• emgendisc• emprocdisc

EM OMS restart • emrestartoms

EM Agent performance

• emagentperf

EM crash • emomscrash

EM java heap usage or performance

• emomsheap

EM OMS crash, restart or

performance• emomshungcpu

Page 21: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ORAchk and EXAchkValidating and Maintaining Best Practices to Avoid Rediscovery of Known Issues

21

Page 22: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 22

Oracle Stack Health ChecksEXAchk is for Engineered Systems, ORAchk is for ODA & Everything Else

• Proactive self-service method for customers to perform Health Checks on their Engineered, RAC and Single Instance systems

• Checks Driven by Best Practices and Success Factors generated from Real-World Customer Experiences

• Pre-Upgrade Validation for 11.2.0.3+ upgrades

• Ability to Create User Defined Checks

• Fully Integrated into TFA

– Health Checks Automatically Run in TFA 18.2 and Above (Linux and Solaris, non-Engineered Systems Only)

• Can be configured to send email notifications when it detects problems

• Documentation and Standalone Versions:

– ORAchk: Doc 1268927.2

– EXAchk: Doc 1070954.1

ORAchk & EXAchk

Page 23: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

ORAchk & Exachk – Easy to Execute

• Ensure the latest version of TFA is installed - Doc 1513912.1

•Open TFA Shell

•Execute orachk or exachk

•Follow the prompts

Execution times vary based on size of the cluster, installed products being checked and number of resources!

[oracle@cehaovmsp1145 ~]$ /u01/app/12.1.0/grid/bin/tfactl

tfactl> orachk

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME

to /u01/app/12.1.0/grid?[y/n][y]y

Checking ssh user equivalency settings on all nodes in cluster

Node cehaovmsp1146 is configured for ssh user equivalency for oracle user

Node cehaovmsp1147 is configured for ssh user equivalency for oracle user

Node cehaovmsp1148 is configured for ssh user equivalency for oracle user

Searching for running databases . . . . .

Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

----------------------------------------------------------------------------------

Oracle Stack Status

----------------------------------------------------------------------------------

Host Name CRS Installed RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance

----------------------------------------------------------------------------------

cehaovmsp1145 Yes Yes Yes Yes Yes ORCL1

cehaovmsp1146 Yes Yes Yes Yes Yes ORCL2

cehaovmsp1147 Yes Yes Yes Yes Yes ORCL3

cehaovmsp1148 Yes Yes Yes Yes Yes ORCL4

----------------------------------------------------------------------------------

148 of the included audit checks require root privileged data collection . If sudo

is not configured or the root password is not available, audit checks which

require

root privileged data collection can be skipped.

1. Enter 1 if you will enter root password for each host when prompted

2. Enter 2 if you have sudo configured for oracle user to execute root_orachk.sh

script

3. Enter 3 to skip the root privileged collections

4. Enter 4 to exit and work with the SA to configure sudo or to arrange for root

access and run the tool later.

Please indicate your selection from one of the above options for root access[1-

4][1]:- 1

*** Checking Best Practice Recommendations (PASS/WARNING/FAIL) ***

ORAchk

23

Page 24: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

• Easy-to-Read HTML Report

• System Health Score

• ALL Findings Documented with Hyper-linkedReferences

• Proactive Patch Recommendations

• Report Compare and Merge Functionality

• JSON output available for consumption by 3rd party reporting tools

• Integration with EM via the Compliance Framework

ORAchk & EXAchk – How to Help Yourself…

24

Page 25: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

• Companion APEX Application to ORAchk and EXAchk

• Central Repository for ORAchk Collections

• Dashboard Interface to Track ORAchk Collections

―Trending of Findings Over Time

―Automatic Result Comparison

• Incident Tracking System

• UI for Authoring User Defined Checks

• Application installation file (CollectionManager_App.sql) is distributed with ORAchk - Requires Apex 5

Health Check Collection Manager

25

Page 26: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TFA Utilities To Detect and Analyze Issues

26

Page 27: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TFA Utilities To Detect and Analyze IssuesTool Description

ORAchk or EXAchk

Provides health checks for the Oracle stack. Oracle Trace File Analyzer will install: -• EXAchk for Engineered Systems, see document 1070954.1 for more details• ORAchk for all non-Engineered Systems, see document 1268927.2 for more details

OSWatcher Oswatcher Collects and archives OS metrics. These are useful for instance or node evictions & performance Issues. See document 301137.1 for more details

oratop Provides near real-time database monitoring. See document 1500864.1 for more details.

alertsummary Provides summary of events for one or more database or ASM alert files from all nodes

ls Lists all files TFA knows about for a given file name pattern across all nodes

pstack Generate process stack for specified processes across all nodes

grep Search alert or trace files with a given database and file name pattern, for a search string.

summary Provides high level summary of the configuration

27

tfactl <tool>

Page 28: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TFA Utilities To Detect and Analyze IssuesTool Description

vi Opens alert or trace files for viewing a given database and file name pattern in the vi editor

tail Runs a tail on an alert or trace files for a given database and file name pattern

param Shows all database and OS parameters that match a specified pattern

dbglevel Sets and unsets multiple CRS trace levels with one command

history Shows the shell history for the tfactl shell

changes Reports changes in the system setup over a given time period. This includes database parameters, OS parameters and patches applied

calog Reports major events from the Cluster Event log

events Reports warnings and errors seen in the logs

managelogs Shows disk space usage and purges ADR log and trace files

ps Finds processes

triage Summarize oswatcher/exawatcher data

28

Page 29: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

TFA Tools Execution

• Each tool can be run using TFA Shell

• Start tfactl shell with

• Run a tool with the tool name

1. Where necessary set context with database <dbname>

2. Then run tool

3. Clear context with database

tfactl

tfactl > database MyDB

MyDB tfactl > database

tfactl > orachk

MyDB tfactl > oratop

[oracle@cehaovmsp1079 ~]$ tfactl

tfactl>

29

Page 30: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Use ‘tfactl’ to check for recent Errorsbash-4.1# tfactl events

Output from host : myserver69

INFO :2ERROR :2WARNING :0

Event Timeline:

[Oct/18/2018 02:38:25.000]: [db.ogg11204.ogg112041]: Incident details in: /scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/incident/incdir_102702/ogg112041_ora_5001_i102702.trc

[Oct/18/2018 02:38:25.000]: [db.ogg11204.ogg112041]: ORA-00600: internal error code, arguments: [ksprcvsp2], [1596993584], [], [], [], [], [], [], [], [], [], []

[Oct/18/2018 02:38:37.000]: [db.ogg11204.ogg112041]: Incident details in: /scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/incident/incdir_102703/ogg112041_ora_5001_i102703.trc

[Oct/18/2018 02:38:37.000]: [db.ogg11204.ogg112041]: ORA-00600: internal error code, arguments: [ktfbtgex-7], [1015817], [1024], [1015816], [], [], [], [], [], [], [], []

30

Page 31: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Check to see if a change may have caused the issue ?-bash-4.1# tfactl changes

Output from host : myserver69

------------------------------

[Oct/17/2018 04:54:15.397]: [RDBMS.myDB1]: Parameter: parallel_max_servers: Value: 8 => 16

[Oct/17/2018 05:12:13.344]: [RDBMS.myDB1]: Parameter: log_archive_dest_1: Value: /var => /opt

31

Page 32: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ProactivelyAddressing Database Problems

Page 33: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

ORAchk/EXAchk email Notification

• Automatically started & configured to run Critical Health Checks

• You only need to configure your email for notification

33

tfactl orachk/exachk -set “[email protected]

Page 34: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

ORAchk/EXAchkReport

Confidential – Oracle Internal/Restricted/Highly

34

Page 35: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Configure Diagnostic Collection email Notification

• Set notification email for any problem detected:

• To set notification email for specific ORACLE_HOMEs include the OS home owner:

35

tfactl set [email protected]

tfactl set notificationAddress=oracle:[email protected]

Page 36: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 36

Event Notification

Page 37: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Problem Resolution Automation in MOS

Page 38: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Problem Resolution Automation in MOS

Page 39: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Problem Resolution Automation in MOS

Page 40: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Problem Resolution Automation in MOSSupport Engineer View

Page 41: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Problem Resolution Automation in MOSSupport Engineer View

Page 42: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

The Future of TFA

Confidential – Oracle Internal/Restricted/Highly Restricted 42

Page 43: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TFA User Interface

• TFA Collector will upload to central repository

• TFA UI analyses files and generates – Events TimeLine

– Anomaly TimeLine using Applied Machine Learning

– Root Cause Analysis and Recommendations where available.

– Interface to easily access all files and analyser reports.

• Already used in Oracle Database Cloud.

• Coming On Prem in 19c

43

Page 44: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 44

TFA User Interface

Page 45: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Central Self Analysis of an Instance eviction

45Confidential – Oracle Internal

Page 46: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Database Writes Error to Alert LogErrors in file /cdb1810_1/trace/cdb1810_1_lmhb_16476.trc (incident=11791) (PDBNAME=CDB$ROOT):

ORA-29770: global enqueue process LMS0 (OSID 16458_16462) is hung for more than 70 seconds

Incident details in: /cdb1810_1/incident/incdir_11791/cdb1810_1_lmhb_16476_i11791.trc

LMHB (ospid: 16476): terminating the instance due to ORA error 29770

Cause - 'ERROR: Some process(s) is not making progress.

LMHB (ospid: 16476) is terminating the instance.

ERROR: Some process(s) is not making progress.

'System state dump requested by (instance=1, osid=16476 (LMHB)), summary=[abnormal instance termination]. error - 'Instance is terminating.'

System State dumped to trace file /cdb1810_1/trace/cdb1810_1_diag_16435_20181016124352.trc

Instance terminated by USER, pid = 30173

Starting ORACLE instance (normal) (OS id: 30489)

46

Page 47: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TFA Detects and Generates a Collection[INFO] [Thread-328-4437] addCriticalEvent: Thu Oct 18 14:37:13 PDT 2018:/cdb1810_1/trace/alert_cdb1810_1.log ORA-29770: global enqueue process LCK0 (OSID 26445) is hung for more than 70 seconds

[INFO] [Thread-2801-9133] Sleeping for 5 minutes to collect events

[INFO] [Thread-328-4437] addCriticalEvent: Thu Oct 18 14:37:29 PDT 2018:/cdb1810_1/trace/alert_cdb1810_1.log System State dumped to trace file /cdb1810_1/trace/cdb1810_1_diag_26239_20181018143729.trc

[INFO] [Thread-2801-9133] Processing event ORA-29770: global enqueue process LCK0 (OSID 26445) is hung for more than 70 seconds at .... Thu Oct 18 14:37:13 PDT 2018

[INFO] [Thread-2801-9133] Processing event System State dumped to trace file /cdb1810_1/trace/cdb1810_1_diag_26239_20181018143729.trc at .... Thu Oct 18 14:37:29 PDT 2018

[INFO] [Thread-2801-9133] Creating a composite event at Thu Oct 18 14:37:29 PDT 2018

[INFO] [Thread-2801-9133] compositeEvents: Thu Oct 18 14:37:29 PDT 2018

[INFO] [Thread-2801-9133] Done processing events

47

Page 48: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Email ReceivedEvent: .*ORA-297(01|02|03|08|09|10|40|70|71).*

Event time: Thu Oct 18 14:37:13 PDT 2018

File containing event: /cdb1810_1/trace/alert_cdb1810_1.log

String containing event: ORA-29770: global enqueue process LCK0 (OSID 26445) is hung for more than 70 seconds

Logs will be collected at: /tfa/repository/collection_2018_10_18T13_37_13_node_myserver67

Event: .*System State dumped.*

Event time: Thu Oct 18 14:37:29 PDT 2018

File containing event: /cdb1810_1/trace/alert_cdb1810_1.log

String containing event: System State dumped to trace file /cdb1810_1/trace/cdb1810_1_diag_26239_20181018143729.trc

Logs will be collected at: /tfa/repository/collection_2018_10_18T13_37_13_node_myserver67

Analysis available at https://myserver21/tfa.php?p_incident=2018_10_18T13_37_13_node_myserver67

48

Page 49: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Dashboard

49Confidential – Oracle Internal

Page 50: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 50

Page 51: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 51

Page 52: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 52Confidential – Oracle Internal

ORAchk/EXAchk results are automatically uploaded to TFA &

automatically processed

Page 53: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 53

Page 54: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Maintenance Slot Identification

54Confidential – Oracle Internal

Page 55: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Maintaining TFA

Confidential – Oracle Internal/Restricted/Highly Restricted 55

Page 56: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

• Option 2

– Applying RUs, RURs or PSUs will automatically update TFA

• Option 1 (Preferred)– To update with latest TFA & TFA Tools

Bundle

1. Download latest version: Doc 1513912.1

2. Transfer zip to required machine

3. Unzip

4. Execute installTFA-<platform> self extracting install script as root user

– TFA will find and update the existing installation

Maintaining TFA

Upgrade to the latest version whenever possible to include bug fixes, new features & optimizations

56

Page 57: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects

Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

Help Us Help You!

Follow Through with the Best Practices Outlined in Todays Presentation to:– Increased Efficiency of Admin Staff – TFA and

ORAchk/Exachk

– Decreased resolution time on reactive issues – TFA

– Decreased number of reactive issues - ORAchk/Exachk

– Increased system stability, reliability, and performance- TFA and ORAchk/Exachk

– Decreased complexity and eliminate problems with Grid Infrastructure and RAC upgrades - ORAchk Pre-Upgrade

57

Page 58: Best Practices for Maintaining Oracle Environments...Why TFA? Provides one interface for all diagnostic needs Collects data across the cluster and consolidates it in one place Collects