29
Real World Mission Real World Mission Critical Database M it i t AT&T ith Monitoring at AT&T with Oracle Enterprise Manager Oracle Open World – 2010 db Presented by Venkat Tekkalur Principal Technical Architect Prem Venkatasamy Director IT © 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Real World MissionReal World MissionCritical Database M it i t AT&T ith Monitoring at AT&T with Oracle Enterprise Manager

Oracle Open World – 2010

d bPresented by

• Venkat Tekkalur– Principal Technical Architect

• Prem VenkatasamyDirector IT

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Page 2: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Agenda

Company Profile

Ch llChallenges

Requirements

ApproachApproach

Infrastructure

EM Implementation Details

Benefits

Common Commands

Q/A

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.2

Page 3: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

AT&T

•AT&T is a leading provider of wireless, Wi-Fi, high speed Internet, and voice services

•90 1 million wireless subscribers•90.1 million wireless subscribers

•More than 129,000 Wi-Fi hotspots around the globe

•The nation’s fastest mobile broadband network

•AT&T’s global network handles nearly 19 petabytes of traffic on an average business day

2 5 illi AT&T U TV b ib•2.5 million AT&T U-verse TV subscribers

•100 percent of Fortune 1000 companies are AT&T customers

•In 2010, again ranked among Fortune’s 50 Most Admired Companies

•Global headquarters located in Dallas Texas

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

•Global headquarters located in Dallas, Texas

3

Page 4: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

AT&T DBA Team

One of the DBA support teams in AT&T managing databasesdatabases.

• 2000+ ORACLE DBsM lti l V i• Multiple Versions

• Features• RAC• Data GuardData Guard• Golden Gate• Streams• Flashback

• 60+ DBAs• Multiple sub teams

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.4

Page 5: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Challenges

Database Management and Diagnostics: Wide range of ad hoc tools in use. Management complexity.

Database Monitoring : Multiple home grown custom monitoring solutions developed over the years.

Database Scripts: Complexity with script rollout, updates and version Database Scripts: Complexity with script rollout, updates and version changes.

Database Version Complexity: Hard to keep up with changing data dictionary views in newer Oracle versions.

Database Diagnostics: Growing performance and availability requirements for our databases and existing tools cannot keep up with them.

New DB Features Support Complexity: Supporting new DB features pp p y pp ginvolved creating scripts, custom monitoring solutions and building tools.

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.5

Page 6: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Requirements We Set for EM

Provide Ease of Database

Management

Database Troubleshooting and Performance

Tuning

Monitor All Databases

Database Build Automation

•Perform all database management duties using the tool

•One common tool for enterprise to troubleshoot database

•Provide a monitoring solution that is easy to manage and

•Ability to provision Oracle database software and the tool

•Manage new database features with ease

database performance issues

manage and will scale well to meet all of our requirements

software and automate the database build process

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.6

Page 7: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Approach: Road to EM 10.2.0.4

• Grid stability• Agent scalability• EM monitoring capabilitiesPOC

• Design EM solution• Develop custom solutions for EM agent deployment,

availability and additional metrics for monitoring

Design & Development

• EM with DR implementation• Agent and monitoring deployment

Production ImplementationImplementation

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.7

Page 8: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Proof of Concept Findings

EM POC results showed that EM 10g can meet our EM POC results showed that EM 10g can meet our requirements, but custom work was still needed on the following areas:

• Agent mass deployment• Agent availability (automatic start/stop)• User defined metrics to plug monitoring gaps• User defined metrics to plug monitoring gaps• Automate target configuration to the appropriate DBA

teams

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.8

Page 9: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

POC Key Decisions

Deploy the latest EM version

available at that time which was Automate agent

Since agent availability is

critical for monitoring,

develop scripts to auto start/stop

Add additional monitoring using

user defined metrics to our

Use EM groups for managing

target ownership. Develop custom

10.2.0.4. Work with Oracle to identify all the

patches required for a stable environment

gdeployment using cloning

technique

/ pagent during

server reboot, database

failovers and to restart agent when they are d f th

requirements. Define and

deploy monitoring

metrics through templates.

pprocess using

EMCLI to manage groups based on

our internal database

inventory datadown for other

reasons

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.9

Page 10: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

EM Implementation –Time Capsule

• Three major phases

• Proof of concept in 2007Proof of concept in 2007

• Production deployment in 2008

• Monitoring implementation in 2009

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.10

Page 11: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Key EM Features Used

• Deploy a standard and fully patched EM agent software across all grid targets.

Oracle Software Cloning

• Target ownership, pushing out monitoring templates, notification, dashboards, ease of management

EM Groups

Monitoring • Target monitoring metrics and policies management• Used in conjunction with groups

Monitoring Templates

• OS & SNMP notification methods to page/email alerts out to the appropriate recipients within the DBA teamNotification out to the appropriate recipients within the DBA team

• Email repeat notification featureNotification

• User Defined Metrics and User Defined Policies are used to meet monitoring needs related to database administration performance backups Golden

UDMs and UDPsadministration, performance, backups, Golden Gate, compliance programs like SOX, PCI

• Extensive use of EMCLI commands for target configuration, EM group and template management, agent management, password changes

EMCLI Commands

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.11

Page 12: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Agent Installation and Configuration Implementation

Agent Install

- Implementation

•Copy the Agent Clone Software•Run runInstaller command to clone the Agent Home

Target Configuration•Agentca –f command to discover target•Emcli modify_target command to set password

Setup Monitoring

P h i t t i i it i t l t b d th t t t•Push appropriate metrics using monitoring templates based on the target type•Emcli apply_template command to push templates

Configure EM Groups• Add the newly added targets to appropriate EM groups and EM rolesy g pp p g p•Emcli modify_group and emcli grant_privs command to configure groups and roles

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.12

Page 13: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

EM Agents – Key for a Successful ImplementationImplementationEM Agents are set up to meet the following requirements:

Performance

•Monitor agent operations (trace files, log files) from time to time. Review metric collection errors, metrics extending beyond interval extending beyond interval errors. Cleanup of agent log files on periodic basis.

Stability

Availability

•Auto start/stop script integrated with VCS cluster software where applicable

•Tracking agent non-availability through EM repository views and starting agents on demand

•Standardization of agent software . Only 10.2.0.4 and above versions with Oracle recommended patches are deployed in our environment

demand

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.13

Page 14: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Monitoring Solution Through EM –Key Components

Metrics & Monitor Notify

Key Components

Policies

•Standard Metrics

•User Defined

•Monitoring Templates

•Agents

•OMS•Notification Rule •User Defined

Metrics•Metric Thresholds

•Policies

•Agents•Targets (Database, Host, Listener)

Rule •Notification Methods

•User Defined Policies

We used standard metrics, UDM, custom metric thresholds, UDP, monitoring templates, notification rule, OS and SNMP notification method for our monitoring solution

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.14

Page 15: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Database Monitoring – Challenges with Out of the Box Metricswith Out of the Box Metrics

Issues:

Metrics for conditions that were not appropriate for some of our databases

Metrics that produced too many alerts

Bugs with some metrics that are based on the database server generated alerts in 10g

Metrics that didn’t exist for conditions that are deemed as required for our environment

Solution:

Disable the metrics where possible. If there is a dependency on other metrics, then nullify the th esholds thresholds

Adjust thresholds, number of occurrences to reduce the quantity of alerts

Develop User Defined Metrics for missing monitoring conditionsWork with Oracle to resolve bugs. If not possible work around the issue with UDMs

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.15

Page 16: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Database Monitoring – Standard and UDM Metrics UsageUDM Metrics Usage

Out of the Box User Defined Out of the Box Metrics

User Defined Metrics

Used for database, listener and host targets

Only required metrics are used after

UDMs are only used for database targets

UDMs used for conditions that cannot be t ith t d d t i M t UDM Only required metrics are used after

thorough testing for reliability

Thresholds, number of occurrences used. Frequency never adjusted as per Oracle

recommendation

met with standard metrics. Most UDMs are against data dictionary views

UDM thresholds, frequency are carefully determined to make sure we don’t impact agent and database target performance recommendation

Metrics used include: Availability, performance, alert log with special filters,

space, RAC, data guard

agent and database target performance

Metrics usage includes: Performance, custom lock monitoring, RMAN backups, scheduler jobs, table space and Golden

Gate monitoring

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.16

Page 17: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Database Monitoring - User Defined Metrics Key Usage Requirements

User Defined Metrics is a powerful EM feature that facilitates adding additional metrics to meet monitoring

Metrics – Key Usage Requirements

facilitates adding additional metrics to meet monitoring requirements.

Key Points about UDM

• UDM can only return two columns (key and value)• In a two-column UDM the first column is the key• Change of key triggers a clear notification of the • Change of key triggers a clear notification of the

previous key record and a new notification for the new key record

• UDMs can only be of a particular type (number or string) and the type is based on the value columnstring) and the type is based on the value column

• UDMs requires login credentials to the database• Can be pushed through templates

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.17

Page 18: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Compliance: Standard and User Defined Policies ImplementationDefined Policies Implementation

Compliance for security is becoming more and more important for database administrators. EM provides standard policies for security and option to create custom one using User Defined Policies.

Enable Policies Related to Security

•We reviewed the available metrics and chose the ones that met our requirementsU it i t l t t bl d di bl li i

p g

•Use monitoring templates to enable and disable policies

Create User Defined Policies• Built User Defined Policies to meet our internal security and SOX,PCI controls related to databases

Reports for Policy Violations

•Created custom reports for policy violations based on repository views to meet the requirements

About User Defined Policies• 10.2.0.4 allows UDPs to be created using EM packages. 10.2.0.5 provides a GUI screen in EM for UDP creation

•Follows a two step process Create UDM first and then associate the UDM

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.18

•Follows a two step process. Create UDM first and then associate the UDM metadata to create the policy

Page 19: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Monitoring – Lessons Learned from Our ImplementationOur Implementation

• Evaluate and use metrics that are applicable and Pl Y

ppmeets your requirements

• Test your metrics and create baseline metric thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates based on them

Plan Your Metrics

• Use monitoring templates to push out metrics to targets

• If few metrics requires change, consider creating a temporary template

Monitoring Templates temporary template

• Build monitoring templates for each target typeTemplates

U D fi d • Use User Defined Metrics for cases where standard metrics are not available

• Plan and develop your UDM carefully knowing all the restrictions with using them

User Defined Metrics

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.19

Page 20: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Monitoring – Lessons Learned from Our ImplementationOur Implementation

• Remember to track and get notified for metric collection errors. They happen for various reasons (password issues collection running a long time bug) Metric (password issues, collection running a long time, bug) and failing to rectify them could result in monitoring failures

• Query the repository if required to identify these metric collection errors, if tracking them through EM screens is not an option

Metric Collection

Errors

• Monitoring and notification are independent in EM. You can monitor all, but notify only a few. Leverage thi f t ff ti l

Monitoring and

this feature effectivelyNotification

R i • There are repository views that can help provide all the metric information. This feature comes in handy when there is a requirement to compare and validate metrics across large number of targets

Repository Views

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.20

Page 21: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

EM Notification Implementation

Notification/alerting requires carefully planning in the overall database monitoring strategy. Notification challenges we faced and how we solved them:

•10.2.0.4 had very little customization available for email notification method

•Our teams required more information about our databases from our inventory records

Alert Customization

Problem Solution

Used EM OS Notification

th d th our databases from our inventory records and this required customization

•Requirement to send alerts to different address based on warning and critical thresholds

•Cannot utilize the default schedule

Customization Limitations with Standard Email

Notification

method as the primary alerting

mechanism.

1 EM OS Notification method calls a script in our OMS servers which in turn performs the alerting 1. EM OS Notification method calls a script in our OMS servers which in turn performs the alerting functionality. EM passes alert information as OS variables and we deliver the alert with formatting, additional information and to the appropriate recipients based on the target name

2. In addition to OS Notification we also use SNMP traps, custom alerting from repository views to meet additional alerting requirements

3. EM 10.2.0.5 and above provides a notification customization feature for the email method4 EM 10 2 0 5 and above provides repeat notification capability for all methods

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.21

4. EM 10.2.0.5 and above provides repeat notification capability for all methods

Page 22: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Lessons Learned from Our ImplementationImplementation

• Evaluate and use metrics that are applicable and Pl Y

ppmeet your requirements

• Test your metrics and create baseline metric thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates based on them

Plan Your Metrics

• Use monitoring templates to push out metrics to targets

• If few metrics requires change, consider creating a temporary template

Monitoring Templates temporary template

• Build monitoring templates for each target typeTemplates

U D fi d • Use User Defined Metrics for cases where standard metrics are not available

• Plan and develop your UDM carefully knowing all the restrictions with using them

User Defined Metrics

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.22

Page 23: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Other Useful Customization: Alert Log Filtering

We came up with a custom alert log filter expression that will only alert for ORA errors that requires DBA’s

Log Filtering

will only alert for ORA errors that requires DBA s immediate action.

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.23

Page 24: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Some of the key issues we addressed after go live:

Post Go Live: IssuesSome of the key issues we addressed after go live:

OMS Performance Tuning:g

• Apache HTTP parameter tuning to handle more connections

L d b kl I OC4J t h dl • Loader backlog: Increase OC4J processes to handle concurrent loader files to avoid backlog

Repository Tuning:

• Increase job_queue_processes parameter to support parallel EM task processingparallel EM task processing

• Increase redo log size to avoid frequent log switches

• Running repvfy utility on regular basis and take actions

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

g p y y gto clean out stuck notifications

24

Page 25: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Some of the key issues we addressed after go live:

Post Go Live: IssuesSome of the key issues we addressed after go live:

Agent Issues:g

• Agent crashing due to patch conflict with 10.2.0.4 database version patch - resolved by applying the right patch combinationpatch combination

• Agent leaving orphan database connections – resolved by a combination of patching and a housekeeping task to bounce agent prior to hitting that conditionto bounce agent prior to hitting that condition

• Missing host performance information in HPUX platform – resolved by patching

• Metric collection errors on standby databases – fixed in 10.2.0.5 agent

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.25

Page 26: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Some of the key issues we addressed after go live:

Post Go Live: IssuesSome of the key issues we addressed after go live:

Database Monitoring Issues:g

•EM Dictionary queries running longer – resolved by collecting periodic dictionary statistics

T bl it i i i t i k d b •Tablespace monitoring inconsistencies – workaround by creating UDMs

Other Issues:

•Load balancer connectivity issue – resolved by LB setting

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.26

Page 27: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Supporting EM Infrastructure –Ongoing Support

Some of the key tasks we perform on a regular basis to support this infrastructure includes

Ongoing Support

support this infrastructure includes

• Daily health check: Make sure all targets are running without any issues. Investigate collection errors, pending status state and review performance alerts pending status state and review performance alerts related to OMS, OMR and agents

• Target discovery: We run into target discovery issues f ti t ti hi h i l i t tifrom time to time which requires manual intervention

• Agent upload problems due to connectivity issues

• Running repvfy and taking care of any issues Running repvfy and taking care of any issues reported

• User privilege management by super administrator

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.27

Page 28: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Q & AQ & A

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.28

Page 29: Real World Mission Critical Database M it i t AT&T ith Monitoring … · • Venkat Tekkalur ... thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates

Thank You

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.29