22
Enterprise Manager 12c - Database Monitoring Implementation at Thomson Reuters Suraj Talreja Manager, Database & Middleware, Design and Engineering

Enterprise Manager 12c - Database Monitoring ... · EM 12c Monitoring Implementation at Thomson Reuters - Overview Dev/Test Staging ... Critical / Fatal events – HP Service Manager

  • Upload
    hadiep

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Enterprise Manager 12c - Database Monitoring Implementation at Thomson Reuters

Suraj Talreja Manager, Database & Middleware, Design and Engineering

Thomson Reuters

• Thomson Reuters is the world’s leading source of intelligent

information for businesses and professionals.

• We combine industry expertise and innovative technology

to deliver critical information to leading decision makers.

• We are the world’s most trusted news organization.

• We serve professionals in the financial and risk, legal, tax

and accounting, intellectual property and science and

media markets.

Database & Middleware, Design and Engineering

• Collaborate with Business Partners.

• Partner with Architects in evaluating new database

technologies.

• Rollout Database Infrastructure Projects.

• Establish and govern database standards across data-centers.

• Escalation contact for support groups during major incidents.

THOMSON REUTERS DATABASE SCALE

• Oracle Environment

– Over 1200 Databases Deployed

– Over 2400 Instances Deployed

– Over 1 PB Of Allocated DB Storage

– Over 350 New Instances Deployed Last Year

• Exadata Database Machines

– 8 Exadata Full Racks

– 3 Exadata Quarter Racks

So why Enterprise Manager 12c? • Implement “One Administration Approach”

• Standardize database monitoring – Replace custom scripts

– Enable monitoring & alerting in a controlled fashion

– Meaningful & actionable alerts only

– Common monitors and default thresholds for all DBA teams – exceptional thresholds to be altered by DBA teams as needed

– Email and Pager (Service Manager) integration

• Implement best practices on Database Diagnostics and Performance Management

• Holistic view for Exadata Management

Administration Group Hierarchy

Lifecycle Status

LOB

But how do we manage privileges across DBA teams for a subset of targets within an Administration Group?

By implementing “Dynamic Groups”

• Requirements:

– Need to provide privileges to subset of targets in Administration Group to different DBA Support groups

– Administration groups based on “Lifecycle Status” and “Line of Business” did not align with group of targets needed for privilege management

• Solution:

– Keep Administration Groups intact for monitoring

– Create Privilege-Propagating Dynamic Groups with criteria based on “Contact” to differentiate databases managed by different DBA Support groups

– Grant Operator privilege on dynamic group to role and grant role to DBAs

– Use same dynamic groups for incident rule sets

EM 12c Monitoring Implementation at Thomson Reuters - Overview

Dev/Test Staging

LEGAL DCO BSI LEGAL DCO BSI

ALL TARGETS

Lifecycle

Status

Line of

Business

Prod

LEGAL DCO BSI

Priv Prop

Dynamic Group: “Contact=DBA-

SUPP-ORACLE-INT”

Target Attributes -

• Lifecycle Status: “TEST”

• LoB: “LEGAL”

New Target

Lifecycle Template

Collection

• Contact: “DBA-SUPP-ORACLE-INT” Other Targets

“Contact” Attribute

automatically

adds target

to Dynamic Group

Role:

“DBA-SUPP-ORCL-INT”

Incident Management - Rule Sets • One Rule Set per dynamic group corresponding to each of the three DBA

support groups.

• Each rule set consists of: Rule on.. Action Summary

Target Down Events Create Incident; set priority to Very High

Metric Alert Events Create Incident; set priority to High

Agent unreachable Create incident if event open for 15 minutes; set priority to Very High

"ORA" errors in alert.log (to capture “ORA” error string) Create Incident and notify the respective DBA group

Send email for all new incidents (from above rules)

Notify the respective DBA group:

Warning event – “email only”

Critical / Fatal events – HP Service Manager integration using EMAT

(homegrown tool)

• Advantages:

• Quickly enable/disable notifications for a subset of targets.

• Scalable approach as new DBA groups are formed.

• Room for customization within each rule set to meet new requirements.

• Ease of management.

Recommendations From Our Experience

• Ensure standby database monitoring is configured with a user having “SYSDBA” privilege.

• Have corrective actions such as listener restart? Explore “Corrective Actions”

• Configure DG Broker to receive advanced notifications for standby monitoring.

• Leverage Dynamic Groups for Incident Management as well.

• Use “Event-based” notification template for complete ORA error string from alert.log

• Set delay for “agent unreachable” alerts to mitigate network latency.

BRINGING IT ALL TOGETHER

1. Building Blocks + 2. Patterns + 3. Scale + 4. Oracle =

Q&A

APPENDIX: PROJECT IMPLEMENTATION DETAILS

DATABASE MANAGEMENT – Pre-Enterprise Manager

DEPLOYING AT SCALE

– Over 350 New LION servers deployed each year

– Current Approach • Gold Image

• Cloning Process via Custom Scripts

– Challenges • Revision Management

• Scripting Effort

• Troubleshooting Failures

MONITORING AT SCALE

– Over 2400 Oracle servers on the floor

– Current Approach • Custom scripts

• Cron, dbash

– Challenges • Each team has their own set of

scripts

• Troubleshooting Performance

• Patch Management

Phases of Enterprise Manager Program Q4 2012 Q2 2013 Q1 2013 Q3 2013 Q4 2013

Phase 1 Deployment

(12/31) Execute

Design

Execute

Phase 2 Monitoring

(7/31)

Phase 3 Diagnostics

(9/30)

Phase 4 Lifecycle

Management

(12/15)

Design

Execute

Design

Execute

Plan

Plan

Plan

EM 12c: Infrastructure Deployment to Maximize Availability

Level 4 HA Deployment

EM Version: 12.1.0.2

DB Plugin: 12.1.0.3

EM 12c: Database Monitoring

• Implementation focus on core database monitoring components:

• Agree on pilot rollouts

• Finalize the “Admin Tree” structure

• Test HA for your EM site (addon)

• Instance Down

• Listener Down

• Alert.log

• File System Utilization

• Tablespace Utilization

• EM Agent Down

• Streams Process Down

• Listener Down

• Data Guard Standby Lag

• CRS Nodeapps Down

• CRS Processes Down

• Max. Connections

• SOA Suite • Exadata Monitoring

Incident Management – Notification Details • Utilize “EMAT” to integrate Enterprise Manager’s incident management

features with Service Manager capabilities

• EMAT functionality based upon email notification sent from EM.

• EMAT processes will generate appropriate email, paging notifications and

generate an IM ticket where appropriate.

• Incident tickets managed in Service Manager. Severity Target Lifecycle Status Action

Fatal Production IM Ticket, Page

Fatal Staging IM Ticket

Fatal Development IM Ticket

Critical Production IM Ticket, Page

Critical Staging IM Ticket

Critical Development IM Ticket

Warning Production e-mail

Warning Staging e-mail

Warning Development e-mail

Database Diagnostics using EM 12c

• Implement Database Diagnostics Pack

• Expand Core Monitoring capabilities: • Golden Gate Plugin

• “Corrective Actions” for listener restart and tablespace adds.

• Implement Phase-2 Pilots: • Exadata Monitoring and Management

• SOA Suite monitoring

• Pilot rollout: • Netapp Plugin

• MySQL Plugin

• Real Application Testing (RAT)

• Performance monitoring and diagnostics

• Automatic Workload Repository (AWR)

• Active Session History (ASH)

• Real Time ADDM

• Real-time SQL and PL/SQL Monitoring

• Exadata Cell Grid Performance

• Exadata Resource Utilization

• Mass Deployment of Oracle Software (Database, Real Application Clusters)

• Supports all versions up to 11.2 / Grid Infrastructure Architecture

• Gold Image cloning and standardized software deployment via Profiles

• Lock down access for controlled and error free deployments

DB Provisioning

EM 12c: DEPLOYING AT SCALE – Q4’13

Source DB systems Target DB Systems

Software Library Storage

Save Gold image (and

optionally data) from

source systems to EM

software library

Deploy saved Image and

data to target systems

with customizations

EM: Database Monitoring

Lessons Learnt..the hard way!

• EM12c agent NOT supported on SLES9.

• Ensure standby database monitoring is configured with a user having “SYSDBA” privilege.

• Have corrective actions such as listener restart? Explore “Corrective Actions”

• Configure DG Broker to receive advanced notifications for standby monitoring

Enhancement Requests

• Allow wildcards for tablespace names in metric collection settings.

• Custom target properties shown as drop down during target promotion.

• Make selected target properties as mandatory for target promotion

• Need EMCLI verbs for target promotion.

• Take corrective action to automatically add space to tablespace.

• Take corrective action to restart listener on failure.