40
Reduce Production Incidents with Oracle Enterprise Manager 12c …and give yourself a break! Roland Evers

Reduce Production Incidents with Oracle Enterprise Manager 12c

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reduce Production Incidents with Oracle Enterprise Manager 12c

Reduce Production Incidents with

Oracle Enterprise Manager 12c…and give yourself a break!

Roland Evers

Page 2: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 2

Roland Evers

About us

• 34 years old

• Living in The Netherlands

• 9 years at Accenture

• > 5 years Oracle Administrator since EM 10g

Tech Area

Oracle - Infrastructure & Databases; HW/SW,

Infrastructure, Monitoring, Linux

Loves:

Music, photography, theme parks, a good movie

…and repairing stuff

Oracle Enterprise Manager 12c:

Certified Implementation Specialist

Software Engineer Sr. AnalystOracle Technology

Page 3: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 3

Accenture

• Accenture is a global management consulting, technology services and outsourcing company.

• Accenture has more than 323,000 people serving clients in more than 120 countries.

• Combining unparalleled experience, comprehensive capabilities across all industries and business functions, and extensive research on the world’s most successful companies, Accenture collaborates with clients to help them become high-performance businesses and governments.

• The company generated net revenues of US$30.0 billion for the fiscal year ended Aug. 31, 2014.

• Accenture Positioned as a Leader in 2014 Gartner Magic Quadrant for Oracle Application Implementation Services, Worldwide

Page 4: Reduce Production Incidents with Oracle Enterprise Manager 12c

4© 2014 Accenture. All rights reserved.

1. Introduction: Business Case & Maturity Levels

Concepts, in short on R4

2. Plan: Roadmap & Requirements

3. Oracle Enterprise Manager 12c Monitoring

4. Next Steps: Team & Processes

Oracle

Enterprise Manager

Page 5: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 5

Executive Summary

• Design & monitoring in order to provide detection and alerting of service or component failures in business chain is missing

• Monitoring on application level is extensive but not for all applications on the same level • Monitoring on Infrastructure & KPIs is not fully developed

Situation

• Identify monitoring improvement opportunities across infrastructure and its systems; roadmap to enable realization of recommendations

Program Objectives

• Create a plan to set up an improved monitoring framework through thorough analysis of current and desired monitoring

• Include a tooling assessment to find the best quality cost ratio tools • Enable 24/7 monitoring management on the newly created monitoring

Program Approach

• Phased realization of this plan will ultimately result in a comprehensive monitoring catalog, new monitoring that includes updated or new tooling

Program Results

Page 6: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 6

Model for maturity monitoring

6

Leve

l of M

atu

rity

OptimizedManagedDefinedRepeatableInitialNon Existing

Level T

hre

e

Level F

our

Level Tw

o

Level F

ive

Level O

ne

Monitoring is optimized

Activities are being executed pro-actively continuously according to the defined processes.

Execution is monitored & measured.

Performance is measured and discussed with individuals.

Activities are improved based upon measurement, review and evaluation.L

evel Z

ero

Monitoring is managed and measurable

Activities are being executed actively and pro-actively continuously, according to the defined processes.

Execution is monitored and measured.

Performance is measured & discussed

Monitoring is defined and structural

Monitoring activities are defined, activities are executed according to the processes.

Monitoring isrepeatable but intuitive

Activities are executed regularly and actively but are executed differently across resources or teams.

Monitoring is ad-hoc

Activities are executed reactive and ad hoc, not clear or defined.

Monitoring is not existing

Monitoring activities are not performed

Page 7: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 7

Costs of a Priority 1 Incident

7

Example P1 Financial Impact

Cost System Downtime

• Business non productive, Resources waiting, idle

• Lost Client Revenue; e.g. in Call Centers• Public Image; Publicity, Trust

Cost Consultants for Resolving

• Hourly Rates• Time they cannot spent on: Changes, Solving structural Problems, Solving P2 & P3 issues, Innovation

Other Factors

• P1s cannot be planned and are hard to factor into staffing & resourcing planning• P1 resolution lowers team productivity: P1 Disrupt Planned work, Changes and solving structural

problems• P1 Require management focus

Page 8: Reduce Production Incidents with Oracle Enterprise Manager 12c

8© 2014 Accenture. All rights reserved.

1. Introduction: Business Case & Maturity Levels

2. Plan: Roadmap & Requirements

3. Oracle Enterprise Manager 12c Monitoring

4. Best Practices & Next Steps

Oracle

Enterprise Manager

Page 9: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 9

Event Monitoring

Be aware of availability & performance problems 24x7

• Specify critical vs. warning

thresholds for metrics

• Various notification methods:

email, SNMP trap, OS command

• Notification rules and schedules

for alerts

• Predefined & user-defined

monitoring templates

Page 10: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 10

Levels of monitoring maturity

Level 0

Scattered Monitoring

Maturity Level

o All kinds of tools

o Different teams

Tools

o OEM

o SCOM

o Nagios

o Lots of custom scripts

Level 1

Mature Monitoring

Maturity Level

o One main tool

o Different teams

Tools

o OEM

o Less custom scripts

Level 2

Centralized Monitoring

Maturity Levelo One main tool

o One central team

o Tools

o OEM

o BI Publisher integrated

o Minimal custom scripts

Level

Analytical Monitoring

Maturity Level

o One main tool

o One central team

o Advanced Analytics

Tools

o OEM

o BI Publisher integrated

o Minimum to no sanity

checks

Start Monitoring

Project

0

AnalyticsCompleted

Monitoring Project

1Monitoring Control

Center

2

From scattered to analytical monitoring

Page 11: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 11

Monitoring Levels - TechnicalIn

fra •(L)Unix

•Solaris

•Windows

•Databases

•Oracle

•MySQL Ap

plic

atio

n •Web servers

•WebLogic

•BRM

•FMW

•HCM

•CRM

•Finance

•All SW components

Inte

rfa

ce

s •Web services

•Batches

•XML

Co

mp

on

en

ts F

un

ctio

na

lity •Application

Processes

•Login functionality

•Severities

•Availability

•Reports in time

•Data health

End-user monitoring and business KPI’s are not included in this overview

Page 12: Reduce Production Incidents with Oracle Enterprise Manager 12c

12© 2014 Accenture. All rights reserved.

1. Introduction: Business Case & Maturity Levels

2. Plan: Roadmap & Requirements

3. Oracle Enterprise Manager 12c Monitoring

About System & Services, Administration Groups

4. Best Practices & Next Steps

Oracle

Enterprise Manager

Page 13: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 13

Challenges – no automated monitoring configuration

Automation

• Application deployments with new environmentswhat about monitoring & alerting?

• New hardware & software: new or re-arrange monitoring set-up

• Home grown tools (older versions, impact, security, manageability,

responsibility, changes…)

Various & different applications: require different configurations

Similar applications, but different domains

Multiple (internal / external) parties and teams with different demands and

requirements

Page 14: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 14

ApplicationsManagement

Enterprise Ready

FrameworkCloud

Management

Chargeback and Capacity

Planning

Middleware Management

Database Management

Application Quality

Management

Configuration Management

Exadata and Exalogic

Management

Provisioning and Patching

Oracle Enterprise Manager

• One suite

No separate tools

Wide range of Metrics - out of the box

Notifications for anyone – how they prefer

• View

Dashboards to show status of systems and

services

Root cause with topology

Drill down from key business processes

down to a single statement / metric

• Alerting

Pro-active monitoring with notifications

Helping in finding root causes up front

(before it is too late…)

Entire Oracle Stack!

Combined with the best database

management, tuning and analysis

Support

Innovative

Being prepared for:

Self service

Provisioning

The cloud – Hybrid cloud management

One suite for various departments

Page 15: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 15

Concept > Analyze > Design > Implement

Automation

• Automate monitoring setup with new hardware / software

• Flexible solution

• No differences between (application) related servers

• Role based access

• Monitoring over different domains

• Together and possible for other external and internal teams and parties

Page 16: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 16

Concept > Analyze > Design > Implement

Automation – Administration Groups

• Automatically place targets into corresponding group:By adding properties to a target (database, host, etc.)

• Automatic deployment of predefined monitoring checksTo all current and future new targets.

• Saving time, automatic apply collections of templates to similar targetsFor their specific purpose (HCM, Ordering, Billing, etc.)

• Incident Rules configured on an Administration groupWill automatically apply for any new target in that group.

• No need to perform ad hoc apply operations or comparing templates settings

Page 17: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 17

Concept > Analyze > Design > Implement

Administration Groups

Oracle describes Administration groups as:

“a special type of group used to fully automate application of monitoring and other

management settings targets upon joining the group.”

Concept describes the flexibility in defining your own hierarchy

• Maximum of four levels (target properties)

• In general, for most cases: generic target properties (Department & Line of Business)

are very logical

Page 18: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 18

Concept > Analyze > Design > Implement

Automation - Approach…

Best Practices:Documentation &

white papers

Monitoring set-up

Manage incidents

Fine tuning

Monitoring configurationsIncident Rules for

notificationsExpand Administration

Groups

Administration Groups

Templates System & Services Notification Rules Roles

Docs.oracle.com

Strategies for Scalable, Smarter Monitoring using Oracle Enterprise Manager Cloud Control 12chttp://www.oracle.com/technetwork/oem/sys-mgmt/wp-em12c-monitoring-strategies-1564964.pdf

Page 19: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 19

Concept > Analyze > Design > Implement

Automation

Analyze:

• The current situation

• (Third) parties and external parties

• Requirements and needs from monitoring levels

• Break down into all levels

Level T

hre

e

Level F

our

Level T

wo

Level F

ive

Level O

ne

Level Z

ero

Infr

a • OS

• Databases

• HW checks

Applic

ation • Web

servers

• WebLogic

• BRM

• FMW

• HCM

• CRM

• Finance

• SW components

Inte

rfaces • Web

services

• Batches

Com

ponents

Functionalit

y • Application Processes

• Login functionality

• x

Page 20: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 20

Concept > Analyze > Design > Implement

Administration Groups

Automatic grouping of targets into Administration groups

• Idea: Administration groups hierarchy:

Targets into groups based on following criteria:

Lifecycle status

Department

Line of business

Contacts

Automatically:

• Apply Template Collections and set thresholds

• Include targets in Incident Rules, receive notifications

• Set privileges for existing Administrators

role based access on any group

Contact

Cost Center

Customer Support Identifier

Department

Lifecycle Status

Line of Business

Location

Target Version

Target Type

Avail

ab

le C

rite

ria O

EM

12c

One challenge: Various (external) teams & multiple same Line of Business values within 1

department: we would not be able to use generic names for Departments and or Line of Business.

A different approach needed…

Page 21: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 21

Concept > Analyze > Design > Solution

Administration groups

Automated deployment of configurations as targets join groups

Target Properties

Administration Groups are created based on types of membership criteria levels:

• Lifecycle Status: Production, Stage, Test, and Development

• Line of Business: Unique Application identifier

Example: Peoplesoft BM, HCM BM, FMW CM, Siebel

(BM: Business Market, CM: Consumer Market, etc)

• Contacts Which party is maintaining the application or application service.

Example: Customer’s DBA team, Accenture Application Maintenance, other’s teams

Page 22: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 22

Concept > Analyze > Design > Solution

Administration groups

Automated deployment of configurations as targets join groups

Target Properties

Administration Groups are created based on types of membership criteria levels:

• Lifecycle Status: Production, Stage, Test, and Development

• Line of Business: Unique Application identifier

Example: Peoplesoft BM, HCM BM, FMW CM, Siebel

(BM: Business Market, CM: Consumer Market, etc)

• Contacts Which party is maintaining the application or application service.

Example: Customer’s DBA team, Accenture Application Maintenance, other’s teams

How it works

• An hierarchy of Administration groups is generated based on these three levels;

• New targets are added automatically based on the three predefined levels;

• Existing or new targets that match the definition, will be placed into the corresponding Administration groups.

Page 23: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 23

Targets placed into Administration Groups based upon criteria

Administration Groups

Contact

1 Accenture Application Maintenance

2 Client DBA team

3 Analytics Team

4 External Teams

Line of Business

1 BI Applications

2 FMW BR

3 BRM Business

5 WebCenter

Lifecycle Status

1 Production

2 Staging

3 Test

4 Development

TestProduction Staging Development

FMW BM Siebel CRMBRM Business

Ext. Team A

Ext. Team B

Analytics Team

Client DBA Team

Accenture AM

Accenture DBA

BI App X

Admin Group hierarchy

Web Center

Page 24: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 24

*New targets!

CRM Production

Application

& App ServersDatabases

Middleware Test

DatabasesMiddleware

& App Servers

Middleware

Production

Servers DB

Example: new targets (CRM & Fusion)

Discovered:• 3 new WebLogic servers• 2 new Cluster Databases• 10 new hosts for CRM

Properties set for the targets:Test, Middleware, DBA TeamProduction, CRM, DBA TeamProduction, CRM, Testers

2 Siebel

Databases

Type: Cluster DatabaseLine of Business: “CRM”Lifecycle status: “Production”Contact: “DBA Team”

3 WebLogic

Servers (Fusion)

Type: WebLogic ServerLine of Business: “Middleware”Lifecycle status: “Test”Contact: “Test Team”

2

Middleware

Databases

Type: Cluster DatabaseLine of Business: “Middleware”Lifecycle status: “Test”Contact: “DBA Team”

Targets automatically placed into corresponding Administration Groups

DBA TeamApplication Team Test Team DBA Team

Targets into groups

Administration groups

Page 25: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 25

CRM Production

WebLogic

application serversDatabases

Middleware

Test

DatabasesApplication

& App Servers

Middleware

Production

Servers DB

Example: new targets (CRM & Fusion)

Template Collection CRM Prod:• WebLogic server template• Cluster Databases template• Host template

Template

Collection

(CRM)

Template

Collection

(CRM Prod)

Thresholds set

Template Collections automatically applied• Metric thresholds• Metric Extensions (automated deployment onto these targets)• Compliance standards & Privileges• And more...

ME setME set

Templates automatically applied onto targets in Administration Groups

Database

Template

Thresholds set

WebLogic

Template

Multiple templates on Administration groups

Administration groups

Page 26: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 26

Role: View on all targets

View and Operator roles for Administrators

Role: View sub selection

Role: Operator CRM Admin

Administrator AM CRM Consumers:

(CRM Administrators

Role: CRM Viewer

“viewer”(CRM Consumers)

CRM Production

Application

& App ServersDatabases

Operator

View

Overall userview Consumers:

(View_Consumers)

Consumers“CRM_PRD” Administration Group

View

Administrators with View privileges on Administration Groups• Can view all targets in that particular Administration group.

Administrators with Operator privileges on Administration Groups• Can view all targets in that particular Administration group• Can also perform Operator activities when it has Operator rights on the

Administration group.

Roles on Administration Groups

Administrators & Roles

Page 27: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 27

High Level Services: business processes

Systems & Services

Logical structure of services, systems and business

services.

From a business processes point of view:

• High Level services

Services delivered / delivery by client

• One dashboard, high level services

• One overview – your delivery

Page 28: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 28

Monitor the high level services

Systems & Services

Systems Services

&

• Service: In an Enterprise, an entity that

provides a useful function to its users

• Sub-Service: Any type of service created

using cloud Control.

• Aggregate Service: Service that consists out

of two ore more services called a sub-service.

• Generic Service: Using this, you can define a

service to model and monitor any business

process or application

• A logical set of Targets that collectively

provides one or more applications or

services.

• “Out-of-box systems are provided for

Oracle-Packaged applications and

database targets.”

Billing - Business Ordering - Consumers

Sales Force Automation CRM

Page 29: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 29

Systems & Services

<SYSTEM> <SYSTEM> <SYSTEM> <SYSTEM> <SYSTEM>

Host Host Host Host Host Host Host Host Host Host

Page 30: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 30

Systems & Services

Page 31: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 31

Systems & Services

Page 32: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 32

Services defined

Systems & Services (3/4)

32

• Each Service consists out of different

Components.

• For each Service monitoring rules are

setup to check the status.

• If one Component is down, it does not

mean the whole service is down or all

other service are impacted

Define:

• Services – Main Business Processes

• Systems: collection of targets of a service

• Required metrics (KPI’s)

• Extra metrics required

Billing - Business Ordering - Consumers

Sales Force Automation CRM

Services

Page 33: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 33

Billing - Business Ordering - Consumers

Sales Force Automation CRM

Explanation

Systems & Services (4/4)

33

CRM Consumers

Call Software

Ordering SystemCo

ns

um

er

CRM Business

Fusion Middleware

BRM Billing software

Bu

sin

es

s

Ordering software

Page 34: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 34

Event Management

Alerts & Notifications

An event can have different severities. The ones focused on in the project are:

• Fatal: Corresponding service is no longer available. For example, a monitored target is

down (target down event). A fatal severity is the highest level severity and only

applies to the Target Availability event type.

• Critical: Immediate action is required in a particular area. The area is either not functional or

indicative of imminent problems.

• Warning: Attention is required in a particular area, but the area is still functional.

2x same warning threshold events:

Create Incident

1x critical threshold event

Create Incident

1 New incident

Send e-mail

2x same incident

Create Problem

Send e-mail

Page 35: Reduce Production Incidents with Oracle Enterprise Manager 12c

35© 2014 Accenture. All rights reserved.

1. Introduction: Business Case & Maturity Levels

2. Plan: Roadmap & Requirements

3. Oracle Enterprise Manager 12c Monitoring

4. Best Practices & Next Steps

Oracle

Enterprise Manager

Page 36: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 36

Reducing costs…and have fun!

Key Benefits

• Significantly decreased manual monitoring configuration effort to a minimum

• Decreased P1’s, cost saving, increased ROI

• Automated configuration of monitoring and monitoring

• Improve mean time to resolution

Reduce Costs

• Proactive monitoring of performance and availability

• Monitor key performance indicators and metrics

Improve Service Levels

• On going:

• Make optimization decisions based on KPI’s to be defined

• Create Service Level Agreements and Dashboards

Align with Business Demands

Page 37: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 37

Our Experiences

Conclusion

•Flexible, easy to implement and maintain!

•Many improvements in Administration groups, Services and administration

•Usability and efficiency for others

Enterprise Manager R4

•With the standards levels, there is already a direction of a definition of groups

•Less may be more: Flexibility! You do not need to include all 4 levels.

•Take the time to thoroughly evaluate the concept and design

•Update and include the target properties, during discovery or synchronization.

Administration Groups

•Good concept when in use and described well

•Take time to go through the documentation

•If issues may occur: helps to quickly identify what is the actual impact on your main business processes

Systems and Services

•Reduce time for root cause analysis via diagnostics findings

•Recommendations to resolution calculated based on performance and configuration data collected across complete stack

Diagnostics and Packs

Page 38: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 38

Best practices!

Sources

• Oracle documentation

• Strategies for Scalable, Smarter Monitoring using Oracle Enterprise Manager

Cloud Control 12c

http://www.oracle.com/technetwork/oem/sys-mgmt/wp-em12c-monitoring-strategies-

1564964.pdf

• Experiences & differences between current & previous Enterprise Manager

versions

• Investigate time to find your best approach!

Page 39: Reduce Production Incidents with Oracle Enterprise Manager 12c

© 2014 Accenture. All rights reserved. 39

Our Ideas / plans

Conclusion

• …

• …

• …

Page 40: Reduce Production Incidents with Oracle Enterprise Manager 12c

Q&A

Roland EversSoftware Engineer Sr. Analyst

Oracle Technology

[email protected]

linkedin.com/in/rolandevers

For more informationErwin WinkelmanSoftware Engineer Sr. Analyst

Oracle Technology

[email protected]