29
Implementing Highly Available OpenView Ken Herold Senior Integration Consultant Melillo Consulting

HP World Highly Available OpenVie€“ /etc/opt/OV/share – /var/opt/OV/share – /opt/OV/OpC_SG – /u01/oradata/OpenView (DB files) – /u01/app/oracle/product (DB binaries) Create

Embed Size (px)

Citation preview

Implementing HighlyAvailable OpenView

Ken HeroldSenior Integration ConsultantMelillo Consulting

OutlineOverview of High Availability (HA)

Highly Available NNM

Highly Available ITO

Management of HA clusters

Integrated products

Performance products

Demand for HA solutionsShift in focus on NSM solutions in themarketplace

– NSM core IT department function– IT departments accountable to business units– Driven by Service Level Agreements (SLAs)– Impact of reporting

Highly Available NNMCollection station failover

– Implemented in Distributed Internet Discovery& Monitoring (DIM)

– Allows a Management Station (MS) to pick upstatus polling responsibility for a failedCollection Station (CS)

Highly Available cluster– Allows NNM to run on a cluster of 2 or more

servers that provide continuous availability

Distributed ArchitectureOverview

NNM Station A(Management

Station)

NNM Station B(Collection

Station)

NNM Station C(Collection

Station)

NNM Station n(Collection

Station)

Discovery & Status Pollingoccurs locally

Changes in status andtopology are relayed from thecollection stations (stations Bthrough n) to managementstation A.

Failover ConfigurationFailover filter

– Allows specified objects to be polled by MSduring CS failure

Enabling CS failover– xnmtopoconf -failover {stationlist}

Applying failover filter– xnmtopoconf -failoverFilter {filter} {stationlist}

Limitations of DIM for HAScaling

– Limited by number of devices polled and thepolling interval

– Practical size of object database (primary &secondary objects)

– Capacity of network connection between MS &CS

– Event data may be lost since SNMP traps arenot forwarded when CS fails

– Multiple CS failures may create problems for MS

Highly Available ITOMultiple Managers

– Manager of Managers– Peer Managers– Follow-the-Sun

Hardware Backup– Cold Standby– Highly Available Cluster

Multiple ManagersManager of Managers (MoM)

– Single ITO server manages multiple ITO serversPeer Managers

– Multiple managers with a designatedresponsibility providing redundancy to oneanother

Follow-the-Sun– Management responsibility moves based on

time of day

Limitations of MultipleManagers

Scalability– Number of managed nodes per ITO server– Number of messages received in the ITO

browser– Number of operator logons– Speed of network connections to remote sites– NNM configuration issues– Agents must be told to report to new ITO server

(time issue)

Hardware RedundancyCold Backup

– Cost effective• 1 server backs up multiple ITO servers• Shared storage device not required

– Downtime may be unacceptable• Configuration of failed server loaded after

failover– Message issues

• No synchronization of current message data• Latency detecting events while message

buffers are cleared– Need to implement configuration

synchronization

Hardware RedundancyHot Standby

– ITO servers implemented in HA cluster– Rapid, automated failover of a failed ITO server– Configuration and messages data shared

between nodes– Upgrades & patches require more effort to

install– Costly solution

• Requires shared disk array• MC / ServiceGuard software required

– Can provide LAN failover

Basic HA DefinitionsPackage - application and associated processes

can only run on 1 node in the cluster

Service - a process monitored by MC/SG

Original Node - node where package existedbefore failover

Adoptive Node - node that takes control of apackage

Overview of a ClusterLAN 1

LAN 2

Node 1 Node 2

SharedData

PackageA

IP: 10.1.50.25IP: 10.1.50.26

Virtual IP:10.1.50.27

Bridge

Originating Node Adoptive Node

Heart Beat LAN

Steps to Implement HACreate a volume group on shared deviceCreate logical volumes in that group

– /etc/opt/OV/share– /var/opt/OV/share– /opt/OV/OpC_SG– /u01/oradata/OpenView (DB files)– /u01/app/oracle/product (DB binaries)

Create fully qualified hostname & IP address forITO package

Activate & mount volume group on primary node– vgchange -a y /dev/{volume_group}

Steps to Implement HAInstall Oracle binaries on primary nodeInstall ITO binaries on primary nodeInstall latest ITO/NNM patches on primary nodeConfigure the ITO database (opcconfig)

– Select MC/SG installation– Use fully qualified name for package– Shared lvol is /opt/OV/OpC_SG– Configure DB automatically– Do not enable startup at boot time

Configure startup of ITO processes manually in$OV_LRF directory:– ovaddobj ovoacomm.lrf– ovaddobj opc.lrf

Steps to Implement HAModify /etc/oratab to enable autostartVerify ITO/NNM startsModify ov.conf file

– Clean copy - /opt/OV/newconfig/OVNNM-RUN/conf/ov.conf– Create ov.conf.host1 & ov.conf.host2

• Modify HOSTNAME= field in each to match the localhostname

• Modify NNM_INTERFACE= to match floating IP• Modify USE_LOOPBACK= to ON

Modify NNM auth files– ovw.auth– ovwdb.auth– ovspmd.auth

Steps to Implement HAVerify ITO/NNM starts

– First copy ov.conf.host1 to ov.confInstall bits on 2nd node

– Do not run opcconfig– Verify operation of NNM

Modify opcsvinfo file on both to include lines:– OPC_SG TRUE– OPC_SG_NNM TRUE

Switch shared volume to 2nd node– Shutdown OpenView & Oracle– Unmount all 5 volumes– Deactivate the shared volume group– Activate VG & mount lvols on 2nd node

Steps to Implement HACopy the following to the 2nd node:

– /etc/oratab– /etc/opt/OV/share/conf/ovdbconf

Create new server registration file:– cd /etc/opt/OV/share/conf/OpC/mgmt_sv– mv svreg svreg.OLD– touch svreg– opt/OV/bin/OpC/install/opcsvreg -add itosvr.reg– rm svreg.OLD

Remove ovserver file:– rm /var/opt/OV/share/databases/openview/ovwdb/ovserver

Steps to Implement HARun opcconfig

– Do not configure the DB automaticallyCreate ITO packageModify monitor scripts to watch necessary

OpenView processesConfigure package control scriptsVerify operation of ITO on both cluster nodes

independentlyTest failover by killing monitored process

Configuration Filesito.ctl

– Package control file, contains steps necessaryto activate/deactivate ITO in failover

ito.mon– Describes processes that the package monitors

ito_create_new_svreg– Creates new server reg file on failover, invoked

by ito.ctlito_start_sgtrapi & ito_stop_sgtrapi

– Starts & stops trap interceptor on ITO server

Common ProblemsFailure to modify opcsvinfo file

– OPC_SG_NNM_STARTUP TRUE– OPC_SG TRUE

Failure to remove ovserver file– If in doubt remove & let it get re-created

Failure to modify NNM auth filesFailure to modify the ov.conf file

– HOSTNAME, NNM_INTERFACE fields

Keep in MindPatches must be applied to each nodeKey commands

– cmhaltpkg ITO– cmrunpkg -n {node} -v ITO– cmviewcl– cmmodpkg -e ITO– cmrunnode {node}

An ovstop will cause switchoverProcesses to monitor for failover

Managing HA ClustersManaging the Management Cluster

– Trap Template– Log files– Process monitors

Managing HA packages– Process monitors– Log file issues

Managing HA ClustersShared log files

– Monitored from active node– Copy contents to log.node on failover, start with

clean slate– Use “close after read” to avoid corruption– Use “read from last file position”– Do not use “message on no logfile”

Monitors– Intelligent monitors that read output of

cmviewcl to determine state

Managing HA ClustersTrap interceptor

– Assign to virtual node– Started up on active node with ito_start_sgtrapi– Stopped on inactive node with ito_stop_sgtrapi

Notification & TroubleTicketing in HA clusters

Connect to TT using virtual hostname– Databases will be synched– Intelligence needed to move SPI monitoring– Include startup/shutdown of SPI in package

control scriptsHW Notification requires connection to each

serverSW based use virtual hostname to connect

PerfView & MeasureWare inHA Environments

Management Server– PerfView binaries loaded on every cluster node– Use shared data repository as MW data

collection– Each node connects to MW agent as necessary

Managed Node– MW agent runs on each cluster node to collect

performance data– Application data collected on cluster node that

currently runs the package

Thank You for Attending

Ken HeroldMelillo Consulting, Inc.