52

Session 2.5 - Tivoli Monitoring - IBM · PDF fileSession 2.5 - Tivoli Monitoring ... Co-authored ITM Redbook Perl specialist. Agenda ... non-stop system via TEC non TME

  • Upload
    hadieu

  • View
    219

  • Download
    4

Embed Size (px)

Citation preview

Session 2.5 - Tivoli Monitoring implementation at St.George Bank:Using ITM6.1 for Debugging complex applications

Sydney 3-4 August 2006

By Craig Lister – [email protected]

About the Presenter

Over 25 years in IT

Last 6 specialising in TEC,ITM

Co-authored ITM Redbook

Perl specialist

Agenda

Tivoli Setup at St GeorgeWhat is MonitoredThe Complex ApplicationWhy is my Application not PerformingAnd the solution isUniversal Event

We use :-Tivoli Configuration Manager 4.2.3Tivoli Enterprise Console 3.9FP4IBM Tivoli Monitoring 6.1 and 5.1IBM Tivoli Monitoring For Transaction Performance 5.3

Tivoli Products Deployed at St.George

Seperate joined TEC TMRManaging over 1000 serversThru 8 gateways from Inet perimeters to core LANsReal time Trouble ticket lodgement direct from Solaris to MSSQL DBIntegrated with 3rd party cash management system

Tivoli Setup at St.George

Tivoli Setup at St.George Cont’d

In house two way SMS paging for first and second callServer owners listed in fact files for alertingDynamic ‘who’s on call’ file kept and maintained by users via emailDynamic conf file held on all classes per user group for alert generation

ATM’s monitored via Prognosis feed from Tandem non-stop system via TEC non TMETandem system monitored via TEC non TMEATM’s configuration controlled via CMAll servers monitored via ITM61/ITM51Banks internet systems monitored via live TMTP transactions

How St.George uses Tivoli to Monitor Key Infrastructure

The application in question is a CRM (Customer Relationship Management)Present all customer A/C’s, single viewBank wideTime criticalAccess several large DB’sDivergent systems, Windows/AIX/PeopleSoftCustomer facing App = Executive focus

The Complex Application(A problem waiting to happen)

Initially two Websphere servers across two sites, one in each siteAfter initial problems increased to two Websphereservers at each site.One AIX backend serverOne AIX/DB2 Database serverIDM validation serversGlobally load balanced across the sites

The Complex Application Cont’d

Websphere servers locking up after 36hrs needing re-boot to correct.Transactions thru city site slower than KogarahCustomers impacted

The Symptoms(Aka: unwanted executive focus)

Install Websphere monitor on all serversSetup of kwe.xmlAnalyse/collect HTTP Session dataAnalyse/collect Garbage Collection dataAnalyse/collect InFlight Workload data

Gathering Evidence using ITM61

<?xml version="1.0" encoding="UTF-8"?><KWEINSTR Version="130"

CollectCPUTime="false"CollectHTTPSessStats="false"CollectInFlightWorkloads="captured"CollectorSessionPort="65535"CtgDelayPlugin=""DisplayEJBAs="EAR:CLASS"DisplayServletWorkloadAs="CLASS"GlobalInstrumentation="all"HeapAnalysisUserClasses=""InternalTrace="false"LockAnalysisSystemClasses=""LockAnalysisUserClasses=""LogFileName="kweinstr.log"MaxTrivialMethodInstructions="5"MaxClasses="2000"MeasureHeapDelays="true"NumHeapAnalysisClasses="2000"ShowInternalWorkloads="false"ShowMethodArguments="false"SysInstr="Direct"toStringIsSafeToUse="false"

>

Kwe.xml data settings used, Part I

<Class Name="psft.pt8.*"ClassType="user"DepthSensitiveInstrumentation="true"DisplayServletWorkloadAs="CLASS"HeapAnalysis=""IgnoreTrivialMethods="true"LockAnalysis="false"MethodNames="*"

MethodType="SERVLET,EJB,METHODS,PORTLET">

</Class>

</KWEINSTR >

Kwe.xml data settings used, Part II

You will need to create one of the above entries for each class you want to instrument

Note HTTP sessions, sharp rise to plateauNo sessions released during 9am – 5pmChunk released approx 12hr after sharp riseNote Java Heap usage rises unabatedMore and more frequent Garbage collectionsAnd, the clincher, InFlight work loads growing in duration and numberResulting in server re-boot

The Analysis(Showing the executive why they pay us)

After some digging, business design requirement of 12hr HTTP session timeoutThis was so Branches wouldn’t have to login repeatedly, thus degrading their serviceThis session timeout changed to 30mins on the Websphere servers. Retained at 12hrs on IDM.Situations developed to look at Inflight timesDevelop LFA for Websphere server log

The Analysis Cont’d

use Postemsg;use Sys::Hostname;

my ($id_string) = 'sendtec';

#Global Environmental Variablesmy $local_host=hostname();my $logsdirfile = '/var/spool/sendtec.log';

my $datetime = &loct();open (DBG, ">$logsdirfile");print DBG "$datetime Begin run of sendtec\n";

&sendevt(@ARGV);

close DBG;

exit 0;# E N D O F M A I N L I N Esub loct {######################Define the localtimemy ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);$mon = $mon+1;if (length $mday < 2) { $mday="0".$mday; } # Add a "0" to the front if len<2if (length $mon < 2) { $mon="0".$mon; } # This is required as the localtime fnif (length $hour < 2) { $hour="0".$hour; } # doesn't prefix single digit fieldsif (length $min < 2) { $min="0".$min; } # with a "0".if (length $sec < 2) { $sec="0".$sec; }

my $datetime=(($year+1900)."-".$mon."-".$mday." ".$hour.":".$min.":".$sec);return $datetime;}

Sample of Perl Programmeused in previousSituation

sub sendevt {#############my ($class,$whotocall,$msgin,$OMNode) = @_;

print DBG "$datetime ----->sending alert class: $class Alert to: $whotocall Message: $msgin OMNode: $OMNode\n";

$msgin = $msgin . " OMNode=$OMNode";$msg = Postemsg->new('koguxlh1'); # Set the server

hostname$msg->setClass("$class"); # Set a valid class

name and$msg->setAdapter('TMNT'); # Set a

valid adapter name$msg->setMessage("$msgin");$msg->setSeverity('CRITICAL');$msg->setSlotValue('hostname',"$OMNode");

# $msg->setSlotValue('origin','10.50.16.26');$msg->setSlotValue('evttype','TMNT');$msg->setSlotValue('alerttype',"\'$whotocall\'");$msg->setSlotValue('infrayn','N');$msg->setSlotValue('infracode',"");$msg->setSlotValue('filename','');$msg->setSlotValue('errormessage',"");$msg->setSlotValue('filepointer','');

$msg->sendMessage();

return;}

// [11/10/05 13:13:38:742 EST] 37d9c575 ThreadMonitor W WSVR0605W: Thread "Servlet.Engine.Transports : 790" (36694576)// has been active for 778,031 milliseconds and may be hung. There are 144 threads in total in the server that may be hung.// Log location = KOGNTISP044 - D:\IBM\WebSphere\AppServer\logs\fireflykog\SystemOut.log// Log location = KOGNTISP056 - D:\IBM\WebSphere\AppServer\logs\fireflykog2\SystemOut.log// Log location = KOGNTISP059 - D:\IBM\WebSphere\AppServer\logs\fireflykog3\SystemOut.log// Log location = SBDNTISP012 - D:\IBM\WebSphere\AppServer\logs\fireflysbd\SystemOut.log// Log location = SBDNTISP014 - D:\IBM\WebSphere\AppServer\logs\fireflysbd2\SystemOut.logFORMAT Firefly_TECAD_ThreadHungServer FOLLOWS NT_Base%s* has been active for %s milliseconds and may be hung. There are %s*hostname DEFAULTorigin DEFAULTthread $2msg1 $3severity CRITICALmsg PRINTF("Thread has been active for %s milliseconds and may be hung. There are %s",thread, msg1)END

Sample Format File developed as backup alert

Results achieved = Execs happy

HTTP sessions rise and fall thru the dayJava heap size grows and fallsIn-flight workloads no longer grows endlessly

A Further example

Websphere monitor exposing Database problemsNotice the SQL Query/Update times

Some Query SQL Changes

Notice, after the changes to queries, response time is under 2 secs.Garbage collection monitoring?

And now a word from our sponsors

Part II

Nope, its not intermission its…

An event type that we use for ‘just’ alertingUsed in situations where correlation not req’dWho to Alert carried within event payloadEvent blocking data carried within eventNo need to continually create barocs, compile and restart TECParticularly good for LFA type scenarios

St.George ‘Universal Event’

Sample baroc structure

TEC_CLASS :SITM_Alert ISA TMW_EventDEFINES {

evttype : STRING;msg : STRING, dup_detect=yes;alerttype : STRING;infrayn : STRING;filename : STRING;errormessage : STRING;filepointer : STRING;infracode : STRING;custommsg : STRING;blockdetails : STRING;severity: default = CRITICAL;

};END

rule: tmw_escal_rl:(

event: _event of_class 'TMW_Event'where [

severity: outside [ 'HARMLESS', 'UNKNOWN' ]],

reception_action: tmw_escal_ra1:(

bo_get_class_of(_event, _classname),% Only do this if class is SITM_Alert_classname == 'SITM_Alert',bo_get_slotval(_event, evttype, _evttype),% Dont do it if evttype is SMSConnect_evttype \== 'SMSConnect',bo_get_slotval(_event, custommsg, _custommsg),bo_get_slotval(_event, msg, _oldmsg),atomconcat(['Custom Msg is ', _custommsg,' ', _oldmsg], _new_msg),bo_set_slotval(_event, msg, _new_msg),re_mark_as_modified(_event, _),bo_set_slotval(_event, infra_update, 'Pending'),bo_set_slotval(_event, infra_action, 'New'),exec_program(_event, '/usr/local/esm/bin/esm_action_tec_event.pl', '', [], 'NO'),set_event_status(_event, 'CLOSED' ),commit_set

),reception_action: tmw_escal_ra2:(

bo_set_slotval(_event, infra_update, 'Pending'),bo_set_slotval(_event, infra_action, 'New'),re_mark_as_modified(_event, _),exec_program(_event, '/usr/local/esm/bin/esm_action_tec_event.pl', '', [], 'NO')

)).

Sample TMW Rule

sub newalerthandler()#####################{

my $ACTION_MSG;my (@tempalerts, @OnCall, $numberofcommas);my ($infrayn, $alerts, $key, $emailorsms, $grouptoalert, $tempkey, $Infracode);

$alerts = $evt_alerts;$infrayn = $evt_infrayn;$Infracode = $evt_Infracode;

&Log_Frontend_Error($ESM_QUEUE_MGR_HANDLE, $ESM_ERROR_QUEUE_HANDLE, __FILE__,$TXN_CODES{ERROR_GENERAL}, DEBUG, __LINE__,"Alerts=$alerts Infra=$infrayn", $PROCESS_ID);

@tempalerts = split(/,/, $alerts);

Part I Perl Subroutine

Part II Perl Subroutine

foreach $key (@tempalerts) {

($emailorsms,$grouptoalert) = split(/\s+?/, $key);

&Log_Frontend_Error($ESM_QUEUE_MGR_HANDLE, $ESM_ERROR_QUEUE_HANDLE, __FILE__,$TXN_CODES{ERROR_GENERAL}, DEBUG, __LINE__,"Email or SMS =$emailorsms Group=$grouptoalert", $PROCESS_ID);

@OnCall = &get_oncall($grouptoalert, "1");#Set to one as the contact order&Mail_Notif(@OnCall) if ($emailorsms eq "EMAIL");&SMS_Notif(@OnCall) if ($emailorsms eq "SMS");

Part III Perl Subroutineif ($infrayn eq "Y") {

$INFRA_API_PARMS{strTransaction} = "NEWCALL";$INFRA_API_PARMS{strSystem} =“ ProblemRoute";$INFRA_API_PARMS{strString4} = "Tivoli";$INFRA_API_PARMS{strString5} = "Automated Support";#Default Infra group$INFRA_API_PARMS{strString7} = $Infracode;$INFRA_API_PARMS{strString16}= "Y" ; #if ($InfraRouteFlag eq "ProblemRoute");$INFRA_API_PARMS{strString9} = $Severity{"$severity"};#set Infra Sev$INFRA_API_PARMS{strString11}= "Call logged by Tivoli";$ACTION_MSG .= "sysadm.WRAPPER_API ";foreach $key (@INFRA_API_KEYS){$ACTION_MSG .= "\@$key = \'$INFRA_API_PARMS{$key}\',";}chop ($ACTION_MSG);&directinfra($ACTION_MSG) ;}

}}

No TEC rule correlation processing neededCan be handled in normal TMW.rlsEscalated by our Perl alerting daemonCan handle multiple SMSs and Emails to multiple parties

St.George ‘UE’ Processing

Saves on Situation definationSocket MDL setup on all remote TEMSPerl LFA program generates SITM_AlertsSends to socket port 7500 on TEMSOne situation alerting on many monitorsPerl LFA buffers at EP if not sentCan try multiple TEMS/remotes

St.George ‘UE’ and ITM61

//APPL TSitmSock//NAME TSitmCatcher E 7200 AddTimeStamp//ATTRIBUTES ';'Host D 15Msg D 500EventType D 10AlertType D 50Infra D 1InfraCode D 25FileName D 50ErrorMessage D 50CustomMessage D 500FilePointer D 50BlockDetails D 50

Sample MDL that accepts data from Perl LFAs on EPs

if (! $foundstr1) {if ($SockOut2 = new IO::Socket::INET (PeerAddr => ‘YourTEMS/RTEMS',PeerPort => 7500,Proto => 'tcp')) {

print $SockOut2 "//tsocket\n";print $SockOut2 "$compname;input $psearchstr1 has not been found at

$datetime;TMTP;$whotocall;$lodgeinfra;$infracode;nofilename;noerrormessage;nocustommsg;nofilepointer;noblockdetails\n";

close($SockOut2);print DBG "Did connect 1: $@\n";

print DBG "Sent$compname;input $psearchstr1 has not been found at $datetime;TMTP;$whotocall;$lodgeinfra;$infracode;nofilename;noerrormessage;nocustommsg;nofilepointer;noblockdetails\n";

}else {print DBG "Did not connect 1: $@\n";open (DATF, ">>$cachefile");print DATF "//tsocket\n";print DATF "$compname;input $psearchstr1 has not been found at

$datetime;TMTP;$whotocall;$lodgeinfra;$infracode;nofilename;noerrormessage;nocustommsg;nofilepointer;noblockdetails\n";;

close (DATF);}

}

Sample Perl send statement

Sample MDL that executes your Perl/VBS LFA

//APPL Parsefile//NAME FileChk1 K 3600 AddTimeStamp Interval=600//SOURCE SCRIPT C:\Tivoli\ITM\TMAITM6\scripts\perl.exe parsefile.pl//ATTRIBUTES ' 'Host D 15Action D 7Msg D 150

Perl binaries need to be on EP but you don’t need to install PerlUse push @INC,"/Tivoli/ITM/TMAITM6/scripts/lib";We install Perl.exe and perl58.dll in the scripts dir by default and libs belowSysroot and windir had to be set inside Perl code before necessary win libs locateIf you change the UA the alerting sit needs to be recreated

Things to watchout for

Q & A

Ideas to share

Disclaimers and TrademarksNo part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM customers are responsible for ensuring their own compliance with legal requirements. It is the customer's sole responsibility to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer's business and any actions the customer may need to take to comply with such laws. The following terms are trademarks or registered trademarks of the IBM Corporation in either the United States, other countries or both: DB2, e-business logo, eServer, IBM, IBM eServer, IBM logo, Lotus, Tivoli, WebSphere, Rational, z/OS, zSeries, System z.Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States and/or other countries.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States and/or other countries.UNIX is a registered trademark of The Open Group in the United States and other countries.Linux is a trademark of Linus Torvalds in the United States and other countries.Other company, product, or service names may be trademarks or service marks of others.ITIL® is a Registered Trade Mark, and a Registered Community Trade Mark of the Office of Government Commerce, and is Registered in the U.S. Patent and Trademark Office.IT Infrastructure Library® is a Registered Trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.