Upload
hadieu
View
219
Download
4
Embed Size (px)
Citation preview
Session 2.5 - Tivoli Monitoring implementation at St.George Bank:Using ITM6.1 for Debugging complex applications
Sydney 3-4 August 2006
By Craig Lister – [email protected]
About the Presenter
Over 25 years in IT
Last 6 specialising in TEC,ITM
Co-authored ITM Redbook
Perl specialist
Agenda
Tivoli Setup at St GeorgeWhat is MonitoredThe Complex ApplicationWhy is my Application not PerformingAnd the solution isUniversal Event
We use :-Tivoli Configuration Manager 4.2.3Tivoli Enterprise Console 3.9FP4IBM Tivoli Monitoring 6.1 and 5.1IBM Tivoli Monitoring For Transaction Performance 5.3
Tivoli Products Deployed at St.George
Seperate joined TEC TMRManaging over 1000 serversThru 8 gateways from Inet perimeters to core LANsReal time Trouble ticket lodgement direct from Solaris to MSSQL DBIntegrated with 3rd party cash management system
Tivoli Setup at St.George
Tivoli Setup at St.George Cont’d
In house two way SMS paging for first and second callServer owners listed in fact files for alertingDynamic ‘who’s on call’ file kept and maintained by users via emailDynamic conf file held on all classes per user group for alert generation
ATM’s monitored via Prognosis feed from Tandem non-stop system via TEC non TMETandem system monitored via TEC non TMEATM’s configuration controlled via CMAll servers monitored via ITM61/ITM51Banks internet systems monitored via live TMTP transactions
How St.George uses Tivoli to Monitor Key Infrastructure
The application in question is a CRM (Customer Relationship Management)Present all customer A/C’s, single viewBank wideTime criticalAccess several large DB’sDivergent systems, Windows/AIX/PeopleSoftCustomer facing App = Executive focus
The Complex Application(A problem waiting to happen)
Initially two Websphere servers across two sites, one in each siteAfter initial problems increased to two Websphereservers at each site.One AIX backend serverOne AIX/DB2 Database serverIDM validation serversGlobally load balanced across the sites
The Complex Application Cont’d
Websphere servers locking up after 36hrs needing re-boot to correct.Transactions thru city site slower than KogarahCustomers impacted
The Symptoms(Aka: unwanted executive focus)
Install Websphere monitor on all serversSetup of kwe.xmlAnalyse/collect HTTP Session dataAnalyse/collect Garbage Collection dataAnalyse/collect InFlight Workload data
Gathering Evidence using ITM61
<?xml version="1.0" encoding="UTF-8"?><KWEINSTR Version="130"
CollectCPUTime="false"CollectHTTPSessStats="false"CollectInFlightWorkloads="captured"CollectorSessionPort="65535"CtgDelayPlugin=""DisplayEJBAs="EAR:CLASS"DisplayServletWorkloadAs="CLASS"GlobalInstrumentation="all"HeapAnalysisUserClasses=""InternalTrace="false"LockAnalysisSystemClasses=""LockAnalysisUserClasses=""LogFileName="kweinstr.log"MaxTrivialMethodInstructions="5"MaxClasses="2000"MeasureHeapDelays="true"NumHeapAnalysisClasses="2000"ShowInternalWorkloads="false"ShowMethodArguments="false"SysInstr="Direct"toStringIsSafeToUse="false"
>
Kwe.xml data settings used, Part I
<Class Name="psft.pt8.*"ClassType="user"DepthSensitiveInstrumentation="true"DisplayServletWorkloadAs="CLASS"HeapAnalysis=""IgnoreTrivialMethods="true"LockAnalysis="false"MethodNames="*"
MethodType="SERVLET,EJB,METHODS,PORTLET">
</Class>
</KWEINSTR >
Kwe.xml data settings used, Part II
You will need to create one of the above entries for each class you want to instrument
Note HTTP sessions, sharp rise to plateauNo sessions released during 9am – 5pmChunk released approx 12hr after sharp riseNote Java Heap usage rises unabatedMore and more frequent Garbage collectionsAnd, the clincher, InFlight work loads growing in duration and numberResulting in server re-boot
The Analysis(Showing the executive why they pay us)
After some digging, business design requirement of 12hr HTTP session timeoutThis was so Branches wouldn’t have to login repeatedly, thus degrading their serviceThis session timeout changed to 30mins on the Websphere servers. Retained at 12hrs on IDM.Situations developed to look at Inflight timesDevelop LFA for Websphere server log
The Analysis Cont’d
use Postemsg;use Sys::Hostname;
my ($id_string) = 'sendtec';
#Global Environmental Variablesmy $local_host=hostname();my $logsdirfile = '/var/spool/sendtec.log';
my $datetime = &loct();open (DBG, ">$logsdirfile");print DBG "$datetime Begin run of sendtec\n";
&sendevt(@ARGV);
close DBG;
exit 0;# E N D O F M A I N L I N Esub loct {######################Define the localtimemy ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);$mon = $mon+1;if (length $mday < 2) { $mday="0".$mday; } # Add a "0" to the front if len<2if (length $mon < 2) { $mon="0".$mon; } # This is required as the localtime fnif (length $hour < 2) { $hour="0".$hour; } # doesn't prefix single digit fieldsif (length $min < 2) { $min="0".$min; } # with a "0".if (length $sec < 2) { $sec="0".$sec; }
my $datetime=(($year+1900)."-".$mon."-".$mday." ".$hour.":".$min.":".$sec);return $datetime;}
Sample of Perl Programmeused in previousSituation
sub sendevt {#############my ($class,$whotocall,$msgin,$OMNode) = @_;
print DBG "$datetime ----->sending alert class: $class Alert to: $whotocall Message: $msgin OMNode: $OMNode\n";
$msgin = $msgin . " OMNode=$OMNode";$msg = Postemsg->new('koguxlh1'); # Set the server
hostname$msg->setClass("$class"); # Set a valid class
name and$msg->setAdapter('TMNT'); # Set a
valid adapter name$msg->setMessage("$msgin");$msg->setSeverity('CRITICAL');$msg->setSlotValue('hostname',"$OMNode");
# $msg->setSlotValue('origin','10.50.16.26');$msg->setSlotValue('evttype','TMNT');$msg->setSlotValue('alerttype',"\'$whotocall\'");$msg->setSlotValue('infrayn','N');$msg->setSlotValue('infracode',"");$msg->setSlotValue('filename','');$msg->setSlotValue('errormessage',"");$msg->setSlotValue('filepointer','');
$msg->sendMessage();
return;}
// [11/10/05 13:13:38:742 EST] 37d9c575 ThreadMonitor W WSVR0605W: Thread "Servlet.Engine.Transports : 790" (36694576)// has been active for 778,031 milliseconds and may be hung. There are 144 threads in total in the server that may be hung.// Log location = KOGNTISP044 - D:\IBM\WebSphere\AppServer\logs\fireflykog\SystemOut.log// Log location = KOGNTISP056 - D:\IBM\WebSphere\AppServer\logs\fireflykog2\SystemOut.log// Log location = KOGNTISP059 - D:\IBM\WebSphere\AppServer\logs\fireflykog3\SystemOut.log// Log location = SBDNTISP012 - D:\IBM\WebSphere\AppServer\logs\fireflysbd\SystemOut.log// Log location = SBDNTISP014 - D:\IBM\WebSphere\AppServer\logs\fireflysbd2\SystemOut.logFORMAT Firefly_TECAD_ThreadHungServer FOLLOWS NT_Base%s* has been active for %s milliseconds and may be hung. There are %s*hostname DEFAULTorigin DEFAULTthread $2msg1 $3severity CRITICALmsg PRINTF("Thread has been active for %s milliseconds and may be hung. There are %s",thread, msg1)END
Sample Format File developed as backup alert
Results achieved = Execs happy
HTTP sessions rise and fall thru the dayJava heap size grows and fallsIn-flight workloads no longer grows endlessly
Some Query SQL Changes
Notice, after the changes to queries, response time is under 2 secs.Garbage collection monitoring?
An event type that we use for ‘just’ alertingUsed in situations where correlation not req’dWho to Alert carried within event payloadEvent blocking data carried within eventNo need to continually create barocs, compile and restart TECParticularly good for LFA type scenarios
St.George ‘Universal Event’
Sample baroc structure
TEC_CLASS :SITM_Alert ISA TMW_EventDEFINES {
evttype : STRING;msg : STRING, dup_detect=yes;alerttype : STRING;infrayn : STRING;filename : STRING;errormessage : STRING;filepointer : STRING;infracode : STRING;custommsg : STRING;blockdetails : STRING;severity: default = CRITICAL;
};END
rule: tmw_escal_rl:(
event: _event of_class 'TMW_Event'where [
severity: outside [ 'HARMLESS', 'UNKNOWN' ]],
reception_action: tmw_escal_ra1:(
bo_get_class_of(_event, _classname),% Only do this if class is SITM_Alert_classname == 'SITM_Alert',bo_get_slotval(_event, evttype, _evttype),% Dont do it if evttype is SMSConnect_evttype \== 'SMSConnect',bo_get_slotval(_event, custommsg, _custommsg),bo_get_slotval(_event, msg, _oldmsg),atomconcat(['Custom Msg is ', _custommsg,' ', _oldmsg], _new_msg),bo_set_slotval(_event, msg, _new_msg),re_mark_as_modified(_event, _),bo_set_slotval(_event, infra_update, 'Pending'),bo_set_slotval(_event, infra_action, 'New'),exec_program(_event, '/usr/local/esm/bin/esm_action_tec_event.pl', '', [], 'NO'),set_event_status(_event, 'CLOSED' ),commit_set
),reception_action: tmw_escal_ra2:(
bo_set_slotval(_event, infra_update, 'Pending'),bo_set_slotval(_event, infra_action, 'New'),re_mark_as_modified(_event, _),exec_program(_event, '/usr/local/esm/bin/esm_action_tec_event.pl', '', [], 'NO')
)).
Sample TMW Rule
sub newalerthandler()#####################{
my $ACTION_MSG;my (@tempalerts, @OnCall, $numberofcommas);my ($infrayn, $alerts, $key, $emailorsms, $grouptoalert, $tempkey, $Infracode);
$alerts = $evt_alerts;$infrayn = $evt_infrayn;$Infracode = $evt_Infracode;
&Log_Frontend_Error($ESM_QUEUE_MGR_HANDLE, $ESM_ERROR_QUEUE_HANDLE, __FILE__,$TXN_CODES{ERROR_GENERAL}, DEBUG, __LINE__,"Alerts=$alerts Infra=$infrayn", $PROCESS_ID);
@tempalerts = split(/,/, $alerts);
Part I Perl Subroutine
Part II Perl Subroutine
foreach $key (@tempalerts) {
($emailorsms,$grouptoalert) = split(/\s+?/, $key);
&Log_Frontend_Error($ESM_QUEUE_MGR_HANDLE, $ESM_ERROR_QUEUE_HANDLE, __FILE__,$TXN_CODES{ERROR_GENERAL}, DEBUG, __LINE__,"Email or SMS =$emailorsms Group=$grouptoalert", $PROCESS_ID);
@OnCall = &get_oncall($grouptoalert, "1");#Set to one as the contact order&Mail_Notif(@OnCall) if ($emailorsms eq "EMAIL");&SMS_Notif(@OnCall) if ($emailorsms eq "SMS");
Part III Perl Subroutineif ($infrayn eq "Y") {
$INFRA_API_PARMS{strTransaction} = "NEWCALL";$INFRA_API_PARMS{strSystem} =“ ProblemRoute";$INFRA_API_PARMS{strString4} = "Tivoli";$INFRA_API_PARMS{strString5} = "Automated Support";#Default Infra group$INFRA_API_PARMS{strString7} = $Infracode;$INFRA_API_PARMS{strString16}= "Y" ; #if ($InfraRouteFlag eq "ProblemRoute");$INFRA_API_PARMS{strString9} = $Severity{"$severity"};#set Infra Sev$INFRA_API_PARMS{strString11}= "Call logged by Tivoli";$ACTION_MSG .= "sysadm.WRAPPER_API ";foreach $key (@INFRA_API_KEYS){$ACTION_MSG .= "\@$key = \'$INFRA_API_PARMS{$key}\',";}chop ($ACTION_MSG);&directinfra($ACTION_MSG) ;}
}}
No TEC rule correlation processing neededCan be handled in normal TMW.rlsEscalated by our Perl alerting daemonCan handle multiple SMSs and Emails to multiple parties
St.George ‘UE’ Processing
Saves on Situation definationSocket MDL setup on all remote TEMSPerl LFA program generates SITM_AlertsSends to socket port 7500 on TEMSOne situation alerting on many monitorsPerl LFA buffers at EP if not sentCan try multiple TEMS/remotes
St.George ‘UE’ and ITM61
//APPL TSitmSock//NAME TSitmCatcher E 7200 AddTimeStamp//ATTRIBUTES ';'Host D 15Msg D 500EventType D 10AlertType D 50Infra D 1InfraCode D 25FileName D 50ErrorMessage D 50CustomMessage D 500FilePointer D 50BlockDetails D 50
Sample MDL that accepts data from Perl LFAs on EPs
if (! $foundstr1) {if ($SockOut2 = new IO::Socket::INET (PeerAddr => ‘YourTEMS/RTEMS',PeerPort => 7500,Proto => 'tcp')) {
print $SockOut2 "//tsocket\n";print $SockOut2 "$compname;input $psearchstr1 has not been found at
$datetime;TMTP;$whotocall;$lodgeinfra;$infracode;nofilename;noerrormessage;nocustommsg;nofilepointer;noblockdetails\n";
close($SockOut2);print DBG "Did connect 1: $@\n";
print DBG "Sent$compname;input $psearchstr1 has not been found at $datetime;TMTP;$whotocall;$lodgeinfra;$infracode;nofilename;noerrormessage;nocustommsg;nofilepointer;noblockdetails\n";
}else {print DBG "Did not connect 1: $@\n";open (DATF, ">>$cachefile");print DATF "//tsocket\n";print DATF "$compname;input $psearchstr1 has not been found at
$datetime;TMTP;$whotocall;$lodgeinfra;$infracode;nofilename;noerrormessage;nocustommsg;nofilepointer;noblockdetails\n";;
close (DATF);}
}
Sample Perl send statement
Sample MDL that executes your Perl/VBS LFA
//APPL Parsefile//NAME FileChk1 K 3600 AddTimeStamp Interval=600//SOURCE SCRIPT C:\Tivoli\ITM\TMAITM6\scripts\perl.exe parsefile.pl//ATTRIBUTES ' 'Host D 15Action D 7Msg D 150
Perl binaries need to be on EP but you don’t need to install PerlUse push @INC,"/Tivoli/ITM/TMAITM6/scripts/lib";We install Perl.exe and perl58.dll in the scripts dir by default and libs belowSysroot and windir had to be set inside Perl code before necessary win libs locateIf you change the UA the alerting sit needs to be recreated
Things to watchout for
Disclaimers and TrademarksNo part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM customers are responsible for ensuring their own compliance with legal requirements. It is the customer's sole responsibility to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer's business and any actions the customer may need to take to comply with such laws. The following terms are trademarks or registered trademarks of the IBM Corporation in either the United States, other countries or both: DB2, e-business logo, eServer, IBM, IBM eServer, IBM logo, Lotus, Tivoli, WebSphere, Rational, z/OS, zSeries, System z.Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States and/or other countries.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States and/or other countries.UNIX is a registered trademark of The Open Group in the United States and other countries.Linux is a trademark of Linus Torvalds in the United States and other countries.Other company, product, or service names may be trademarks or service marks of others.ITIL® is a Registered Trade Mark, and a Registered Community Trade Mark of the Office of Government Commerce, and is Registered in the U.S. Patent and Trademark Office.IT Infrastructure Library® is a Registered Trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.