View
213
Download
0
Category
Preview:
Citation preview
8/8/2019 Ibm Summer Internship Slides
1/35
Autonomic ComputingAutonomic Computing
Framework for Error Recovery inFramework for Error Recovery inIBM WebSphere MQIBM WebSphere MQ
A Proof of ConceptA Proof of Concept
NeerajNeeraj BishtBisht,, PawanPawan HN &HN &
Vikram SubramanyaVikram SubramanyaSummer Interns of 2007Summer Interns of 2007
IBM India Software Lab, BangaloreIBM India Software Lab, BangaloreManagerManager:: ArunArun ShivaswamyShivaswamy
WebShpereWebShpere MQ GroupMQ Group
8/8/2019 Ibm Summer Internship Slides
2/35
Profile: IBM ISL SoftwareProfile: IBM ISL SoftwareGroupGroup
IBM Software GroupIBM Software Group -- largest middlewarelargest middlewarecompany in the worldcompany in the world
Brands:Brands: WebSphere, Information Mgmt.,WebSphere, Information Mgmt.,Lotus, Tivoli, and RationalLotus, Tivoli, and Rational
Technology Areas:Technology Areas: SOA, XML, Web 2.0,SOA, XML, Web 2.0,Application Servers, Databases, AutonomicApplication Servers, Databases, AutonomicComputingComputing
8/8/2019 Ibm Summer Internship Slides
3/35
Motivation for ourMotivation for our PoCPoC
Current SceneCurrent Scene:: WebSphereWebSphere MQ cannot comeMQ cannot comeout of erroneous situations by itselfout of erroneous situations by itself Needs manual interventionNeeds manual intervention
ObjectiveObjective: To make MQ self: To make MQ self--reliantreliant Automatic monitoring/analysis of errorAutomatic monitoring/analysis of error
Recovery actionRecovery action
GistGist: Expose MQ to AC: Expose MQ to AC
8/8/2019 Ibm Summer Internship Slides
4/35
Autonomic ComputingAutonomic Computing
8/8/2019 Ibm Summer Internship Slides
5/35
WhatWhats Autonomics AutonomicComputing?Computing?
AimAim: To create: To create selfself--managingmanaging systemssystems Overcome complexity by automatingOvercome complexity by automating
maintenancemaintenance
AC makes the system:AC makes the system:
SelfSelf--ConfiguringConfiguring: adapt to changes, use policies: adapt to changes, use policies
SelfSelf--HealingHealing: diagnose H/W or S/W disruptions: diagnose H/W or S/W disruptions SelfSelf--OptimizingOptimizing: maximize IT resource usage: maximize IT resource usage
SelfSelf--ProtectingProtecting: defend from threats/attacks: defend from threats/attacks
8/8/2019 Ibm Summer Internship Slides
6/35
MAPEMAPE--K Loop ArchitectureK Loop Architecture
8/8/2019 Ibm Summer Internship Slides
7/35
The MAPEThe MAPE--K Loop in ACK Loop in AC
MonitorMonitor: Collect, filter details from: Collect, filter details frommanaged resourcemanaged resource
AnalyzeAnalyze: Learn IT: Learn IT envtenvt., predict future., predict future
PlanPlan: Policy actions to achieve goals: Policy actions to achieve goals
ExecuteExecute: Run the plan: Run the plan
KnowledgeKnowledge: Data shared among MAPE like: Data shared among MAPE like
symptoms & policiessymptoms & policies
8/8/2019 Ibm Summer Internship Slides
8/35
IBMIBMWebSphereWebSphere MQMQ
8/8/2019 Ibm Summer Internship Slides
9/35
WhatWhats IBM WebSpheres IBM WebSphereMQ?MQ?
IBMIBMs middlewares middlewarefor messaging &for messaging &queuingqueuing
CommunicationCommunicationamong programsamong programs
across aacross aheterogeneousheterogeneousnetworknetwork API callsAPI calls
8/8/2019 Ibm Summer Internship Slides
10/35
Messaging & QueuingMessaging & Queuing
MQ analogous toMQ analogous toemail, not phone!email, not phone!
8/8/2019 Ibm Summer Internship Slides
11/35
Queue Manager ObjectsQueue Manager Objects
QueueQueue: To store: To store msgmsg sent by programs; localsent by programs; localor remoteor remote
ChannelChannel: Logical communication link: Logical communication link Message ChannelMessage Channel: connects 2: connects 2 QMgrsQMgrs MQI ChannelMQI Channel: connects client to: connects client to QMgrQMgr
8/8/2019 Ibm Summer Internship Slides
12/35
MQ Error ScenariosMQ Error Scenarios
8/8/2019 Ibm Summer Internship Slides
13/35
QMgrQMgr Crash Error ScenariosCrash Error Scenarios
QMgrQMgr crash by killing the OAMcrash by killing the OAMprocessprocess amqzfuma.exeamqzfuma.exe
RecoveryRecovery: Close connection, restart: Close connection, restart QMgrQMgr
QMgrQMgr crash due to access violation incrash due to access violation in
the agent processthe agent process RecoveryRecovery: Close connection, restart: Close connection, restart QMgrQMgr
8/8/2019 Ibm Summer Internship Slides
14/35
More MQ Error ScenariosMore MQ Error Scenarios
Backward version DLLs placed in theBackward version DLLs placed in themachinemachine
RecoveryRecovery: Find installation path from: Find installation path from
registry; delete/renameregistry; delete/rename
DCOM user ID configured incorrectlyDCOM user ID configured incorrectly RecoveryRecovery: Run: Run amqmjpseamqmjpse --ss rr
8/8/2019 Ibm Summer Internship Slides
15/35
What Did We Do?What Did We Do?
These error scenarios were manuallyThese error scenarios were manuallyinduced into MQinduced into MQ
Populated Symptom catalog with possiblePopulated Symptom catalog with possibleerrors. Parsed the generated error logs toerrors. Parsed the generated error logs todetect themdetect them
Developed AC framework (MAPEDeveloped AC framework (MAPE--K loop)K loop)to call recovery procedure automaticallyto call recovery procedure automatically
8/8/2019 Ibm Summer Internship Slides
16/35
IBM Tools UsedIBM Tools Used
8/8/2019 Ibm Summer Internship Slides
17/35
Error Log AnalysisError Log Analysis
EclipseEclipse--based tool,based tool,IBM Log & Trace AnalyzerIBM Log & Trace Analyzer(LTA)(LTA)
Converts textual log records into Common BaseConverts textual log records into Common BaseEvent (CBE) format by parsingEvent (CBE) format by parsing
Log View of LTALog View of LTA::
8/8/2019 Ibm Summer Internship Slides
18/35
Symptom DatabaseSymptom Database
Knowledge base of problems & solutions forKnowledge base of problems & solutions fora software producta software product Symptom description: Why the problem occurs?Symptom description: Why the problem occurs?
Rules to identify a problem:Rules to identify a problem: XPathXPath expressionsexpressions
Recommended actionRecommended action
LTA provides a symptom editorLTA provides a symptom editor
Can also be used for correlation of eventsCan also be used for correlation of events
8/8/2019 Ibm Summer Internship Slides
19/35
Symptom Editor In LTASymptom Editor In LTA
8/8/2019 Ibm Summer Internship Slides
20/35
Closing MAPEClosing MAPE--K LoopK Loop
8/8/2019 Ibm Summer Internship Slides
21/35
IBM Problem DeterminationIBM Problem DeterminationAssistant (PDA)Assistant (PDA)
Tool to achieve closed AC loopTool to achieve closed AC loop
ComponentsComponents::
Generic Lop Adapter (GLA)Generic Lop Adapter (GLA) Symptom CatalogSymptom Catalog
Analysis EngineAnalysis Engine
Action ProcessorAction Processor Manager: Notification, configuration, autoManager: Notification, configuration, auto--
updateupdate
8/8/2019 Ibm Summer Internship Slides
22/35
Our Project:Our Project:GUI and Source CodeGUI and Source Code
ExplainedExplained
8/8/2019 Ibm Summer Internship Slides
23/35
Management Application based on AC frameworkManagement Application based on AC framework
WebSphereMQ
WebSphereMQ
ErrorLogsNotification
Router
NotificationRouter
Correlation
Engine (If needed)
CorrelationEngine (If needed)
Action
Processor
ActionProcessor
Action:Change
Queue ManagerQueue Manager
ContextContext
Analysis EngineAnalysis Engine
CBE for WMQerroneous
situation
CBECBE
Management
Data
Management
Data
CBE
SymptomDatabase for
WMQ
SymptomDatabase for
WMQ
Loadrules
Action:
Save
Save
CBE
AC Centric TechnologiesAC Centric Technologies
Generic Log Adaptor(GLA) / Log TraceAnalyzer (LTA) for WMQ
Runtime platform TPTP
XPath CorrelationEngine (if needed)
Generic Log Adaptor(GLA) / Log TraceAnalyzer (LTA) for WMQ
Runtime platform TPTP
XPath CorrelationEngine (if needed)
Restart
Action APIs
GLA for WMQGLA for WMQ
Use Case Realization ofUse Case Realization ofQMgrQMgr CrashCrash
8/8/2019 Ibm Summer Internship Slides
24/35
Project GUIProject GUI
8/8/2019 Ibm Summer Internship Slides
25/35
List of MQ ProcessesList of MQ Processes
8/8/2019 Ibm Summer Internship Slides
26/35
PutterPutterApplicationApplication
PutsPuts msgmsg in a nonin a non--full queuefull queue WhileWhile(Q.Connection(Q.Connection not closed)not closed)
IfIf ((Q.CurrDepthQ.CurrDepth
8/8/2019 Ibm Summer Internship Slides
27/35
GetterGetterApplicationApplication
ReceivesReceives msgmsg in a nonin a non--empty queueempty queue WhileWhile(Q.Connection(Q.Connection not closed)not closed)
IfIf ((Q.CurrDepthQ.CurrDepth > 0)> 0)
Q.GetQ.Get ((msgmsg););
ElseElse wait();wait();
8/8/2019 Ibm Summer Internship Slides
28/35
InduceInduce QMgrQMgrCrashCrash: Kill: KillOAM Process (!)OAM Process (!)
Manually issue the commandManually issue the command taskkilltaskkill /f //f /imimamqzfuma.exeamqzfuma.exe
forcefully kills theforcefully kills the FumaFuma imageimage
QMgrQMgrcrashes!crashes! AllAllprocesses are killedprocesses are killed
8/8/2019 Ibm Summer Internship Slides
29/35
PutterPutter&& GetterGetterStopStop
8/8/2019 Ibm Summer Internship Slides
30/35
Log File MonitorLog File Monitor
Call PDA to continuously ping in theCall PDA to continuously ping in thebackgroundbackground
When generated, log file is parsed intoWhen generated, log file is parsed into
CBE formatCBE format Error is matched with the symptomError is matched with the symptom
catalogcatalog
User is alertedUser is alerted
Recovery action is calledRecovery action is called
8/8/2019 Ibm Summer Internship Slides
31/35
QMgrQMgr Restart Action APIRestart Action API
After detection ofAfter detection of QMgrQMgr crash,crash, Close all existing connections to theClose all existing connections to the
QMgrQMgr
Restart using the commandRestart using the command STRMQMSTRMQM
If restart fails, wait and output errorIf restart fails, wait and output errorcodecode
8/8/2019 Ibm Summer Internship Slides
32/35
PutterPutter&& GetterGetterRestartRestart
8/8/2019 Ibm Summer Internship Slides
33/35
What Have We Achieved?What Have We Achieved?
For the first time, benefits of autonomic computingFor the first time, benefits of autonomic computingare realized onare realized on WebSphereWebSphere MQMQ
Common MQ errors are successfully overcome inCommon MQ errors are successfully overcome in
our demonstrationour demonstration
Feasibility is high, since time & space cost isFeasibility is high, since time & space cost isminimumminimum
Value Addition to MQ as a selfValue Addition to MQ as a self--managing resourcemanaging resource
8/8/2019 Ibm Summer Internship Slides
34/35
Future As We SeeFuture As We See
AC framework extends to all MQ errors;AC framework extends to all MQ errors;Makes MQ completelyMakes MQ completely SelfSelf--ReliantReliant
Manual intervention drastically reduces,Manual intervention drastically reduces,cutting labor costs to IBM;cutting labor costs to IBM;ProductivityProductivityincreasesincreases
We predict aWe predict aParadigm ShiftParadigm Shiftin the MQin the MQproduct & maintenance teamproduct & maintenance team
8/8/2019 Ibm Summer Internship Slides
35/35
Thank You!Thank You!
Managers,Managers,Mr.Mr.ArunArun ShivaswamyShivaswamy ofof WebSphereWebSphereMQ group &MQ group &Mr. M RMr. M RAnandaAnanda, AC team at IBM, AC team at IBMISL, BangaloreISL, Bangalore
TeamTeam--mates:mates:NeerajNeeraj BishtBisht, IITB,, IITB,PawanPawan HNHN, NITK, NITK
andand Vikram SubramanyaVikram Subramanya, NITK, NITK
MQ team, AC team at IBMMQ team, AC team at IBM
IBM, for giving us valuable exposure to industrialIBM, for giving us valuable exposure to industrialresearch, with some cash coming our way too (!)research, with some cash coming our way too (!)
Recommended