Upload
helpsystems
View
125
Download
1
Embed Size (px)
Citation preview
All trademarks and registered trademarks are the property of their respective owners.© HelpSystems LLC. All rights reserved.
How Robot Monitor can help you pen up runaway processes before they cause problems
Stop Feeding IBM iPerformance Hogs
HelpSystems LLC. All rights reserved.
YOUR HOSTS
Chuck LosinskiDirector of Automation Technology
HelpSystems
Broadcasting live from Eden Prairie, Minnesota USA
and Düsseldorf, Germany
Kurt ThomasSystem Engineer
HelpSystems
HelpSystems LLC. All rights reserved.
• Resource-hogging scenarios and jobs
• Qualify impact on CPU,
memory, and storage
• Identify and address resource
hogs automatically
AGENDA
HelpSystems LLC. All rights reserved.
MOST COMMON HOGS
What types are most common?
• CPU hogs
• Main storage hogs
• Temporary storage hogs
How can you locate them and take action?
• A reactive solution: WRKACTJOB, WRKSYSACT, iNAV
• A proactive solution: Robot Monitor
These impact job run time
HelpSystems LLC. All rights reserved.
Why are these issues so problematic to an IBM i environment?
• Rapid resource consumption
• Need to be found quicklyResource
Hogs
• Limited
• ExpensiveResources
PROBLEM PROBLEM PROBLEM
Userimpact
Demandfor moreresources
Systemfailure
CHALLENGES
UP NEXT...
CPU Hogs
HelpSystems LLC. All rights reserved.
What is the impact on your enterprise?
System symptoms:
• CPU is being rapidly consumed
Impact:
• Until the job(s) causing
this are identified and
the issue resolved,
expect:
o Additional resource expenditure
o Impairment of user productivity
o Impact on many systems
o POS, web, DB, GUI, credit cards
Rapid CPU consumption
Operators in race against the clock to identify
CPU hogs
System productivity is maintained onlyby feeding extra
resources
All of which consumes more
resources
Underlying issue remains until the
team runs outof resources or
resolves it!
CPU HOGS
HelpSystems LLC. All rights reserved.
QZDASOINIT and QSQSRVR Jobs
What are they?
• DB2 database server jobs
What do they do?
• Let applications pull information from, and modify, information in DB2 on IBM i
• QZDASOINIT jobs implement ODBC, JDBC
• QSQSRVR implements Server Mode = IBM-flavor ODBC
Why so problematic?
• The issue with QZDASOINIT and QSQSRVR jobs is not the jobs themselves—
it is poorly-written SQL code running in them that causes the issues.
CPU HOGS
HelpSystems LLC. All rights reserved.
CPU HOGSHow to find the maximum number of QZDASOINIT jobs in QUSRWRK
Step 1: Run DSPSBSD SBSD(QUSRWRK)
HelpSystems LLC. All rights reserved.
CPU HOGSHow to find the maximum number of QZDASOINIT jobs in QUSRWRK
Step 2: Select Option 10 (Display Prestart Job Entries)
HelpSystems LLC. All rights reserved.
CPU HOGSHow to find the maximum number of QZDASOINIT jobs in QUSRWRK
Step 3: Select Option 5 (Display Details on QZDASOINIT)
Result: The maximum number of jobs and uses
HelpSystems LLC. All rights reserved.
CPU HOGSHow to display Active Prestart Jobs
Step 1: Run DSPACTPJ SBS(QUSRWRK) PGM(QZDASOINIT)
Result: Shows the current number of prestart jobs and how many are still in use
HelpSystems LLC. All rights reserved.
CPU HOGSHow do we find the offending job FAST?
Common example:
• QZDASOINIT jobs
Challenges to identification:
• Jobs all share the same name
• Potentially hundreds of jobs could be contributing to the issue
• The longer this process takes, the more CPU is consumed and the greater
the risks and impact on resources
HelpSystems LLC. All rights reserved.
CPU HOGSReactive approach
Step 1: WRKACTJOB followed by manual batch investigation and resolution at
the job level on each LPAR you manage
HelpSystems LLC. All rights reserved.
CPU HOGSTroubleshooting
Big picture questions:
• Who is running these jobs?
• What proportion of overall CPU are these jobs consuming?
Conclusion:
• Greater insight will provide deeper understanding and context to any issues,
which will result in faster problem resolution.
HelpSystems LLC. All rights reserved.
CPU HOGSProactive approach
• Step 1: Real-time visibility
• Step 2: Immediate access to jobs for resolution and threshold levels
for early detection
HelpSystems LLC. All rights reserved.
CPU HOGSProactive approach
Data definitions: Define by any/all systems + Define by job name
HelpSystems LLC. All rights reserved.
CPU HOGSProactive approach
Thresholds: Custom thresholds and proactive alerts
HelpSystems LLC. All rights reserved.
CPU HOGSProactive approach
Monitored groups: Granular monitoring can extend to subsystem, accounting
code, user, current user, job, and function
HelpSystems LLC. All rights reserved.
• Identify and manage CPU hogs
LIVE DEMO
UP NEXT...
Main Storage Hogs
HelpSystems LLC. All rights reserved.
Page faults
System symptoms:
• The number of paging faults rapidly increases, a condition known as “thrashing”
Impact:
• Until the job(s) causing this
are identified and the issue
resolved, expect:
o Number of page faults
to rapidly rise
o Jobs to take longer
and longer to execute
o Impairment of user productivity
Data loaded/re-loaded into
memory
Batch job beginsto process
Interactive job flushes memory
records
Batch job can’t access the
records (faulting occurs) and...
ThrashingCycle
MAIN STORAGE HOGS (RAM)
HelpSystems LLC. All rights reserved.
Reactive approach
Step 1: Access the Work with System Status screen to show number of page
faults in each memory pool
MAIN STORAGE HOGS
HelpSystems LLC. All rights reserved.
Troubleshooting
Big picture questions:
• Which jobs are responsible for causing problems in these memory pools?
• Which subsystem(s) are using these memory pools?
Conclusion:
• Greater insight will give more meaningful understanding and context to any
issues for faster problem resolution.
MAIN STORAGE HOGS
HelpSystems LLC. All rights reserved.
Proactive approach
Step 1: Real-time visibility of dedicated NDB bar shows us the overall system
faults/second.
Stage 2: Immediate access to offending jobs for resolution. Threshold levels
for early detection.
MAIN STORAGE HOGS
HelpSystems LLC. All rights reserved.
• Identify and manage main storage hogs (thrashing)
LIVE DEMO
UP NEXT...
Disk Hogs
HelpSystems LLC. All rights reserved.
What is temporary storage?
• Storage allocated by the OS in the system ASP (disk). Its contents will be
cleaned up when an IPL is performed.
• Programs loaded into the activation group and the associated variables, heap space (Java as well as other HLL APIs, i.e., malloc, calloc), open data paths, etc.
Temporary storage?
YES!
• Objects in QTEMP librariesTemporary storage?
NO!
DISK HOGSTemporary storage
HelpSystems LLC. All rights reserved.
DISK HOGS
System symptoms:
• Overall system DASD is rapidly being consumed as a result of underlying temporary
storage being used by a runaway job
Impact:
• Until the job(s) causing this are identified and the issue resolved, expect:
o Additional disk requirement
o Potential system failure if left unchecked
Runaway job
Temporary storage
Overall DASD
Temporary storage
HelpSystems LLC. All rights reserved.
Common example:
• Java memory leak—Java job allocates memory to run tasks, but never
returns memory to OS after finishing task
Challenges to identification:
• Where is storage being used?
• Which jobs are consuming temporary storage?
DISK HOGSTemporary storage
HelpSystems LLC. All rights reserved.
Step 1: Use the command WRKSYSSTS to determine 2 values:
• Current temporary storage consumption
• Peak temporary storage consumption
DISK HOGSReactive approach
HelpSystems LLC. All rights reserved.
Step 2: Run the WRKSYSACT command as follows: WRKSYSACT
SEQ(*STGNET) to determine which jobs are using the most temporary storage
• Only shows jobs that have had activity since the last collection
• Can also include permanent storage
DISK HOGSReactive approach
HelpSystems LLC. All rights reserved.
Step 1:
• Real-time visibility of the temporary storage being used by each subsystem
Step 2:
• Immediate access
to offending jobs
for resolution
• Threshold levels
for early detection
DISK HOGSProactive approach
HelpSystems LLC. All rights reserved.
• Identify and manage temporary storage hogs
LIVE DEMO
HelpSystems LLC. All rights reserved.
• Resource-hogging scenarios and jobs
– How to identify them
• Qualify impact on CPU,
memory, and storage
– How this will affect users and applications
• Identify and address resource
hogs automatically
– Thresholds and notification
RECAP and Q&A
HelpSystems LLC. All rights reserved.
Upcoming Performance Monitoring Webinars
• SQL-Based Monitoring
– September 7
• Real-Time Disk Monitoring
– September 28
Integrating Robot Monitor into your infrastructure
HelpSystems LLC. All rights reserved.
Thank you for attending!
Contact Information
Website:
www.helpsystems.com/robot
Telephone:
800-328-1000 sales
+1 952-933-0609 support
Presenters: