40
DB2 10 Capturing Tuning and Trending for SQL Workloads - a resource and cost saving approach Roy Boxwell SOFTWARE ENGINEERING GmbH Session Code: V05 15.10.2013, 11:30 – 12:30 | Platform: DB2 z/OS

DB2 10 Capturing Tuning and Trending for SQL Workloads - a

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

DB2 10 Capturing Tuning and Trending for

SQL Workloads - a resource and cost saving approach Roy Boxwell SOFTWARE ENGINEERING GmbH

Session Code: V05 15.10.2013, 11:30 – 12:30 | Platform: DB2 z/OS

Click to edit Master title style

Agenda

1. DB2 10 technology used by SQL WorkloadExpert (WLX)

2. The three components of WLX a) The WLX-Started Task (assembler-written)

b) Workload processing engine

c) GUI front end for Eclipse

3. Real life example: a) Access Path Protection

b) Fast problem detection

c) Useful advice rule system

d) KPI trending

4. Quick solution creation and control

5. Selected use cases

2

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

• How many resources do you spend on capturing DB2 SQL workload and its metrics?

• There seems to be out-of-the-box metrics delivered by DB2, but does it give me all the data I need, when I need it?

• How does the smarter database, how does DB2 10 for z/OS deal with it?...

3

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

• DB2 10 Monitoring Enhancements and Changes: • Statement Level Statistics

• Enhanced messages and traces to capture statement level information

• Statement information in real-time

• STMT_ID – unique statement identifier assigned when statement first inserted into DSC

• Statement type – static or dynamic

• Bind TS – 10 byte TS when stmt was bound, or prepared

• Statement level execution statistics (per execution)

• New Monitor class 29 for statement detail level monitoring

• Monitor Class 29 (overhead is ~1-3%)

• New for statement level detail

4

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

What‘s exactly new: • IFCID 316 was enhanced to externalize the data from the Dynamic

Statement Cache (DSC) when a flushing situation occurs (LRU, RUNSTATs, ALTER, DROP, REVOKE, …) – NO DATA LOSS

• New IFCIDs 400* and 401 additionally EDM pool data – let’s call it the Static Statement Cache

• Memory resident storage of static SQL statements

• Like with the enhanced 316, data is externalized when the EDM pool is full. – NO DATA LOSS

*This IFCID is not really an IFCID but more „switch“ to enable externalization of static SQL metrics

5

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

DSC and EDM provide detailed workload insights:

• SQL text

• Statement ID

• Date/time

• Current status

• Resource consumption

6

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

DB2 10 also provides some additional information from the DSC trace we all know today:

• Wait time accumulation for • Latch requests

• Page latches

• Drain locks

• Drains during waits for claims to be released

• Log writers

7

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

• Date and time in store clock format for Stmt insertion and update (along with internal format)

• Number of times that • a RID list overflowed because of

• storage shortage

• # of RIDs exceeded internal limit(s)

• a RID list append for a hybrid join interrupted

• because of RID pool storage shortage

• # of RIDs exceeded internal limit(s)

• a RID list retrieval failed for multiple IX access. The result of IX AND/OR-ing could not be determined

8

Click to edit Master title style

DB2 10 technology used by SQL WorkloadExpert (WLX)

9

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

10

The WLX-STC(assembler-written)

GUI front end for Eclipse

Workload processing engine

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

The WLX STC:

We run a started task 24x7 to catch all the IFCIDs that DB2 will be throwing and store the data.

Workload processing engine:

We externalize and process the data, such as every 60 min: • customizable (e.g. 30 - 180 minutes)

• allow Ad hoc data refresh triggered via operator command for the started task (MODIFY)

• Capture the SQL Text at trace time

11

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

GUI front end for Eclipse:

We exploit and integrate into Eclipse based GUI front ends • The GUI comes as a Plug-in for

• IBM Rational

• IBM Data Studio

• Eclipse native

• Existing DB2 connections are used to connect to the mainframe

• Interactive dialogs allow complex and powerful analysis

• Export features create PDF reports and allow MS Excel hand over

• Additional plug-ins interface with other SOFTWARE ENGINEERING tools, such as !SQL PerformanceExpert (SPX)

12

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX) • First Topic

• Second Topic • Sub-Topic A

• Secondary Sub-Topic 1

• Secondary Sub-Topic 2

• Sub-Topic B

• Sub-Topic C

• Third Topic

• Fourth Topic

13

Workoad KPIs – left hand side

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

14

Workoad KPIs – right hand side

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

Spider diagram of three application extracts

15

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

Example of our Eclipse embedded SPX plug-in

16

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

SPX rule violations

17

Click to edit Master title style

The three components of SQL WorkloadExpert (WLX)

We enhanced our existing SQL Performance Management tools to interface to the DB2 out-of-the-box data.

Resulting benefits:

• See any executed SQL in a plex-wide report

• Workload/performance warehouse repository that contains *all* executed SQL

• Powerful history and trending analysis

All of this is now available with the smallest overhead ever possible!

18

Click to edit Master title style

Real life example …

19

… Quick solution creation and control

Click to edit Master title style

July Problem…

• Thursday night – Production staging of numerous packages

• Friday – Thread time-outs, deadlocks, bad news! • First thought: Must be caused by a bad package – All packages checked

for bad access paths and everything found was OK

• Second thought: Open Priority two ticket at IBM in case it is a DB2 problem

• JAVA trace showed a long running SQL appearing often

• REORG with inline RUNSTATs the biggest tables used in that SQL

• Reduce number of available servers to stop problem getting worse (internal throttling of transactions)

• Full panic mode now enabled

• Saturday – Call in senior DBA from vacation

20

Click to edit Master title style

July Problem…

• Saturday Morning 09:00 – DBA uses WLX to compare SQL workload from Thursday with Friday – Sees bad guy instantly

• 09:10 – DBA uses BIX to confirm that an access path change has caused the problem – Nothing to do with staging, Nothing to do with the large tables

• 09:15 – DBA creates a new “virtual” index using SPX and re-tests – Access path switches back to old correct method

• 09:30 – DBA creates a new index, RUNSTATS it, everything is fixed and the systems are running sweetly again

• 09:45 – DBA goes back on vacation

21

Click to edit Master title style

July Problem…

• Monday Morning – Investigation by DBA group starts

• 12:00 - DBA group finds that a badly timed RUNSTATs on Thursday night caused the access path change. New index is OK and in fact the old index can now be dropped

• 15:30 – Report written for CIO, Problem closed at IBM

22

Click to edit Master title style

Selected use cases

GUI features – button overview

Example use case drop down box

23

Click to edit Master title style

Selected use cases

24

Example of application workload and SQL text drill down

Click to edit Master title style

Selected use cases

25

Compare view: Select any two SQLs to generate graphs

Click to edit Master title style

Selected use cases

26

Report generation dialog and selection

Click to edit Master title style

Selected use cases

27

Output of the selected reporting

Click to edit Master title style

Selected use cases

Here we have found our own bad guy! STOGROUP SQL

28

Click to edit Master title style

Selected use cases Now we need to see what it is doing…

Aha! This looks like a great candidate for LEFT OUTER JOIN processing (Already in our next RTDX PTF by the way!)

29

Click to edit Master title style

Selected use cases Application Usage figures

Adjusted data

30

Click to edit Master title style

Selected use cases Lots of executions for the *same* SQL going on here…

Why so often? Discussed with development and find it is a „design“ problem… The query could be run earlier and then only a few times a day instead of millions!

31

Click to edit Master title style

Selected use cases Often run BAD SQL

This workload splits into two SQLs

Which have this SQL: Six UNIONs… DBA rewrote down to one SELECT and IN usage.

32

Click to edit Master title style

Selected use cases

■ Use Case 1: Application Workload Analysis

■ Use Case 2: Workload-Change Problem-Detection and Trending

■ Use Case 3: Object Quiet Times for Maintenance (REORG)

■ Use Case 4: AUDIT (Who did What Where and When)

■ Use Case 5: Never Used Objects (Tables, Indexes, and Tablespaces)

■ Use Case 6: Never Executed SQL (Plans, Collections, Packages, and Statements)

■ V 2.0 ■ FTD ■ outside standard delivery

33

Click to edit Master title style

■ Use Case 7: Forecasting of possible performance improvements in dynamic SQL by exchanging literals with parameter markers

■ Use Case 8: Disc Problem Detection – I/O Rates

■ Use Case 9: Bufferpool Analysis – Hit Ratios, VPSEQT-Tuning

■ Use Case 10: Multi-Row Fetch Candidate Detection

■ Use Case 11: SQL KPIs – Background Noise and Exceptions

■ Use Case 12: SELECT Only Table Detection

■ Use Case 13: Delay Detection

■ V 2.0 ■ FTD ■ outside standard delivery

34

Selected use cases

Click to edit Master title style

■ Use Case 14: Deadlock and Timeout, Lock Escalation, Index Page Splits, and BIF Usage

■ Use Case 15: Multi-Snap

■ Use Case 16: SPX (!SQL PerformanceExpert) Link

■ Use Case 17: REORG Detector and Suppressor

■ Use Case 18: Eager vs. Lazy Loader Detection (JPA-Java Persistence API)

■ Use Case 19: Object Usage of Application Including Service Naming by Object

■ Use Case 20: Offline Performance Database

■ V 2.0 ■ FTD ■ outside standard delivery

35

Selected use cases

Click to edit Master title style

■ Use Case 21: Up and Down Scaling SQL Workloads

■ Use Case 22: Same SQL on Multiple Schemas

■ Use Case 23: DSC Flush Analysis

■ Use Case 24: SQL Text Analysis

■ Use Case 25: Application Object Cross-Reference Builder

■ Use Case 26: CLUSTER Index Detection

■ Use Case 27: Object to SQL Statement Cross-Reference

■ Use Case 28: CPU Intensive Statements

■ Use Case 29: Index Maintenance Costs

■ V 2.0 ■ FTD ■ outside standard delivery

36

Selected use cases

Click to edit Master title style

Selected use cases

37

Calendar date/time for from <-> to index analysis

Click to edit Master title style

Selected use cases

38

Result: Green: resource reduction Red: resource increase

Click to edit Master title style

Appendix

• Problem list : • UK70891 – Reset stats when STOP/START MONITOR TRACE

• UK72630 – Incorrect when exec same statement from different threads

• UK73219 – Strange counters - HIPER

• UK73903 – Storage leak leading to abend - HIPER

• UK78414 – Storage overlay leading to abend – HIPER

• UK80337 – Storage overlay leading to abend – HIPER

• UK81878 – Storage leak

• UK83219 – Strange numbers – HIPER

• UK90073 – Difference between IFCID 401 and IFCID 58

• UK93065 – SOS – HIPER

• UK94511 – Fields TB and IX sometimes wrong

• UK95683 – Storage leak leading to abend – HIPER

• UK96376 – DB2 11 Abend with IFCID 400 or 401

• UK97197 – Improve accuracy of IFCID 316 and 401

• Timings: • Externalizing every 5 mins for 24 hours cost 300 secs cpu (No EXPLAIN of course in this scenario!)

39

Roy Boxwell SOFTWARE ENGINEERING GmbH [email protected]

Session V05 DB2 10 Capturing Tuning and Trending for SQL Workloads - a resource and cost saving approach

40