41
Db2 V12 Migration @KBC Gilbert Sieben

Db2 V12 Migration @KBC - GSE Young Professionals V12 Migration @KBC.pdf · • Db2 tries to store the V0 OBDREC info in a system page (new V12 feature and retrofitted to V11) and

  • Upload
    others

  • View
    15

  • Download
    1

Embed Size (px)

Citation preview

Db2 V12 Migration @KBCGilbert Sieben

Internal

Agenda

Company 2

1. Time line

2. Premigration checks

3. Migration to V12

4. Measurements

5. New Features

6. Lessons learned

Internal

Time line

Company 3

• Project of 1 year, 300 Mandays, described in a Project Charter, monthly reporting to steering committee.

• Migration in phases (starting from V11 FFM)

• TREL Nov 2017 : Db2 tools

• TREL Jan 2018 : early code

• TREL May 2018 : V12R1M100 : 1/3 Dev, 15/3 Acpt, 5/5 Prod

• TREL June 2018 : V12R1M500 : 17/5, 3/6 Acpt, 16/6 Prod

• Sandbox3 : to M100 and M500 : todo

1.

Internal

Time line : Db2s to migrate

Company 4

# subsyst. 2-way DS 4-way DS

Sandbox 4 2

DEV 14 2

ACPT 14 4 4

PROD 14 4 4

1.

Internal

Premigration checks

Company 5

• Cfr. Installation & Migration Guide for checklist

• Release incompatibilities : simple TS, explain tables in old format, Packages that access the catalog

• BIND/REBIND Packages/Plans ?

• DBAs were involved

• They got output of premigration checks

• They schedule (re)binds in applicative releases

• All products/tools (>40) that use/access Db2

• were checked for V12 tolerance/compatibility

• were upgraded if needed at TRELs

• Remark : >6 months elapsed, a lot of effort !

2.

Internal

Challenges

Company 6

• Planned outage because of CATMAINT job ?

• In V11 we did, both at CM and FFM

• Window between 01:00 and 03:00 including migration and testing by DBA’s and Business

• In V12 we didn’t

• 20 sec for M100 for our largest catalog

• But… (to be continued…)

3.

Internal

Migration to M100 on DEV

Company 7

• DB2 dumps when executing query in V12R1M100 : Usermod (applied immediatly), APARFIX (with M500), UI55366

• Query : select count(1) from (select NAME,(casewhen EXISTS(select 1 from sysibm.sysdummy1) then '1' else '2' end) from sysibm.systables where NAME = 'X')

• DBM1 creates a 0C4‐00000004 : UI55990

4.1.

Internal

Migration to M100 on ACPT

Company 8

• During migration Catmaint V12R1M100 fails with REASON 00C90101 ERQUAL 5020 (OMG !)

• PMR opened : looks like PI93894 : bad value in the SCA record for table space DSNDB08.D8TS1 before and after Db2 start.

• Fallback to V11 for 2 non-data sharing Db2s (scenarios were in place and tested before !). We didn’t migrate the 3 others.

• Solved via usermod 10 days later : migrate to M100 on non-data sharing Db2s (IBM Dev worked around the clock to deliver AI94553)

• UI55070

• Remark :

• -TERM UTIL needed if catmaint is not restarted (not documented)

• Be sure all utilities are terminated in V11 (no V11 UTIL can be terminated under V12 so fallback to V11 needed)

4.2.

Internal

After migration to M100 on ACPT

Company 9

• Issues with UR :

• UR ? : “We always use UR, also for reading in our programs, to have a minimal impact to online workload regarding locks.”

• Impacted : all UNLOAD utilities with SHRLEVEL CHANGE ISOLATION UR

• Not all records are read (= records altered by 1 subsystem of a Data Sharing group and thus registeredin the Coupling Facility)

• Workaround : mass update UNLOAD jobs with SHRLEVEL CHANGE ISOLATION UR REGISTER YES

• 1100 jobs in PROD

• APAR PI99075 / UI57005

4.3

.

Internal

After migration to M100 on ACPT

Company 10

• S0C1 dump when executing query in V12R1M100 : workaround : add ‘rtrim’ function to query + alter hidden zparm. APARFIX PI95129 applied. Afterwards UI55397

• DIST address space dumps after migrating to V12R1M100 : CASE expression : UI53834 + alter hidden zparm

• DB2 supplied java stored procedures address space loops : workaround : set the NUMTCB to 1

• Resource unavailable for EDM DYNAMIC STATEMENT SPACE

• dynamic statement cache cleanup mechanism is not working anymore

• Increased size to 4G but not sufficient

• workaround: disable the dynamic statement cache (zparm ACPT warehouse)

• Solution : PI91054 / UI57563

4.3.

Internal

Migration to M100 on PROD

Company 11

• Executed on Saturday evening

• Sunday morning : issues with impact during 3 hours

• PMR Sev 1 opened

• lock requests (the vast majority) are issued for Space Map pages

• Db2 tries to store the V0 OBDREC info in a system page (new V12 feature and retrofitted to V11) and that the page set partition does not have enough space to store this info.

• online reorgs of the UTS PBG TS during the morning solved the issue

4.4.

Internal

Migration to M100 on PROD

Company 12

• Monday morning : issues with P-locks, severe impact !

• PMR Sev 1 opened

• 3 online reorgs of the PBG UTS TS during the morning to solve the issue

• 46 online reorgs during the evening + testing by Business

• V12 : UI57257 (TREL1811)

4.4.

Internal

Migration to M100 on PROD

Company 13

• Page p-locks in batch jobs

• Very high impact

• Cause : overflow rows

• Bypass : reorg the involved TS and ALTER TS PCTFREE FOR UPDATE (to reserve space)

• Solution :

• usermod (TREL1806)

• APARfix (TREL1811)

4.4.

Internal

Migration to M100 on PROD

Company 14

• Excessive getpages after initialisation of FTB.

• Very high impact

• Cause : ?

• Not reproducable

• re-enabling FTB is not an option

• Bypass : turn of FTB for this particular subsystem

• Solution :

• UI53014 for other FTB issue

4.4.

Internal

Migration to M100 on PROD

Company 15

• Data inconsistency : 00C90101 : data of 1 index is corrupt

• High impact

• Cause : regression on spacemap page

• Bypass : rebuild IX

• Solution : UI57598 (TREL1811)

4.4.

Internal

Migration to M100 on PROD

Company 16

• Select in batch gets ‐811 but there are no duplicates

• High impact

• Cause : wrong leaf page when FTB enabled

• Bypass : disable FTB (was not an option for us)

• Solution : UI53014 (TREL1806)

4.4.

Internal

Migration to M500 on DEV

Company 17

• No issues

4.5.

Internal

Migration to M500 on ACPT

Company 18

• All load utilities on PBG UTS that result in an extra partition : resource unavailable with resource type 00000000 and no name filled in

• Also called ‘DBD puffing'

• First work around : ALTER TABLE...ADD PARTITION (but generates unneccesary partitions)

• Second workaround : ALTER TS PRIQTY

• Our solution : ALTER ALL PBG and PBR UTS TS SEQ QTY

• 57.000 TS on PROD !

• UI56192

4.6.

Internal

Migration to M500 on PROD

Company 19

• Issues with medium impact :

• problem with reorg job

• Hung remote thread consumes a lot of cpu

• runstats ends with 04E-00C90101

• Db2 12 message DSNI011I PAGE TEMPORARILY UNAVAILABLE

• DBM1 creates a 0C4‐00000004

4.7.

Internal

Problem solving

Company 20

• Follow-on PMRs : each TREL we roll out the most recent maintenance

• On the planning : ‘customer profiling’

• Sending Catalog/Directory and certain queries to IBM

• A workshop with experts from IBM and KBC to explain ourenvironment

• We asked IBM if it is possible to collect Db2 usage info and share that (so, if you get that question, please respond)

4.8.

Internal

Measurements : expectations

Company 21

• System and OLTP :

- 2-3% CPU reduction without Index In-Memory feature

- 5-10% CPU reduction using Index In-Memory feature

• Query performance :

- 0-20% without changing access path

- 10-40% with new access path

• A baseline measurement is done upfront

5.

Measurements : resultso V12R1M100 (TREL1805 on 5/5) and V12R1M500 (TREL1806 on

16/6)

• We didn’t see any effect on average TRX CPU usage

o 26/5, 23/6 and 25/8 : RAC on PROD :

o more effect on average TRX CPU usage

o A very hard interpretation of the numbers because of a lot of issues after M100 and all necessary workarounds and (temporary) maintenance

o 18/08 : TREL1808 on PROD

o Extra maintenance because Performance issues

o Dynamic Statement Cache issues (went to 4Gig and was not cleared)

Measurements : results

Measurements : results

Measurements : Total CPU

Measurements : zIIp

Measurements : DBM1 zIIp

Measurements : real memory

Measurements : Invalid SPROCs

Measurements : RID pool

Measurements : RID pool

o = internal Record Identifier

o Expected increase of 40% because of extension from 5 to 7 bytes.

o Only visible after M500

o With TREL1808 ZPARM MAXRBLK on PROD increased (with60% on PROD)

Measurements : RID pool

Measurements : Internal latchcontention relief

Latch contention classes : in TYPE 100 records

• LC14 buffer manager latch : QVLSLC14

• LC19 log latch : QVLSLC19 : removed ??

• LC23 page latch timer : QVLSLC23

• LC24 EDM latch : QVLSLC24

Internal

LC19 log latch

After M100 : still (but less) LC19 (so not totally ‘removed’)

InternalLC14 buffer manager latch

35

Remark. : decrease after M100 on May 5

Remark : increase LC14 buffer manager latch : QVLSLC14 since M500 ! (no explanation yet)

InternalLC23 page latch timer

InternalLC24 EDM latch

Remark : strong decrease after M100 on May 5 = normal because EDM pool is on other location and therefore handled by Storage Manager

Internal

New features :

Company 38

• We start by using defaults and out-of-the box coding

• New features are subject of study (not in project scope !)

• In-memory bufferpools (PGSTEAL(NONE) option of ALTER BP)

• In-memory index optimization using index Fast Traverse Block (FTB)

• Application related features (study by DBAs):

• SQL performance enhancements

• Implementation only at TREL or RAC

• Still an open question : how to deal with Continuous Delivery

6.

Internal

Summary

Company 39

• Hardest upgrade ever for (just the four of) us

• Had to open 4 SEV1, 12 SEV2 and 7 SEV3 PMRs for a lot of new issues

• We had to convince all management levels to proceed theupgrade to M500

• We were backed up by the management of our department

• A lot of (technical) explaining to do, even to our business

• Great cooperation between system engineers and DBAs, theinvolved development teams, the colleagues of our department

• Good cooperation with IBM

• Issues picked up immediatly

• Got an advocate (Bart) assigned

• Conf call and meeting with Jeff Josten + follow up of status of PMRs

7

Internal

Lessons learned:

Company 40

• Have fallback scenarios ready and tested

• Be current on maintenance

• Allthough some PTF which caused problems were already retrofitted in V11 (would have popped up during our half year maintenance)

• Have a good relationship with ‘advocate Bart’

8.

Questions?

41Company