ZSeries Scalability and Availability improvements Latest hardware and software advancements Fri Sep 26, 2014: 10:15-11:30 Track 2 Speaker: Donald Zeunert

zSeries Scalability and Availability improvements

Latest hardware and software advancements

Fri Sep 26, 2014: 10:15-11:30

Track 2

Speaker: Donald Zeunert (BMC Software)

mailto:[email protected]

https://www.linkedin.com/groups?home=&gid=1741877&trk=anet_ug_hm

Abstract / Summary

• Recent IBM system and subsystem scalability improvements allow significant consolidation.– This save CPU and Memory resources– Avoids creating more server instances to manage

• Bottlenecks - Faster CPUs mean everything else relatively slower– Need to look at what are the bottlenecks and what did we ignore

when it was released

Areas of improvements

• Performance / Scalability– Hardware, Microcode, PR/SM– Software – z/OS, CICS, DB2

• Availability–Software / hardware

IT Budgets – Save $

• Need to look at technology impact on Software License Costs?

• New hardware or software costs money– Does it save

me any?

Relative speed of devicesCPU faster everything else relatively

slower• RAM is now slow

– > CHIP Cache on z196, EC12

• Paging devices are slower – EC12 Flash memory

Topic 1 – Hardware / Microcode Performance

• zEC12 Benefits– Workloads may use fewer MSUs– Warning track– Other exclusives

• CF Sharing - Thin interrupts• Hiperdispatch

– Why do we need ?– Parked engines

z10 vs zEC12 Chip Cache• Each

generation CPs getting– more levels

of chip cache

– More MB at each level

• Chip cache hits = fewer CPU secs

RNI = Ratio of Chip / local sourcing to total sourcing (Local + Remote)

zEC12

Disclaimer – “Your Mileage may vary”

Migrating from gasoline Exotic to electric supercar will improve MPG

But what if you also have an 18-wheeler

zPCR Workload RNI Characteristics

zEC12 – Workloads may use fewer MSUs

– Onlines may get better response, use significantly less CPU secs and MSUs. But does this help my hardware and software bills?• Yes, if onlines are driving your box capacity and 4HRA

– Step 1 – Run or find your last Sub-Capacity Reporting Tool (SCRT) report and determine what hour / shift is the peak(s).

– Step 2 – Determine 4HRA contributors % Batch, CICS, etc.

– Step 3 – Look at zPCR chart for RNI (High?) of your workload

– Step 4 – If workload has high RNI run zCPR model upgrade to zEC2 using your Hardware Capacity SMF 113s Cache stats

RNI by Hour correlated with 4HRA

High RNIat peak

Day shift OLTP High RNI –matches 4HRA

Night shift – 4HRA lower RNI, not as CHIP cache constrained

3:00 AM 1:00PMRNI MSU

Peak 4HRA

zEC12 – Warning Track InterruptMore consistent response time

Old Way New Way - Warning Track InteruptPR/SM CP Stealing

z/OS is not directly involved in the undispatching of a logical processor from physical CP. LCP saved status info is stored off z/OS

EC12 presents z/OS "Warning Track" interrupt when it wants to undispatch a logical processor.This gives z/OS a grace period to undispatch (save status) the workunit.

Pro If undispatched in grace period then z/OS can dispatch it on another logical processor.

Con Any pending work on that LCP would remain undispatched until that LCP is put back on any PCP

If grace period expires before z/OS returns logical CP to PR/SM, PR/SM undispatches the logical CP. Then it works like before.

Warning Track - % Successful and Rate

WTI Metrics are in SMF70s, support to collect in;CMF PTFs BQM0868 (5.8) or BQM0867 (5.9) or RMF APAR OA37803

• High rate of WTIs / second – (typical low weight / overcommited)• Logical removed from Physical successfully > 98% of the time

• making work eligible to be re-dispatched on different logical.• Provides more consistent OLTP response time

zEC12 – Other exclusives

• Storage Class Memory (SCM) – AKA Flash Express– PCIe - 1.4 TB of memory per mirrored card pair

• PR/SM Absolute Capping– Expressed in terms of 1/100ths of a processor (0.01 to 255.0)

• Smoother capping• 4HRA Max Spike(s)

• Data Compression Express (zEDC)– (PCIe) device – Offload software compression

• SMF Logstreams– Get Softinflate (OA41156 ) if machines w/o zEDC

• Numerous other exploiters

CF Thin interrupts – Improved Service times

Type CF Polling Dynamic CF Dispatching Coupling Thin Interrupts

Parm DYNDISP=NO DYNDISP=YES DYNDISP=THIN

Dispatch algorithm

LPAR Time Slicing CF Time based algorithm for CF engine sharing

Event driven Dispatching - CF releases engine if no work left.

Pros

Best performance for dedicated CF, not suitable for shared CF

More effective engine sharing than polling loop. CF now managing slice.

Improved Sync Service times.

Thin interrupt to dispatch processor when new work arrives

Cons

Holds CF processor yields very poor / erratic Sync Service times

Uses whole slice even if no work, can not be interupted / stolen

Requires z/OS 1.12 or higher w/ PTFs and CFCC

microcode level 19

High Sync times = GCP Spin loop = wasted MSUs = contribute 4HRA

Logical / Physical - Guaranteed

SJS* LPAR weight < 10% of 16 CPs so guaranteed < 1.6 CPs, actually max is 2 -> 3These LPARs using about twice guarantee

Using White Space / more than guaranteed share. Impacting 4HRA?

• %MSU is % of 1 CP so work runs 100-%MSU slower than if full CP– VMR CPs are 64%

slower than full due to over commit

– Unless using HiperDispatch with parked CPs

– Warning track helps• If not over commit can’t

exceed # of CPs to LPAR– How much white

space do you need?

Guaranteed Share calculations

Name LP CT Wgt Rel Shr Guar CPs %MSU Over commitVMR 4 500 9.01% 1.44 36% 178%SYSP 3 500 9.01% 1.44 48% 108%VM4 2 500 9.01% 1.44 72% 39%VM5 2 500 9.01% 1.44 72% 39%VM9 2 500 9.01% 1.44 72% 39%DB2A 3 400 7.21% 1.15 38% 160%DB2B 3 400 7.21% 1.15 38% 160%SYSN 3 400 7.21% 1.15 38% 160%SJSE 3 300 5.41% 0.86 29% 247%

SYSM 3 300 5.41% 0.86 29% 247%SJSB 2 300 5.41% 0.86 43% 131%SJSC 3 250 4.50% 0.72 24% 316%SJSD 3 250 4.50% 0.72 24% 316%ESAJ 3 200 3.60% 0.58 19% 420%IMSA 3 150 2.70% 0.43 14% 594%SJSH 2 50 0.90% 0.14 7% 1288%IOC2 1 50 0.90% 0.14 14% 594%Total 5550

HiperDispatch – view of over commit

0.86

1.15

0.72

Guaranteed

Lots of Parked Med/Low = slower effective MSUs and overhead

Performance / Scalabilityimprovements in Software

How does IBM charge for sub-capacity?

• IBM MLC costs 30% of IT budget

• Reducing 4HRA MSUs lowers bills– Paid on peak LPAR

where product(s) run

– Less GCP MSUs• zIIP

– LPAR consolidation

4448 MSUs - CEC utilization of a LPAR w/ DB2 licensed

199 MSUs – DB2 Service class CPU(not complete DB2)But not DB2 peak

Sub-Cap Reporting Tool (SCRT)

DB2 Software Performance / Scalability

• DB2 v10 / 11 – ( $/ MSU higher, but fewer MSUs?) – Max # concurrent threads, – zIIP offload sequential prefetch (batch)

• Watch for zIIP overcommit / zIIP on GCP– IDAA – offload and WLM goal awareness– V11 DDF Enclave classification enhancements

• Also requires z/OS 2.1 WLM• Package Name: 128 characters (instead of 8)• Procedure Name: 128 characters (instead of 18)

DB2 V10 – Seq prefetch zIIP overcommit

• DB2 V10 Sequential prefetch zIIP eligibility can overcommit zIIPs

• Monitor zIIP > 50% potential for GCP

• Monitor DB2 WLM Service classes w/ SMF72s or DBM1 jobs w/ SMF30s for zIIP eligible on GCP

DB2 V10 – 10x more Threads per SSID

• Higher concurrent threads = fewer SSIDs / LPAR

Software Performance / Scalability

• CICS – – TOR CPU Starvation– Threadsafe vs QR, fewer regions = less function shipping

= Save CPU & memory• Convert via 80/20 rule, and whatever is left on QR runs

faster• Target with CICS stats - #context switches, CPU secs

CICS Scalability - L8 running threadsafe

• Non-threadsafe program calls DB2, CICS switches from the QR TCB to an open TCB, and back to after DB2 request

• Threadsafe Program stays on Open TCB until non-threadsafe CICS API call

• Force to OTE / L8 w/o OPENAPI TRUE call– CONCURRENCY(THREADS

AFE) and API=OPENAPIOriginally DB2 was only OPENAPI TRUENow MQ, CICS/DLI, native sockets

Lower QR TCB CPU

• Old Circumventions: Move TORs to a service class with higher importance than AORs– Option 1 - : Exempt all regions from being

managed by response time goals and classify TORs to a service class with higher importance than AORs.

• Disadvantages: • Loose WLM Server management of variable

dispatching priorities, memory protection, etc.• No response time data available in Service class

reports

– Option 2 - Exempt only AORs and move them to a service class with lower importance than the CICS service classes with response time goals.

• Disadvantage: • WLM Server Mgmt lost• WLM SrvCls BTE reports for highest CPU

consumption eliminated.

• Problem from TORs and AORs running at the same dispatch priority.– AORs heavily

consumes CPU.– TORs need to wait

too long to receive work and return results to the caller

• Especially when large # of transactions enter AORs w/o TOR

CICS TOR CPU Starvation / responsiveness

26

• TORs managed towards the goals of the STC’s service class– WLM ensures bookkeeping of transaction

completions for CICS response time for service class

– CICS transactions are managed towards CICS response time goals and the

• AORs Response time goal management unchanged

• Use NEW WLM SrvCls classification option “BOTH” for TORs

• Define a STC service class for TORs which has a higher importance than the AORs response time goals

• Retain “Manage Regions by Goals of Transaction” for AORs.

CICS – Work Manager / Consumer Model

Allows CICS to use similar model to WAS with DB2 / DDF.WLM classification both from APAR OA35428

27

Software Performance / Scalability

• IMS 13 - 117K trans / second , – 4x Max # PSTs, – IMS Connect WLM routing,

• z/OS – – CF "fair queueing" algorithm for queued CF requests

• Selection FIFO prior to APAR (OA41203) z/OS 1.12+• Burst to single structure could negatively affect all structures. • Now allows low volume high priority signaling to get processed

– SMF logstreams, SMF Compress (zEDC), – VSAM RLS based User Catalogs (enqueues < CPU)

• IP - RoCE (Remote Direct Memory Access over Converged Ethernet)

z/OS 2.1 VSAM RLS User Catalog Benchmarks

Non-VSAM Delete PerformanceNonRLS RLS RLS Improve %

Elapsed Min 80.42 8.42 89.51CPU Sec 1269.3 298.7 76.46

CPU measured in the CATALOG, GRS, SMSVSAM, and XCFAS address spaces. Source:Terri Menendez of IBM RLS/VSAM/Catalog R&D

Non-VSAM Define Performance NonRLS RLS RLS Improve %

Elapsed Min 48.84 21.42 56.13CPU Sec 685.6 130.8 76.46

• Help batch elapsed time when high # of temporary cataloged datasets– Reduce batch

window squeeze– Possibly reduce

4HRA MSUs by moving peak

Monitor VSAM RLS for User Catalogs

• Monitor z/OS Health checkers for RLS– Enable RLS HCs– 8 RLS Msgs that go

to MVS console– CF IGWLOCK00

• Monitor RLS – See SHARE RLS

Performance sessions for what to monitor

• True / False contention

• DSN false invalidatesRLS stats in SMF 42s, BP in Subtype 19

Availability improvementsfrom Hardware / Software

Topics - Availability

• z/OS – – V2.1 Serial Coupling Facility structure rebuild

• Processing, re-designed to help improve performance and availability by rebuilding coupling facility structures more quickly and in priority order.

– Previously all structures rebuilt in parallel causing contention– Now critical systems structures recovered 1st and other

structures optionally prioritized by policy– zAware – Analytics – Detects anomalies avoid outages

• MQ ShrQ CF full avoidance – – EC12 Storage Class Memory (SCM) (AKA- Flash Memory)

• May hurt performance as slower than CF memory

Questions

SHARE Sessions with more details

• 15841: EWCP: Project Open and IBM ATS Hot Topics – Monday, August 4, 2014: 1:30 PM-2:30 PM Room 303– Speakers: Kathy Walsh(IBM Corporation)

• 15806: IBM zEnterprise EC12 and BC12 Update – Tuesday, August 5, 2014: 10:00 AM-11:00 AM Room 310 – Speaker: Harv Emery(IBM Corporation)

• 15105: z/OS Parallel Sysplex z/OS 2.1 Update (Annaheim)– https://share.confex.com/share/122/webprogram/Session15105.html

• 14142: Unclog Your Systems with z/OS 2.1 – Something New and Exciting for Catalog (VSAM RLS UCATs) (Boston)– http://

proceedings.share.org/conference/abstract.cfm?abstract_id=26822

https://share.confex.com/share/123/webprogram/Person4583.html



https://share.confex.com/share/122/webprogram/Session15105.html



http://proceedings.share.org/conference/abstract.cfm?abstract_id=26822

http://proceedings.share.org/conference/abstract.cfm?abstract_id=26822

谢谢

Backup Slides

z/OS 1.12 – Servers w/ no active enclaves

• Controlled by new IEAOPT Parameter• ManageNonEnclaveWork = {No |Yes}

– No: (default) Non enclave work is managed based on the most important enclave.• Doesn’t work well when no active enclaves and other work to

do• Example; Significant work unrelated to an enclave:

– Garbage collection for a JVM (WAS) – Yes: (new / recommended) Non-enclave work is managed

towards the goals of the address space external service class • Enclave managed address spaces service class goals /

importance is more important than it used to be.– Verify high enough before switching to Yes

SMF70 - Warning-Track-Interrupt Metrics

SMF RECORD 70 SUBTYPE 1 Field Name OffsetThe number of times PR/SM issued a warning-track interruption to a logical processor and z/OS was able to return the logical processor within the grace period.

SMF70WTS x50

The number of times PR/SM issued a warning-track interruption to a logical processor and z/OS was unable to return the logical processor within the grace period.

SMF70WTU x54

Amount of time in milliseconds that a logical processor was yielded to PR/SM due to warning-track processing.

SMF70WTI x58

Metric of interest % WTI Successful = SMF70WTS / (SMF70WTS + SMF70WTU)

CMF PTFs BQM0868 (5.8) or BQM0867 (5.9)RMF APAR OA37803

SMF 42 – SMSVSAM - VSAM RLS

• Subtype 15: Storage Class Response Time Summary• Subtype 16: Dataset Response Time Summary

– Only generated if DSN Monitoring is enabled• using the V SMS,MONDS command

– Remembered by SMSplex, not available via Parm

• Subtype 17: Coupling Facility Lock Structure Usage• Subtype 18: CF Cache Partition Usage• Subtype 19: Local Buffer Manager LRU Statistics

Documents

ZSeries Scalability and Availability improvements Latest hardware and software advancements Fri Sep 26, 2014: 10:15-11:30 Track 2 Speaker: Donald Zeunert