22
8/6/2013 (c) 2013 IBM Corporation 1 © 2013 IBM Corporation The World of z/OS Dispatching on the zEC12 Processor Glenn Anderson, IBM Lab Services and Training Summer SHARE 2013 Session 14040 What I hope to cover...... What are dispatchable units of work on z/OS How WLM manages dispatchable units of work The role of HiperDispatch Dispatching work to zIIP and zAAP engines RMF Work Unit Analysis Report

The World of z/OS Dispatching on the zEC12 Processor World of z/OS Dispatching on the zEC12 Processor Glenn Anderson, IBM Lab Services and Training ... IBM System z Integrated Information

  • Upload
    vulien

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

8/6/2013

(c) 2013 IBM Corporation 1

© 2013 IBM Corporation

The World of z/OS Dispatching on the zEC12 Processor

Glenn Anderson, IBM Lab Services and Training

Summer SHARE 2013Session 14040

What I hope to cover......

What are dispatchable units of work on z/OS

How WLM manages dispatchable units of work

The role of HiperDispatch

Dispatching work to zIIP and zAAP engines

RMF Work Unit Analysis Report

8/6/2013

(c) 2013 IBM Corporation 2

z/OS Dispatchable Units

There are different types of Dispatchable Units (DU's) in z/OS

Preemptible Task (TCB)

Non Preemptible Service Request (SRB)

Preemptible Enclave Service Request (enclave SRB)

Independent - a new transactionDependent – extend existing address spaceWork-dependent – extend existing independent enclave

z/OS Dispatching Work

8/6/2013

(c) 2013 IBM Corporation 3

What is a WLM Transaction?A WLM transaction represents a WLM "unit of work"

basic workload entity for which WLM collects a resource usage value

foundation for statistics presented in workload activity report

represents a single subsystem "work request"

Subsystems can implement one of three transaction typesAddress Space:

WLM transaction measures all resource used by a subsystem request in a single address space

Used by JES (a batch job), TSO (a TSO command), OMVS (a process), STC (a started task) and ASCH (single APPC program)

Enclave:

Enclave created and destroyed by subsystem for each work request

WLM transaction measures resources used by a single subsystem request across multiple address spaces and systems

Exploited by subsystems - Component Broker(WebSphere), DB2, DDF, IWEB, MQSeries Workflow, LDAP, NETV, TCP

CICS/IMS Transactions

Neither address space or enclave oriented - special type

WLM transaction measures resource used by a single CICS/IMS transaction program request

The WLM View

STCHI

transactions

transactions

STC rules Vel = 50%

Imp=1

Enclave SRB

Enclave SRB

Enclave SRB

Enclave SRB

STCHI

transactions

transactions

STC rules Vel = 50%

Imp=1

CICS/IMS tran

CICS/IMS tran

CICS/IMS tran

CICS/IMS tran

Address Spaces, and the transactions inside

A/S is dispatchable unitEnclaves are dispatchable units Only A/S is dispatchable unit

8/6/2013

(c) 2013 IBM Corporation 4

DDF and Enclave SRBs

SMF 72SMF 30 SMF 72

SMF 72

Create Enclave

Schedule SRB

ssnmDIST (DDF)

DDFPROD

DDFDEF

STCHI

DDF production requests

DDF default requests

Enclave SRB

STC rules

DDF rules

Vel = 50%

Imp=1

RT=5s avg

Imp=3

RT=85%, 2s

Imp=1

PC-call to DBM1

PC-call to DBM1

Enclave SRB

WLM Policy Adjustment Algorithm

8/6/2013

(c) 2013 IBM Corporation 5

WLM Dispatching Priority Usage

WLM CPU Management Example

8/6/2013

(c) 2013 IBM Corporation 6

11

WLM Blocked Workload Support: IEAOPTxx

Specifies threshold time interval for which a blocked address space or enclave must wait before being considered for promotion.•Minimum is 5 seconds. Maximum is 65535 seconds.•Default is 20 seconds.

BLWLINTHD

Percentage of the CPU capacity of the LPAR to be used for promotion

• Specified in units of 0.1%• Default is 5 (=0.5%)• Maximum is 200 (=20%)• Would only be spent when enough units of

work exist which need promotion

BLWLTRPCT

Blocked Workload Support: User Interface: RMF

Extensions of RMF Postprocessor CPU Activity and WLMGL reports with information about blocked workloads and the temporary promotion of their dispatching priority

SMF record 70-1 (CPU activity) and SMF 72-3 (Workload activity)

C P U A C T I V I T Y…

BLOCKED WORKLOAD ANALYSIS

OPT PARAMETERS: BLWLTRPCT (%) 0.5 PROMOTE RATE: DEFINED 50000 WAITERS FOR PROMOTE: AVG 0.001

BLWLINTHD 60 USED (%) 95 PEAK 15

8/6/2013

(c) 2013 IBM Corporation 7

© 2013 IBM Corporation 13

© 2013 IBM Corporation 14

8/6/2013

(c) 2013 IBM Corporation 8

© 2013 IBM Corporation 15

CPU time in seconds that transactions in this group were running at a promoteddispatching priority, separated by the reason for the promotion:

BLK CPU time in seconds consumed while the dispatching priority of work with lowimportance was temporarily raised to help blocked workloadsENQ CPU time in seconds consumed while the dispatching priority was temporarilyraised by enqueue management because the work held a resource that other workneeded.CRM CPU time in seconds consumed while the dispatching priority was temporarilyraised by chronic resource contention management because the work held aresource that other work neededLCK In HiperDispatch mode, the CPU time in seconds consumed while the dispatchingpriority was temporarily raised to shorten the lock hold time of a local suspendlock held by the work unit.SUP CPU time in seconds consumed while the dispatching priority for a work unit wastemporarily raised by the z/OS supervisor to a higher dispatching priority thanassigned by WLM.

Promoted

LPAR1 LPAR2

Work Units (TCBs and SRBs)

Logical Processors

z/OS Dispatcher

Logical Processors

Physical Processors

PR/SM Dispatcher

PR/SM

LPAR1 LPAR2

Work Units (TCBs and SRBs)

Logical Processors

z/OS Dispatcher

Logical Processors

Physical Processors

PR/SM Dispatcher

PR/SM

Dispatching in an LPAR Environment

8/6/2013

(c) 2013 IBM Corporation 9

17

HiperDispatch Introduction

• Design Objective– Keep work as much as possible local to a physical processor to

optimize the usage of the processor caches– As a result systems with high number of physical processors

provide a much better scalability

• Function: HiperDispatch– Interaction between z/OS and the PR/SM Hypervisor to optimize

work unit and logical processor placement to physical processors– Consists of 2 parts

• In z/OS (sometimes referred as Dispatcher Affinity)• In PR/SM (sometimes referred as Vertical CPU Management)

© 2011 IBM Corporation18

HiperDispatch Mode

PR/SM

– Supplies topology information/updates to z/OS

– Ties high priority logicals to physicals (gives 100% share)

– Distributes remaining share to medium priority logicals

– Distributes any additional service to unparked low prioritylogicals

z/OS

– Ties tasks to small subsets of logical processors

– Dispatches work to high priority subset of logicals

– Parks low priority processors that are not need or will not get service

8/6/2013

(c) 2013 IBM Corporation 10

19

HiperDispatch: z/OS Part

• z/OS obtains the logical to physical processor mapping in Hiperdispatch mode– Whether a logical processor

has high, medium or low share

– On which book the logical processor is located

• z/OS creates dispatch nodes– The idea is to have 4 high

share CPs in one node– Each node has TCBs and

SRBs assigned to the node– Optimizes the execution of

work units on z/OS

Hardware Books

PR/SM

Logical to Physical Mapping

LPAR

Node Node

z/OS

Pa

rke

d

Pa

rke

d

20

HiperDispatch: z/OS Affinity Dispatching

• Affinity dispatching– WLM balances the units of work

(TCBs/SRBs) across the nodes to equalize the utilization of nodes• And to assure that each node

has work of different priorities– Balancing takes place every 2

seconds– For unbalanced situations in

between• WLM creates lists of helper nodes

for each node• These helper nodes can be asked

to select work from a given node in cases the node is overloaded

• Helper nodes are sorted to avoid book crossing if possible

• Low share processors– WLM parks and un-parks these

processors based on demand and utilization of the CEC

Node 1

Node 2

Node 3

WLMBalancer

3

2

3

1

2

1

HelperNodes

8/6/2013

(c) 2013 IBM Corporation 11

© 2011 IBM Corporation21

RMF CPU Activity Report

© 2013 IBM Corporation 22

HiperDispatch and LPAR1 P A R T I T I O N D A T A R E P O R T

PAGE 3z/OS V1R10 SYSTEM ID LPAR1 DATE 04/29/2011 INTERVAL 14.59.998

CONVERTED TO z/OS V1R12 RMF TIME 19.28.00 CYCLE 1.000 SECONDS-MVS PARTITION NAME LPAR1 NUMBER OF PHYSICAL PROCESSORS 55 GROUP NAME N/AIMAGE CAPACITY 3165 CP 53 LIMIT N/A NUMBER OF CONFIGURED PARTITIONS 4 IIP 2 AVAILABLE N/AWAIT COMPLETION NO DISPATCH INTERVAL DYNAMIC

---------- PARTITION DATA ----------------- -- LOGICAL PARTITION PROCESSOR DATA -- -- AVERAGE PROCESSOR UTILIZATION PERCENTAGES --

----MSU---- -CAPPING-- PROCESSOR- ----DISPATCH TIME DATA---- LOGICAL PROCESSORS --- PHYSICAL PROCESSORS ---NAME S WGT DEF ACT DEF WLM% NUM TYPE EFFECTIVE TOTAL EFFECTIVE TOTAL LPAR MGMT EFFECTIVE TOTALLPAR1 A 494 0 582 NO 0.0 32.0 CP 02.17.24.319 02.20.44.154 28.63 29.32 0.44 17.96 18.40LPAR2 A 446 0 762 NO 0.0 32.0 CP 03.01.28.607 03.04.05.167 37.81 38.35 0.34 23.72 24.06LPAR3 A 59 0 0 NO 0.0 3.0 CP 00.00.00.000 00.00.00.000 0.00 0.00 0.00 0.00 0.00LPAR5 A 1 0 0 NO 0.0 1.0 CP 00.00.00.000 00.00.00.000 0.00 0.00 0.00 0.00 0.00*PHYSICAL* 00.10.58.833 1.44 1.44

------------ ------------ ----- ----- -----TOTAL 05.18.52.927 05.35.48.155 2.21 41.68 43.90

Total LPAR weight = 1000LPAR1 494/1000 = .494 * 53 CPs = 26.18 CPsLPAR2 446/1000 = .446 * 53 CPs = 23.64 CPs

LPAR1 = 25 VH and 2 VM at 59% share (27 logicals unparked)LPAR2 = 23 VH and 1 VM at 64% share (24 logicals unparked)

____________51 logicals unparked

53 physicals

Need to deactivate unused LPARs to reallocate their weight to VH and VM logicals

8/6/2013

(c) 2013 IBM Corporation 12

© 2013 IBM Corporation 23

Warning Track

© 2011 IBM Corporation24

Warning Track

8/6/2013

(c) 2013 IBM Corporation 13

© 2011 IBM Corporation25

Warning Track Statistics

Available on IBM zEnterprise 196 and 114, IBM System z10, IBM System z9 and zSeries z990 and z890

Used by any workload with Java cycles, e.g. WebSphere, CICS, Batch, DB2®,

Executes Java code with no changes to applications

Objective: Enable integration of new Java based Web applications with core z/OS backend database environment for high performance, reliability, availability, security, and lower total cost of ownership

Specialty assist processor dedicated exclusively to execution ofJava workloads under z/OS® – e.g. WebSphere®, CICS, IMS, DB2

IBM System z Application Assist Processor (zAAP)

Standard Processor zAAP

WebSphereExecute JAVA Code

z/OS DispatcherSuspend JVM task on z/OS standard logical processor

WebSphere

Java Application Code

Executing on a zAAP

logical processorz/OS Dispatcher

JVMSwitch to zAAP

JVM JVMSwitch to

z/OS Dispatcher

JVM

Dispatch JVM task on z/OS standard logical processor

JVM

z/OS Dispatcher

logical processor

z/OS DispatcherDispatch JVM task on z/OS zAAP logical processor

JVMSwitch to standard processor

z/OS DispatcherSuspend JVM task on z/OS zAAP logical processor

8/6/2013

(c) 2013 IBM Corporation 14

Available on IBM zEnterprise 196 and 114, IBM System z10, IBM System z9

Requires work to be run as an enclave SRB

Exploiters work with z/OS to determine how much capacity should be redirected to a zIIP

Exploiters must use licensed interface to enable exploitation

IBM System z Integrated Information Processor (zIIP)

Specialty assist processor dedicated exclusively to execution ofany workloads under z/OS®r

Classify to WLM

NET DDF

DDF Requests

IWMECREAIWMSCHDIWMSCHD

IWMEDELE

EnclaveSRBs

DDF Requests

IWMECREAIWMSCHDIWMSCHD

IWMEDELE

Current IBM Exploitation of zAAPs and zIIPs

Specialty CP Eligible Major Users

zAAP Any Java Execution Websphere CICS Native appsXMLSS

zIIP Enclave SRBs DRDA over TCPIPDB2 Parallel QueryDB2 Utilities Load, Reorg, RebuildDB2 V9 z/OS remote native SQL procedures TCPIP - IPSECXMLSSzIIP Assisted HiperSockets Multiple WriteVirtual Tape Facility Mainframe (VTFM) Software z/OS Global Mirror (XRC), System Data Mover (SDM)z/OS CIM Server

8/6/2013

(c) 2013 IBM Corporation 15

What is "Needs Help"

Determination zIIP or zAAP work is being delayed and additional resources should help process the workƒRequires xxPHONORPRIORITY=YES to be set

If help is required: ƒThe zxxP CP signals waiting zxxP to help

ƒWhen all zxxP CPs are busy the zxxP asks for help from the GCP–All available speciality engines (of the same type) must be busy before help is asked of the GCPs IF the zxxPs needs help and all zxxPs are busy help is obtained from 1 GCPIF zxxPs continue to need help additional CPs may be asked to help

ƒHelp is always provided in dispatch priority order

REPORT BY: POLICY=WLMPOL WORKLOAD=BAT_WKL SERVICE CLASS=BATSPEC RESOURCE GROUP=BATMAXRG TRANSACTIONS TRANS-TIME HHH.MM.SS.TTT --DASD I/O-- ---SERVICE---- SERVICE TIMES ---APPL %---AVG 0.98 ACTUAL 6.520 SSCHRT 11.5 IOC 8326 CPU 24.7 CP 0.97 MPL 0.98 EXECUTION 6.128 RESP 7.0 CPU 662386 SRB 0.0 AAPCP 0.01ENDED 10 QUEUED 391 CONN 6.9 MSO 0 RCT 0.0 IIPCP 0.00END/S 0.17 R/S AFFIN 0 DISC 0.0 SRB 965 IIT 0.0 #SWAPS 0 INELIGIBLE 0 Q+PEND 0.1 TOT 671677 HST 0.0 AAP 40.27EXCTD 0 CONVERSION 0 IOSQ 0.0 /SEC 11195 AAP 24.2 IIP 0.00AVG ENC 0.00 STD DEV 0 IIP 0.0

GOAL: EXECUTION VELOCITY 35.0% VELOCITY MIGRATION: I/O MGMT 99.2% INIT MGMT 92.2%

RESPONSE TIME EX PERF AVG ------ USING% ----- ------------ EXECUTION DELAYS % --------SYSTEM VEL% INDX ADRSP CPU AAP IIP I/O TOT CPU

SYSD --N/A-- 99.2 0.4 1.0 0.8 45.9 0.0 3.9 0.4 0.4

Specialty CP work running in a WLM Service Class

RMF report is at 1 minute interval

8/6/2013

(c) 2013 IBM Corporation 16

Queuing Impacts of Server Busy

1 CP 2 CP 3 CP 4 CP1 0.1010 0.1000 0.1000 0.1000

10 0.1111 0.1010 0.1001 0.1000

20 0.1250 0.1042 0.1010 0.1003

30 0.1429 0.1099 0.1033 0.1013

40 0.1667 0.1190 0.1078 0.1038

50 0.2000 0.1333 0.1158 0.1087

60 0.2500 0.1563 0.1296 0.1179

70 0.3333 0.1961 0.1547 0.1357

80 0.5000 0.2778 0.2079 0.1746

90 1.0000 0.5263 0.3724 0.2969

99 10.0000 5.0251 3.3706 2.5448

xxPHonorpriority=NO is a valid method of running Specialty CPsArrival rates and Server busy will influence IIPCP/AAPCP timeCan run Specialty CPs very busy IF there are multiple classes ofwork with different response time objectives

zAAPs on zIIPsWhen a processor has no zAAPs installed but the LPAR has zIIP(s), treat zAAP-eligible work instead as zIIP-eligible workƒException: Under z/VM can define a guest without zAAPs, on a processor with zAAPs, to

allow testing

IEASYSxx system parm indicates whether zAAP on zIIP is enabled ƒZZ=NO | YES for z/OS 1.9 and z/OS 1.10ƒZAAPZIIP=NO | YES for z/OS 1.11 (also supports alias of ZZ)ƒThe enablement state of zAAP on zIIP cannot be changed after IPL

Timing fields within SMF will show offload time under zIIP if a zIIP is installedƒNo method to determine what portion of zIIP time originated due to a zAAP requestƒAccounting and Capacity planning may need updating to use zIIP field

New operator command in z/OS R12 to display ZIIPZAAP eligibility

New planning White Paper WP101642

8/6/2013

(c) 2013 IBM Corporation 17

DB2 Parallel Query, Enclave SRBs and zIIPs

IO IO IO IO

DU DU DU DU

Host 1

PARTITIONED TABLESPACE

Query CPParallelism

Have been independent enclave SRBs to be zIIP eligible. Beginning in z/OS R11 the child tasks are now work-dependent enclaves.

Complex queryoriginates here

Portions of complex query arrive on participant systems, classified under "DB2" rules, and run in enclave SRBs, so zIIP eligible

IO IO IO IO

DU DU DU DU

IO IO IO IO

DU DU DU DU

IO IO IO IO

DU DU DU DU

Host 1 Host 2 Host 3

PARTITIONED TABLESPACE

Sysplex Query Parallelism

DB2 Parallelism, WLM, and zIIPs

DB2 Parallelism and zIIPsƒControlled by a CPU threshold. Once the threshold is met all child tasks

are zIIP eligibleƒParents are not zIIP eligibleƒParent and child CPU time contribute to the CPU ThresholdƒCan see any kind of work, CICS, IMS, TSO, batch using zIIP resources

DB2 will use new Work-Dependent Enclaves for Child tasksƒAPAR OA26104 for releases 1.8 and beyondƒWithout new Work Dependent Enclave support parallel enclaves must

be classified using subsystem DB2 –Unclassified work would wind up in SYSOTHER

8/6/2013

(c) 2013 IBM Corporation 18

Work-Dependent Enclaves

DB2 Address Space

Independent EnclavezIIP Offload = X %

Work-Dependent EnclavezIIP Offload = Y %

Work-Dependent EnclavezIIP Offload = Z %

createcreate

Managed as one transaction, represented by Independent Enclave

Implement a new type of enclave named “Work-Dependent” as an extension of an Independent Enclave. A Work-Dependent enclave becomes part of the Independent Enclave’s transaction but allows to have its own set of attributes (including zIIP offload percentage)

DDF and Work-dependent Enclaves

SMF 72SMF 30 SMF 72

SMF 72

Create Enclave

Schedule SRB

ssnmDIST (DDF)

DDFPROD

DDFDEF

STCHI

DDF production requests

DDF default requests

Enclave SRB

STC rules

DDF rules

Vel = 50%

Imp=1

RT=5s avg

Imp=3

RT=85%, 2s

Imp=1

PC-call to DBM1

PC-call to DBM1

Enclave SRB

In cases where DRDA applications create extended duration work threads in DB2, for example through extensive use of held cursors, the zIIP utilization levels can become more variable. DB2 and DDF now may use work-dependent enclavesin this situation to control this variability. See APAR PM28626.

8/6/2013

(c) 2013 IBM Corporation 19

Work-dependent Enclaves in SDSF

Statement of Direction

• zEC12 is planned to be last System z server to offer support for zAAP

• Continued support for zAAP on zIIP

• Remove restriction for zAAP on zIIP when zAAPinstalled on server– APAR OA38829 on z/OS R12 and R13

8/6/2013

(c) 2013 IBM Corporation 20

July 2013 zIIP/zAAP Announcement

• Effective July 23, 2013, IBM has modified the ratio of zIIP/zAAPs to CPs to be 2:1 for a zEC12 and/or zBC12. Customers may purchase up to two zIIP and/or up to two zAAP processors for every general purpose processor they purchase on the server. For the z196 and z114 and earlier servers, the 1:1 ratio will still be enforced; customers may purchase one zIIP and/or one zAAP for every general purpose processor they purchase on the server.

40

© 2012 IBM Corporation

SYSTEM ADDRESS SPACE AND WORK UNIT ANALYSIS

---------NUMBER OF ADDRESS SPACES--------- -----------------------DISTRIBUTION OF IN-READY WORK UNIT QUEUE-------------

QUEUE TYPES MIN MAX AVG NUMBER OF 0 10 20 30 40 50 60 70 80 90 100

WORK UNITS (%) |....|....|....|....|....|....|....|....|....|....|

IN 550 1,008 594.8

IN READY 4 438 22.7 <= N 55.5 >>>>>>>>>>>>>>>>>>>>>>>>>>>>

OUT READY 0 1 0.0 = N + 1 4.4 >>>

OUT WAIT 0 0 0.0 = N + 2 4.0 >>>

LOGICAL OUT RDY 0 628 10.3 = N + 3 3.7 >>

LOGICAL OUT WAIT 178 634 589.4 <= N + 5 7.1 >>>>

<= N + 10 14.7 >>>>>>>>

ADDRESS SPACE TYPES <= N + 15 8.0 >>>>>

<= N + 20 1.9 >

BATCH 281 284 282.0 <= N + 30 0.2 >

STC 708 763 736.4 <= N + 40 0.0

TSO 97 98 97.9 <= N + 60 0.0

ASCH 0 1 0.0 <= N + 80 0.0

OMVS 43 97 68.0 <= N + 100 0.0

<= N + 120 0.0

---------NUMBER OF WORK UNITS------------- <= N + 150 0.0

CPU TYPES MIN MAX AVG > N + 150 0.0

CP 444 888 555.5 N = NUMBER OF PROCESSORS ONLINE UNPARKED (22.4 ON AVG)

AAP 22 33 28.8

IIP 0 0 0.0

16 Buckets representing the work unit count with regard to the number of online processors

Work unit count on CPU type level

Work Unit Queue Distribution – Monitor I CPU Report

8/6/2013

(c) 2013 IBM Corporation 21

What I hope I covered......

What are dispatchable units of work on z/OS

How WLM manages dispatchable units of work

The role of HiperDispatch

Dispatching work to zIIP and zAAP engines

RMF Work Unit Analysis Report

8/6/2013

(c) 2013 IBM Corporation 22

© 2013 IBM Corporation

The World of z/OS Dispatching on the zEC12 Processor

Glenn Anderson, IBM Lab Services and Training

Summer SHARE 2013Session 14040