32
Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 9.0 © Copyright IBM Corporation 2014 [pPE531] The Four Dimensions of IBM POWER7 and IBM POWER8 Affinity Earl Jew ([email protected]) October 2014 Senior IT Management Consultant for IBM Power Servers/IBM Storage IBM STG Lab Services Power Systems Delivery Practice

pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 9.0

© Copyright IBM Corporation 2014

[pPE531]The Four Dimensions ofIBM POWER7 andIBM POWER8Affinity

Earl Jew ([email protected]) October 2014Senior IT Management Consultant for IBM Power Servers/IBM StorageIBM STG Lab Services Power Systems Delivery Practice

Page 2: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

ABSTRACT

The Four Dimensions of IBM POWER7 and IBM POWER8 Affinity

The four dimensions of Power7/8 affinity are NUCA, NUMA, thread migration and vCPU time-slice fragmentation. This session describes these dimensions and how they influence the performance of your workloads on Power/AIX systems.

Earl Jew ([email protected]) 310-251-2907 cellSenior IT Management Consultant - IBM Power Systems and IBM Systems Storage IBM STG Lab Services and Training - US Power Systems (group/dept)400 North Brand Blvd., c/o IBM 8th floor, Glendale, CA 91203

Page 3: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Leverage the deep skills and expertise of IBM's technical consultants to implement projects that achieve faster business value

Cloud design & implementation POWER8 provisioning assurance Big Data Analytics proof of concept Advanced virtualization provisioning HA/DR cluster design & deployment Security assessment and design Performance & database optimization Mobile application modernization

How to contact us:

Meet us at the Edge2015 Solutions Center

See demos of PowerVM Provisioning and Automation Tools and PowerSC Tools

email us at [email protected]

Follow us on Twitter at @IBMSLST

Learn more ibm.com/systems/services/labservices

IBM Systems Lab Services

Page 4: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: The four dimensions of Affinity

• The Power7/Power8 architecture is very different from the Power5/Power6 architecture. These differences tend to emphasize the effects of the four dimensions of Affinity

• Three dimensions of Affinity concern proximity localization whereby accessing physically closer content as well as migrating content a shorter distance expends substantially fewer cycles

• proximity localization underlies these dimensions of Power7/8 Affinity:

– NUCA (Non-Uniform Cache Access) affinity concerns the incidences of Local/Near/Far access to L2 and L3 cache content – relative to the CPUcore attending an LPAR’s virtual CPU workload

– NUMA (Non-Uniform Memory Access) affinity concerns the incidences of Local/Near/Far access to Main Memory content – relative to the CPUcore attending an LPAR’s virtual CPU workload

– Thread Migration affinity concerns migrating a given thread by its L2/L3 cache content to the same CPU-core (Perfect S0rd/S1rd), a different CPU-core:same socket (Local S3rd), a different socket:same CEC (Near S4rd), or a different CEC (Far S5rd).

Page 5: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: The four dimensions of Affinity

• vCPU time-slice fragmentation is the 4th dimension of Affinity that applies to Shared-Pool LPARs, but does not apply to (nor exist in) Dedicated CPU LPARs

• This dimension concerns the time-slice continuity of a given CPUcore’s dedicated attention to an LPAR’s virtual CPU – versus the multiply-divided time-slice fragmentation of a CPUcore’s attention to virtual CPUs of many LPARs throughout the PHYP 10ms dispatch cycle

Page 6: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Assumptions about Power7/8 Affinity

• Imagine the productivity/performance of a “perfectly-fed” CPU whereby:– all instructions are resident in Cache– all input data are resident in Cache– all output data are written to ever-empty Cache

However improbable, this is the aim of Power Affinity

• Most Enterprise Workloads today are granted sufficient CPU capacity to attend their workloads – indeed more cases have too much CPU capacity, not too little

• Performance today is less the traditional CPU bottleneck, and more simply moving instructions and data fast enough to keep CPUs busy (and not so terribly idle)

• Hence the trend of adding more&faster means of moving instructions and data between CPUcores, as well as, to/from DIMM/Flash/SSD/HDD/NAS subsystems

• For this presentation, we will assume all LPARs are Shared Pool LPARs

Page 7: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Assumptions about Power7/8 Affinity

• Any CPUcore can read/write to the L2/L3/Main Memory of any other CPUcore in the Power7/8 Architecture (only when authorized)

• Per PowerVP, accessing local RAM is faster than L2/L3 of other cores:same socket

• CPUcores can best access Threads (of processes) and Data residing in its own L2/L3/DIMMs -- versus the L2/L3 of other cores:same socket

• Priority Queues abound throughout PHYP/AIX: Each LPAR, CPUcore, virtual CPU, logical/SMT CPU has a Priority Queue for controlling the flow of executable threads-of-instructions

• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server architecture; factors are LPAR configuration, scale, entitlement, implementation, technology, Power affinity, etc.

Page 8: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Cycles Per Instruction (CPI) Jeff Stuecheli/IBM Development

Page 9: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUCA (Non-Uniform Cache Access)

NUCA (Non-Uniform Cache Access)

Page 10: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUCA (Non-Uniform Cache Access)

NUCA (Non-Uniform Cache Access)

• Three dimensions of affinity concern proximity localization whereby accessing physically closer content as well as migrating content a shorter distance expends substantially fewer cycles

• proximity localization underlies these dimensions of Power7/8 Affinity:

• NUCA (Non-Uniform Cache Access) affinity concerns the incidences of Local/Near/Far access to L2 and L3 cache content – relative to the CPUcore attending an LPAR’s virtual CPU workload

• When authorized, any CPUcore can access the L2/L3 cache of any other CPUcore within the same Power7/8 Architecture

Page 11: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUCA (Non-Uniform Cache Access) (Tracy Smith/IBM ATSS)

Each CEC is a Coherent Node

Left-most Diagram by Jeff Steucheli/IBM Development

Page 12: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7 System Topology (Jeff Stuecheli/IBM Development)

Page 13: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

PowerVP is most useful for monitoring NUCA (and NUMA)

Page 14: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUMA (Non-Uniform Memory Access)

NUMA (Non-Uniform Memory Access)

Page 15: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUMA (Non-Uniform Memory Access)

NUMA (Non-Uniform Memory Access)

• NUMA (Non-Uniform Memory Access) affinity concerns the incidences of Local/Near/Far access to Main Memory content – relative to the CPUcore attending an LPAR’s virtual CPU workload

• When properly configured:– every CPU socket has adjacent local DIMMs to serve with Main Memory– all DIMMs have an adjacent local CPU socket to serve as Main Memory

• When authorized, any CPUcore can access the Main Memory of any other CPU socket within the same Power7/8 Architecture

Page 16: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUMA (Non-Uniform Memory Access) (Damir Rubic/IBM ATSS)

Page 17: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUMA (Non-Uniform Memory Access) (Damir Rubic/IBM ATSS)

Page 18: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUMA (Non-Uniform Memory Access) (Damir Rubic/IBM ATSS)

Page 19: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: NUMA (Non-Uniform Memory Access)

mpstat -d 2

System configuration: lcpu=12 ent=2.1 mode=Uncapped

cpu cs ics bound rq push S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd0 291 2 1 1 0 0 70 88.4 0.0 0.0 0.0 0.0 11.6 0 279 53.2 0.0 46.81 0 0 0 0 0 0 0 - - - - - - 1 6 - - -2 0 0 0 0 0 0 0 - - - - - - 0 5 - - -3 0 0 0 0 0 0 0 - - - - - - 1 5 - - -4 299 6 0 0 0 0 105 83.0 0.2 0.0 0.0 0.0 16.8 0 248 42.5 0.0 57.55 0 0 0 0 0 0 0 0.0 100.0 0.0 0.0 0.0 0.0 0 6 0.0 0.0 100.06 0 0 0 0 0 0 0 - - - - - - 0 5 - - -7 0 0 0 0 0 0 0 - - - - - - 1 5 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 23 - - -ALL 590 8 1 1 0 0 175 85.6 0.2 0.0 0.0 0.0 14.2 3 582 47.7 0.0 52.3-----------------------------------------------------------------------------------------------------------------------------0 303 5 1 1 0 0 31 97.5 0.0 0.0 0.0 0.0 2.5 0 185 52.9 0.0 47.11 0 0 0 0 0 0 0 - - - - - - 0 5 - - -2 0 0 0 0 0 0 0 - - - - - - 0 5 - - -3 0 0 0 0 0 0 0 - - - - - - 0 5 - - -4 0 0 0 0 0 0 0 - - - - - - 0 60 - - -5 0 0 0 0 0 0 0 - - - - - - 0 0 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 10 - - -13 0 0 0 0 0 0 0 - - - - - - 0 0 - - -14 0 0 0 0 0 0 0 - - - - - - 0 0 - - -15 0 0 0 0 0 0 0 - - - - - - 0 0 - - -

ALL 303 5 1 1 0 0 31 97.5 0.0 0.0 0.0 0.0 2.5 0 270 52.9 0.0 47.1-----------------------------------------------------------------------------------------------------------------------------cpu cs ics bound rq push S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd

0 137 9 0 0 0 0 1 99.9 0.0 0.0 0.0 0.0 0.1 0 110 47.9 0.0 52.11 0 0 0 0 0 0 0 - - - - - - 0 2 - - -2 0 0 0 0 0 0 0 - - - - - - 0 2 - - -3 0 0 0 0 0 0 0 - - - - - - 0 2 - - -4 119 2 1 1 0 2 84 85.7 0.0 0.0 0.0 0.0 14.3 0 86 44.4 0.0 55.65 0 0 0 0 0 0 0 - - - - - - 1 2 - - -6 0 0 0 0 0 0 0 - - - - - - 0 2 - - -7 0 0 0 0 0 0 0 - - - - - - 0 2 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 11 - - -ALL 256 11 1 1 0 2 85 93.3 0.0 0.0 0.0 0.0 6.7 1 219 46.3 0.0 53.7-----------------------------------------------------------------------------------------------------------------------------

Page 20: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

PowerVP is most useful for monitoring NUMA (and NUCA)

Page 21: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: Thread Migration

Thread Migration

Page 22: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: Thread Migration

Thread Migration

• Thread Migration affinity concerns migrating a given thread by its L2/L3 cache content to the same CPU-core (Perfect S0rd/S1rd), a different CPU-core:same socket (Local S3rd), a different socket:same CEC (Near S4rd), or a different CEC (Far S5rd).

• Priority Queues abound throughout PHY/AIX: Each LPAR, CPUcore, virtual CPU, logical/SMT CPU has a Priority Queue for controlling the flow of executable threads-of-instructions– Queues control when&where threads-of-instructions execute by priority– Queues control when&where threads-of-instructions migrate between queues– Most priority values increase over time until they are served– Any grouping-of-queues will likely have a global (or master) queue

Page 23: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: Thread Migration

mpstat -d 2

System configuration: lcpu=12 ent=2.1 mode=Uncapped

cpu cs ics bound rq push S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd0 291 2 1 1 0 0 70 88.4 0.0 0.0 0.0 0.0 11.6 0 279 53.2 0.0 46.81 0 0 0 0 0 0 0 - - - - - - 1 6 - - -2 0 0 0 0 0 0 0 - - - - - - 0 5 - - -3 0 0 0 0 0 0 0 - - - - - - 1 5 - - -4 299 6 0 0 0 0 105 83.0 0.2 0.0 0.0 0.0 16.8 0 248 42.5 0.0 57.55 0 0 0 0 0 0 0 0.0 100.0 0.0 0.0 0.0 0.0 0 6 0.0 0.0 100.06 0 0 0 0 0 0 0 - - - - - - 0 5 - - -7 0 0 0 0 0 0 0 - - - - - - 1 5 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 23 - - -ALL 590 8 1 1 0 0 175 85.6 0.2 0.0 0.0 0.0 14.2 3 582 47.7 0.0 52.3-----------------------------------------------------------------------------------------------------------------------------0 303 5 1 1 0 0 31 97.5 0.0 0.0 0.0 0.0 2.5 0 185 52.9 0.0 47.11 0 0 0 0 0 0 0 - - - - - - 0 5 - - -2 0 0 0 0 0 0 0 - - - - - - 0 5 - - -3 0 0 0 0 0 0 0 - - - - - - 0 5 - - -4 0 0 0 0 0 0 0 - - - - - - 0 60 - - -5 0 0 0 0 0 0 0 - - - - - - 0 0 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 10 - - -13 0 0 0 0 0 0 0 - - - - - - 0 0 - - -14 0 0 0 0 0 0 0 - - - - - - 0 0 - - -15 0 0 0 0 0 0 0 - - - - - - 0 0 - - -

ALL 303 5 1 1 0 0 31 97.5 0.0 0.0 0.0 0.0 2.5 0 270 52.9 0.0 47.1-----------------------------------------------------------------------------------------------------------------------------cpu cs ics bound rq push S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd

0 137 9 0 0 0 0 1 99.9 0.0 0.0 0.0 0.0 0.1 0 110 47.9 0.0 52.11 0 0 0 0 0 0 0 - - - - - - 0 2 - - -2 0 0 0 0 0 0 0 - - - - - - 0 2 - - -3 0 0 0 0 0 0 0 - - - - - - 0 2 - - -4 119 2 1 1 0 2 84 85.7 0.0 0.0 0.0 0.0 14.3 0 86 44.4 0.0 55.65 0 0 0 0 0 0 0 - - - - - - 1 2 - - -6 0 0 0 0 0 0 0 - - - - - - 0 2 - - -7 0 0 0 0 0 0 0 - - - - - - 0 2 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 11 - - -ALL 256 11 1 1 0 2 85 93.3 0.0 0.0 0.0 0.0 6.7 1 219 46.3 0.0 53.7-----------------------------------------------------------------------------------------------------------------------------

Page 24: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: vCPU time-slice fragmentation

vCPU time-slice fragmentation

Page 25: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: vCPU time-slice fragmentation

vCPU time-slice fragmentation

• vCPU time-slice fragmentation is the 4th dimension of Affinity that applies to Shared-Pool LPARs, but does not apply to (nor exist in) Dedicated CPU LPARs

• This dimension concerns the time-slice continuity of a given CPUcore’s dedicated attention to an LPAR’s virtual CPU – versus the multiply-divided time-slice fragmentation of a CPUcore’s attention to virtual CPUs of many LPARs throughout the PHYP 10ms dispatch cycle

– The aim: Keeping fewer vCPUs so busy, they don’t Fold-down or Cede-away– The aim: Keeping each vCPU sticky&busy on its assigned Home CPUcore– The aim: Keeping each CPUcore sticky&busy to as few vCPUs as possible– These aims are met when implementing “Tight&Fat” eCPU::vCPU tactics– This can be monitored by observing AIX:mpstat –d 2 (ilcs and vlcs)

Page 26: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: vCPU time-slice fragmentation

• On Power7/8, there are two kinds of “context switching”:– There is the intra-LPAR context switching of threads on/off logical/SMT CPUs;

this is the extremely-fast light-weight switching of threads on chip-level circuitry• cs (context switching)• ics(involuntary context switching)

– There is the inter-LPAR context switching of active vCPUs of every SPLPARon/off CPUcores; all folded-up/active vCPUs are queued and scheduled over the CPUcores of the Shared Pool• ilcs (involuntary logical context switching) -- by preempting the vCPU off the

CPUcore; this occurs most when running Uncapped beyond CPU entitlement• vlcs(voluntary logical context switching) – by freely giving up the CPUcore;

this occurs when a vCPU is not busy enough to hold its CPUcore

• vCPU time-slice fragmentation pertains to only the inter-LPAR context switching of active vCPUs of every SPLPAR on/off CPUcores

Page 27: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: vCPU time-slice fragmentation

mpstat -d 2

System configuration: lcpu=12 ent=2.1 mode=Uncapped

cpu cs ics bound rq push S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd0 291 2 1 1 0 0 70 88.4 0.0 0.0 0.0 0.0 11.6 0 279 53.2 0.0 46.81 0 0 0 0 0 0 0 - - - - - - 1 6 - - -2 0 0 0 0 0 0 0 - - - - - - 0 5 - - -3 0 0 0 0 0 0 0 - - - - - - 1 5 - - -4 299 6 0 0 0 0 105 83.0 0.2 0.0 0.0 0.0 16.8 0 248 42.5 0.0 57.55 0 0 0 0 0 0 0 0.0 100.0 0.0 0.0 0.0 0.0 0 6 0.0 0.0 100.06 0 0 0 0 0 0 0 - - - - - - 0 5 - - -7 0 0 0 0 0 0 0 - - - - - - 1 5 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 23 - - -ALL 590 8 1 1 0 0 175 85.6 0.2 0.0 0.0 0.0 14.2 3 582 47.7 0.0 52.3-----------------------------------------------------------------------------------------------------------------------------0 303 5 1 1 0 0 31 97.5 0.0 0.0 0.0 0.0 2.5 0 185 52.9 0.0 47.11 0 0 0 0 0 0 0 - - - - - - 0 5 - - -2 0 0 0 0 0 0 0 - - - - - - 0 5 - - -3 0 0 0 0 0 0 0 - - - - - - 0 5 - - -4 0 0 0 0 0 0 0 - - - - - - 0 60 - - -5 0 0 0 0 0 0 0 - - - - - - 0 0 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 10 - - -13 0 0 0 0 0 0 0 - - - - - - 0 0 - - -14 0 0 0 0 0 0 0 - - - - - - 0 0 - - -15 0 0 0 0 0 0 0 - - - - - - 0 0 - - -

ALL 303 5 1 1 0 0 31 97.5 0.0 0.0 0.0 0.0 2.5 0 270 52.9 0.0 47.1-----------------------------------------------------------------------------------------------------------------------------cpu cs ics bound rq push S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd

0 137 9 0 0 0 0 1 99.9 0.0 0.0 0.0 0.0 0.1 0 110 47.9 0.0 52.11 0 0 0 0 0 0 0 - - - - - - 0 2 - - -2 0 0 0 0 0 0 0 - - - - - - 0 2 - - -3 0 0 0 0 0 0 0 - - - - - - 0 2 - - -4 119 2 1 1 0 2 84 85.7 0.0 0.0 0.0 0.0 14.3 0 86 44.4 0.0 55.65 0 0 0 0 0 0 0 - - - - - - 1 2 - - -6 0 0 0 0 0 0 0 - - - - - - 0 2 - - -7 0 0 0 0 0 0 0 - - - - - - 0 2 - - -

12 0 0 0 0 0 0 0 - - - - - - 0 11 - - -ALL 256 11 1 1 0 2 85 93.3 0.0 0.0 0.0 0.0 6.7 1 219 46.3 0.0 53.7-----------------------------------------------------------------------------------------------------------------------------

Page 28: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: vCPU time-slice fragmentation

• ilcs (involuntary logical context switching) by preempting vCPU off the CPUcore– Any ALL:ilcs means vCPUs with ongoing work are preempted off the CPUcore– Any ALL:ilcs means the vCPU workload has no CPUcore to execute instructions– Typically ALL:ilcs is mostly 0 with single and double digit pops – perfect !!– Acceptable is ALL:ilcs up to three-digits – especially on 5+ vCPU LPARs– Four digits of sustained ALL:ilcs is a blending of Acceptable and Not Acceptable– Five-plus digits of sustained ALL:ilcs is a PMR Sev2: Indicated Performance Issue

• vlcs(voluntary logical context switching) by freely giving-up the CPUcore– vCPUs are Cede’d off the CPUcore because there is no workload to hold them (~1000s/sec)– Every vCPU has a Home Core with a base assignment of Local SRAD memory – The longer a vCPU is not “at home”, the more its CPUcore’s L2/L3 cache gets displaced– Perfection: ALL:vlcs up to three-digits with a Steady-Peak workload– Acceptable: ALL:vlcs at four digits with a Steady-Peak workload– Reduce vCPUs when ALL:vlcs is sustained at five-plus digits with a Steady-Peak workload

Page 29: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: The four dimensions of Affinity

• The Power7/Power8 architecture is very different from the Power5/Power6 architecture. These differences tend to emphasize the effects of the four dimensions of Affinity

• Three dimensions of Affinity concern proximity localization whereby accessing physically closer content as well as migrating content a shorter distance expends substantially fewer cycles

• proximity localization underlies these dimensions of Power7/8 Affinity:

– NUCA (Non-Uniform Cache Access) affinity concerns the incidences of Local/Near/Far access to L2 and L3 cache content – relative to the CPUcore attending an LPAR’s virtual CPU workload

– NUMA (Non-Uniform Memory Access) affinity concerns the incidences of Local/Near/Far access to Main Memory content – relative to the CPUcore attending an LPAR’s virtual CPU workload

– Thread Migration affinity concerns migrating a given thread by its L2/L3 cache content to the same CPU-core (Perfect S0rd/S1rd), a different CPU-core:same socket (Local S3rd), a different socket:same CEC (Near S4rd), or a different CEC (Far S5rd).

Page 30: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Power7/8: The four dimensions of Affinity

• vCPU time-slice fragmentation is the 4th dimension of Affinity that applies to Shared-Pool LPARs, but does not apply to (nor exist in) Dedicated CPU LPARs

• This dimension concerns the time-slice continuity of a given CPUcore’s dedicated attention to an LPAR’s virtual-CPU – versus the multiply-divided time-slice fragmentation of a CPUcore’s attention to virtual CPUs of many LPARs throughout the PHYP 10ms dispatch cycle

Page 31: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

© Copyright IBM Corporation 2014

The Four Dimensions ofIBM POWER7 andIBM POWER8Affinity

Thank youQ&A

Earl Jew ([email protected]) 310-251-2907

Page 32: pPE531 earlj The Four Dimensions of IBM POWER7 and IBM ......• By the above, a given workload may run variably faster/slower based on how/where/when it manifests on a shared server

Trademarks and notes

IBM Corporation 2014

IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Other company, product, and service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make

them available in all countries in which IBM operates. IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to

clients. Clients should consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf of the client are the sole responsibility of the customer.

IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada, and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability are based on a client’s credit rating, financing terms, offering type, equipment type and options, and may vary by country. Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.