Day2-01 Architecting and Deploying IBM Power Enterprise Systems (1)

Embed Size (px)

DESCRIPTION

Day2 - 01 Architecting and Deploying IBM Power Enterprise System

Citation preview

  • Copyright IBM Corporation 2015

    Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.

    Copyright IBM Corporation 2015

    Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.

    Chandan Chopra

    Architecting and Deploying IBM

    Power Enterprise Systems

    [email protected]

    Power Systems Solution Architect, IBM Systems Lab Services

  • Agenda

    2

    Power Systems Portfolio Power8 Enterprise Systems Architecture Deployment Guidelines Solution Guidelines Q&A

  • Power Systems Portfolio & Power8 Enterprise Systems Architecture

  • Power Systems Portfolio

    4

  • Power E870 and E880 Servers

    5

    Increased performance and scale System Control Unit (midplane) Active Memory Mirroring 8 PCIe3 adapter slots per node PCIe Gen3 I/O drawers Power Enterprise Pool PowerVM Enterprise included Enterprise RAS

    Even for 1-node system

    24x7 Warranty

    Power E880 Power E870

  • Power8 Enterprise Family

    6

    E850 16 - 48 Cores

    3.72 GHz (12c)

    3.35 GHz (10c)

    3.02 GHz (8c)

    128 GB 2 TB Memory* 7 - 51 PCI Adapters

    E870 8 - 80 Cores, 1-2 nodes

    4.19 GHz (10c)

    4.02 GHz (8c)

    256 GB 8 TB Memory 8 - 96 PCI Adapters

    E880 8 - 192 Cores, 1-4 nodes

    4.02 GHz (12c)

    4.35 GHz (8c)

    256 GB 16 TB Memory 8 - 192 PCI Adapters

    * Statement of direction to 4 TB. Statements of direction represent plans only and are subject to change without notice.

  • Power8 Enterprise System Structure

    7

  • Power8 System Control Unit

    8

    Improves availability of all E870 and E880 configurations

  • System Node PCIe slots

    9

    Eight Low profile (LP) adapter slots

    Used for PCIe adapters (Gen1, Gen2 or Gen3 LP adapters)

    Or used to connect to PCIe Gen3 I/O Expansion Drawer

    Slots use a new low profile

    blind swap cassette (BSC).

    Server comes fully

    populated with BSC. No

    special feat code

    associated with BSC.

  • PCIe Gen3

    10

    Though these cards physically look the same and fit in the same slots Gen3 cards/slots have up to 2X more bandwidth than Gen2 cards/slots Gen3 cards/slots have up to 4X more bandwidth than Gen1 cards/slots

    More virtualization More consolidation saving PCI slots and I/O drawers More ports per adapter

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    Gen1 Gen2 Gen3

    Peak

    Sustained

    A Gen1 x8 PCIe adapter has a theoretical max (peak)

    bandwidth of 4 GB/sec.

    A Gen2 x8 adapter has a peak bandwidth of 8 GB/sec.

    A Gen3 x8 adapter has a peak bandwidth of 16 GB/sec.

  • PCIe Gen3 I/O Expansion Drawer

    11

    Feat #EMX0

    Rear view

    Front view

    Fan-out Module 6 PCIe Gen3 Slots

    Attaches to 1 system node PCIe slot Fan-out Module

    6 PCIe Gen3 Slots Attaches to 1 system node PCIe slot

    12 PCIe Gen3 slots

    4U drawer

    Full high PCIe slots

    Hot plug PCIe slots

    Modules not hot plug

  • Single Root I/O Virtualization (SR-IOV)

    12

    Direct Ethernet virtualization

    Lower CPU overhead

    Better throughput

    QoS capable

    * Note: The number of Virtual Functions available

    per adapter or port is adapter dependent Up to 64*

    Virtual Functions

    Example: 4-port PCIe3 10Gb FCoE Adapter

    VM

    1

    VM

    2

    VM

    3

    VM

    4

    Model SR-IOV Mode Supported Slots

    E850 All internal slots

    E870 All internal slots

    E880 All internal slots

    I/O Drawer Slots C1 and C4 of the 6-slot fan-out module

    Software SR-IOV Software Support

    AIX

    AIX 6.1 TL9 SP5 and APAR IV68443 or later

    AIX 7.1 TL3 SP5 and APAR IV68444 or later

    AIX 7.1 TL2 SP7 or later (planned availability 3Q 2015)

    AIX 6.1 TL8 SP7 or later (planned 3Q 2015)

    IBM i

    IBM i 7.1 TR10 or later

    IBM i 7.2 TR2 or later

    Both require either VIOS or adapter in SR-IOV mode

    Red Hat

    Red Hat Enterprise Linux 6.6 or later

    Red Hat Enterprise Linux 7.1, big endian, or later

    Red Hat Enterprise Linux 7.1, little endian, or later

    SUSE SUSE Linux Enterprise Server 12 or later

    Ubuntu Ubuntu 15.04 or later

    PowerVM Firmware 830 available June, 2015 and HMC V8.830

  • EXP24S SFF Gen2-bay Drawer

    13

    Front

    Rear

    (24) 2.5 inch hot-swap SAS or

    SSD disks

    Ordered as 1,2, or 4 sets of disks*

    Redundant power

    * Applies to orders for AIX, Linux, and VIOS, IBM i is ordered as 1 set

  • Enterprise System Deployment Guidelines

  • Hardware areas to discuss

    15

    POWER Processors and levels of cache

    Does processor speed (frequency) matter?

    Multi-Core Multi-Node Systems

    How many Nodes (Books/Enclosures) ?

    Should I use more than minimum?

    How many should I have installed vs active and why?

    Memory

    How much do I need ? Should I fill the Memory card slots ?

    Memory access (Local, Near, and Far NUMA)

    I/O

    How many drawers on a loop ?

    Do card slots matter ?

    Adapter placement across drawers and nodes for potential higher availability,

    Performance

  • Processor Designs

    16

    POWER6 POWER7 POWER7+ POWER8

    Technology 65nm 45nm 32nm 22nm

    Size 341 mm2 567 mm2 567 mm2 675 mm2

    Transistors 790 M 1.2 B ~2.4 B ~5 B

    Cores 2 8 8 12

    Frequencies 4+ GHz 3 4+ GHz 3 4+ GHz 3 4.35 GHz

    L2 Cache 4MB / Core 256 KB / Core 256 KB / Core 512 KB / Core

    L3 Cache 32MB 32MB 80MB 96MB

    L4 Cache - - - 128MB

    Memory (Dram Channel)

    8 DDR2 16 DDR2 16 DDR2 32 DDR3/4

    I/O Propriety GX Propriety GX+ Propriety GX+ Integrated PCIe

    Architecture In of Order Out of Order Out of Order Out of Order

    Threads 2 4 4 8

  • Simultaneous Multithreading

    17

  • Simultaneous Multithreading

    18

    SMT1

    Largest unit of execution work

    SMT2

    Smaller unit of work, but provides greater amount of

    execution work per cycle

    SMT4

    Smaller unit of work, but provides greater amount of

    execution work per cycle

    SMT8

    Smallest unit of work, but provides the maximum

    amount of execution work per cycle

    Can dynamically change modes as required: SMT1 / SMT2 / SMT4 / SMT8

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    P7

    SMT1

    P8

    SMT1

    P8

    SMT2

    P8

    SMT4

    P8

    SMT8

  • Power Sizing: Throughput and Response time

    19

    Higher SMT Boosts capacity by Allowing core to continue executing instructions during cache miss delays.

    Using available execution resources not used by other task(s).

    Overall throughput increases

    Task executes fastest when alone.

    Task Dispatcher of dedicated-processor partition spreads tasks first over available cores.

    As task count increases, task speed decreases.

    Tasks executing individually slower, but are executing.

    Response Time consideration: Consider setting partition limit to four threads (P7 mode) on POWER8.

    Big improvement in task execution speed

  • Power Sizing: rPerf and CPW

    20

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    8-core

    POWER6 5505.0GHz

    POWER7 7503.3GHz

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    1-socket

    POWER6 570

    5.0GHz

    POWER7 780

    3.86GHz

    Core-to-Core Performance 8-core POWER6 vs. POWER7

    Socket-to-Socket Performance 1-chip POWER6 vs. POWER7

    POWER7 and POWER8 provide significant

    gains in CPW & rPerf Ratings Impressive core-to-core capacity increase Outstanding socket-to-socket increase in capacity

    CPW and rPerf are OLTP DB workloads used

    for representing Capacity

  • Power Sizing: rPerf and CPW

    21

    E880 32-core 4.35 GHz 716.0

    64-core 4.35 GHz 1,432.5

    128-core 4.35 GHz 2,865

    48-core 4.02 GHz 976.4

    96-core 4.02 GHz 1,952.9

    192-core 4.02 GHz 3,905.8

    E870 32-core 4.02 GHz 674.5

    64-core 4.02 GHz 1,349.0

    40-core 4.19 GHz 856.0

    80-core 4.19 GHz 1,711.9

    CPW

    E880

    32-core 4.35 GHz 381,000

    64-core 4.35 GHz 755,000

    128-core 4.35 GHz 1,523,000

    48-core 4.02 GHz 518,000

    96-core 4.02 GHz 1,034,000

    192-core 4.02 GHz 2,069,000

    E870 32-core 4.02 GHz 359,000

    64-core 4.02 GHz 711,000

    40-core 4.19 GHz 460,000

    80-core 4.19 GHz 911,000

    rPerf

  • Power Sizing: rPerf and CPW

    22

    What if I had a workload that needed 70,000 CPW

    9117-MMD 12-core (4.2GHz) = 90,000 CPW and 90,000/12 cores = 7500/core

    9119-MME 40-core (4.19GHz) = 460,000 CPW and 460,000/40 cores = 11500/Core

    In this example CPW on POWER7 @ 7,500 per core running SMT4

    and CPW on POWER8 = 11,500 per core running SMT8

    The POWER8 system might very well provide the CPW capacity However, remember response time vs throughput. You might get the transactions but at increased response times and longer batch runtimes.

    USE WLE to size

    and CPW on POWER8 = 9,200 per core running SMT4 (460,000 x .8 / 40 = 9,200 CPW)

    Based on CPW math

    POWER7 (SMT4)

    70,000 CPW divided 7,500 per core

    ------------------ 9.33 Cores

    POWER8 (SMT8)

    70,000 CPW divided 11,500 per core

    ------------------ 6.08 Cores

    POWER8 (SMT4)

    70,000 CPW divided 9,200 per core

    ------------------ 7.6 Cores

  • Best Practice #1

    23

    Consider appropriate rPerf and CPW for selecting a POWER8 system.

    Remember these are ratings of capacity not speed.

    You can migrate a workload to a slower frequency system with at least the same

    or better CPW and/or rPerf rating, but not when per thread performance (speed)

    is critical

    Start with about 3/4 of cores of POWER7 if speed is the requirement.

    Consider using SMT4 (POWER7 mode) when speed is a major concern on

    POWER8 systems.

    Consider dedicated or dedicated donate for partitions that are business critical

    Understand the number of cores worth of capacity and performance you need in

    POWER8 compared to POWER7 or POWER6

    Use performance sizing tools

    If speed (response time and batch run time) is the priority for the workload then consider using higher frequency POWER8 Processors.

  • Power Sizing: Tools

    24

    IBM Systems Energy Estimator (SEE)

    Estimates energy for Power Systems

    Integration points: WLE, SPT, e-Config

    SEE drives 550 energy estimates per week

    IBM Systems Workload Estimator (WLE):

    Strategic sizing tool that recommends the best IBM system to satisfy overall workload and virtualization

    requirements

    Power Systems, System x, PureFlex System - AIX, IBM i, Linux, Windows

    - PowerVM (Partitions and VIOS), x virtualization

    - Customizable storage (internal, SAN, SSD)

    Considers existing customer data for sizing upgrades, migrations, and consolidations

    Sizes new workloads via 300 WLE plug-ins

    Flexible interface for IBM/ISVs to build plug-ins

    Free strategic sizing tool for IBM Sales, ISV/BP, customers

    http://www-947.ibm.com/systems/support/tools/estimator/

    http://www-947.ibm.com/systems/support/tools/estimator/energy/

  • Multi-core Multi-node Systems

    25

    Multi-core - smaller die size, more transistors, more processor cores per chip, more

    threads per core. more functions on chip,

    Use of SMP (Symmetric Multi-Processing) to scale across more cores

    Multi Core and Multiple Node Power Systems 870, 880, 770, 780, 795

    NUMA (Non-Uniform Memory Access), a concept that is used to further drive up the

    performance capacity of a system.

    What is Multi-Node: http://www-03.ibm.com/systems/resources/pwrsysperf_WhatIsMulticoreP7.pdf

  • Power 870, 880,770, 780, and 795 Scale by adding Nodes

    26

    These systems differ from the non-Enterprise Power Systems Additional scaling by adding Enclosures/Books/Nodes

    Each additional node adds cores, memory, and I/O (Bandwidth)

    Adding Nodes can improve RAS characteristics

    770/780 adding second enclosure adds second clock and FSP (795 always has

    second clock and FSP in System frame)

    870 and 880 always has dual clock and dual FSP in System control unit

    Additional I/O multi path if node failure/maintenance

    Adding Nodes can improve Performance

    Extra capacity is controlled with CUoD activation codes Memory and processor On Demand

    If more cores and memory installed than active, Hypervisor has more options for

    partition placement for best processor and memory affinity

  • 64 way 770 to POWER8 Upgrade for best performance

    27

    770 64 way needs four enclosures nodes and has memory in all four nodes

    E880 48 way needs only one node and the memory in one node

    Should I use one System node? Would it be better to use two nodes ?

    Additional nodes provides better RAS and gives Hypervisor better placement options which could provide better performance

  • NUMA - Non-Uniform Memory Access

    28

    is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.

    Why would we design something like this?

    1. The key to the answer is bandwidth

    2. Bandwidth available for accessing memory scales up linearly with the number of chips

    3. A more rapid access to local memory and scalable bus bandwidth is largely what a NUMA-based system produces.

  • Memory: POWER8 Processor Planner Memory Layout

    29

    8 CDIMMS per SCM

    Each CDIMM adds memory bandwidth

    Each CDIMM adds L4 cache

  • Memory: POWER8 Memory CDIMMs Rule

    30

    8 CDIMM slots per SCM (2 feature codes per SCM)

    Minimum one memory feat code any size (four identical CDIMM) per SCM

    Optional second memory feature (four identical CDIMM) per SCM

    2nd memory feature code same capacity as 1st memory feature code

  • Memory: E870/E880 Memory Bandwidth

    31

    Up to 1 TB / Socket (with two 512GB features, eight 128GB CDIMMs)

  • Best Practice #2 (Memory Configuration)

    32

    Understand your LPAR definitions (processors and memory)

    Avoid having chips without DIMMs.

    Attempt to fill every chips DIMM slots, activating as needed.

    Hypervisor tends to avoid activating cores without local memory.

    POWER8 Performance Best Practices http://www14.software.ibm.com/webapp/set2/sas/f/best/home.html

  • Affinity

    33

    Affinity is a measurement of the proximity a thread has to a physical resource, and performance is optimal when data crossing affinity domains is minimized

    Examples of resources can include L2/L3 cache, memory, core, chip and System node

    Cache Affinity: threads in different domains need to communicate with each other, or cache needs to move with thread(s) migrating across domains

    Memory Affinity: threads need to access data held in a different memory bank not associated with the same chip or node

    Think about your biggest partitions cores and memory, could it fit on a node with the addition of the Hypervisor memory usage?

  • Power8 Cache

    34

    L1: 96 KB per Core

    L2: 512 KB per Core

    Large working sets Single thread sensitive Multi-threaded

    L3: 96MB per SCM

    Virtualization Shared data

    L4: 16MB off-chip on each

    memory card

    Write burst traffic 55% lower latency reads Mixed reads and writes

  • Where does your application access data?

    35

    Access

    data:

    L1 cache L2 cache L3 cache L4 cache Local

    memory

    Remote

    memory

    Distant

    memory

    Cycles 3 12 28 180 320 500 800

  • Best Practice #3 (Partition Placement for Affinity)

    36

    Define dedicated partitions first.

    Within shared pool, define large partition first.

    After initial LPAR definitions IPL the System.

    At full system (not partition) IPL , Hypervisor will allocate resources for best affinity on given configuration.

    At deep IPL (System power cycle) Hypervisor will use previous partition allocation table to place partitions for best performance.

    Consider use of DPO and PowerVP

    Help the hypervisor cleanly place partitions when they are first defined and activated.

  • Dynamic Platform Optimizer (DPO)

    37

    Designed to reduce the complexity and time required for clients to

    manage and tune their systems DPO optimizes processor and memory affinity in virtualized consolidated

    environments

    Process first runs to assess level of affinity by partition

    User then selects partitions for system optimization

    System and workloads continue to run during optimization process

    System adjusts workload placement in background to optimize performance

    without requiring additional interaction

    Available at no additional charge for Power 770, 780, 795, 870 and 880

    systems with firmware level 760 or later

    DPO operations can be automated using HMC

  • Best Practice #4

    38

    Think about the nodal resources as you define partitions resources.

    Cores Cores

    DIM

    Ms

    DIM

    Ms

    Cores Cores

    DIM

    Ms

    DIM

    Ms

    Cores Cores

    DIM

    Ms

    DIM

    Ms

    Cores Cores

    DIM

    Ms

    DIM

    Ms

    Ideally, partitions shouldnt span a chip or book/drawer boundary.

    Be aware of the number of cores per chip and chips per book/drawer.

  • Best Practice #5

    39

    Dont under-commit entitlement.

    Every virtual processor has a preferred Node ID. That set of cores close to where memory resides. Too little entitlement results in too many VCPUs contending for nodes cores. Results in reduction in system capacity when needed most. Set VCPUs to entitlement rounded up. Dont over-commit shared-processor pool with virtual processors.

  • Best Practice #6

    40

    Update Firmware to latest level

    The hypervisor has had numerous performance enhancements Favor performance over energy savings Home node re-dispatch Dynamic Platform Optimizer added New PowerVP License Program product

    Partition X Memory

    Partition Y Memory

    Partition Z Memory

    Free LMBs

    Partition Y Processors

    Partition X Processors Partition Z

    Processors

    Partition X Processors

    Partition Y Processors

    Partition Z Processors

  • PCIe Adapter Placement Rules and Priorities

    41

    Rules for E870 and E880

    All slots are x16 with buses direct from the Processor Modules and must be used to install high-performance PCIe adapters

    The adapter priority for these slots is for the PCIe3 Optical Cable Adapter (FC EJ07), SAS adapters (FC EJ0M, EJ11), followed by any other high-performance low-profile adapter

    Refer to Slot priority table for all supported adapters for optimal placement https://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8eab/p8eab_87x_88x_slot_details.htm

    All slots support Single Root IO Virtualization (SRIOV) capable adapters

    Verify whether the adapter is supported for your system. IO placement can

    be planned and validated using System Planning Tool (SPT)

  • PCIe I/O Drawer per E870/E880 Node

    42

    2x more drawers PLUS More flexibility

    0, 1, 2, 3 or 4 PCIe Gen3 I/O Drawers in 2015 (max 8 fan-out modules per node)

    Requires 8.3 firmware level available June 2015

  • PCIe I/O Drawer per E870/E880 Node

    43

    0, , 1, 1, 2, 2, 3, 3 or 4 PCIe Gen3 I/O Drawers in 2015 (max 8 fan-out modules per node)

    Requires 8.3 firmware level available June 2015

    For even more flexibility can choose to have 1/2 drawers.

    Thus any of the drawers could have a single 6-slot fan-out module

  • Supported PCIe I/O Drawer Cabling Examples

    44

    Notes: With two system nodes it is a good practice (but not required) to attach the two fan-out

    modules in one I/O drawer to different system nodes. Combined with placing redundant PCIe adapters in different fan-out modules, system availability is enhanced.

    PCIe I/O drawer can be in the same or different rack as the system nodes. If large numbers of I/O cables are attached to PCIe adapters, its nice to have the I/O drawer in a different rack for cable management ease

    System control unit not shown for visual simplicity

    Note the single blue/green/etc lines below each depicts two physical AOC cables

  • Supported PCIe I/O Drawer Cabling More Examples

    45

  • System Planning Tool

    46

    www.ibm.com/systems/support/tools/systemplanningtool/

  • Enterprise System Solution Guidelines SMT Guidance Active Memory Mirroring Guidance SRIOV Guidance Power Saving Guidance Enterprise Pools

  • Review: Power6 vs Power7/Power8 SMT Utilization

    48

  • Power6 vs Power7/Power8 Dispatch

    49

  • Power6 vs Power7/Power8 Dispatch

    50

  • Migrations: Dispatching, SMTGuidance

    51

    When migrating from POWER7 to POWER8, expect the following

    Dispatch behavior remains the same

    Physical CPU consumption will look similar based on VPs

    When migrating from POWER5/POWER6 to POWER8, expect the following

    Dispatch behavior will be different (scaled and raw)

    Physical CPU consumption will look higher on POWER8

    Too low VP can limit the dynamic scalability of workload

    Too high VP can result in

    Higher physical CPU usage for heavily loaded partitions (raw through put

    mode, default)

    VP folding for less loaded partitions

  • Power8 SMT Default: Why SMT4?

    52

    A partition that runs AIX 6.1 on POWER8 will only support POWER6,

    POWER6+ or POWER7 mode

    Will limit partition to SMT4

    A partition that runs AIX 7.1 on POWER8 will only support POWER6,

    POWER6+, POWER7 or POWER8 mode

    Will scale partitions to SMT8

    AIX chose to keep SMT4 as default on POWER8

    Most workloads will be fine with SMT4 or SMT8

    Applications with scalability issues will not be able to leverage SMT8

    Many workloads do not run at 80% utilization levels to be able to use SMT8

    threads

    SMT4 is the best of all worlds for now, but there are now more options to

    exploit SMT

  • Power8 SMT: Should I use SMT8?

    53

  • Power8 SMT: Should I use SMT8?

    54

    Any PoC or benchmark where we are going to drive to 80% utilization

    We want to use all the capacity

    OLTP DB, large WAS servers, etc will get benefit

    Environment where you have fair idea of SMT behavior

    If utilization is high and increasing SMT threads had improved performance

    It is easy and free to test SMT4 and SMT8 modes, no reboot required

    For new applications, need to review software stack

    If application space is well known on AIX, SMT8 should not be a problem

    If application is new to AIX, should be tested for scaling issues

  • Scaled Throughput Guidance

    55

  • Active Memory Mirroring - Hypervisor Mirroring Standard on E870 and E880 Systems

    56

    Eliminate Platform outages due to

    uncorrectable errors in memory

    Maintains two identical copies of the system

    hypervisor in memory at all times

    Both copies are simultaneously updated with

    any changes

    In the event of a memory failure on the

    primary copy, the second copy will be

    automatically invoked and a notification sent

    to IBM via the Electronic Service Agent (ESA)

  • AMM Guidance

    57

    Hypervisor memory mirroring defaults to enabled. You need to be aware of this when sizing system

    memory. Plan on AMM to take about 8% of each nodes memory and 16% if hypervisor mirroring

    Remember,

    Hypervisor data that is mirrored:

    Hardware Page Tables (HPTs) that are managed by the hypervisor on behalf of partitions to track

    the state of the memory pages assigned to the partition.

    Translation control entries (TCEs) that are managed by the hypervisor on behalf of partitions to

    communication partition I/O buffers for I/O devices,

    Hypervisor code (instructions that make up the hypervisor kernel)

    Memory used by hypervisor to maintain partition configuration, I/O states, Virtual I/O information,

    partition state and so on

    Hypervisor data that is not mirrored:

    Memory used to hold contents of platform dump while waiting for offload to HMC/OS

    Partition data is not mirrored:

    Desired memory configured for individual partitions are not mirrored

    Switch off the I/O Adapter Enhanced Capacity Feature unless you are running Linux with dedicated

    physical adapters.

    I/O Adapter Enhanced Capacity is reserved memory

    With hypervisor memory mirroring enabled, this gets doubled. Reserved memory can go excessively

    high for Power8 Enterprise systems

  • SRIOV Guidance

    58

    Link Aggregation (LACP) will not function properly with

    multiple logical ports using the same physical port

    Etherchannel is not recommended for an SR-IOV

    configuration. For Etherchannel, SR-IOV logical ports may go

    down while the physical link remains up. Switch does not

    recognize a logical port going down and will continue to send

    traffic on the physical port

    Use Link Aggregation (LACP) with one logical port per

    physical port. Provides greater bandwidth than a single link

    with failover

    Best Practice

    Assign 100% capacity to each SR-IOV logical port in the

    Link Aggregation Group to prevent accidental assignment

    of another SR-IOV logical port to the same physical port

  • SRIOV Guidance (LPM Options with SRIOV)

    59

    Multiple VIOS configuration

    Use current Virtual Ethernet support with logical

    ports as Shared Ethernet Adapter (SEA)

    physical connections to the network

    Reduced adapter and port requirements

    Does not receive performance benefits provided

    with SR-IOV Direct Access

  • SRIOV Guidance (LPM Options with SRIOV)

    60

    Active-backup configuration

    Configure SR-IOV logical port as Active

    connection and Virtual Ethernet adapter as

    backup

    Prior to migration, use dynamic LPAR operation

    to remove SR-IOV logical port

    Virtual Ethernet becomes Active connection

    Migrate the partition

    On target system, configure SR-IOV logical port

    as Active connection

  • VIOS, AIX, Linux and HMC Guidance

    61

    The minimum level of AIX 6.1 or 7.1 supported on

    E870 and E880 depends on partition having 100%

    virtualized (via VIOS) resources or not

    The minimum level of VIOS

    VIOS 2.2.3.4 with ifix IV63331

    VIOS 2.2.3.51 with APAR IV68443 and

    IV68444

    Fix Level Recommendation Tool (FLRT)

    https://www14.software.ibm.com/webapp/set2/flrt/home

    For LPM fix recommendations, use FLRT LPM

    Report

  • Power Saving and Favor Performance

    62

    Power Saver Mode

    Predetermined reduction in

    processor frequency

    Dynamic Power Saver Mode

    Processor frequency varies

    based on usage of processors

    Frequency can be increased

    (favor performance) or

    reduced (energy saving)

    If performance is favored over

    energy saving, consider enabling

    Favor performance mode in ASMI

  • Power Enterprise Pools

    63

    Flexibility & Ease of operations & Price performance

    Enhanced availability and cloud characteristics

    For POWER7+ 770, POWER7+ 780, Power795,

    and Power E870, Power E880

    Power Enterprise Pools

  • Power Enterprise Pools

    64

    New mobile activations for both processor and memory

    Mobile activations can be used for systems within the same pool

    One pool type for Power E880 & POWER7+ 780 & Power 795 systems

    One pool type for Power E870 & POWER7+ 770 systems

    Activations can be moved at any time by the user without contacting IBM

    Done using HMC

    Movement of activations is instant, dynamic and non-disruptive

    Many Power Systems software entitlements also mobile

    Power Enterprise Pools enable you to move processor and memory

    activations within a defined pool of systems, at your convenience.

  • Power Enterprise Pools Example

    65

    Sys A 64-core E880

    4.35 GHz Activations:

    10 static 40 mobile 14 dark

    Sys B 96-core 795

    3.7 GHz Activations:

    30 static 40 mobile 26 dark

    Sys C 96-core 780

    3.7 GHz Activations:

    16 static 20 mobile 60 dark

    Sys D 128-core 795

    4.0 GHz Activations:

    40 static 60 mobile 28 dark

    Pool Totals

    Activations: 96 static

    160 mobile 128 dark

    Monday 8 am

  • Power Enterprise Pools Example

    66

    Sys A 64-core E880

    4.35 GHz Activations:

    10 static 0 mobile 54 dark

    Sys B 96-core 795

    3.7 GHz Activations:

    30 static 55 mobile 11 dark

    Sys C 96-core 780

    3.7 GHz Activations:

    16 static 45 mobile 35 dark

    Sys D 128-core 795

    4.0 GHz Activations:

    40 static 60 mobile 28 dark

    Pool Totals

    Activations: 96 static

    160 mobile 128 dark

    Monday 8:01 am

  • Power Enterprise Pools Guidance

    67

    67

    PLAN Review Power Enterprise Pools offering and plan implementation

    DEFINE

    SIGN

    REQUEST

    ORDER

    INSTALL

    DOWNLOAD

    USE

    Define participating systems by serial numbers within a pool

    Sign Power Enterprise Pools contract and addendum

    Submit addendum to IBM and request Pool ID

    Order mobile enablement, processor and memory activations

    Install new firmware for participating systems and HMC

    Download configuration file to HMC from IBM web site

    Assign activations to systems

  • Summary (1 of 2)

    68

    68

    Identify Power Enterprise systems best suitable for you needs

    Perform sizing based on throughput and response time considerations

    For response time critical workloads, higher frequency POWER8 processor will give

    more benefit

    Understand SMT behavior on POWER8 systems and evaluate, apply accordingly

    For maximum memory bandwidth, populate all memory DIMMS slots

    For optimum cache and memory affinity, plan for partition placement in processor nodes

    Additional drawers may help you get better performance. Plan for scalability and

    performance

    Apply latest firmware level and review minimum supported OS, VIOS and HMC levels for

    using various capabilities on POWER8 Enterprise systems

    Consider planning for IO adapter placement based on slot priorities

  • Summary (2 of 2)

    69

    69

    AMM can be leveraged for higher reliability on Enterprise systems. Disable IO adapter

    Enhance Capacity to avoid excessive usage of hypervisor memory

    SRIOV can be considered based on solution requirements

    Leverage tools like SPT, WLE, SEE for planning

    DPO, PowerVP can help is management of partition affinity on Enterprise systems

    Power Enterprise Pools will help provide additional availability across pool on systems

  • PowerCare Service

    70

    70

    Select one PowerCare service option with each Power E870 or E880

    A PowerCare Services engagement offer is included, at no additional charge, with the purchase of each Power E870 or E880 system.

    Power E870 engagement options include : Enterprise Systems Optimization Power Systems Availability Cloud Enablement Power Integrated Facility for Linux (IFL)

    Power E880 PowerCare engagement options include: Enterprise Systems Optimization Power Systems Availability Cloud Enablement Security Power Integrated Facility for Linux (IFL) Tivoli Monitoring Enablement Mobile Enablement with Worklight Private Technical Training

    For more information contact IBM Lab Services [email protected]

  • Thank You

  • References

    72

    72

    Power systems best practices

    http://www14.software.ibm.com/webapp/set2/sas/f/best/home.html

    E870, E880 Redbook

    https://www.redbooks.ibm.com/redbooks.nsf/RedbookAbstracts/redp5137.html?Open

    IBM System Planning Tool

    www.ibm.com/systems/support/tools/systemplanningtool/

    Fix Level Recommendation Tool

    https://www14.software.ibm.com/webapp/set2/flrt/home

    PCIe Slot priority table for all supported adapters for optimal placement

    https://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8eab/p8eab_87x_88x_slot_details.htm

    Dynamic Platform Optimizer

    https://www-01.ibm.com/support/knowledgecenter/POWER7/p7hat/iphatdpoovw.htm?cp=POWER7%2F1-8-2-5-3-5-0

  • References

    73

    73

    AIX Performance website https://www.ibm.com/developerworks/wikis/display/WikiPtype/Performance+Monitoring+Documentation

    https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/rperff

    System Performance Reports http://www.ibm.com/systems/power/hardware/reports/system_perf.html

    IBM Benchmark Index http://www-03.ibm.com/systems/power/hardware/reports/system_perf.html

    Benchmarking blog https://www.ibm.com/developerworks/mydeveloperworks/blogs/benchmarking

    Workload Estimator http://www.ibm.com/systems/support/tools/estimator/

    Americas Lab Services http://www-03.ibm.com/systems/services/labservices/