Webinar OpenVMS i2ServerPerformance

Embed Size (px)

Citation preview

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    1/33

    12010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

    Rafiq Ahamed K, Technical Expert, OpenVMS

    3rd December 2010

    OpenVMS V8.4 PerformanceOn New i2 Server

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    2/33

    2

    Agenda

    New i2 Server Quick Introduction

    Performance of new i2 Servers

    V8.4 Performance Features and Improvements

    OpenVMS Guest Performance Summary

    Q & A

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    3/33

    3

    The performance results shared in this session are fromengineering test environment, they do not represent any

    specific customer workload. Your mileage may vary.

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    4/33

    4

    New i2 Server Quick Introduction

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    5/33

    5

    Platform Evolution

    All CPU hog same MC;Possible Memory

    Controller Bottleneck

    Added more FSB forscalability

    Source: Intel Corporation

    Low Latency, HighBandwidth,

    Linear ScalableQPI Fabric

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    6/33

    6

    The BL8x0c i2 Product Family

    BL860c i2 BL870c i2 BL890c i2

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    7/33

    7

    More

    SimplyScale

    Up, out and withinScale

    Scale

    Simplified scalability with industrys first 2, 4, 8-socket OpenVMS server blades;Now even small systems use NUMA

    Combines multiple blades into a single, scalable systemIntroducing Next Generation i2 Servers

    2s/8c X 2 =4s/16c X 2 = 8s/32cProcessor:

    BL860C i2 BL870C i2 BL890C i2

    NUMAAwareServers

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    8/33

    8

    BL860c i2, BL870c i2 and BL890c i2Introducing New Integrity Server Blades

    BL860c i2 BL870c i2 BL890c i2

    Processor Intel Itanium processor 9300 series (quad-core and dual-core*)

    Processors/CoresUp to 2 Processors/8 coresUp to 2 Processors/4 cores*

    Up to 4 Processors/16 cores Up to 8 Processors/32 cores

    Chipset Intel Boxboro Chipset (I/O Hub)

    MemoryIndustry Std. DDR3technology

    24 DIMM Slots

    Max:192GB (w/8GB)

    Max:384GB (w/16GB*)

    48 DIMM Slots

    Max:384GB (w/8GB)

    Max:768GB (w/16GB*)

    96 DIMM Slots

    Max:768GB (w/8GB)

    Max:1.5TB (w/16GB*)

    Internal Storage2 Hot-Plug SFF SAS HDDsHW RAID 0/1 controller(standard)

    4 Hot-Plug SFF SAS HDDsHW RAID 0/1 controller(standard)

    8 Hot-Plug SFF SAS HDDsHW RAID 0/1 controller(standard)

    Networking (integrated) 4 x 10 GbE (Flex-10) NICs 8 x 10 GbE (Flex-10) NICs 16 x 10 GbE (Flex-10) NICs

    Mezzanine Slots 3 PCIe slots 6 PCIe slots 12 PCIe slots

    Management Integrity Integrated-Lights Out 3 (iLO 3 ) Advanced Pack (standard)

    Density 8 server blades in c70004 server blades in c3000

    4 server blades in c70002 server blades in c3000

    2 server blades in c70001 server blade in c3000

    * Future Support

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    9/33

    9

    BL8x0c i2 Blade ArchitectureIntra-Blade 19.2 GB/s, Inter-blade 57.6GB/sMemory: 28.8GB/s peak per Processor ModuleQPI, IOH to Processors: 38.4 GB/s

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    10/33

    10

    Memory

    Dual Integrated Memory

    Controllers with 4 SMI channels,peak memory band-width up to 34GB/s (6x)

    Capability to supports up to 1TBmemory per IMC

    1MB Directory Cache/IMC

    Directory-based Cache Coherency Reduces Snoop traffic andcontention

    DDR3 Higher Throughput(800MT/s), Lower Power, FasterResponse Time, IncreasedCapacity/DIMM (16GB)

    Performance Features of i2 Blades

    SMI

    Intel Scalable Memory Interconnect(Intel SMI), connects to the Intel 7500Scalable Memory Buffer to support

    larger physical memory DDR3 RDIMMsSMB supports different size and types of

    DIMM

    Processor

    Enhanced Thread-LevelParallelism (TLP) [8T/P]

    Instructions-level parallelism (ILP) minimize threads from stallingthe pipeline

    Data TLB support for 8K and 16Kpages

    Intel Turbo Boost Technology

    Performance on Demand Intel VT-i2 is Introduced

    QPI

    New Intel QuickPathInterconnect Technology -replaces the Front Side Bus witha point-to-point

    4 full-width Intel QuickPathInterconnect links and 2 half-width links per processor

    Peak processor-to-processor andprocessor-to-I/Ocommunications up to 96 GB/s(9x)

    Glueless System Designs Up toEight Sockets FSB Limitations

    IO

    Gen 2 supports 5GB/sec

    Flex-10 Dual Ported 10GBE NICs helpsbandwidth partitioning

    QPI

    Itanium9300

    (Tukwila-MC)

    Itanium9300

    (Tukwila-MC)

    Intel7500IOH

    (Boxboro-MC)

    Intel

    ICH10

    PCIe Gen2 Gen1

    MB

    MB

    MB

    MB

    DDR3 DIMMDDR3 DIMM

    DDR3 DIMM

    DDR3 DIMMDDR3 DIMM

    DDR3 DIMM

    MB

    MB

    MB

    MB

    DDR3 DIMMDDR3 DIMM

    DDR3 DIMM

    DDR3 DIMMDDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    DDR3 DIMM

    PCIe Devices PCIe Devices

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    11/33

    11

    NUMA in i2 Servers

    Socket 0 Socket 1

    Socket 2 Socket 3

    Scalable Blade Link

    Each socket/processor has its ownmemory (local) Each Processor can access other

    Processor Memory (remote) In one-blade or two-blade server,

    every access at most one hop

    Example: BL860c i2, BL870c i2 In four-blade server, the maximumwill be two hops

    Example: BL890c i2

    The new i2 Servers come with 5different memory configurations

    helping customers to profile theirapplication needs accordingly Details are part of this White paper: Why

    Scalable Blades - HP Integrity ServerBlades

    P1

    P1

    P2

    P2P3

    P3

    Blade 1

    Blade 2

    http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdf
  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    12/33

    12

    Key Characteristics Intel Itanium processor 9100 Intel Itanium processor 9300

    Cores 2 4

    Total On-Die Cache 27.5 MB 30 MB

    Software Threads per Core 2 2 (w/ enhanced thread management)

    System Interconnect(bandwidth per processor fora 2-socket system)

    Front Side Bus Peak bandwidth per processor: 5

    GB/s

    Intel QuickPath Interconnect Technology Peak bandwidth: 48 GB/s (up to 9x

    improvement) Enhanced RAS Enables common IOHs with next-

    generation Intel Xeon processors

    Memory Interconnect

    (bandwidth per processor fora 2-socket system)

    Front Side Bus

    Peak bandwidth per processor: 5GB/s

    Dual Integrated Memory Controllers

    Peak bandwidth 34 GB/s (up to 6ximprovement)

    Memory Capacity(4-socket system)

    128-384 GB 1TB (using 16 GB RDIMMs) up to 8ximprovement

    Partitioning andVirtualization

    Intel VT-i Intel VT-i2

    Energy Efficiency Demand Based Switching (DBS) Enhanced DBS (voltage modulation inaddition to frequency)

    Intel Turbo Boost Technology Advanced CPU and Memory Thermal

    Management

    SMP Scalability 64-bit Virtual Addressability 50-bit Physical Addressability Home snoop coherency

    64-bit Virtual Addressability 50-bit Physical Addressability Directory coherency for better

    performance in large SMP configurations

    Up to 8-socket Glueless systems (higherscalability with OEM chipsets)Source: Intel Corporation

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    13/33

    13

    Performance of new i2 Servers

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    14/33

    14

    Cores BL860crx2660

    rx3600

    rx6600

    rx7640

    BL870c

    rx8640

    Superdome

    Montvale-based Integrity servers Integrity Servers based onBlade Scale Architecture

    rx2800 i2

    BL860c i2

    BL870c i2

    BL890c i2

    Superdome 28 s

    Superdome 232 s

    Primary

    Secondary

    Up to 2x performance improvement per socket

    Positioning the New vs. Current Servers

    OpenVMS

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    15/33

    15

    Performance Highlights

    OpenVMS running on BL8x0c i2 servers

    BL8x0c i2 servers architected for high performance

    Architecture provides increased number and faster cores/socket

    Superior memory and interconnect technology

    Memory intensive applications benefit from low latency and high bandwidth architecture

    Higher IO bandwidth and throughput resulting from new IO architecture

    More headroom for CPU, Memory and IO intensive workloads with improved responsetime

    Upto 2x performance improvement with i2 servers running OpenVMS

    Our test have shown up to 2x improvement with java, some database and web serverapplications

    Oracle has shown upto 3x improvement

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    16/33

    16

    2xImprovement

    020

    40

    60

    80

    100

    120

    sec

    Time Taken (sec)(Less is Better)

    BL860c (1.59GHz/9.0MB)

    BL860c-i2 (1.73GHz/6.0MB)

    0

    200

    400

    600

    800

    Req/sec

    Throughput( More is Better)

    BL860c (1.59GHz/9.0MB)

    BL860c-i2 (1.73GHz/6.0MB)

    050

    100

    150

    200

    250

    KB/sec

    Bandwidth(More is better)

    BL860c (1.59GHz/9.0MB)

    BL860c-i2 (1.73GHz/6.0MB)

    Apache Bench Tests on OpenVMS 8.4

    Apache Performance

    Configuration Details

    The tests were run on OpenVMS 8.4 Apache 2.1-1 with ECO2 , Apache Bench 2.0.40-dev

    Time Taken should be less; Req/sec and KB/sec should be more BL860c-i2 was able to cater 2x performance compared to BL860c

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    17/33

    17

    2xImprovement

    Native Java Tests on OpenVMS 8.4 More is Better

    Java Workload Tests

    0

    20000

    40000

    60000

    80000

    100000

    120000

    140000

    8 9 10 11 12 13 14 15 16

    OperationRate

    Threads

    Java Workload

    rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB)

    0

    20000

    40000

    60000

    80000

    100000

    120000

    140000

    0 2 4 6 8 10 12 14 16 18

    OperationRate

    Threads

    Java Workload

    rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB)

    Java Workloads scale up better on i2 Servers

    Java Workloads are high CPU and Memory Intensive

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    18/33

    18

    3xImprovement

    Oracle 10gR2 on new i2 Server

    0

    5000

    10000

    15000

    20000

    16 32 48

    TPM

    Users

    rx7640 (1.60GHz/12.0MB) BL890c i2 (1.60GHz/6.0MB)

    Oracle Swing Bench Tests were run with tuning configuration (same)

    rx7640 and BL890c i2 are NUMA based systems, MostlyNUMA RADEnabled and Hyper-Thread disabled, 6 RAID 5 EVA8100 Volumes

    Oracle was run in Shared Server Mode

    BL890c i2 server consistently shows 3x improvement for same numbersof users

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    19/33

    19

    0

    2000

    40006000

    8000

    10000

    12000

    14000

    16000

    rx7640 (1.60GHz/12.0MB) BL890c i2 (1.60GHz/6.0MB)

    TPM

    Oracle TPM for Same CPU Usage

    Oracle Tests Resource Usage

    3x Increase

    BL890c i2 is able to drive 3x improvement for same CPU usage

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    20/33

    20

    Integer Tests

    CPU Ratings

    These numbers are per processor/socket

    As the frequency increases, we see a increase in rating

    CPU Bound applications should benefit (database queries), specificallyinteger computational bound applications

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    Ratings

    (More is Better)

    9300 - BL8x0c-i2 (1.73GHz/6.0MB)

    9300- BL8x0c-i2 (1.60GHz/6.0MB)

    9300 - BL8x0c-i2 (1.33GHz/4.0MB)

    9000 - BL860c (1.59GHz/9.0MB)

    9100 - rx7640 (1.60GHz/12.0MB)

    1.73 GHz/ 1.6 GHz 9300series processor show 2-2.3xperformance improvement

    PerProcessor

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    21/33

    21

    Floating Point Computation Tests

    CPU Ratings

    These numbers are per Core (within a processor/socket)

    Intel Itanium 9300-series processors have new high precision floatingarchitecture

    Fast response to complex operations; Scientific, Automation and robotic

    applications should benefit

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    9000

    FP Rating

    (More is Better)

    9300 - BL8x0c-i2 (1.73GHz/6.0MB)

    9300 - BL8x0c-i2 (1.60GHz/6.0MB)

    9300 - BL8x0c-i2 (1.33GHz/4.0MB)

    9000 - BL860c (1.59GHz/9.0MB)

    9100 - rx7640 (1.60GHz/12.0MB)

    PerProcessor

    1.73 GHz/ 1.6 GHz 9300series processor show 2.1-2.3x performanceimprovement

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    22/33

    23

    V8.4 Performance Preview and Features

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    23/33

    24

    HP Delivers Continuous Performance Improvements

    OpenVMS 8.4; Delivers 10-15%Improvement

    OpenVMS 8.3 OpenVMS 8.3-1H1 OpenVMS 8.4

    Significant Performance EnhancementsIncorporated in each release

    8.4 PerformanceEnhancements

    RAD support IA64

    Shadowing FeaturesCompiler ChangesFaster Cache FlushingDLM EnhancementsException Handling ChangesSMP Enhancements

    RTL ChangesRMS MBC > 127

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    24/33

    25

    V8.4 Performance Features..

    Resource Affinity Domain (RAD) support for IA64

    Packet Processing Engine (PPE) Support for TCP/IP

    Automatic Dynamic Processor Resilience (DPR)

    Compression support for BACKUP

    RMS SET MBC count support for 255 blocks

    Asynchronous Virtual IO (AVIO) support for Guest OpenVMS running on HPVM

    OpenVMS V8.4

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    25/33

    26

    V8.4 Performance Enhancements..

    Shadow feature improvements to WriteBitMap, MiniCopy, MiniMerge and

    SPLIT_READ_LBN Core OS improvements

    Dedicated lock manager using pre-fetch

    PE Driver Optimizations

    Exception Handling Optimizations

    Deferred SCHED AST Queuing

    Changes to avoid MMG SPL contention Optimizations in Global Section Deletion and Creation Algorithms

    Enabling IMS up calls for multithreaded applications

    Introduced Paged Pool Look Aside List (LAL)

    SYSMAN IO AUTO Performance Improvements (Fibre Only)

    RTL Changes to optimize strcmp() and memcmp()

    Support for new high speed USB connectivity

    Compiler improvements

    Miscellaneous Improvements

    OpenVMS V8.4

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    26/33

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    27/33

    28

    OpenVMS Guest Performance Summary

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    28/33

    29

    OpenVMS Guest Performance

    The native CPU and virtual CPU integer and floating point tests havesame rating

    The memory access speed and throughput are similar to native host onsame hardware

    We do see 20-40% application penalty on java workloads

    Value Proposition of OpenVMS Guest on HPVM

    Hardware consolidation for applications which dont rely on performanceScenario: Monolithic & distributed application development & testing Qualification on multiple OS versions Development & testing on multiple configurationsBenefits Cheaper Fewer test boxes

    Faster Ready to boot or ready to use

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    29/33

    30

    Finally.

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    30/33

    31

    Up to 2Xfaster performance

    Dual-coreIntegrity servers

    with built-in

    resiliency and less

    power

    consumption

    Integrity server bladesbased on Blade Scale

    Architecture

    2- & 4-socket Integrity

    Performance Enhanced with new i2 Bladesrunning OpenVMS V8.4

    2.3x Integer & Floating Tests

    Up to 2x Application Performance

    Per socket performance increases

    OpenVMS V8.4

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    31/33

    32

    References and Contacts

    T4 & Friends was used across many benchmarking http://h71000.www7.hp.com/openvms/products/t4/

    Please send across any feedback on performance [email protected]

    OpenVMS 8.4 Documentation http://h71000.www7.hp.com/doc/os84_index.html

    OpenVMS 8.4 New Features Documentation http://h71000.www7.hp.com/doc/84final/6679/6679pro.html

    Feedback [email protected]

    Business Manager Vivasvan Shastri ([email protected])

    http://h71000.www7.hp.com/openvms/products/t4/mailto:[email protected]:[email protected]://h71000.www7.hp.com/doc/os84_index.htmlhttp://h71000.www7.hp.com/doc/84final/6679/6679pro.htmlmailto:[email protected]:[email protected]:[email protected]:[email protected]://h71000.www7.hp.com/doc/84final/6679/6679pro.htmlhttp://h71000.www7.hp.com/doc/84final/6679/6679pro.htmlhttp://h71000.www7.hp.com/openvms/products/t4/http://h71000.www7.hp.com/doc/os84_index.htmlmailto:[email protected]://h71000.www7.hp.com/openvms/products/t4/
  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    32/33

  • 8/8/2019 Webinar OpenVMS i2ServerPerformance

    33/33

    34

    Supported NUMA Configurations