25
© 2008 IBM Corporation IBM CONFIDENTIAL © 2008 IBM Corporation IBM Systems and Technology Group Cell/B.E. processor-based systems and software offerings IBM BladeCenter® QS22 and SDK 3.0

Grice-QS22

Embed Size (px)

DESCRIPTION

 

Citation preview

  • 2008 IBM CorporationIBM CONFIDENTIAL

    2008 IBM Corporation

    IBM Systems and Technology Group

    Cell/B.E. processor-based systems and software offeringsIBM BladeCenter QS22 and SDK 3.0

  • IBM Systems and Technology Group

    2008 IBM Corporation2 Sales Conference

    The challenge today

    For many years, organizations have relied on performance gains from increasing clock speeds of traditional microprocessor architectures

    This approach has been challenged by the physical limitations ofsemiconductors and by traditional processor architecture implementations

    High performance computing (HPC) applications need a fundamentally new technology and approach to the system-level architecture to achieve the desired level of performance.

  • IBM Systems and Technology Group

    2008 IBM Corporation3 Sales Conference

    Cell Broadband Engine (Cell/B.E.) Technology

    IBM, Sony, Toshiba Alliance formed in 2000

    March, 2001 STI Design Center opened in Austin, TX

    April, 2004 - Single Cell BE operational

    July, 2004 - 2-way SMP operational

    February, 2005 - first technical disclosures at ISSCC

    May, 2005 - first public demonstration of Cell/B.E. processor-based system at E3

    August, 2005 - published technical details of Cell/B.E. architecture

    November, 2005 - published open source SDK & Cell/B.E. simulator

    August, 2006 - introduced the very first Cell/B.E. processor-based server to the market

    For a higher of absolute performance and efficiency

  • IBM Systems and Technology Group

    2008 IBM Corporation4 Sales Conference

    IBM commitment to innovation

    2006

    2008

    2007Produce systems for early adoption and solution enablement

    Create initial platforms for experimentation

    BladeCenter QS21

    IBM SDK forMulticoreAcceleration 3.0

    BladeCenter QS20Produce robust production ready systems for targeted industry applications

    IBM BladeCenter QS22Extraordinary double precision floating point performance. Large memory capability. Ready for the most demanding production applicationsPowerXCell 8i processor

  • IBM Systems and Technology Group

    2008 IBM Corporation5 Sales Conference

    Cell Broadband Engine Architecture (CBEA) Technology Roadmap

    20102009200820072006

    PerformanceEnhancements/Scaling

    CostReduction

    All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs.

    ConceptCommitted

    Compatible code and security base across entire lineCompatible code and security base across entire line

    Cell/B.E.(1+8)

    90nm SOI

    IBM PowerXCell 8i(1+8eDP SPE)

    65nm SOI

    Cell/B.E.(1+8)

    65nm SOI

    IBM PowerXCell 32ii

    45nm SOI

    Cell/B.E.(1+8)

    45nm SOI

  • IBM Systems and Technology Group

    2008 IBM Corporation6 Sales Conference

    IBM PowerXCell 8i processor benefits

    Sets a new performance standard Accelerates computationally intense workloads such as

    analytics, multimedia and vector processing. Efficient computation per watt

    Designed for flexibility Wide variety of application domains Cell can cover a wide range of application space with its

    capabilities in floating point operations, integer operations data streaming / throughput support real-time support

    Exploits C/C++, Fortran programming models Enhanced security capability

    Virtual trusted computing environment for security

    The new PowerXCell 8i processor builds on the Cell Broadband Engine Architecture and combines a general-purpose Power Architecture core of modest performance with eight enhanced synergistic processing elements optimized for extreme double precision and single precision computational performance

    PowerXCell 8i processor 65 nm 9 cores, 10 threads 230.4 GFlops peak (SP) at 3.2GHz 108.8 GFlops peak (DP) at 3.2GHz Up to 25 GB/s memory bandwidth Up to 75 GB/s I/O bandwidth 92 Watts @ 3.2GHz Top frequency >4GHz

    (observed in lab)

    PowerXCell 8i processor 65 nm 9 cores, 10 threads 230.4 GFlops peak (SP) at 3.2GHz 108.8 GFlops peak (DP) at 3.2GHz Up to 25 GB/s memory bandwidth Up to 75 GB/s I/O bandwidth 92 Watts @ 3.2GHz Top frequency >4GHz

    (observed in lab)

  • IBM Systems and Technology Group

    2008 IBM Corporation7 Sales Conference

    Intels x86 Quad Core processors are Dual Chip Modules (DCMs), 2 of these processor

    stacked vertically & packaged together

    PowerXCell 8i uses the space & power and delivers more than 2.3x the GFlops of traditional architecture

    On any traditional processor, shown ratio of cores to cache, prediction, & related items

    illustrated here remains at ~50% of area the chip area

    Example Server Dual Core

    349mm2, 3.4 GHz @ 150W2 Cores, ~27.2 SP GFlops1.3b Transistors @ 65nm

    Example Desktop Quad Core

    214 mm, 3 GHz @ 130W4 Cores, ~96 SP GFlops

    820m Transistors @ 45nm

    PowerXCell 8i Nine Core

    109 mm2 3.2 GHz@ 75W9 cores, ~ 230 SP GFlops,250m Transistors @ 65nm

  • IBM Systems and Technology Group

    2008 IBM Corporation8 Sales Conference

    BladeCenter QS22 PowerXCell 8i Core Electronics

    Two 3.2GHz PowerXCell 8i Processors SP: 460 GFlops peak per blade DP: 217 GFlops peak per blade Up to 32GB DDR2 800MHz Standard blade form factor Support BladeCenter H chassis

    Integrated features Dual 1Gb Ethernet (BCM5704) Serial/Console port, 4x USB on PCI

    Optional Pair 1GB DDR2 VLP DIMMs as I/O buffer

    (2GB total) (46C0501) 4x SDR InfiniBand adapter (32R1760) SAS expansion card (39Y9190) 8GB Flash Drive (43W3934)

    DDR2

    DDR2

    DDR2

    DDR2

    PowerXCell 8i

    DDR2

    PowerXCell 8i

    2 UART, SPI

    Rambus FlexIO

    PCI-E x16PCI-X

    PCI-E x8

    HSC *12x PCI-E

    x16

    PCI

    Leg

    acy C

    on

    USB toBC mid plane

    GbE toBC mid plane

    2x1GbE

    SPI

    Optional IB2 port

    IB x4 HCA

    HSDC

    IB-4x toBC-H high speed fabric/mid plane

    DDR2

    DDR2

    DDR2

    DDR2

    FlashDrive

    DDR2

    IBMSouthBridge

    4xUSB2.0

    Flash, RTC& NVRAM

    IBMSouthBridge

    DDR2

    *The HSC interface is not enabled on the standard products. This interface can be enabled on customsystem implementations for clients by working with the Cell services organization in IBM Industry Systems.

  • IBM Systems and Technology Group

    2008 IBM Corporation9 Sales Conference

    Performance highlights

    Performance is an order of magnitude better than general purposeprocessors (GPP) for media and certain applications that can take advantage of its Single Instruction Multiple Data (SIMD) capability

    Performance of its simple Power Processor Element (PPE) is comparable to a traditional GPP performance

    Each Synergetic Processor Element (SPE) is able to perform mostly the same as a GPP running at the same frequency

    Key performance advantage comes from its eight de-coupled SPE engines with dedicated resources including large register files and DMA channels

    Accelerates targeted applications with extraordinary processing capabilities Floating-point operations Integer operations Data streaming / throughput support Real-time support

    Open architecture allows for optimization at compiler and application level Performance gains from tuning compilers and applications can be significant Tools/simulators are provided to assist in performance optimization efforts

  • IBM Systems and Technology Group

    2008 IBM Corporation10 Sales Conference

    IBM BladeCenter QS22

    QS22 is the RIGHT choice for intensive streaming and/or single and double precision floating point workloads

    QS22 is OPEN based on Power Architecture and running Linux OS

    QS22 is EASY to deploy and to integrate into the existing IT infrastructure and/or workloads: Co-exist and complement all other Blade servers offerings (Intel, AMD, POWER) Ready to scale out and deploy in production environments

    QS22 is GREEN more than 1.7 SP (or 0.8 DP) GFLOPS per watt.

    Premier blade for HPC workloads

  • IBM Systems and Technology Group

    2008 IBM Corporation11 Sales Conference

    IBM SDK for Multicore Acceleration and related tools

    Libraries and frameworks

    IBM XL C/C++ compiler*Optimized compiler for use in creating Cell/B.E. optimized applications. Offers:

    * improved performance * automatic overlay support * SPE code generation

    AcceleratedLibrary

    Framework (ALF)

    DataCommunication

    andSynchronization

    (DaCS)

    Basic LinearAlgebra

    Subroutines (BLAS)

    StandardizedSIMD math

    libraries

    GNU tool chainPerformance

    Tools

    The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture

    XLC compiler is a

    complementary product to SDK

    Eclipse-based IDE

    Simulator

    Denotes software components included in the SDK for Multicore Acceleration

  • IBM Systems and Technology Group

    2008 IBM Corporation12 Sales Conference

    IBM SDK for Multicore Acceleration value

    Designed to be highly reliable, simple toacquire and easy to use

    Complete, integrated kit Production-ready tools from IBM IBM warranty and support

    Based on industry standards to ease thetransition to the Cell/B.E.

    Eclipse-based Integrated DevelopmentEnvironment

    Standard, base libraries Third-party libraries can be plugged in

    Designed to make it easy to port and optimize applications for the QS21 and QS22

    Enhancements to enable new features in QS22 Performance tuning tools to help optimize algorithms without re-writing the entire application Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform

  • IBM Systems and Technology Group

    2008 IBM Corporation13 Sales Conference

    Cell Programming Approaches are fully customizable!

    3. Case Tools / CompleteHardwareAbstraction

    User tool-driven

    2. AssistedProgramming

    Libraries,Frameworks

    1. NativeProgramming

    Compilers,Intrinsics,DMA, etc.

    Increasing Programmer

    Control over

    Cell/B.E

    . resources

    Decreasing programmer

    attention to

    architectural details

  • IBM Systems and Technology Group

    2008 IBM Corporation14 Sales Conference

    Workloads ideal for PowerXCell 8i and QS22

    DigitalMedia

    Financial ServicesSector

    Home MediaConsumer Electronics

    Information Based

    Medicine

    Digital Video Surveillance

    Aerospace and Defense

    Electronic Design

    Automation

    Chemicals & Petroleum

    Market & Solution Specific Assets

    Real-time AnalyticsProcessing of Data

    Information SynthesisAnalysis

    Unstructured DataMultimodal SearchData TransformsPattern Matching

    Image/Video Creation/MgtPresentation of Data

    VisualizationImaging

    Extreme Stream Computation and Bandwidth requirements

    PowerXCell 8i is suited for applications which demand extraordinary floating point performance

  • IBM Systems and Technology Group

    2008 IBM Corporation15 Sales Conference

    Public sector HPC solutions

    IBM components: IBM BladeCenter QS21 & QS22 IBM SDK for Multicore Acceleration IBM Cell/B.E. math libraries IBM hybrid computing solution (custom offering) PXCAB

    ISV applications: Development tools from RapidMind, Gedae,

    Wind River, etc. A growing number of university and government

    research labs with external collaborative missions are exercising existing and emerging science codes

    The solution is designed to offer: Petaflop Scalability and reliability Lower power and space footprint Lower total cost of ownership

    Performance advantages: Science code such as SPaSM, VPIC, Milagro,

    Sweep3D, accelerated up to 4-9X faster than AMD Opteron single core(Source: LANL - www.lanl.gov/roadrunner)

    Enable government labs, agencies, and academic research centers to run high performance codes faster, less expensively, and with lower power consumption than existing computing architectures

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation16 Sales Conference

    Aerospace & defense solutions

    IBM components: IBM BladeCenter QS21 & QS22 IBM SDK for Multicore Acceleration IBM Cell/B.E. math libraries IBM hybrid computing solution (custom offering) PXCAB

    ISV applications: Gedae stream, image and signal programming

    environment RapidMind development tools Wind River VxWorks RTOS and WorkBench

    Tools

    Performance advantages: FFT workloads up to 7.7x faster than 3.0 GHz

    2-core Woodcrest x2* Double Precision Matrix Multiplication up to

    2.6x faster than 2.66GHz 4-core Clovertown*

    Enhance competitiveness, demonstrate innovation and capture significant government contracts through dramatic performance improvements in real time signal and image processing

    As a time-served radar architect, I can say that Cell/Gedae is something of a dream and should

    rightly impact the new design market it is an opportunity that the DoD should not fail to grasp.

    - John Roulston,SCImus Solutions, March 2007

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation17 Sales Conference

    Digital content creation solutions

    IBM components: IBM BladeCenter QS21 & QS22 IBM SDK for Multicore Acceleration IBM Cell/B.E. math libraries IBM hybrid computing solution (custom offering) PXCAB IBM iRT scalable real-time ray tracer

    ISV applications RapidMind development tools

    The solution is designed to offer: Rapid turn around of digital assets More realistic simulation An open and flexible solution based on standards Scalability and reliability

    Performance advantages: 1080p Ray-traced images computed in

    milliseconds* 1080p Ambient Occlusion images computed in

    seconds*

    IBM solutions enable Media and Entertainment companies to produce the next generation of animated feature films, games, and advertising content

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation18 Sales Conference

    Digital video surveillance solutions

    IBM components: IBM BladeCenter QS21/QS22 IBM Total Storage IBM DVS ADK

    ISV applications: Codec libraries Video distribution software

    The solution is designed to offer: H.264 encoding Encoders for analog cameras Transcoding to save storage and

    network costs Decoding acceleration to reduce

    workstation costs and improve robustness Better management and scalability Network-based surveillance Compute density - with two processors per

    blade, 14 blades to a chassis, and two chassis to a rack, it is possible to have as many as 672 H.264 encoders in the rack

    Performance advantage: One Cell/B.E processor running at 3.2 GHz,

    can encode 12 channels of standard definition video at 30 fps to H.264 (main profile, including CABAC)[1][1] Source: IBM Research benchmark

    Solutions deliver hardware and enablement for high-density, highly scalable encoding, transcoding, and compositing for digital video surveillance

    IBM BladeCenter QS21/QS22

    PTZ

    Coax16 camera inputs

    16 camera inputs

    Aggregation Unit

    14 card slotsIBM BladeCenter-HIBM Total Storage

    672 encoders in

    a rack!

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation19 Sales Conference

    EDA solutions

    IBM components: Cell/B.E. hybrid cluster IBM BladeCenter QS21 IBM System x / IBM BladeCenter IBM Cluster 1350 integrated cluster Storage: DS4000, N series, DCS9550

    ISV applications: Mentor Graphics Calibre nmOPC

    and OPCVerify

    The solution is designed to offer: Significant run time acceleration

    Leverages Cell/B.E. strengths to offer significant speed-up when compared to existing solutions in the market, reducing design turnaround time

    Scalability and reliability Blade form factor improves scalability,

    compute density and reliability

    Accelerate computational lithography workload to address turnaround time challenges and at the same time reduce total cost of the computing infrastructure

  • IBM Systems and Technology Group

    2008 IBM Corporation20 Sales Conference

    Financial market analytics solutions

    IBM components: IBM BladeCenter QS22 IBM SDK for Multicore Acceleration Dynamic Application Virtualization Cell/B.E. math libraries

    ISV applications: NAG - Math & Stat Software Platform Symphony -Grid Computing

    Environment Encirq Event Processing Platform

    The solution is designed to offer: Flexibility and Scalability

    IBM Bladecenter QS22 integrates with other Bladecenter Products

    IBM SDK, DAV, third party applications for ease of adoption within existing infrastructure

    Technical Services with skilled programming expertise and subject matter experts

    Power, space and cooling advantages

    Performance advantage Collateralized Debt Obligation (CDO) - 7.5X faster than

    2.8 GHz 4-core Harpertown* 650 million European options /sec using Monte Carlo

    simulations on QS22 blade*

    Enable financial market professionals to perform the required speed, accuracy and highly complex analytics to support trade execution and improve their firms competitive position

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation21 Sales Conference

    Medical imaging solutions

    IBM components: IBM BladeCenter QS21 & QS22 IBM SDK for Multicore Acceleration IBM Cell/B.E. math libraries IBM hybrid computing solution (custom offering) PXCAB

    ISV applications: Advanced image and text analytics High-performance image compression

    The solution is designed to offer: 3D image reconstruction, registration, volume

    rendering, segmentation On-demand compression/decompression

    Performance advantage: 16x improvement on MRI image reconstruction

    over Opteron system 11x improvement on CT image reconstruction

    over 3.0GHz Xeon system 48x improvement on image registration over

    3GHz Pentium 4 200x shear-warp volume visualization over TI

    TMS320C80 processor 40:1 CT study data compression

    (Source for all above: Mayo Clinic -http://www.mayoclinic.org/news2007-rst/3996.html )*

    Improve the efficiency, productivity, and quality of patient care through dramatic performance improvements in the transmission and analysis of medical images

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation22 Sales Conference

    Seismic solutions

    IBM components: IBM BladeCenter QS22 IBM SDK for Multicore Acceleration IBM Cell/B.E. math libraries IBM hybrid computing solution (custom offering) PXCAB

    Standard math, vector math, FFT, BLAS, MPI and tridiagonal solver

    ISV applications: Simudyne Customers own proprietary code

    The solution is designed to offer: High-performance highly accurate rendering

    of geologic structures Cost effective HPC environment that has

    significant performance increases Scalability and reliability

    Performance advantages: FFT workloads up to 7.7x faster than 3.0

    GHz 2-core Woodcrest x2* Double Precision Matrix Multiplication up to

    2.6x faster than 2.66GHz 4-core Clovertown*

    Improve the speed and accuracy of geologic visualization to reduce the cost of evaluatingpotential targets for oil and gas yielding potential

    *See Notes on Benchmarks, charts 46 and 47

  • IBM Systems and Technology Group

    2008 IBM Corporation23 Sales Conference

    QS22 summary

    The QS22 is based on the new PowerXCell 8i processor built on an enhanced version of the Cell Broadband Engine Architecture

    The QS22 offers the capabilities you need for your most demanding computational requirements

    Offers extraordinary double precision and single precision floating point performance

    Supports up to 32GB of processor memory

    IBM is working with ISVs and customers to accelerate workloadson the QS22 in targeted application areas

    The QS22 is extremely efficient, offering more than 1.7 SP (or 0.8 DP) GFLOPS per watt of energy

    BladeCenter QS22 is Right, Open, Easy and Green

    Premier blade for HPC workloads

  • IBM Systems and Technology Group

    2008 IBM Corporation24 Sales Conference

    IBM SDK for Multicore Acceleration summary

    Designed to be highly reliable, simple to acquire and easy to use Complete, integrated kit Production-ready tools from IBM IBM warranty and support

    RHEL 5.2 Enterprise support

    Based on industry standards to ease the transition to the Cell/B.E. architecture

    Eclipse-based Integrated Development Environment Standard, base libraries Third-party libraries can be plugged in

    Designed to make it easy to port and optimize applications for the QS22 Performance tuning tools to help optimize algorithms without re-writing the entire application Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform

  • IBM Systems and Technology Group

    2008 IBM Corporation25 Sales Conference

    Cell/B.E. architecture reaches wide and deep from consumer products to high performance computing

    SCE PS3(Cell/B.E. + GPU)

    IBM BladeServer(2 Cell/B.E. orPowerXCell 8i)

    Roadrunner(16,000

    PowerXCell 8i. + AMD)

    Mercury 1u Dual CellSony Cell/B.E. Computing Unit

    (Cell/B.E. + GPU + AV I/O)

    Consumer Business

    High Performance ComputingEnterprise

    PowerXCell 8i PCI card

    (Cell/B.E. + Host)

    Common OSs, Infrastructure, Tools, Libraries, Codethe SAME SPE code runs from end to end

    Toshiba SpursEngine

    (SPUs. + Host)

    Mini-Roadrunner

    Custom

    Increas

    ing supp

    ort for s

    cale

    and

    datacen

    ter

    Increas

    ing supp

    ort for s

    cale

    and

    datacen

    ter