154
smpdr3.13-xw0001.pdf 1 May 2009 Education XW0001 Servicing IBM System x Servers – Part II Study Guide XW0001 Release 3.13 May, 2009

Servicing IBM Systems x Servers II - Study Guide

Embed Size (px)

DESCRIPTION

IBM servers

Citation preview

Page 1: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 1 May 2009

Education

XW0001Servicing IBM System x Servers – Part II

Study Guide

XW0001Release 3.13

May, 2009

Page 2: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 2 May 2009

© International Business Machines Corporation, 2009 All rights reserved.

IBM System x Service and Support EducationIBM Systems, Department EYGA. Building 203, Post Office Box 12195, Research Triangle Park, North Carolina 27709-2195

IBM reserves the right to change specifications or other product information without notice. This publication could include technical inaccuracies or typographical errors. References herein to IBM products and services do not imply that IBM intends to make them available in other countries. IBM provides this publication as is, without warranty of any kind —either expressed or implied—including the implied warranties of merchantability or fitness for a particular purpose. Some jurisdictions do not allow disclaimer of expressed or implied warranties. Therefore, this disclaimer may not apply to you.

Data on competitive products is obtained from publicly obtained information and is subject to change without notice. Please contact the manufacturer for the most recent information.

The following terms are trademarks or registered trademarks of IBM Corporation in the United States, other countries or both: Active Memory, Active PCI, AT, BladeCenter, the e-business logo, EasyServ, Enterprise X-Architecture, EtherJet, HelpCenter, HelpWare, IBM RXE-100 Remote Expansion Enclosure, IBM XA-32, IBM XA-64, IntelliStation, LANClient Control Manager, Memory ProteXion, NetBAY3, Netfinity, Netfinity Manager, Predictive Failure Analysis, RXE Expansion Port, SecureWay, ServeRAID, ServerProven, ServicePac, SMART Reaction, SMP Expansion Module, SMP Expansion Port, UM Services, Universal Manageability, Update Connector, Wake on LAN, XceL4 Server Accelerator Cache, XpandOnDemand scalability.

IBM Corporation Subsidiaries: Lotus, Lotus Notes, Domino, and SmartSuite are trademarks of Lotus Development Corporation. Tivoli and Planet Tivoli are trademarks of Tivoli Systems, Inc.

LLC, Adobe, and PostScript are trademarks of Adobe Systems, Inc. Intel Celeron, LANDesk®, MMX, Pentium II, Pentium III, Pentium 4, SpeedStep, and Xeon are trademarks or registered trademarks of Intel Corporation. Linux is a trademark of Linus Torvalds. Microsoft Windows® and Windows NT® are trademarks or registered trademarks of Microsoft Corporation. Other company, product, and service names may be trademarks or service marks of others.For more information, visit:www.ibm.com/legal/copytrade/phtml

Page 3: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 3 May 2009

PrefaceThis publication is primarily intended for use by students enrolled in the course ‘Servicing System x Servers – Part II – xw0001’.

This document represents a training technique developed for and used by IBM and is not for sale. Portions of this document, such as foils, charts, and quizzes, may be copied and distributed if required to conduct a class properly. The instructor should exercise good judgment on handouts of this type. The complete document may not be copied for or sold to non-IBM personnel.

Please write your name and address below to personalize your copy.

Issued to: ____________________________________________________

Address: ____________________________________________________________________________________________________________________________________________________________

Current release date: May 2009Current release level: 3.13Test numbers for this guide are: xw0001r313

The information contained within this publication is current as of the date of the latest revision and is subject to change at any time without notice.

Please forward all comments and suggestions regarding the course material, format, and content to your local IBM System x Service and Support Education country coordinator or contact.

Page 4: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 4 May 2009

Table of Contents

Preface 3Table of Contents 4 Introduction to the Study Guide 4

Topic 1 – Objectives and Agenda 5

Topic 2 – High-performance System x Server Family Overview 15

Topic 3 – RAID Adapters and Enclosures 29

Topic 4 – High-performance Technologies Review 49

Topic 5 – Working With Scalable Systems 75

Topic 6 – Dynamic System Analysis 113

Topic 7 – Problem Solving 133

Topic 8 – Support References 147

Introduction to the Study GuidePurposeThe purpose of this guide is to:

Provide you with the necessary documentation to support the learning experience so that you can successfully fulfill the objectives defined for this course. This guide contains a number of lessons based on the instructor's presentation material, supplemental student notes within each lesson, and appropriate additional material (in the form of appendices) as is required by the course learning objectives.

Limitations1. No computer game playing or copying of games is allowed in class.2. Do not copy recordable media on any of the systems in the lab. Adhering to this rule will keep viruses from

spreading and ensure that the media that have been created especially for your systems are retained.3. Do not remove any materials from the classroom other than those given to you by the instructor.4. Do not remove the covers from your computer. If you encounter any problems with your system, please

speak with your instructor.

Page 5: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 5 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 1 – Objectives and Agenda

Welcome!

Page 6: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 6 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

•By the end of this topic, you will be able to:-Describe the overall course objectives-Explain the course prerequisites-Understand the course agenda

Before we begin, we need to establish some basics for the course.You need to understand what the course objectives are so you can be sure you are taking the right class.You need to understand what we expect of you by way of previous knowledge.We also need to explain the course agenda so you will know what is about to happen.

Page 7: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 7 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Course Objectives

•By the end of this course, you will be able to:-Identify the serviceability features of System x high-performance servers

-Describe the advanced technologies used in System x servers and their service implications

-Describe the management characteristics of System x servers

-Perform a series of setup, configuration and troubleshooting tasks on System x servers and associated peripherals

This course concentrates on problem determination and the tools that can be utilized to trouble shoot IBM System x Servers. Before you start the practical exercises, however, we will discuss some of the key technologies used in IBM System x servers.The lab exercises revolve around best practice in dealing with IBM System x server problems. This combination remote lab and paper exercises will enable you to become familiar with the high end of the System x server range and how to perform service on them. You will also see some of the fault tolerant and redundant features of the servers and practice working with servers that have suffered a component failure but which are still running the NOS.

Page 8: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 8 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Introductions

•Your instructor is …-Your instructor will now introduce herself/himself

•You are …-Your instructor will ask you to introduce yourself

•Tell the class what you do (not what your job title is)•Tell the class how you got into this role•Tell the class anything else you wish to share

We will be together for some time. It will be useful for us all to get to know each other.

Page 9: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 9 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Course Prerequisites

•To make the most of this class, you should have completed the following education prior to attending this course:-Strongly recommended (mandatory in some locations)

•A+ Certification•Server+ Certification

-Required•Servicing IBM xSeries Servers – Part I (XW2001 R300)

In some locations, if you work with IBM System x server products, you are required to be A+ and Server+ certified. Even if this is not mandatory where you are, we strongly recommend that you are A+ Certified and Server+ certified. Servicing IBM System x Servers – Part I is REQUIRED prior to attending this class. XW2001R300 is a self-paced, CD-ROM course.If you have not completed this training prior to attending, you will not get the most from this class. As there is a test at the end of this class, this may impact your ability to pass the end-of-class mastery test.

Page 10: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 10 May 2009

XW0001 - Servicing IBM System x Servers – Part II

IBM System x Curriculum

•Worldwide field support training roadmap for System x warranty authorization

Level 3 –Approved warranty service provider (high performance System x servers)

Server+ CertificationLevel 1 –Industry certification (entry point)

Level 2 –Approved warranty service provider (high volume System x servers)

XW2001 – Core knowledge (self-study)

XW2xxx – Service update CD-ROMs

XW0001 – High perf skill (hands-on)

XW2xxx – Service update CD-ROMs

Compulsory in some locations, highly recommended everywhere else

Approved service providers may stop here if not providing warranty service on BladeCenter or high performance System x products

This chart identifies the position of this course in the IBM System x server service curriculum. This course is a mandatory module towards warranty approval for high-performance System x servers.Service update CD-ROMs are issued periodically to inform service technicians about new products that are announced.

Page 11: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 11 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Key Exit Skills

•Key practical exit skills include being able to: -Identify the physical components of the IBM System x servers covered here and how to work with them using the support documentation

-Use the features of multiple IBM System x Service Processor (SP)technologies

-Configure and bring up a scaled, multi-node IBM System x server partition

•In addition, you will be able to:-Recognize and use server-specific support tools -Install a NOS on selected IBM System x servers-Understand how IBM System x servers work with the Microsoft Windows Network Operating System (NOS) and how they behave under certain failure conditions

This course provides practical experience through ‘hands-on’ exercises. This is a list of the key exit skills that you should be able to perform after completing this course.

Page 12: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 12 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Lesson Topics in This Course

•Lesson topics•Topic 1: Objectives and agenda•Topic 2: High-performance System x Server Family Overview•Topic 3: RAID Adapters and Enclosures •Topic 4: High-performance Technologies Review•Topic 5: Working With Scalable Systems•Topic 6: Dynamic System Analysis •Topic 7: Problem Solving •Topic 8: Support references

•Lab exercises•Details are on the next page

•Test

Here are the lesson topics in this guide.

Page 13: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 13 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Lesson Topics in This Course

•Lab exercises•Lab1- Locations, Removals, Flash Update and Diagnostics•Lab 2 – Remote Desktop Connection & Bios Setup•Lab 3 – Utilizing RCM &Virtual Console software•Lab 4 – Preboot DSA / Diagnostics •Lab 5 – Updating with IBM UpdateXpress Service Packs•Lab 6 – ServeRAID Mgr & Spanned Arrays•Lab 7 – MegaRAID Storage Manager•Lab 8 – Utilizing BMC and DSA to gather the Facts•Lab 9a - Scale System x460•Lab 9b - Scale a multi-node x3950 M2

Here are the lesson topics in the associated lab guide.

Page 14: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 14 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary –Topic 1

•This topic has covered the following:-Described the course objectives-Explained the course pre-requisites-Established the course agenda and key exit skills

We have now outlined the contents and scope of this course. The next topic is a review of the product and how the components fit together.

Page 15: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 15 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 2 – High-performance System x Server Family Overview

We will provide an overview of the IBM System x and xSeries high performance products.

Page 16: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 16 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

•By the end of this topic, you will be able to:-Describe the System x high-performance server family of products

-List the non-scalable and scalable models-Identify the system management components of IBM System x high-performance servers

-Describe the server security software features

This topic describes high-performance System x server family offerings and some common options.

Page 17: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 17 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Non Scalable IBM System x 3755 Overview

•2- 4 processors, using AMD quad core Opteron processors with HyperTransport link

•Eight DIMM slots per processor card

•Dual Broadcom 5708c Gigabit Ethernet

•Optional redundant power and cooling

•Standard DVD drive•4 - 3.5 in. HS SAS HDD Bays

•4 PCI Express, 2 PCI-X and 1 HTX I/O slots

• IPMI 2.0 BMC w/optional RSA II Slimline refresh

•SAS Chipset supporting RAID 0, 1 or 10

•Optional RAID 5 upgrade

•RoHS Compliant•Server Enablement Suite

•Support for Windows, Linux, VMWare and Netware

The x3755 is a low cost high end AMD Dual Core Opteron based Server. The system supports up to 4 Opteron revision F processors. Each processor supports up to 8 DIMM slots and using 4GB memory DIMMS the system supports up to 128GB of memory. The IO slot mixture is 4 PCI-E, 2 PCI-X slots and 1 HTX slot. The x3755 is a RoHS compliant system. Tip: Processors must be installed in order 1 through 4. Tip: The processor complex uses a passthru card. Older systems may have had only one processor installed. Current models ship with two processors standard. If there is no Processor/Memory Card in processor 2 slot there is no path to the ServerWorks HT2100 B PCI-E Bridge unless the passthru card is fitted. This passthru card must be present in processor slot 2 if no processor is installed. If processor 2 is present then no passthru card is used and one is NOT shipped as well, however, if processors 1, 2 and 3 are installed then a passthru card must be installed in processor socket 4. The Baseboard Management Controller (BMC-H8) is a system environmental monitor and controller. It will perform low level system monitoring and LED control functions using multiple I2C bus connections to communicate out-of-band with other onboard devices. The optional RSAII Slimline Refresh systems management adapter adds advance service processor alert notification and remote connectivity. .

Page 18: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 18 May 2009

XW0001 - Servicing IBM System x Servers – Part II

System x3800, x3850 and x3950 Product Overview

• 3U or7U tower or rack • XA-64e™ Enterprise X-

Architecture chipset • 1-way to 4-way, Intel Xeon MP

(up to 32-way on x3950)• Intel Xeon MP (support for

EM64T processors)• PC2-3200 DDR2 SDRAM

• 2-way interleaving• Disk support

• DVD-ROM standard• Up to twelve 3.5” Serial Attached SCSI (SAS) hot-swap disks(x3800) Up to six 2.5” SAS hot-swap disks (x3850 and x3950)

• ServeRAID 8I (optional) RAID 0/1/5

• Up to two 1300W (x3850 and x3950), three 770W (x3800) redundant hot-swap power supplies

• Active Memory• ChipKill, Memory ProteXion and Memory Mirroring

• Remote Supervisor Adapter II slim line (optional on x3800 and x3850, standard on x3950)

• Broadcom 5704 dual port ethernet

• Active PCI-X 2.0 Slots, 64-bit/266MHz

• 3-year, next business day warranty

X3850 and x3950

x3800

The IBM Enterprise X-Architecture range of servers offers models 3U and 7U rack model server for high-volume network transaction processing. These high-performance, symmetric multiprocessing (SMP) servers are ideally suited for networking environments that require superior microprocessor performance, input/output (I/O) flexibility, and high manageability.EM64T is a 64-bit extension technology enhancement to the Intel IA-32 architecture. It is compatible with legacy IA-32 software while enabling new software to access larger memory address space. EM64T introduces a new operating mode which includes two sub-modes:(1) Sub-mode, referred to as compatibility mode, enables a 64-bit operating system to run most existing legacy 32-bit software unmodified.(2) Sub-mode, referred to as 64-bit mode, enables a 64-bit operating system to run applications written specifically to access 64-bit address space.The System x3800 and x3850 offers the Remote Supervisor Adapter II slim line (RSA II) as an option. This new adapter significantly enhances the tools available to the service technician for detecting and correcting problems with the server. It supports a web interface to the error logs, dramatically simplifying the troubleshooting task without disturbing the workings of the host server. The system error logs can be viewed and manipulated from a ThinkPad, connected to the Service Processor through the LAN (using a web browser) if the RSAII is connected to an ethernet network.

Page 19: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 19 May 2009

XW0001 - Servicing IBM System x Servers – Part II

x3850/3950 M2 Product Overview• XA64e 4th generation chipset

• Four processor sockets•Intel Xeon Dual- and Quad-core and six core processors

• Up to four memory cards•Up to 8 DIMMs per memory card

•PC2-5300 DDR II• Disk support

•DVD-ROM standard•Integrated LSI 1078 SAS RAID controller (supports RAID 0 & 1)

• Up to four 2.5” SAS hot-swap disks

•Optional ServeRAID MR10

• Two standard 1440 watt redundant, hot-swap power supplies

• Seven I/O slots:-Seven PCI-E x8 slots

• Two Active/Hot-Swap• Active Memory

•ChipKill, Memory ProteXion & Memory Mirroring

• Dual embedded Broadcom 5709 ethernet

• Remote Supervisor Adapter II standard

• Chassis scalability supported

• One or three year, next business day warranty

The IBM System x3850 M2 and 3950 M2 is a high-performance, four-socket, non-scalable server featuring fourth-generation Enterprise X-Architecture. The x3950 M2 server contains advanced technology that combines scalable SMP power, PCI-E expansion, fourth-generation Enterprise X-Architecture (EXA), high availability, scalability, and substantial internal data storage capacity.

This slide summarizes some of the features of the IBM System x3850 and x3950 M2.

The x3850 M2 supports scaling with the installation of the IBM ScaleXpander Option kit. It will become a x3950 M2 when scaled.A multi-node configuration interconnects multiple servers. Each multi-node configuration can have one or more scalable partitions. Each scalable partition supports an independent operating system installation. The scalable partition uses a single, contiguous memory space and provides access to all associated adapters and hard disk drives. PCI slot numbering starts with the primary node and continues with the secondary nodes, in numeric order of the logical node ID.

The scalability discussion is continued later in this course.

Page 20: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 20 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Enterprise X-Architecture Overall Design

•Scalable systems use the IBM Enterprise X-Architecture (EXA) and IBM XA-64e fourth-generation chipset

Note. Third generation EXA chipsets use system memory for L4 cache

Scalable systems need a sophisticated chipset to enable processors and memory to be shared across multiple chassis under a single OS.This diagram shows the overall schematic of the EXA chipset.The processor and memory bus can be extended with the use of scalability cables, effectively joining the processors, memory and I/O into a single hardware set.

Page 21: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 21 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Architecture – x3800, x3850, x3950, x3950E

Not present on x3800 or x3850

The x3850, x3950 and x3950E use the third generation of the IBM XA-64e chipset. The architecture consists of the following components:One to four Xeon MP processorsOne Memory and I/O Controller (MIOC)Two PCI BridgesEach memory port out of the memory controller has a peak throughput of 5.33 GBps. DIMMs are installed in matched pairs (two-way interleaving) to ensure that the memory port is fully utilized. Peak throughput for each PC2-3200 DDR2 DIMM is 2.67 GBps. (The DIMMs are run at 333 MHz to remain in sync with the throughput of the front-side bus.)In addition, there are four memory ports; spreading installed DIMMs across all four memory ports can improve performance, because the four independent memory ports (memory cards) provide simultaneous/concurrent access to memory. With four memory cards installed (and DIMMs in each card), peak memory bandwidth is 21.33 GBps.The memory controller routes all traffic from the four memory ports, two CPU ports and the two PCI bridge ports. The memory controller also has embedded DRAM, which in the x366. x3800 and x3860holds a snoop filter lookup table. This filter ensures that snoop requests for cache lines go to the appropriate CPU bus and not both of them, thereby improving performance.One PCI bridge supplies four of the six 64-bit 266 MHz PCI-X slots on four independent PCI-X buses. The other PCI bridge supplies the other two PCI-X slots (also 64-bit, 266 MHz), plus all the onboard PCI devices.This illustration details the interconnect and board components.CPLD = ‘Complex Programmable Logic Device’.BMC = ‘Baseboard Management Controller

Page 22: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 22 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Architecture – x3850M2, x3950M2

The x3850 M2 and x3950 M2 uses the fourth generation of the IBM XA-64e chipset.The architecture consists of the following components:One to four Xeon dual-core or quad-core processors4 Memory and I/O Controller (MIOC)Eight high speed memory buffersTwo PCI Express bridgesOne South bridge PCI bridge 1 supplies four of the seven PCI Express x8 slots on four independent PCI Express

buses. PCI bridge 2 supplies the other three PCI Express x8 slots plus the onboard SAS devices, including the optional ServeRAID-MR10k. A separate South bridge supplies all the other onboard PCI devices, such as the USB ports, onboard Ethernet and the standard RSA II. As this is a multi-board system (processor board, I/O board and RSA II adapter, hardware replacements require careful thought to ensure a working system when a board is replaced. Code is located on all major boards in the system and this code must be matched for release levels to ensure proper operationThe components represented by the black boxes require BIOS/Firmware update after parts replacement. CPU card there is BIOS, BMC, and FPGA code. (Field Programmable Gate Arrays)FPGA is very similar to CPLD in previous systemsTPM (Trusted Platform Module) I/O card there is SAS, Ethernet, FPGA, and DSA (Diagnostics) code.RSAII AdapterBroadcom 5709 Ethernet controller ServeRAID- MR 10K SAS.SATA Controller (If present )

Page 23: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 23 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Processor Architecture “Single to Dual Core”

Up to this point that sometimes we refer to these x86 platforms as either 2-socket, 4 socket, 8-socket, or 16-socket configurations. Historically we have been used to referring to these systems as n-way systems. Due to the current trends in microprocessors the term n-way or n-CPU could become misleading if not used in the proper context.The dual-core processors in the x3950 are the first Intel processor to offer multiple cores. Dual-core processors are a concept similar to a two-way system except that the two cores are integrated into one silicon die. This brings the benefits of two-way SMP with less power consumption and faster data throughput between the two cores. To keep power consumption down, the resulting core frequency is lower, but the additional processing capacity means an overall gain in performance.In addition to the two cores, the dual-core processor has separate L1 instruction and data caches for each core, as well as separate execution units (integer, floating point, and so on), registers, issue ports, and pipelines for each core. A dual-core processor achieves more parallelism than Hyper-Threading Technology, because these resources are not shared between the two cores. Estimates are that there is a 1.2 to 1.5 times improvement when comparing the dual-core Xeon MP with current single-core Xeon MP. With double the number of cores for the same number of sockets, it is even more important that the memory subsystem is able to meet the demand for data throughput. The 21 GB/sec peak throughput of the X3 Architecture of the x3950 with four memory cards is well-suited to dual-core processors.For additional information refer to IBM Red Book Virtualization on the IBM System x3950 Server Publication # SG 24-790-00

Page 24: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 24 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Processor Architecture “Dual Core to Quad Core”

The dual-core processors are a concept similar to a two-way SMP system except that the two processors, or cores, are integrated into one silicon die. This brings the benefits of two-way SMP with less power consumption and faster data throughput between the two cores. To keep power consumption down, the resulting core frequency is lower, but the additional processing capacity means an overall gain in performance.The quad-core processors add two more cores onto the same die. Hyper-Threading Technology is not supported. Each core has separate L1 instruction and data caches, as well as separate executionunits (integer, floating point, and so on), registers, issue ports, and pipelines for each core.A multi-core processor achieves more parallelism than Hyper-Threading Technology, because these resources are not shared between the two cores.With double and quadruple the number of cores for the same number of sockets, it is evenmore important that the memory subsystem is able to meet the demand for data throughput. The 34.1 GBps peak throughput of the x3850 M2 and x3950 M2 eX4 Architecture with four memory cards is well-suited to dual-core and quad-core processors.1066 MHz front-side busThe Xeon MP uses two 266 MHz clocks, out of phase with each other by 90°, and using both edges of each clock to transmit data. A quad-pumped 266 MHz bus therefore results in a 1066 MHz front-side bus. The bus is eight bytes wide, which means it has an effective burst throughput of 8.53 GBps. This can have a substantial impact, especially on TCP/IP-based LAN traffic.

Page 25: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 25 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Manageability Features

•All of the following are supported:•Active PCI-X and PCI-Express x8 half-length slots (system specific)

•Predictive failure analysis (PFA) on processors, memory, disks, fans, and power supplies

•Integrated Baseboard Management Controller (BMC)•RSA II or RSA II slim-line (standard on some models, optional on others)

•Light Path Diagnostics

This group of servers offers a high degree of redundancy, fault tolerance and manageability hardware.The BMC and RSA II are discussed in detail later.

Page 26: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 26 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Server Security Software (SSS) Trusted Platform Module (TPM)

•Server Security Software •SSS is a set of software tools that allow a user access to the basic cryptographic key and identity capabilities of a TPM. Those familiar with the Client Security Software (CSS) will find a strong family resemblance.

•Trusted Platform Module•TPM is a hardware chip used to improve the security and trustworthiness of a computer. Prior versions of the technologysuch as the IBM 4758 offered higher levels of security at much higher cost. The TPM chip brings security to the mass market.

Trusted Platform Module (TPM) Management is a new feature offered by Microsoft Windows. This feature will be available after the release of Windows Server 2008 Network Operating System. The feature set includes the TPM Management console, and an API called TPM Base Services (TBS).This architecture provides an infrastructure that allows Windows®-based applications to use and share the TPM.TPM has the ability to create cryptographic keys and encrypt them so that they can be decrypted only by the TPM. This process, often called "wrapping" or "binding" a key, can help protect the key from disclosure.The TPM can also seal and unseal data generated outside of the TPM. With this sealed key and software like Microsoft Windows BitLocker™ Drive Encryption, you can lock data until specific hardware or software conditions are met.With a TPM, private portions of key pairs are kept separated from the memory controlled by the operating system.

Page 27: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 27 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary - Topic 2

•This topic has enabled you to:-Describe the System x high-performance server family of products

-List the non-scalable and scalable models-Identify the system management components of IBM System x high-performance servers

-Describe the server security software features

This topic provided an overview of the models in this course.

Page 28: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 28 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 3 – RAID Adapters and Enclosures

Here, we will discus IBM RAID adapters and enclosures commonly associated with System x servers.

Page 29: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 29 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

• By the end of this topic, you will be able to:- Describe the Raid Levels offered by the IBM

ServeRAID Adapter Family- List the ServeRAID adapter family of products- Describe the IBM storage enclosures commonly found

in the System x server environment

This topic described the IBM Raid levels and the ServeRAID adapter family and storage enclosures.

Page 30: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 30 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RAID Terminology

•Array•A group of physical disks

•Logical Drive•Who has control?

An array is a grouping of physical disks.

A logical drive is a term given to part or all of an array. An array can contain multiple logical drives. Logical drives are recognized by the OS as ‘physical’ disks.

Page 31: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 31 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RAID 0 (Stripping)

•Data distributed evenly across all disks•No redundancy or error correction•Fastest performance for multiple concurrent requests

Disk 1

Block 1

Block 5

Block 9....

Block n-3

Disk 2

Block 2

Block 6

Block 10....

Block n-2

Disk 3

Block 3

Block 7

Block 11....

Block n-1

Disk 4

Block 4

Block 8

Block 12....

Block n

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

..........

.......... Stripe 1

Stripe 2

Stripe 3

Stripe x

........

RAID-0 stripes (or spreads) data across multiple disks drives without parity protection in order to maximize DASD performance. Performance is improved with larger files because read/writes are overlapped across all disks.An additional benefit of RAID-0 is "drive spanning". With data spread across multiple drives in the array, the logical drive size is the sum of the individual drive capacities. RAID-0 is the only level of RAID that does not provide any type of fault tolerance. In other words, the failure of one drive will cause the entire disk subsystem to fail.

Page 32: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 32 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RAID 1 (Transparent Mirroring)

•Data written simultaneously to two identical disks•Faster than a single disk for reads•Reliability cost is 100% of protected drives

Disk 1 Disk 2

Data MirroredData........

........

Disk Duplexing

RAID-1 is either disk mirroring or disk duplexing. Disk mirroring involves duplicating the data from one disk onto a second using a single controller. Disk duplexing is the same as mirroring in all respects, except that the disks are attached to separate controllers. The server can now tolerate the loss of one disk controller or one disk, without the loss of the disk subsystem's availability or the customer's data.Since each disk is attached to a separated controller, performance and throughput may be further improved.NetWare splits seeks, reads half from data drive and half from mirrored drive

Page 33: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 33 May 2009

XW0001 - Servicing IBM System x Servers – Part II

........

....

Disk 1 Disk 2 Disk 3

.......... ..........

Data Stripe

Mirrored Stripe

Data Stripe

Mirrored Stripe

Data 1

Mirror 3

Data 4

Data 2

Mirror 1

Data 5

Data 3

Mirror 2

Data 6

..........

..........

..........

..........

........................................

..........

..........

Mirror 6 Mirror 4 Mirror 5

RAID 1 Enhanced

•Allows disk mirroring with odd number of disks•Stripes data and mirrored data across ALL disks•Approximates RAID-0 performance

RAID 1e offers an enhanced version of RAID-1 that combines mirroring with data striping.The first stripe is for data and the second is for mirrored data offset by one drive. This allows for improved performance and increased flexibility in configuring mirroring for greater than two drives.

Page 34: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 34 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RAID 5 (Data Stripping with Parity)

•Stripes data and parity information, sectors at a time, across all disks

•Parity information is also striped across all disks•Requires a minimum of three disks•If any one disk fails, the data can still be accessed

Disk 1

....

Block 1

Block 4

Block 7

Block n-2

Disk 2

....

Block 2

Block 5

Block 8

Block n-1

Disk 3

....

Block 3

Block 6

Block 9

Block n

Disk 4

....

Checksum of blocks 1-3

..........

..........

..........

..........

..........

........................................

..........

..........

........................................

..........

..........

..........

Checksum of blocks 4-6

Checksum of blocks 7-9

Checksum of blocks n-2 to n

Stripe 1

Stripe 2

Stripe 3

Stripe x

........

Data and checksum information are evenly spread across drives, spreads both the data and data parity information across the disks one block at a time to ensure maximum read performance when accessing large files and to improve array performance in a transaction processing environment. This removes the bottleneck of storing all of the parity data on one drive.High transaction rate (good for random transactions)Drives operate independently (don't need to be in sync)Better server performance than RAID 2, 3 and 4Low reliability cost: Capacity of 1 drive per array RAID-5 The equivalent of one drive per array is used for the parity data, regardless of the size of array. Once again, the capacity left for data storage is always N - 1.

Page 35: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 35 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RAID 5 Enhanced

•Stripes data, sectors at a time, across all disks with an additional stripe for parity information and hot-spare space

•Requires a minimum of four disks•If any one disk fails, the data and parity information will be redistributed on the remaining drives (Logical Drive Migration)

•Capacity of n - 2 (n = number of disks)

Data 1

HSP

Parity

Data 2

Data 5

HSP

Data 3

Data 6

Data 7

Parity

Data 11

Data 8

..........

..........

..........

..........

..........

........................................

..........

..........

........................................

..........

..........

..........

Data 10

Parity

HSP

Data 4

Data 12

HSP

Stripe 1

Stripe 2

Stripe 3

Stripe x

........Data 9

Parity

RAID 5E is firmware-specific. You can think of RAID 5E as RAID 5 with a built in spare drive.Reading from, and writing to, four disk drives is more efficient than three disk drives and therefore improves performance.Additionally, the spare drive is actually part of the RAID 5E array. With such a configuration, you can not share the spare drive with other arrays. If you want a spare drive for any other array, you must have another spare drive for those arrays.Like RAID 5, RAID 5E stripes data and parity across all of the drives in the array. When an array is assigned RAID 5E,the capacity of the logical drive is reduced by the capacity of two physical drives in the array (that is, one for parity and one for the spare).RAID 5E is a good choice to use, because it offers both data protection and increased throughput, in addition to the built-in spare drive. RAID 5E gives you better utilization of the array's physical capacity than RAID 1, but RAID 1 offers better performance.

RAID 5E was superseded by RAID 5EE where the HSP is left room for in every stripe. (e.g. most prefer the RAID 5EE implementation)

Page 36: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 36 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RAID 6 Block striping with double distributed parity

•RAID 6 reserves the equivalent of two disks in the array for parity information and stores two separately calculated checksums on different disks

•Can survive the loss of two disks before data loss occurs•Block striping with double distributed parity•Two separate parity checksums to survive two disk failures

B0

B2

P3

P0

PB

A3

PA

D1

P2

A0

B3

PC

..........

..........

..........

..........

..........

........................................

..........

..........

........................................

..........

..........

..........

PD

C2

B1

A1

C3

A2

Stripe 1

Stripe 2

Stripe 3

Stripe x

........ D2

P1

RAID 6 is a newly emerging RAID level that has been designed to address modern data storage needs.As RAID arrays increase in size and complexity, the ability to survive more than one disk failure becomes more important to avoid catastrophic data loss.RAID 6 is: Block striping with double distributed parity Two separate parity checksums to survive two disk failuresRAID 6 reserves the equivalent of two disks in the array for parity information and stores two separately calculated checksums on different disks in order to survive the loss of twodisks before data loss occurs.

Page 37: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 37 May 2009

XW0001 - Servicing IBM System x Servers – Part II

SCSI ServeRAID Adapters

•ServeRAID 4 family•Ultra160 SCSI with one, two or four channels

-RAID levels 0, 1, 1e, 5, 5e, 00, 10, 1e0, 50-Support for up to 56 disks

•ServeRAID 5i, 6i •Zero channel RAID adapter (works with onboard SCSI controller)

-Uses full ServeRAID software stack -Has BIOS, firmware, device drivers, and utilities -RAID levels 0, 1, 1e, 5, 00, 10, 1e0, and 50

•ServeRAID 6m•Ultra320 SCSI with two channels

-RAID levels 0, 1, 1e, 5ee, 00, 10, 1e0 and 50

The NOS device drivers are model specific.• The ServeRAID 4 adapter family shares the characteristics listed here. It comes in several different

flavors (4L/4Lx, 4m, and 4H)• The ServeRAID 5i and 6i adapters have no internal or external SCSI connectors. They use the server's

onboard SCSI controller but enhance the basic features to provide support for additional RAID levels. • The ServeRAID 6m is a dual-channel Ultra320 SCSI controller.

Page 38: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 38 May 2009

XW0001 - Servicing IBM System x Servers – Part II

SATA and SAS ServeRAID Adapters

•ServeRAID 7t•1.5 Gbps per port serial ATA (SATA) controller

-RAID levels 0, 1, 5, 10-Up to four SATA disks on four separate ports

•ServeRAID 7k - The option is shipped as a special memory DIMM with a battery attached (for battery-backup purposes)

- Memory is 256 MB, 133 MHz (PC2100) DDR1 memory- RAID levels 0, 1, 5, 10

The ServeRAID 7t is designed for smaller servers that require RAID support with SATA disks. A maximum of four disks can be connected to the ServeRAID 7t.It is unlikely that you will see a ServeRAID 7t in a high-end server as the controller does not support the SCSI or SAS backplanes that are common in high-end models. However, a customer may choose to add such an adapter to a system that can support non-hot-swap disks.The battery backup of the 7k adapter provides up to 33 hr backup.

Page 39: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 39 May 2009

XW0001 - Servicing IBM System x Servers – Part II

SATA and SAS ServeRAID Adapters

•ServeRAID 8i•3.0 Gbps per port serial attached SCSI (SAS) controller

-RAID levels 0, 1, 5, 5ee, 6, 10, 1e0, 50, 60-Up to eight SAS ports

•ServeRAID 8k- This option is shipped as a special memory DIMM with a battery attached via wires (for battery-backup purposes)

- The DIMM is installed in a special DIMM socket in supported servers- Battery is connected to the DIMM by wires and is typically mounted on the server chassis

The ServeRAID 8i and 8k was introduced to support the third generation Enterprise X-Architecture servers as they are built around SAS disk subsystems. The ServeRAID 8k option is shipped as a special memory DIMM with a battery attached via wires (for battery-backup purposes) The DIMM is installed in a special DIMM socket in supported servers Five DRAM chips on the DIMM "Adaptec ATB-200" on battery side Write-back cache memory is 256 MB, 533 MHz DDR2 unbuffered memory Battery is connected to the DIMM by wires and is typically mounted on the server chassis

Page 40: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 40 May 2009

XW0001 - Servicing IBM System x Servers – Part II

ServeRAID 10 (MR10i, MR10k, MR10M)

•IBM ServeRAID-MR10i/10is SAS/SATA Controller

•IBM ServeRAID-MR10k SAS/SATA Controller

•IBM ServeRAID-MR10M SAS/SATA Controller

LSI 1078 RAID Adapter (MR10i/is, MR10k, MR10E)

• Eight-port SAS RAID adapter, Two SAS connectors , 3 Gb/s throughput per port (full duplex)• RAID levels 0, 1, 5, 6,10 and 50,60 w/Greater than 2TB array support• X8 PCI Express host interface• Battery-backed 256MB DDRII 667 MHz SDRAM DIMM module• The 10is offers encryption/security• Protects data in cache up to 72 hours during power loss or MegaRAID controller failure• Allows system administrators to replace a failed adapter, while maintaining the data protected on

the DIMM module for up to 72 hours.• iTBBU support• 122 device support• RoHS and WEEE compliant

Page 41: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 41 May 2009

XW0001 - Servicing IBM System x Servers – Part II

EXP3000 Storage Enclosure

•Entry level disk storage-2 U rack mount enclosure with 12 easily accessible bays

•Support for dual-port and hot-swappable SAS disks at 10,000 and 15,000 rpm speeds and SATA disks at 7,200 rpm

•3 Gbps Serial Attached SCSI (SAS) host interface technology•Easy to deploy and manage with the DS3000 Storage Manager•Combination of 12 SAS or SATA 3.5" drives per enclosure•Scalable to 3.6 TB of storage capacity with 300 GB hot-swappable SAS disks or 12.0 TB with 1.0 TB hot-swappable SATA disks in the first enclosure •Expandable by attaching up to three EXP3000s, a total of 14.4 TB of storage capacity with 300 GB SAS or up to 48.0 TB with 1.0 TB SATA •Telco model supports -48V dc power supplies NEBS and ETSI compliance for AC and DC models

Page 42: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 42 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DS3000 Family

•Host-side features-DS3200

•SAS host-side connection-DS3300

•iSCSI host-side connection-DS3400

•Fibre Channel host-side connection

-One or two controllers available on all models•Two controllers provide host-side cable redundancy

•Disk-side features-All models have SAS disk-side connections

- SATA disks also supported

-Extension through up to three EXP3000 expansion units•Up to 48 disks per system

-One or two controllers•Two controllers provides disk path redundancy

The DS3000 family of storage servers provide flexible connection for external, managed storage.SAS, iSCSI and FC models are available. All disks can be SAS or SATA.The host requires the appropriate host bus adapter for the chosen model (SAS adapter for DS3200, iSCSI adapter (ethernet) for DS3300 and FC HBA for DS3400).

Page 43: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 43 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DS3200 Rear View

This picture shows the rear view of the D3200 chassis with dual power supplies and ESMs..

Page 44: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 44 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DS3300 Rear View

Components covered

iSCSI Ports

This picture shows the rear of the DS3300

Page 45: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 45 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DS3400 Rear View

This picture ends the series showing the rear view of the DS3400.

Page 46: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 46 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary – Topic 3

•This topic has enabled you to:-Describe the Raid Levels offered by the IBM ServeRAID Adapter Family

-List the common ServeRAID adapter family of products-Describe some of the IBM storage enclosures commonly found in the System x server environment

This topic dealt with overviews of IBM Raid levels and the currently offered ServeRAID adapters. During this topic also discussed what storage solutions IBM System x offers

Page 47: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 47 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 4 – High-performance Technologies Review

The prerequisite to this course introduced the design principles of IBM System x and xSeries servers and how to service them.This topic looks more closely at what these design principles mean in practice when servicing an System x and xSeries server.

Page 48: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 48 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

•By the end of this topic, you will be able to:-Describe the advanced technologies used in high-performance System x servers

-Describe the system management capabilities of high-performance System x servers

All System x servers support some of the more advanced technologies that IBM has designed and developed.This topic discusses these technologies and describes the implications of working with them in the field.

Page 49: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 49 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Processor Technologies

The industry standard server (Intel processor-based) takes many forms. There are a number of processor types in common use today.This section discusses some of the features of the Intel processor family and reviews some of the service implications when working with processor problems.

Page 50: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 50 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Processor Types

•Intel processors•Dual processor capable

-Xeon DP•Quad processor capable

-Xeon MP, EM64T (32/64-bit) and IBM XA64e 4th generation chipset •Six Core processor capable

-Xeon Processor 7400 series 2.66 GHz/1066 MHz front side bus

•IBM Enterprise X-Architecture chipsets•Enables scaling of Xeon MP and Itanium II systems beyond 4-way capability – up to 96 cores in a single ‘server’ (multi-node)

•AMD Processors-AMD Opteron family of processors

The Intel processor family has several offerings in common use today. High-performance System x servers use all of the processors in the chart above.It should be noted that, although the servers discussed in this course are multi-processor capable, not all servers you see in the field will actually have multiple processors installed. In many cases, the base server ships with one processor, with spare slots or sockets for additional processors as the customer’s needs grow.The x3950 M2 provides an uncomplicated, cost-effective and highly flexible solution. With the ability to scale up to a maximum of 96cores using Intel® six-core processors, while maintaining balanced performance between processors, memory and I/O, thex3950 M2 can easily accommodate business expansion and the resulting need for additional application space. Unique flexibility of the configurations allows the system to populate a minimum of two CPUs per chassis for additional access to memory and I/O that addresses an organization’s specific application requirements. This flexibility allows for the creation of a12-core, 32-DIMM server utilizing only two processor sockets for processor licensing-constrained applications, and can be scaled to a 48-core, 128-DIMM server utilizing only eight processors.For servers equipped with AMD processors, IBM uses the Opteron multi-core parts.

Page 51: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 51 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Processor and VRM Failures

•The SP detects the failure and handles the error-SP will re-start the server

•SP holds the failed processor in reset to allow POST to complete•Server resumes on remaining processors if possible

-Performance is degraded but users have access to the server’s resources

Help is available if a processor or VRM fails as the Service Processor will log the event.When the SP detects a failed processor or VRM, it handles the error and attempts to make the server functional.The SP will deal with this situation by attempting to re-boot the server to any surviving processors.

Page 52: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 52 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Replacing a Failed Processor or VRM

•Service implications-System restarts if there are good processors/VRMs remaining-Processor slot may need to be manually re-enabled upon repair

- Following replacement of the failed component, run Setup (F1) to check processor slot status

-The PDSG will advise the correct part for VRMs (slot or system board)

•When you replace a failed processor or VRM:-Check system-specific configuration requirements in the PDSG

- If necessary, re-enable the processor socket/slot and test the new processor-Reboot the server and ensure POST message reflects all processors are active

A VRM failure may give the appearance that a processor has failed. The system event log should capture the specific details of the fail and enable you to identify if it was the VRM or the processor that had the error.If it is the VRM, it could be in one of several places in the server. Some system x and xSeries models have VRMs built into the system board, some have VRM slots and some have both. Your knowledge of the System x and xSeries model will help you to identify the exact location. Light Path Diagnostics will usually indicate the failing part and, if no light is visible or if you need to verify the failure, check the HMM/PDG for additional information.Once the failed part has been identified and replaced, it is important that you test the system to make sure that the associated processor is functioning normally.Upon replacement of the component, the initialization of the processor slot may or may not be automatically detected by BIOS. It may be necessary to manually enable the processor slot before the system is restored to full functionality. Check the HMM/PDG for the correct procedure.

Page 53: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 53 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Memory Technologies

This section looks at IBM’s memory protection technologies and describes how these memory technologies change the behavior of a server and how you service it when memory faults occur.

Page 54: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 54 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Error Checking and Correcting (ECC) Memory

•Additional bits on a memory DIMM store checksum data to verify memory contents (72 bits vs. 64)-During each write, a new checksum is calculated and stored in the additional bits on the DIMM

-During a read, the checksum is compared with the data bits and verifies data as valid and/or corrects single bit errors

•ECC memory is limited to Single Error Correct/Double Error Detect (SEC/DED)-Memory in ‘critical’ mode must be replaced as only one ECC action is available at a time

Non-servers traditionally use 64 Bit (non-parity) memory, but the absolute minimum memory quality requirement is ECC (72 bits). This memory type is standard across the System x and xSeries server range.Due to the nature of memory configurations in modern servers, a single bit error is still the most common type of error.ECC offers the ability to detect and correct any single bit error and works well in most situations for most general purpose server requirements.From a service perspective, if ECC is correcting a persistent error, the DIMM ultimately needs to be replaced. Unless the server has encountered a second, uncorrectable error, it is likely that the server will still be running. You may need to schedule a suitable time to replace the failing DIMM.

Page 55: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 55 May 2009

XW0001 - Servicing IBM System x Servers – Part II

ChipKill Memory

•ChipKill memory provides a higher level of error checking and correcting capabilities- Uses standard ECC DIMMs- Corrects up to 4-bit memory errors

•IBM patented technology performs “on-the-fly” correction-Improves reliability 600 times over ‘standard’ ECC memory

-Especially important for business-critical applications where large amounts of memory are installed•1 in 5 servers with more than 1 GB of memory may have multi-bit errors each year

•Large “Database Servers” can take many extra hours to recover from a system failure (for example, time to re-initialize the database)

Where standard ECC protection is not enough, many high-end IBM System x and xSeries servers now offer ChipKill support.This technology extends the basic ECC capabilities to be able to support the loss of an entire DRAM device on a DIMM – the equivalent of 4 bits of bad data.Very large databases can take several hours to resynchronize, rebuild or restart following a shutdown so a customer should/will factor this into service availability planning when deciding on a suitable memory technology for their server.As with ECC, a memory system that has invoked a ChipKill event is likely to be running. You will not be able to simply take out the bad DIMM and replace it without scheduling a suitable time with the customer.

Page 56: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 56 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Hot Spare Memory

•Extends memory availability beyond ChipKill capabilities-Some memory DIMMs are reserved for the hot-spare function•Total available memory will be reduced if hot-spare memory is enabled

-UNLESS MEMORY IS MIRRORED, memory DIMMs are NOT hot removable! •System must be brought down to replace bad memory

-Function needs to be enabled in BIOS (customer choice)•Function is supported on the System x 3800

Hot-spare memory reserves a bank of memory to ‘cover’ the user memory in the server. • The extra/hot-spare memory is idle until it is needed. • The Service Processor monitors memory performance and tracks errors. • Before the ECC threshold is reached, the failing memory is copied to the hot-spare DIMMs during the

refresh cycle, and the questionable memory is switched off. • In order for this to work, the failure must be correctable by ECC or ChipKill correction algorithms and

the memory swapped by the controller before a fatal error halts/crashes the NOS. Traditionally, for hot-spare memory to work, all memory in all banks must be identical.

Page 57: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 57 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Memory ProteXion

•Combination of technologies for ultimate memory reliability-Spare bits on each DIMM can be used to move data from failed bits (up to 2 bits) on DRAMs to good spares

-Memory can also be mirrored for ultimate protection•Mirroring is supported on x3950, x3950E, x3850M2 and x3950M2

Memory ProteXion is the term given to the memory system of a number of System x and xSeries servers that are based around the IBM Enterprise X-Architecture chipsets.• The memory configuration provides for spare bits on each DIMM. If a bit of memory goes bad, it will

be moved by the memory controller to a new location on the DIMM. (Routinely ECC correction has taken 8 extra bits, out of 72 to provide ECC protection, but recent innovations at IBM have found a way to do that with only 6. Leaving two spare bits per 72 pin memory DIMM)

• Memory can also be mirrored. In a mirrored configuration, half of the memory is reserved for the copy so the total maximum possible memory is reduced by half.

• As with all memory failures, the bad DIMM must ultimately be replaced to avoid further failures stopping the NOS. But ONLY in a mirrored memory configuration, will you be allowed to “hot replace” a failed memory DIMM.

• If you are unable to ‘hot’ replace a failed DIMM, you are likely to need to schedule downtime on high performance System x and xSeries servers as they are built to survive even serious memory faults.

Page 58: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 58 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Memory Mirroring

•Requires two memory ‘ports’ and supporting hardware-Memory controller and BIOS work together to create and exact duplicate of one port in the other

-Mirrored memory systems enable ‘hot replace’ of defective DIMMs

Here are some basic rules for memory mirroring to work.

Page 59: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 59 May 2009

XW0001 - Servicing IBM System x Servers – Part II

When a DIMM Fails in a Server

•SP detects the failure and handle the error-SP re-starts the server if memory is not mirrored

- SP will hold the failed DIMM or bank of DIMMs in reset to allow POST to complete

- Server resumes on remaining DIMMs if possible- Performance is degraded but users will be able to use the server’s resources

•Service implications-If memory is mirrored, the server will be running and it may be possible to remove the failed DIMM without stopping the NOS

-If memory is not mirrored, the system may have restarted itself if there was good memory remaining- If so, it will be necessary to shut down the server to make repairs

-Memory slot or bank may need to be manually re-enabled- Following replacement of the failed component, run Setup (F1) to check

memory slot status

When the SP detects a failed DIMM, it handles the error and attempts to make the server functional.• If memory is mirrored, the hardware will have switched off the port containing the bad DIMM. In this

case, you may be able to remove the failed DIMM without stopping the NOS. The procedures for removing a failed DIMM in a mirrored configuration are contained in the HMM or PDG.

• In a system without mirrored memory, you will need to shut down the server to replace a failed DIMM. Upon replacement of the component, the initialization of the memory slot may or may not be automatically detected by BIOS.

It may be necessary to manually enable the DIMM slot or a bank of DIMMs before the system is restored to full functionality.

Page 60: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 60 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Active PCI, PCI-X and PCI- Express

Active PCI, PCI-X and PCI-Express were developed to add the ability to hot add, remove and replace adapters and controllers to a system without the need to shut down the NOS. This section describes the technology and how to work with it.

Page 61: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 61 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Active Slot Implementation

•All 4-way and some 2-way servers have ‘Active’slots-Additional hardware supports sensing and power control-The OS needs to be able to support Active slots

•Device drivers are aware of the power state of the hardware•Individual adapters need to be supported by intelligent drivers

-This is typically supplied by the adapter manufacturer-In a redundant adapter configuration, failed adapters can be removed and replaced without shutting down the OS

PCI-

While not exclusive to high-end servers, Active PCI technology is common to all high-end System x and xSeries servers.• Active PCI (and Active PCI-X) enables the option to potentially add, remove and replace adapters

while the NOS is running. Device drivers are needed to support both the technology and any adapters that will make use of the technology. Where two adapters are coupled together in a redundant configuration, for example two network adapters, a failure can be fixed without stopping the NOS.

Active PCI requirementsHardware• Interlock Switch• 2 LEDs per Active PCI slot

–Power–Attention

Software• Device Driver

– Adapter manufacturer• System Driver

–Machine manufacturer• System Service

–Operating System manufacturer

Page 62: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 62 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Servicing a Server with Active Slots

•If an adapter fails in an Active bus, it can be removed without stopping the OS-Procedures vary according to the OS and adapter

•Simply removing power from a failed adapter may crash the server, even though the adapter has failed-Adapter-specific procedures must be followed

-When a working adapter is reinserted, additional steps may be needed to activate it

PCI-

If you are called to a server which has Active slots enabled and working, you will need to consult with the customer before attempting to replace a failed adapter. • Any customer who adopts this technology will be reluctant to let you stop the NOS to replace the

failed adapter and you may be required to replace the adapter ‘hot’.• Procedures vary from NOS to NOS AND from adapter to adapter. However, in general, the NOS is

informed that an adapter is about to be removed and Active PCI/PCI-X switch card is used to remove power to a slot prior to removal.

When you have completed the repair and fitted the replacement adapter, the NOS may need to be told that the repair is complete.

Page 63: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 63 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Service Processors

Here, we look at the system management hardware (Service Processors) you will find in high-performance System x and xSeries servers.

Page 64: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 64 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Service Processor Types

•Baseboard Management Controller (BMC)-IPMI-compliant service processor

•Stores event log and other system information•Accessible via an ethernet connection to the (shared) eth0 port of the host-Accessible via <F1> Setup or <F2> Diagnostics but requires the OS to be stopped

•Remote Supervisor Adapter II (RSA-II)-Powerful, ethernet graphical (Web) management tool

•Standard on x3950, x3950E, x3850 M2 and x3950M2•Optional on x3755, x3800 and x3850

Service Processors (SP) are often divided into two groups. Basic Service Processor (BMC)

- Runs on the 5v “continuously-on” power, and is used to power on/off the server- monitors I2C bus for sensor activity, and stores logs / information about events- Responds to issues, and errors (light path diagnostics, fans, reboots)- provide limited information access to the machine while powered off (if machine is plugged in)

Advanced Service Processor (RSA2)– Runs on the 5v “continuously-on” power and Monitors/collects information from BMC– Can be programmed to page out support personnel when a problem occurs– Powerful web interface for easy remote management– Remote video, remote control, push down code features

– .

Page 65: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 65 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Base Management Controller (BMC)

•Independent microcontroller used to perform low level system monitoring and control functions.

•BMC Functions:-Initial system check out at AC on-BMC event log maintenance-System power state tracking-System initialization-System software state tracking-System event state monitoring-System fan speed control

•IPMI BMC event log messages-Contains information, warning and error messages

The Intelligent Platform Management Interface (IPMI)-compliant Baseboard Management Controller is for system health monitoring and management. The BMC maintains a system event log (SEL) that can be accessed during POST via the F1 key sequence and while the host OS is running if it is configured for remote access.

Page 66: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 66 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Remote Supervisor Adapter (RSA)

•RSA, RSA II and RSA II Slimline•Management independent of server or OS status

-Full remote control of hardware and OS (only direct via LAN)-Remote power control -Remote flash update -User administration and security

•Additional features of RSA II-Default fixed IP address (192.168.70.125)-Dongle provides RS-485 and serial ports-Host video is provided by the RSA II

- RSA II Slimline does not provide host video•Management tool access

-IBM Director, Telnet, ANSI terminal, Web browser

Remote Supervisor Adapters (RSA) are full featured management adapters with a host of features to provide both in-band and out-of-band management capabilities, including full remote controlThrough the RSA and RSA II, you can interrogate and manage logs, control and monitor the power state of the host server, apply flash updates to host and any attached I/O expansion enclosures and take full remote control of the host console while the NOS is running.RSAs support the following:Web-based management: embedded in the adapter, a small web server provides the capability to connect through the dedicated LAN port and access a user friendly interface, based on HTML code, to perform configuration and monitoring of the server. Remote graphic console redirection: When connecting through the dedicated LAN port, the card will make it possible to grab video data and perform a complete console redirection with text, graphics, keyboard and mouse support.DNS/DHCP support: In addition to static IP configuration, the RSA supports DHCP and DNS. Putting the card in a network where a DHCP is installed will generate its automatic configuration; avoiding the need to run configuration routines through the management software.NT blue screen capture: The most recent OS failure screen can be captured, avoiding the annoying step of restarting the server to reproduce the error.Attach event log to e-mail alerts: The event log can be sent out as an attachment of an e-mail to administrators to notify them of any problem that affected the server.DB-9 connector (RSA only): The card has a standard DB-9 connector, making cabling easier.Externally visible LEDs: Power and error LEDs are on the rear bezel, removing the need to lift the covers in order to check the status of the card.

Page 67: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 67 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RSA II Adapter Features / Layout

1. Status LEDs (Heartbeat & Power - heartbeat blinking, power solid during normal operation)

2. Pinhole Reset (Service Processor Software Reset)

3. Mini-USB Connector (Host OS Comm. / Remote Disk,Mouse,Keyboard)

4. External Power Supply5. RJ45 Ethernet Connector (Web Interface)6. DB15 VGA Video Connector (Host Video)7. Video Compression Memory8. Non-Serviceable Clock Battery9. Video Compression Chip10. Remote Floppy,Mouse,Keyboard Chip11. ATI Radeon 7000VE (a.k.a RV-100) (Video)12. PCI Connector (System Video)13. Ethernet PHY14. Flash Memory (Service Processor)15. PowerPC CPU (Service Processor)16. Video Memory (System Video)17. CPU Memory (Service Processor)18. Real-time Clock

The RSA2 adapter replaced the RSA adapter starting in 2003 and currently comes in several slightly different flavors. The above photos shows some of the complex features of the full RSA II adapter (e.g. mounted on its own video card). The RSA2 – SlimLine adapter mounts on an existing video adapter in many of the newer System x servers. There is also a RSA2 SlimLine Refresh 1, and a RSA2-EXA adapter. The essential differences of these renditions can be found on the following website. http://www.redbooks.ibm.com/abstracts/tips0146.html

This RSA-2 adapter is a complex, half size adapter which needs to be flashed for the supported server that it is installed in. Depending on the level of code installed in the RSA II, the adapter can be reset with either a 5-5-10 second (5 seconds pushed, 5 seconds not pushed, 10 seconds pushed) or a straight 10 second pushed reset using a paper clip.

A reset of the adapter will set it back to factory defaults, cause the adapter to reboot, and try for two (2) minutes to obtain a DHCP address before resorting to a 192.168.70.125 if/when it can not find a DHCP server.

Page 68: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 68 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RSA II Web Interface

In this picture you can see an example of the interface that will be presented to the user when connecting an RSA II through a Web browser.

Page 69: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 69 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Event Log

•If a failure occurs in the system: •A fault LED on the operator panel card is illuminated•Event log information can be viewed though the RDA Web interface

Through the RSA, you can interrogate and manage event logs to assist in problem isolation and repair.Note: you can access the RSA II event logs even if the host is in standby power mode.

Page 70: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 70 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Flash Updates

•The RSA-II has upgradeable BIOS and firmware-The software can be installed from within a supported operating system

- Microsoft and Linux executable files are available for download-Remote flash images file is also available

- Updates are applied via the RSA II web interface

1. Browse for downloaded file

2. Select to Update

The IBM Remote Supervisor Adapter II has three different update package options: a Windows update package, a Linux update package, and a Zip file package. (e.g. sample web link is

The Windows and Linux update packages can be installed from one of these NOSs. (e.g. provided that the NOS driver for the RSA2 is installed)

The Zip file package is used to update the RSA2 adapter from the Web Interface. The package consists of a readme, a change history and the Zip file containing the following PKT files. • PAETBRUS.PKT is traditionally the name of the Boot ROM file• PAETMNUS.PKT is traditionally the name of the Main Application file

If access to the server is possible, these components can be updated with the use of flash images. Images can be downloaded from the IBM support Web site, which cam be used to make the necessary diskettes.If access to the server is not possible or if the RSA2 is under management through a Web browser, updates can also be applied via the web browser connection. In this case, the update images are different but can still be downloaded from the IBM support web site.

Page 71: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 71 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Console Redirection

•Both text and graphics redirection is available-Hardware connection requirements:

•Text redirection is available through the serial port as well as the ethernet connection

•Graphical redirection works through the ethernet only-Software requirements for remote POST screens, remote Setup and remote Diagnostics:•Terminal program or IBM Director or a WEB browser•Supported Java engine

-During Boot, the RSA2 adapter can be loaded with a Disk or CD image/file and the server can boot from this image file.

Console redirection can be very useful for diagnosing problems where access to the server console is required.Using a variety of connection methods and software interfaces, the RSA gives full remote control capabilities.Depending on the level of access to the hardware, you can perform almost any task that you could perform while actually standing at the server itself. If the RSA ethernet port is connected to the customer LAN, you can even take control of the server from another location – in theory, anywhere in the world –provided you know the IP address of the adapter and have the necessary security permissions to access the interface.

This facility is very powerful and must be used with extreme care. Also, accessing a server console in this way should only be undertaken with the permission of the customer.

One other very important feature of the RSA2s – “Remote Disk” feature is that a file ( diskette or CD image file) can be accessed by the server via the RSA2 adapter. The image is first loaded on the RSA2 adapter. Then when the server is rebooted, it’s boot sequence can be altered (e.g. press F12) to boot from it . ( The server will now boot from the remote file, as if it was really an attached diskette driver, or CD-ROM drive.) This can be used to flash the various server hardware features remotely.

Page 72: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 72 May 2009

XW0001 - Servicing IBM System x Servers – Part II

SP Functions Comparison

Management/configurationYesYes (via SoL)ANSI-based ManagementYesYesDirector-based ManagementYesYesTelnet-based ManagementYesNoWeb-based Management

Yes**NoRemote BIOS UpdateYesNoRemote ControlYesNoRemote POST / DiagnosticsYesYesView Status LogsYesYesView Vital Product Data

ConnectivityYesYes (shared)10/100 EthernetYesNoDHCP supportYesNoDNS supportYesNoPPPYesNoShared serial support

NoNoSNMP via PPP

YesYesAlert to pager

YesYesAutomatic Server Restart

Yes*Yes

YesYesYesYesYesYes

RSAII

NoSNMP TrapsNoSMTP Email

AlertingYesPOST, Loader, O/S TimeoutsYesPFA on system componentsNoOptional Power SourceYesInterface with Light-PathYesEnvironmental MonitorsNoCapture Windows Blue Screens

MonitoringBMCFeature / Function

Here is an ‘at a glance’ comparison of the monitoring and alerting capabilities of the Service Processors found in System x servers.*Only SNMPv1 traps supported.**Direct flashing of BIOS/Diags firmware is not supported (can be done using the remote disk feature instead).

Page 73: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 73 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary

•This topic has enabled you to:-Describe the advanced technologies used in high-performance System x servers

-Describe the system management capabilities of high-performance System x servers

This topic discussed the technologies incorporated into high-performance System x servers.

Page 74: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 74 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 5 – Working With Scalable Systems

This topic discusses the service implications of working with scalable systems.

Page 75: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 75 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

•By the end of this topic, you will be able to:-Define the terms scalable, node, complex and partition-Describe the data and system management cabling of a scaled system

-Describe how to use the RSA II WEB interface to configure a scalable partition on an x3950, x3950E and x3950M2

When servicing multi-node systems, it is important to understand the relationship between nodes in the partition and how the partition is wired together.

Page 76: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 76 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scalability Terminology

•Scalable-A system that is able to join with another computing resource to act as a single, larger ‘server’

•Node-A single computing resource (a server)

•Capable of operating alone or joined (scaled)•Complex

-Two or more nodes•Joined together physically

•Partition•A complex that is running a single instance of an OS

The term ‘scalable’ is used to describe a device that has the ability to operate in a joined fashion, along with another computing device, to appear as a single, large ‘server’.A node is the smallest unit of a scaled system. A node can operate standalone, as well as in a complex.A complex is a collection of nodes, physically and joined together to form a large computing resource.A partition is a complex that is running a single instance of an OS across all processors and memory in the complex.

Page 77: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 77 May 2009

XW0001 - Servicing IBM System x Servers – Part IIScalability Schematics – 8-way and 16-way

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

x3950 / x3950E – 8-way configurationx3950,x460/MXE

x3950/460/MXE

Port 1 Port 2 Port 3

Upper SMP module

x3950 M2

Port 1 Port 2 Port 3

Upper SMP module

x3950 M2

RSA

RSA

RSA

RSA

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

x3950 – 16-way configurationx3950/460/MXE

x3950/460/MXE

RSA

RSA

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

x3950/460/MXE

x3950/460/MXE

RSA

RSA

BMC

BMC

BMC

BMC

BMC

BMC

BMC

BMC

Here are the cabling schematics for 8-way and 16-way operation across all the supported scalable systems.When scaling the x3950, x3950E, 460 or MXE 460 to an 8-way partition or all the above systems to a 16-way partition, the RSAs play a key part in creating and maintaining the partition as they hold the partition data and maintain communications between all nodes in the partition.The data flows across the scalability cables. Each node contains a scalability controller (part of the XA chipset) that is effectively a high speed switch. Each node above is directly connected to each other node so much of the switching technology embedded in the controller is not used.Note that ethernet hubs are used in all but the most simple of partitions as there are many devices that need to connect to a common management LAN in order for scaling to work, while still providing real time access to management processors and functions.

Page 78: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 78 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scalability Schematic – 32-way

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

xSeries 460 – 32-way configuration

x460/MXE

x460/MXE

RSA

RSA

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

x460/MXE

x460/MXE

RSA

RSA

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

x460/MXE

x460/MXE

RSA

RSA

BMC

BMC

BMC

BMC

BMC

BMC

Port 1 Port 2 Port 3

Port 1 Port 2 Port 3

x460/MXE

x460/MXE

RSA

RSA

BMC

BMC

Here is the cabling schematic for a 32-way xSeries x3950, x3950E, 460/MXE 460 partition.As you can see from the scalability cabling in this schematic, each node is directly connected to three other nodes in the partition. This time, each node acts as a router to the nodes that are not directly connected, fully exploiting the switching capabilities of the scalability controllers in the nodes. Without the ability to maintain routing tables, it would not be possible to scale eight nodes together.

Page 79: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 79 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scalability Requirements

•All RSAs in the partition must be connected-An ethernet hub is required for more than two nodes

•All scalability cables must be fitted•BIOS and firmware levels must match across all nodes

•Previous partition information must be deleted-Stale partition descriptor data may cause nodes to fail to merge

Here are the basic rules that will allow multiple nodes to merge into a partition.Before a partition can merge, however, parameters must be set to identify all nodes in the partition.

Page 80: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 80 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Partitioning Overview

•System x and xSeries servers use static partitioning-A less complicated hardware implementation, and can operate with current commodity operating systems •Static partitioning (SPAR) is a model of partitioning that enables reconfiguration of a multi-node complex along nodal boundaries after shutdown and restart of the effected partitions, rather than the entire complex.

•The key feature of SPAR is the ability to independently manage and service individual partitions through software without having to shutdown, physically power-down, power-up, and restart unaffected partitions

Static partitions are those that require a reboot to change the configuration.This is a simplified model that fits well with existing OSes that rely on hardware to ‘mask’ the fact that it is running on processors and memory from several physical nodes.

Page 81: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 81 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Configuration

•To create a complex:-Flash BIOS, BMC and RSA of all nodes to same levels-Gather IP addresses (static or dynamic) for all RSAs

•SP networks must either have static IP addresses or have DHCP leases to maintain consistent IP addressing-The IP addresses that are assigned to the RSAs must not change once nodes are scaled and running

-This is true for static IP addresses and DHCP leases-For x3950, x3950E, xSeries 460 and MXE 460:

•Define the RSA and BMC IP addresses in <F1> Advanced Setup•Define the partition details in the RSA II Web interface

-Partition tables still exist and are stored on each local RSA Partition tables still exist and are stored on each local BMC

Before attempting to create a complex, ensure that BIOS, BMC and RSA firmware match across all nodes and that RSA clocks match. By doing this, if a failure occurs the information written to the event logs will correlate. The configuration of a complex is performed in one of two places, depending on the node type. For older systems, the configuration is created and stored via the <F1> Setup program. On newer systems, all configuration tasks are performed though the RSA II Web interface.

Page 82: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 82 May 2009

Here are the instructions to create a complex using <F1> Setup.

XW0001 - Servicing IBM System x Servers – Part II

Scalable Partitioning Using <F1> Setup

•Boot the chassis that will be the primary node•Enter <F1> Setup

-Select <Advanced Setup> on the main menu and <Static Partition Information>- In <Secondary Host Name>, enter the IP address of the secondary node

RSA- Navigate to <Save Static Partition Information> and press <Enter>

-Navigate to <Start Options> on the main menu- Set <Boot Fail Count> to <Disabled>

-Power down the primary node and remove AC•Boot the chassis that will be the secondary node

•Enter <F1> Setup-Navigate to <Start Options> on the main menu

- Set <Boot Fail Count> to <Disabled>- Do not enter any information regarding the primary node TCP/IP address

-Power down the secondary node and remove AC

Page 83: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 83 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scalable Partitioning Using the RSA II Web Interface Sub Menus

•The following tasks can be performed:-Obtain status-Create a partition-Control a partition-Delete a partition

Status View current and new scalable partitions data in the graphical user interface provided by RSA-2 Scalable Partitioning Web interface. This menu is automatically displayed after each task below (create, control and delete) has completed.Create Partition Task Create new scalable partitions with RSA-2 Scalable Partitioning Web interface.Control Partition Task Control new and current scalable partitions with RSA-2 Scalable Partitioning Web interface.Controls are:1. Moving new partition to current partition - new partition is a staging area for current partitions. New partitions can be created while current partitions are running.2. Starting current partitions3. Stopping current partitionDelete Partition TaskSelections are:1. Delete Partition Settings on all ASM's members of New Scalable Partition.2. Delete Partition Settings on all ASM's members of Current Scalable Partition.3. Delete Partition Settings only for this (local) ASM Member of Current Scalable Partition.

Page 84: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 84 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Level 4 Cache Considerations

•Partitions are supported by L4 cache to speed communications across the processor busses-On earlier Scalable systems (the xSeries 440, 445, and 455), L4 cache is separate from main memory

-On the x3950, x3950E, xSeries 460 and MXE 460, the scalability chip has an integrated L4 Scalability Memory Cache (SMC) which utilizes main memory•When BIOS reports available memory per node to the O/S, it must first subtract the scalability cache size (256MB)

On first and second generation scalable systems, the L4 cache was physically separate from main memory. All main memory is available to the OS.On third generation scalable systems, the cache controller utilizes host memory for the cache. The customer will notice a difference between reported memory (that which is available to the OS) and physically installed memory.

Page 85: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 85 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scaled System Management Considerations

•When a partition is merged:•Diagnostics (<F2>) is only available on at node level•Light Path applies to each individual chassis, not the complex•The Service Processor must be functional and SP LAN connected to others in the group

•The OS does not know physical boundaries and sees the complex as one system

•Event logs map memory and PCI bus information to a chassis in addition to a slot

•Processor speeds must be the same within and across all chassis•Multi-chassis configuration code on the RSA II is only available if the Scalability Cartridge Assembly is detected in an x3950, x3950E, xSeries 460 or MXE 460

Here are some things to remember when working with partitions and scaled systems.�

Page 86: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 86 May 2009

XW0001 - Servicing IBM System x Servers – Part IIScalability Port Test from Diagnostics

Scalability Ports can be tested using System Diagnostics <F2>, under the Basic menu option from each chassis.The new Diagnostic Test Scalability Port Test is an Interactive Test which requires the user to follow the text on the screen.

Page 87: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 87 May 2009

XW0001 - Servicing IBM System x Servers – Part II

x3950 M2 Scalability Overview

•The x3950 M2 can be scaled to create complex partition that is running a single instance of an OS

The term ‘scalable’ is used to describe a device that has the ability to operate in a joined fashion, along with another computing device, to appear as a single, large ‘server’.A node is the smallest unit of a scaled system. A node can operate standalone, as well as in a complex.A complex is a collection of nodes, physically and joined together to form a large computing resource.A partition is a complex that is running a single instance of an OS across all processors and memory in the complex.

Page 88: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 88 May 2009

XW0001 - Servicing IBM System x Servers – Part II

x3950 M2 Scalability

•Scalability configurations supported are 2,3,4 nodes•Port cabling same as x3950

-New Cables (deep plug w/iPass connectors)•Scalability key required to enable scalability

-key plugs into the processor board at J14 connector•4 GB minimum is required for successful boot

-One processor and two DIMMs minimum in each •Only USB keyboard and Mouse are supported to boot stand alone -Hit remind button to initiate standalone boot as USB devices are not initialized at start of merge process

Configuration can have one or more scalable partitions. Each scalable partition supports an independent operating system installation. The scalable partition uses a single, contiguous memory space and provides access to all associated adapters and hard disk drives. PCI slot numbering starts with the primary node and continues with the secondary nodes, in numeric order of the logical node IDs.Before you create scalable partitions, read the following information: Make sure that all nodes in the multi-node configuration contain the following software and hardware: The current level of BIOS code, SAS BIOS code, service processor firmware, BMC firmware, and FPGA firmware.Note: To check for the latest firmware levels and to download firmware updates, go to http://www.ibm.com/systems/support/. Microprocessors that are the same cache size and type, and the same clock speed.Make sure that each node contains the following hardware: – A minimum of one microprocessor and one memory card with one pair of DIMMsNote: The nodes can vary in the number of microprocessors and the amount of memory each contains, above the minimum. – A ScaleXpander key on the microprocessor board to enable multi-node operationMake sure that the primary node contains a minimum of 4 GB of memory

The Scalability installation Option documentation is available

Page 89: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 89 May 2009

XW0001 - Servicing IBM System x Servers – Part II

System x3850 M2Non-Scalable

System x3950 M2Scalable

ScaleXpander Option Kit

Scalability icon lights up when active

Chassis Scalability requires ScaleXpander Option Kit

Closer look

The x3850 M2 can be upgraded to a x3950 M2 with the ScaleXpander Option Kit

Notes:The IBM® ScaleXpander Option Kit can be used to upgrade the x3850M2 for scalability. The IBM® ScaleXpander Option Kit can be used interconnect the SMP Expansion Ports of two or more servers to form multi-node configurations. With the ScaleXpander Option Kit, the non-scalable x3850 M2 transforms into a scalable, x3950 M2. This scaleable configuration supports up to 16-sockets and 92 processor cores.

Page 90: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 90 May 2009

XW0001 - Servicing IBM System x Servers – Part II

ScaleXpander Option Key

•The ScaleXpander Option Kit is installed in a slot near the front of the systemboard

•During POST, the BMC reads VPD on the chip to verify the system can scale

•Each chassis to be scaled requires the kit to be installed

ScaleXpander Option Key

Notes:In order to merge chassis, the ScaleXpander Option Kit needs to be installed in a slot near the front of the systemboard. During POST, the BMC will read VPD on the chip to verify the system can scale. Each chassis to be scaled requires the kit to be installed.

Page 91: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 91 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Processor Board Scalability Connectors

This slide is the Processor board Connections The Scalability key required to enable scalability the key plugs into the processor board at J14 connector. Three connectors on the rear of the system are used to connect the physical system together. A management network consisting of the RSA and BMC from each of the system to be scaled is required.

Page 92: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 92 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Rear view scalability connections and cable

Scalability Cable

Notes:This slide shows the scalability cable and SMP connectors on the rear of the x3950 M2.

Page 93: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 93 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scalability cables

Scalability cable release lever

The cabling information is for multi-node configurations that consist of two or (when supported) three servers, for up to 12-socket operation. A node is a server that is interconnected with other servers or nodes through the SMP Expansion Ports to share system resources. Two-node configuration A two-node configuration requires two 3.0 m (9.8-foot) ScaleXpander cables. (for two node Configuration)• Attach Scalability cables to from port 1 to port 1, and port 2 to port 2

Page 94: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 94 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Rear view scalability cables connected

Notes:This slide shows the deep-plug scalability cables installed into the SMP ports on the rear of the x3950 M2. Note the location of the scalability release levers.

Page 95: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 95 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Two Node Scalability Cable Layout

Two-node configuration A two-node configuration requires two 3.0 m (9.8-foot) ScaleXpander cables. To cable a two-node configuration for up to eight-socket operation, complete the following steps: Label each end of each ScaleXpander cable according to where it will be connected to each server.

Connect the ScaleXpander cables to node 1: a. Connect one end of a ScaleXpander cable to port 1 on node 1; then, route the cable through the node

1 wire-form clips on the cable-management arm.b. Connect one end of a ScaleXpander cable to port 2 on node 1; then, route the cable through the node

1 wire-form clips on the cable-management arm. Connect the ScaleXpander cables to node 2:a. Locate the ScaleXpander cable that is connected to port 1 on node 1; then, connect the opposite end

of the cable to port 1 of node 2. Next, route the cable through the node 2 wire-form clips on the cable-management arm.

b. Locate the ScaleXpander cable that is connected to port 2 on node 1; then, connect the opposite end of the cable to port 2 of node 2. Next, route the cable through the node 2 wire-form clip on the cable-management arm.

Three-node or four node configuration A three-node configuration requires three 3.0 m (9.8-foot) ScaleXpander cables. To cable a three-node configuration for up to 12-socket operation, For detailed instructions and cable layout refer to the IBM System x3850 M2 and System x3950 M2 Type 7141Problem Determination and Service Guide

Page 96: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 96 May 2009

XW0001 - Servicing IBM System x Servers – Part II

x3950 and x3950 M2 Scalability Comparison

•x3950-RSA managed partitioning-Complex descriptor and partition descriptor

-Partitioning done across ethernet

-No topology awareness•Manual system discovery required to setup RSA IP addresses

•No cable status or debug reporting-Each RSA only partition “aware”•Controls only the local partition•Control only available from primary system

-Partition deletion for updates

•x3950 M2-BMC managed partitioning-Complex descriptor

•Describes complex and partition per system

-Partitioning done across scalability cables

-Aware of entire complex topology•Cable problems and sophisticated debugging available

-Each system complex “aware”•Control all partitions from one page•Controls all partitions or individual systems

•Aware of all system states-Preserve partition with standalone

Unlike previous scalable systems, the IBM System x3950 M2 BMC manages the scalable partitioning (rather than the RSA).

Page 97: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 97 May 2009

XW0001 - Servicing IBM System x Servers – Part II

New Architecture

•BMC automatically discovers scalable systems•New systems are discovered by checking all the ports

- System topology (all cable connections) discovered- Systems identified by UUID in complex descriptor

•Changes in the scalability cables are discovered - Remote changes available to neighbors

•Complex Descriptor Data Structure filled out•Partition Control (Power On, Off, Reset….), •Partition Create – manually or default partition•Partition Delete, Reset to Defaults•Standalone for debug purposes•Clients (RSA, BIOS…) read structure and process the information

- BIOS uses it to setup systems to merge- RSA uses it to present graphical image of the complex

•External components can create and control partitions through nine available scalability commands

The new architecture uses an RSA connected to one of the nodes to act as a web based scalable complex management console from which partitions can be created and controlled. Cable topology and scalable port status will also be available from this complex management console. Partition creation and control may be performed from an RSA or IPMI client; partition management will be handled within each BMC.The new architecture will perform automatic node topology discovery using the FPGA and BMC, so that every node will be able to communicate with every other node using the scalable management bus. The previous architecture required the user to set up in advance the Ethernet IP addresses of all the RSAs before partitions could be created, and further required partition creation be performed from the boot node of each partition. The new architecture has removed all of these cumbersome requirements, making it possible to connect systems out of the box and go directly to partition creation.Partition creation is now streamlined to a single RSA web page where partition configuration data can be distributed to target member BMCs and stored in NVRAM. A test for pre-existing partitions is performed and their status is checked to ensure that the partition is powered off prior to reconfiguration. Partition IDs are utilized by the FPGA to enable uniform behavior by all nodes in a partition during power and reset operations. Partition-wide platform options such as mirroring are also distributed so that each BIOS can have consistent settings in advance of partition merging during the system boot phase.

Page 98: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 98 May 2009

XW0001 - Servicing IBM System x Servers – Part II

BMC Role

•Performs Auto Discovery•Discovers all the systems in the complex•Discovers the connections on all the ports•Aware of the complete topology•Updates necessary registers for related components (BIOS, FPGA, RSA)

•Retains and maintains all Complex Information•Stored data structure•Clients send commands and BMC keeps the data consistent•Performs data manipulation per partition

•Routes all the information•Controls local and remote systems and partitions•Keeps data structures consistent on all systems in complex

The BMC will be used for creation and storage of complex descriptors as well as automatically generating a default partition based on complex topology (all complex nodes will be partition members). The BMC will also control the static partitioning states. Partition creation/configuration can be performed using RSAII Web interface or user application aware of RSAII dot commands or OEM IPMI commands through the BMC. The BMC will provide an automatically generated partition for users who do not care to manually create partition's) once the user identifies the primary/boot node in the partition. The automatic generated partition descriptor will only support all complex nodes being partition members of the same partition. A CLI interface to dot commands or OEM IPMI commands will be supported allowing of scripting tools to generate partition's) based on the user needs

Note: RSA is still required for partition definition.

Page 99: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 99 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RSA II Role

•Unlike previous scalable systems, the role of the RSA has changed-Reads scalable complex information from the BMC-Displays scalable complex topology to user including:

•Incorrect cabling displayed and noted•Port problems displayed and noted•Non-scaled systems displayed and noted•Provides partition and system

-As in previous versions Partition and system state are displayed

In multi-node integration, the partition configuration is written once through RSAII to the BMC then the FPGA interface. The FPGA interface allows for routing partition configuration to each partition member’s BMC and FPGA interface (Virtual ICMB) (Intelligent Chassis Management Bus).This complex configuration will be stored in each local node’s BMC NVRAM. The partition configurations are contained in the complex configuration.During complex/partition configuration, the BMC will only use one buffer for all data, no longer holding two buffer (active/candidate) like previous scalable systems.The data structure of the complex descriptor will be stored in each local node’s BMC NVRAM. The data structure will have a version check to ensure consistency. This data structure of the complex descriptor will be shared between all the user applications creating and controlling static partitioning

Note: RSA is still required for partition definition.

Page 100: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 100 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RSA II Interface (Create Partition)

To create a scalable partition, complete the following steps:1. Connect the ScaleXpander cables.2. Connect all nodes to an ac power source and make sure that they are not running an operating system.

Note: If the nodes are part of an existing partition, all nodes must be in Standby mode, which means that the nodes are part of the partition but operate independently. • Click Force under Standalone Boot on the Scalable Complex Management page to enable the Standby mode. 3. Connect and log in to the Remote Supervisor Adapter II Web interface 4: In the navigation pane, click Manage Partition(s) under Scalable Partitioning. Use the Scalable Complex Management page to create, delete, control, and view scalable partitions.. Select the primary node; then, automatically or manually create a scalable partition• Click Auto under Partition Configure to automatically create a single partition that uses all nodes in the multi-node configuration• Click Create under Partition Configure to manually assign nodes to the partition

See the Remote Supervisor Adapter II SlimLine and Remote Supervisor Adapter II User’s Guide for more information; then, continue with the procedure to create a scalable partition.

Page 101: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 101 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scalable Complex Management page

To create a scalable partition, complete the following steps: 1. Connect the ScaleXpander cables.2. Connect all nodes to an ac power source and make sure that they are not running an operating

system.Note: If the nodes are part of an existing partition, all nodes must be in Standby mode, which means that

the nodes are part of the partition but operate independently Click Force under Standalone Boot on the Scalable Complex Management page to enable the Standby mode.

3. Connect and log in to the Remote Supervisor Adapter II Web interface. See the Remote Supervisor Adapter II SlimLine and Remote Supervisor Adapter II User’s Guide for more information; then, continue with the procedure to create a scalable partition.

4. In the navigation pane, click Manage Partition's under Scalable Partitioning. Use the Scalable Complex Management page to create, delete, control, and view scalable partitions. A page similar to the one in the following illustration is displayed.

Page 102: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 102 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RSA II Interface ( Partition Started )

Select the primary node; then, automatically or manually create a scalable partition: 1. Click Auto under Partition Configure to automatically create a single partition that uses all nodes

in the multi-node configuration. 2. Click Create under Partition Configure to manually assign nodes to the partition.Note: Click Redraw to reorder the sequence in which the nodes appear in the diagram on the page. You

can, for example, reorder the diagram to reflect the order in which the nodes are installed in a rack. The nodes are reordered according to the ScaleXpander cabling, with the node that you select in the top position.

Page 103: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 103 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Partition Information Partition ID 1

Click Partition ID to define operation of the partition and view information about the partition. A page similar to the one in the following illustration is displayed.

The following non selectable fields display information about the partition: 1. The Partition Count field displays the number of nodes in the partition.2. The Partition Validity field displays the following status: Valid (which indicates the configuration is

correct). 3. The Partition field displays one of the following statuses: – Stopped: The partition is inactive, and

the nodes can be reassigned to a partition. – Started: The partition is active. – Resetting: The configuration is resetting. – Unknown: The partition contains unidentified port or chassis IDs

a) In the Partition merge timeout minutes field, select the number of minutes POST waits for the scalable nodes to merge resources. The default value is 6 minutes.

b) Allow at least 8 seconds for each GB of memory in the scalable partition. c) In the On merge failure, attempt partial merge? field, select whether POST should attempt a partial

merge if one error is detected during full merge. Yes is the default value. d) In the Memory Mirroring? field, select whether memory mirroring is enabled in all nodes in the

partition. Yes is the default value.e) Click Save.

Page 104: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 104 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Chassis Merge

•In order to merge chassis:-All secondary nodes must contain same core count as primary•Can have different speeds, but not different core count

Notes:In order to merge chassis, all secondary nodes must contain same core count as the primary node. They can have different speeds, but not core count. The screen shows that chassis number 2 processors do not match the primary and the error message appears.

Page 105: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 105 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Chassis Merge

•In order to merge chassis:-Each chassis must have 4 GB of memory installed

Notes:In addition, in order to merge chassis, all chassis must have at least 4 GB of memory installed. The

screen shows the error message if this condition is not met.

Page 106: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 106 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Boot Standalone

•In order to boot standalone:-Cannot press ESC key to bypass merge as USB support is not available at merge•Press Blue Remind button or reconfigure partition information via RSA II interface to force standalone

Notes:Any chassis can boot into standalone mode. You can boot into standalone several different ways. First, since you cannot press ESC key to bypass merge as USB support is not available at merge time, you can press the Blue Remind button. Or you can reconfigure the partition information via RSA II interface to force standalone.

Page 107: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 107 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Boot Standalone

• Once merged, press ESC to force a reboot to standalone mode

Notes:Another way to boot into standalone status is to wait till the chassis merge, then press the ESC key to force a reboot to standalone mode.

Page 108: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 108 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Boot Standalone

Notes:This is a sample Scalable Complex Management screen showing how you would modify the settings to boot into standalone.

Page 109: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 109 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Partition Information Manage

The following non selectable fields display information about the partition: 1. The Partition Count field displays the number of nodes in the partition.2. The Partition Validity field displays the following status: Valid (which indicates the configuration is

correct). 3. The Partition field displays one of the following statuses: – Stopped: The partition is inactive, and

the nodes can be reassigned to a partition. – Started: The partition is active. – Resetting: The configuration is resetting. – Unknown: The partition contains unidentified port or chassis IDs

• In the Partition merge timeout minutes field, select the number of minutes POST waits for the scalable nodes to merge resources. The default value is 6 minutes. Allow at least 8 seconds for each GB of memory in the scalable partition.

• In the On merge failure, attempt partial merge? field, select whether POST should attempt a partial merge if one error is detected during full merge. Yes is the default value. c. In the Memory Mirroring? field, select whether memory mirroring is enabled in all nodes in the partition. Yes is the default value.

• Click Save.

Page 110: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 110 May 2009

XW0001 - Servicing IBM System x Servers – Part II

BIOS Changes

Processors listed by Node

Notes:One of the changes in the System x3950 M2 BIOS screens is that you can now see all the processors in a multi-node complex.

Page 111: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 111 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary

•This topic has enabled you to:-Define the terms scalable, node, complex and partition-Describe the data and system management cabling of a scaled system

-Describe how to use the RSA II WEB interface to configure a scalable partition on an x3950, x3950E and x3950M2

This topic discussed scalability.

Page 112: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 112 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 6 – Dynamic System Analysis

This topic discusses Dynamic System Analysis (DSA) and how it can be used to provide service on high-performance System x servers.

Page 113: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 113 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Objectives

•By the end of this topic, you will be able to:-Describe the functions of Dynamic System Analysis (DSA)

-List the data gathering capabilities of DSA-Describe the DSA package offerings and installation requirements of each

-Describe how Preboot DSA operates on high-performance System x servers

This topic discusses the significant aspects of DSA and what you need to know in order to use it to solve problems.

Page 114: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 114 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Dynamic System Analysis (DSA) Overview

•DSA is an information collection and analysis tool•Used to aid in the diagnosis of system problems•Creates a merged log that includes events from the OS, from the service processor event logs and from any devices that store event or error information-DSA also collects product data from the hardware that is installed in the system where it is available

•The information is collected into a compressed XML file. The file can be sent to IBM Support to assist in finding and resolving problems. In addition, DSA provides a local viewer and can display the contents of the XML file in a Web browser.

Here is a summary of the main characteristics of DSA.

Page 115: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 115 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DSA Data Collection

•DSA collects and analyzes the following:

Dynamic System Analysis (DSA) is a collection of probes that hunt the system for information. It has the capability to plug itself into drivers and firmware to pull logs, then, interprets the information into a useable format. IPMI and RSA drivers must be installed prior to using DSA. If there is no RSA present DSA is able to pull information from the BMC as long as the IPMI mapping layer and driver are installed.

Page 116: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 116 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DSA Packages

•DSA Portable Edition-Runs from the command prompt on a supported system without altering any system files or system settings. It collects system information in sensitive customer environments with only temporary use of system resources.

•DSA Installable Edition -Provides a permanent installation of DSA onto a system. This installation shares a similar command prompt interface with the portable edition. With DSA Installable Edition, you can also get an UpdateXpress comparison analysis.

•DSA Bootable Edition -Bootable Edition executes and starts the collection process. The DSA collection process is completed, and an interactive menu is displayed.

•Preboot DSA-A blend of the diagnostic routines behind the F2 option and the DSA data gathering capabilities

There are several editions of IBM DSA The portable edition runs on a supported system without altering any system files or system settings. No files are installed on the system under investigation.The installable edition installs directly on the system. This edition can be run directly from the console of the system under investigation.DSA is supported on Windows and Linux operating systems. The readme file lists the specific information regarding NOS support and installation instructions for the different NOSes.Running DSA with the default options will create an XML file that can be sent to IBM support. The XML file is stored locally on the system under investigation. Command line switches are used to run DSA in a way that will create the necessary HTML files to read the results locally.Preboot Diagnostics (DSA) is installed on a internal USB key in some of IBM High performance Servers .Preboot DSA is activated by pressing F2 at the BIOS prompt screen. Same procedure we used when entering Diagnostics on the older systemsDSA versions are available for download from the IBM support Web site.Note: Linux Portable and Installable versions are for Linux / VMware. VMware ESX 3.0 users should run the Red Hat 3, 32-bit version of DSA.

Page 117: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 117 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Portable and Installable DSA Prerequisites

•DSA will run without any additional software but may not include all of the available logs without the installation of device drivers

•To read a BMC SEL, the system must have the following device drivers installed and running:-IPMI Device Driver-IPMI Mapping Layer-Note. The installation sequence of these drivers is critical. They MUST be installed in the order shown above

•To read the RSA event log, the RSA driver must be installed

Page 118: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 118 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DSA Comparison Features

•DSA has the ability to compare a report for a system against known firmware and driver levels that are available from IBM

•This feature compares DSA outputs for firmware and device drivers with those found on the UpdateXpress CD-ROM set

•To run the comparison tool, the relevant UpdateXpress CD-ROM must be in the system CD ROM drive

•DSA also has a difference checker•Compares two DSA outputs •Highlights changes

-Firmware versions-Device driver levels-Installed applications and new hardware configurations

DSA has the ability to compare code levels against a set of code levels on an UpdateXpress CD-ROM. This can be useful if code mismatches are suspected to be the cause of problems.DSA can also compare two DSA reports to track changes for two points in time. The difference checker will highlight any significant changes to the system environment.

Page 119: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 119 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Preboot DSA

•Incorporated in x3850 M2 and x3950 M2-Accessed by pressing F2 at boot time

•Options to run diagnostics or enter into DSA data gathering

Preboot DSA is integrated into the System x3850 M2 and x3950 M2.It is accessed via the F2 key sequence when the IBM splash screen loads.Preboot DSA can be accessed if the system reaches state 4 – completion of POST.

Page 120: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 120 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Preboot DSA - Capabilities

• System Data Collection Providers- System Overview

• Mfr, version, prod name, serial no, uuid, critical details- Network Settings

• Hostname, physical network port info, global settings- Hardware Inventory

• Processor, memory, disk info, monitor info, system card info, devices – scsi, usb, optical, other

- PCI Information - Devices, bridges, slots- Firmware/VPD - Network, SP, BIOS, other vpd- SP Configurations

• Settings – general, TCP/IP, SNMP, dial-out, dial-in- LSI Controller

• Controller info, physical & logical drive info- System Management

• Data, logs, Light Path – LED settings- BIST results – RSA, IPMI- Event logs – ASM, IPMI - Merged devices- Memory diagnostics log- DSA Error log

• Diagnostic Tests- Memory Test

• runs in standalone mode- BMC I2C Test- Check Point Panel Test- Optical Test

• Read Error Test• Self Test• Verify Media Installed

- RSA Restart Test- TPM Test- Ethernet Test

• Control Registers• EEPROM• Internal Memory• Interrupt• LEDs• MAC Loopback• PHY Loopback• MII Registers

- Stress Tests• CPU Stress Test• Memory Stress Test

- HDD Test

Her is a summary of the capabilities of preboot DSA.

Page 121: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 121 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Initiating a Preboot DSA Session

•Entering the Preboot DSA environment can take several minutes

Preboot DSA will take up to 10 min to load.

Page 122: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 122 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Memory Tests

•Quick Memory Test main menu selection screen

Notes:By default, you will be taken the Memory Test Main menu screen. Test that can be executed are:• Quick Memory test • Full Memory test• Change OptionsTo exit Memory test and enter DSA from here, you would select ‘Quit to DSA’.

Page 123: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 123 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Entering Preboot DSA

•Select Quit to DSA to enter DSA main menu selection screen

By default, you will be taken the diagnostic menu screen. To run DSA from here, select ‘Quit to DSA’.

Page 124: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 124 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Preboot DSA Command Line

•Preboot DSA offers several options in a command line menu system

Preboot DSA offers a command menu where you have the opportunity to make a selection.• GUI - take you the a graphical environment• CMD - offers various command as an option• COPY - copy DSA results to a removable media• EXIT - exits the program• HELP - is also available

Preboot DSA Command menu .1. COLLECT – collects system information2. VIEW – displays the collected data on a local console in text viewer 3. ENUMTESTS – list available test4. EXECTEST – menu used to select a test to execute5. GETEXTENDEDEDRESULTS – retrieves and displays diagnostic results6. TRANSFER –send s the collected data to IBM support7. QUIT – exits the Preboot DSA• The copy command will be used most by the customers and the field community to capture all the logs to a USB key and then have those logs emailed to

IBM support for analysis• In the lab session of this course you will be running this command to capture the logs and then analyze the data

Page 125: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 125 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Preboot DSA Graphical Interface

•Select GUI to enter the Graphical User Menu

The Preboot DSA graphical interface offers clickable items for system diagnostics, information gathering and help, as well as an exit button.

DSA Diagnostic Tests

Page 126: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 126 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Graphical Diagnostics

•Select ‘Diagnostics’ from the main menu to load the diagnostic tests page

From this page, you can select and run a variety of diagnostic tests on system hardware.

Page 127: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 127 May 2009

XW0001 - Servicing IBM System x Servers – Part II

System Information•Select ‘System Information’ from the main menu to run DSA

•DSA collects a wide range of information from hardware components

Preboot DSA provides the following data in System Information •System configuration•Installed applications and hot fixes•Device drivers and system services•Network interfaces and settings•Hardware inventory including PCI information•Vital Product Data and BIOS and firmware information•Drive health information •LSI, RAID controller configuration•Event logs for ServeRAID controller and service processors

Page 128: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 128 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Scaled System Information Gathering•The Primary nodes Preboot Diagnostic ( DSA) gathers and displaysthe systems that are in the scaled partition.

In a scaled system configuration Preboot Diagnostic (DSA) the primary node will gather system information for all the scaled systems in the partition.

Page 129: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 129 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Two Node Graphical Diagnostics

•In a Scaled configuration the Primary nodes Preboot Diagnostic tests the systems that are scaled.

Two node test

The Preboot Diagnostic on the primary node will test the scaled systems. Pay close attention to the Ethernet test in the screen shot above.

Page 130: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 130 May 2009

XW0001 - Servicing IBM System x Servers – Part II

DSA Automated Report Submission

•Preboot DSA has an option to send data to IBM-This option requires the following:

•Eth0 must be wired to the client’s network on an active port•DHCP lease to Eth0 from an active DHCP server•Client’s approval and network permissions to use the FTP protocol to transfer the files to IBM support (possible firewall issues)

Preboot DSA can automatically transmit the DSA data to IBM support for analysis. Here is a list of requirements that MUST be met in order for this process to be successful.

Page 131: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 131 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary

•This topic has enabled you to:-Describe the functions of Dynamic System Analysis (DSA)

-List the data gathering capabilities of DSA-Describe the DSA package offerings and installation requirements of each

-Describe how Preboot DSA operates on high-performance System x servers

Almost all System x and xSeries servers support some of the more advanced technologies that IBM has designed and developed.This topic discusses these service processor technologies and describes the implications of working with them in the field.

Page 132: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 132 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 7 – Problem Solving

This topic discusses how to solve problems on the System x3859, x3950 M2.

Page 133: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 133 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

•By the end of this topic, you will be able to:-Identify the tools available for problem solving-Identify the sequence in which to use the tools-Identify when the tools can be used and what you can expect to get from the tools

This topic deals with information gathering and analysis.Without information, you can not understand what is wrong and you can not apply solutions.

Page 134: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 134 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Service and Support Tools

•The following tools are available for the 3950 M2:-PDSG-Light Path Diagnostics-Beep codes-POST codes-BMC-RSA*-Preboot DSA*-Adapter BIOS messages-DSA installable/portable*

* Indicates the main focus for our gathering process

The list of tools available for this system is quite large The most important aspect of using the tools is to recognize which tools you should be placing all our trust into. Also, you need to understand when to use them and how to use them. The following pages in this topic will explain all those interactions.

Page 135: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 135 May 2009

XW0001 - Servicing IBM System x Servers – Part II

The Six System States

•The six system states are used as the basis for problem analysis and repair

-Each state offers new information gathering and analysis tools-Each state builds on the last state for tool availability

NOS vendor messagesNOS boot messages‘Blue screen’‘Safe’ mode

5. There is AC and DC power, the system completes POST but the NOS fails to complete loading

PDSGRETAIN tipsF2 Preboot Diagnostics (DSA)

ServeRAID ManagerMegaRAID Storage ManagerF2 Preboot Diagnostics (DSA)

4. There is AC and DC power, the system completes POST but the NOS fails to start loading

PDSGRETAIN tipsIBM support Web siteF2 Preboot Diagnostics (DSA)

Checkpoint codesF1 and F2 (possibly)Beep codesAdapter BIOS msgs (Adaptec, LSI, etc.)

3. There is AC and DC power but the system fails to complete POST

SvcCon, SMBridgeRSA event log

BMCRSALight path

2. There is AC power but no DC output

PDSG/HMMVisual1. There is no AC power

DSADSANOS event logs

6. There is AC and DC power, the system completes POST and the NOS completes loading but stops during operation

Data AnalysisData GatheringSystem State

All IBM System x servers start in a uniform manner. All have a common set of interfaces to advise where in the power-up sequence the server has reached.This chart shows the possible information gathering tools on the left and the possible information analysis tools on the right.All servers are supported by documentation, which forms part of the tool set for both information gathering and information analysis. For example, a Problem Determination and Service Guide (PDSG), contains lists of errors that may occur (information gathering) during POST but also contain probable causes of the error (information analysis).It is also important to realize with the above chart the each state builds on to the previous state. Example in system state two we have most importantly the RSA,but we also have BMC, Light Path and from state one, the PDSG and visual symptoms. So each state builds on the previous and you have those previous states data gathering tools and resources to rely upon.It is important to stress that not all information sources are available in all system states. This page summarizes what tools are available and when.

Page 136: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 136 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Service and Support Tools

What To Expect On A Service Engagement-Customer

•Depending on system state, RSA logs, Preboot DSA logs, DSA logs

-SSR•Preboot DSA and or RSA logs and diagnostic results

-Remote Support Agent •Analysis of DSA/RSA logs for problem isolation and FRU action plans from the analysis by RSA

•Analysis of driver levels, firmware versions, installed service packs•Analysis of installed components to meet ServerProven requirements

Here is a summary of what you can expect to see when engaged on a service call with this system.Note. Available data sources will depend on the system state.

Page 137: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 137 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RSA

•RSA II is standard in all x3950, x3850 M2 and 3950 M2 systems

•Light Path Diagnostics is driven by the BMC; the BMC reports to the RSA

•RSA is the primary hardware tool as it interprets the BMC logs into action plans

•If RSA does not report a problem that the BMC sees, then that is a defect which needs to be addressed-It is still important to view and capture all the sources of data input and then compare that input

-If everything is working as designed, the RSA will have the source of the problem and the plan of action

The RSA adapter is alive from system state 1 to system state 6 and is available to log into without any interruption to the customer or OS environment. As you will see in the following pages, DSA in all versions from Preboot to installable will capture the RSA logs and data into its logs to report findings.The RSA in this system is similar to all previous systems. Logon and information capture is the same as before.

Page 138: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 138 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Data Gathering Sources

•Three data gathering sources available:-RSA with the ability to logon from state 1-6 and view and safe logs for transmittals

-Preboot DSA with the ability to gather data from system state 3-5•Captures RSA data plus more

-DSA for system state six•In all instances of DSA the RSA data, BMC data plus more is collected if the IPMI drivers are installed

•DSA installable also captures all driver versions, applications info, services running and OS logs

For DSA installable and portable, the customer must install the drivers prior to running DSA for RSA data.

Page 139: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 139 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Concerns and Issues

•Preboot DSA has an option to send data to IBM-This option REQUIRES the following:

•Eth0 must be wired to the client’s network on an active port•DHCP lease to Eth0 from an active DHCP server•Client’s approval and network permissions to use the FTP protocol to transfer the files to IBM support (possible firewall issues)

Preboot DSA can automatically transmit the DSA data to IBM support for analysis. Here is a list of requirements that MUST be met in order for this process to be successful.

Page 140: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 140 May 2009

XW0001 - Servicing IBM System x Servers – Part II

SVCCon and SMBrige

•Both tools are available for the x460 and x3950M2-The only use for these tools is for a client or SSR to clear the BMC log without having to change the system state

Although a BMC gathers an even log, as the system has an RSA II as standard, the RSA event log is the preferred log.However, the system information light will be illuminated if the BMC log reaches 75% full.Following any service activity, use either SVCCon or SMBridge to clear the BMC log in readiness for any future problems and log reporting.

Page 141: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 141 May 2009

XW0001 - Servicing IBM System x Servers – Part II

CP Codes on Light Path Card

•The client will now see the CP (checkpoint) codes from the Light Path Diagnostic panel-CP codes are not documented in the PDSG

•Explain to the client that this is a service only display used only by support personnel

-The BMC recordsCP codes and the RSA displays them in the log•This only occurs if the system is connected to AC for a minimum of two minutes before the power on button is pressed (to give the BMC/RSA2 time to boot/communicate)

When a system is connected for the first time to AC, the BMC will take up to two minutes to initialize internally, until this is complete the BMC cannot communicate to the RSA and the RSA will not be able to capture any power on failures and/or CP codes.

Page 142: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 142 May 2009

XW0001 - Servicing IBM System x Servers – Part II

RETAIN

•As with any new product announcement it is extremely important to search/query RETAIN for any tips that match the symptoms displayed-In some cases, not all features are available from the initial product release but are added to the system after product GA (General Availability) date.•Published capabilities are contained in the announcement letters•Review RETAIN for those features that are not enabled yet

New products, as they are released, may not have all of their possible features available on GA date. The “announcement letter” for the product will list all of the features that are supported at GA, as well as a prediction of when new features will be forth coming.

The RETAIN tip database will contain up to date information on the status of new features in the product.

Page 143: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 143 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Summary

•This topic has enabled you to:-Identify the tools available for problem solving-Identify the sequence in which to use the tools-Identify when the tools can be used and what you can expect to get from the tools

This topic has identified the support tools available on the System x3859, x3950 M2.

Page 144: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 144 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic 8 – Support References

This topic discusses where to go for help once this course is finished.

Page 145: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 145 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Topic Objectives

•At the end of this topic, you will be able to:-Identify documentation resources available to support the servers discussed in this class

-Identify the support web sites for the units and what they offer

Support information can take many forms. Here, we will discuss the key information sources for these systems and how to access them.

Page 146: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 146 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Documentation

•System documentation (User’s guide, installation guide, etc.)-Useful for confirming shipping group contents (missing parts, etc.) and initial customer setup

•Problem Determination and Service Guide (PDSG)-Available electronically (Adobe Acrobat PDF format) from the IBM support web site or on the Service Update CD-ROM

-Primary support document for diagnostics and troubleshooting

The system documentation, which ships with every new system may also prove useful for verifying the basic setup of the server or I//O expansion drawer. As many of the components of modern servers are customer replaceable units (CRUs) as well as FRUs, some setup instructions are contained in the system manuals. If you are called to a newly installed server, you will want to verify that the customer has, in fact, correctly installed everything.

The Problem Determination and Service Guide (PDSG) (formerly known as the Hardware Maintenance Manual (HMM) is the primary reference document for the systems covered in this course. All PDSG/HMMs are now available electronically in Adobe Acrobat Portable Document Format (PDF). The PDSG contains all the disassembly and reassembly steps, beep codes and error descriptions to assist you in isolating a failed FRU or FRUs.

You will need Adobe Acrobat Reader version 4 or higher to view the contents properly as this is the minimum supported revision of the reader.

Page 147: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 147 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Server Support Web Site

•Central Support site for all products• http://www.ibm.com/jct01004c/systems/support/supportsite.wss/brandmain?brandind=5000008

IBM has launched a new central support site for all products. The address is listed above. It should be noted that web addresses change from time to time. In future, this web address may change but IBM normally links older web addresses to the new address for several months at least after the old site closes. If you bookmark this site in your browser, be sure to maintain your bookmarks as site addresses change.The navigation bar on the left provides the main topics available on the web site.

Page 148: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 148 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Software and Device Drivers

•Central site for downloading software files • http://www.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-

4JTS2T&brandind=5000020

Software and Device Drivers – IBM System x • provides easy/quick access to

– the wide range of firmware updates– as well as the software/device drivers for supported operating systems – for each System x server, BladeCenter or Storage Enclosure.

If you are an authorized servicer, there is also a dealer support site, with a nice collection of some of the more popular links for each product. (e.g. https://www-304.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=SERV-OPTN&brandind=5000008#x460 )

Page 149: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 149 May 2009

XW0001 - Servicing IBM System x Servers – Part II

ServerProven Web Site

•Reference site for device compatibility• http://www.ibm.com/jct09002c/isv/eserver/serverproven/index.html

Whilst IBM extensively tests third party hardware and software and, in many cases, approves them for use with System x servers, not all devices or combinations of devices are tested/supported. If you are working with a server which contains third party devices, you can check for compatibility here.You may find assistance which is not contained in the primary documentation here which can help you to isolate a fault.

Page 150: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 150 May 2009

XW0001 - Servicing IBM System x Servers – Part II

System x Support Repository

•Server Support site w/information and photographs •https://www.ibm.com/systems/support/reflib/

This site is the central repository for a collection of information and photographs of many IBM System x, BladeCenter, eServer, and xSeries Servers intended for support personnel.

(Note: This site and the subsequent one is NOT for the full list of IBM products and was often put together from the documents that the education group provided updated training materials on.)

Page 151: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 151 May 2009

XW0001 - Servicing IBM System x Servers – Part II

IBM Server - Bios Simulators

•Reference site for Bios Simulators•https://www.ibm.com/systems/support/reflib/simulators/

Many times, the support people do not have immediate physical access to the machine that someone is asking for help with. These pages contain one of the ship level BIOS files with a simulator that shows how many of the System x, BladeCenter, eServer, xSeries machines can be configured.

The simulator shows screens similar to the ones that the customer would use to configure their server after pressing F1 during the system boot. (e.g. The Bios level may be different between the simulator version and the one installed on the customer’s machine.) We have also included an Options simulator for the BladeCenter management module, and numerous adapters

(Note: as of this writing, several servers are still missing from the entire support matrix.)

Page 152: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 152 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Configuration Tools Website

•This site contains links to COG, xRef and other helpful configuration tools

•http://www.ibm.com/systems/x/hardware/configtools.html

This Web site contains links, descriptions of several Configuration tools. (Note: While these pages are intended for pre-sale support, they are often useful for Business Partner, and in a Service/Post sale environment.• The COG contains general information about IBM products and supported options for currently

shipping equipment (updated each month)• The xRef documents provide a brief technical overview of each of the servers in the System

x/BladeCenter , Intellistations, and withdrawn systems. (e.g. past servers are removed from the originals and made available in the withdrawn systems xRef)

• Other Configuration tools deal with BladeCenter Interoperability, Rack Configuration, and Power / Equipment sizings.

Page 153: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 153 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Summary

•This topic has enabled you to:-Identify documentation resources available to support the servers discussed in this class

-Identify the support web sites for the units and what they offer

This topic has discussed several helpful support Sites for configuring, maintaining, and troubleshooting IBM Servers.

Page 154: Servicing IBM Systems x Servers II - Study Guide

smpdr3.13-xw0001.pdf 154 May 2009

XW0001 - Servicing IBM System x Servers – Part II

Course Summary

•This course has enabled you to:-Identify the serviceability features of System x high-performance servers

-Describe the advanced technologies used in System x servers and their service implications

-Describe the management characteristics of System x servers

-Perform a series of setup, configuration and troubleshooting tasks on System x servers and associated peripherals

This course is now complete. Thank you for attending. System x and BladeCenter Service and Support Education hopes you have enjoyed it and found it both interesting and valuable to your job.If you have any comments or suggestions regarding this education, please let your instructor know and s/he will pass them on to the education development teams. We ALWAYS act on comments and suggestions as we constantly seek to improve our education offerings.