43
EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 1 Section Five - General Troubleshooting / Navisphere NOTICE: This document contains sensitive technical information which is for use solely by EMC employees and authorized service partners of EMC Corporation. Any use, duplication or distribution outside the Corporation is strictly prohibited.

5 General TS and Navisphere(Important)

  • Upload
    c193402

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 1

Section Five - General Troubleshooting / Navisphere NOTICE: This document contains sensitive technical information which is for use solely by EMC employees and authorized service partners of EMC Corporation. Any use, duplication or distribution outside the Corporation is strictly prohibited.

Page 2: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 2

Architectural differences To begin any troubleshooting process, one must understand the product they are working on. The purpose of this section is to provide fundamentals and better understanding. FC4700 SP-based hard drive There is a 6GB hard drive resident on each storage processor. This storage area is unseen by users; end-user data is never stored here; service personnel may access it via SymmRemote. This drive is not a field replaceable unit. A failure of this drive will require the entire SP to be replaced. This is the same policy CLARiiON has always maintained: if any component of the SP should fail, the entire SP would be replaced. The operating system of the SP resides on this drive and with it the services and layered drivers that comprise the FC4700 software stack.

IDE drive – picture of FC4700 SP showing IDE drive

PSM – Persistent Storage Manager The Persistent Storage Manager (PSM) is a hidden LUN that records configuration information specific to the CLARiiON’s environment on disk. This PSM LUN is what allows an SP to be replaced and come up running the correct software with the correct information on hosts, LUNs, storage groups, etc.

hero.cai
高亮
hero.cai
高亮
Page 3: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 3

Both SPs access a single PSM so that their environmental records are always in sync. If one SP needs to be replaced, the new one can find the unique environmental information on the PSM. If one SP receives new configuration information, that data is written to the PSM and the other SP instantaneously updates itself. The PSM is created at initialization of the array via Navisphere, and currently occupies 512MB. Upon managing an array that does not have a PSM, the Navisphere client software (Navisphere Manager or NaviCLI) will warn the user that the array is currently in an un-initialized state, and allow the user to perform the initialization. Once created, destruction of the PSM will result in loss of all host information on the array. This is why the installer must determine the types of RAID groups that the end-user will employ. For example: if you’ve made a five-drive RAID5 group (the default PSM setting) the PSM LUN the customer will be forced to use those disks as RAID5. Assume we’re using 18GB drives… that’s approximately 90GB of raw storage that customer most likely will want to use. The PSM will occupy less than 1% of that RAID group. Make sure the customer can use the RAID type selected for the PSM LUN. Note also that the LUs selected for inclusion in the PSM Raid Group should NOT be subject to heavy I/O due to performance reasons. The PSM is used by non-disruptive upgrade to store new and previous version driver software during the installation process. This allows the installation to occur within the array – removing the issues associated with a host’s failure or a lost connection during installation. After installation, the drivers are “run from cache” on the SP-based hard drive for better response time to the OS. The software packages for the current & previous versions for each component are stored in PSM. That way, when an SP of arbitrary software revision, with an arbitrary set of layered drivers, is inserted, the array software can install the currently valid set on that SP. What host information is stored in the PSM? The security provided by FC4700’s PSM has been featured prominently as an important step forward: it moves critical host configuration data off the host’s agent.config file and on to a RAID protected hidden LUN on the array. It allows hosts to be taken off/on line and the host can regain access to the storage. But just what is being stored there? • Drive mapping

The drive letter in Windows or device name in Unix that the OS has assigned to a particular LUN will be noted by the host agent and pushed to the array. This information is determined dynamically by the HOST agent, and is reported to any clients. This is why a user must manage the host agents for hosts attached to FC4700s, in order to get this mapping information. • Host information The host agent reports: hostname, OS, version of ATF, versions of Agent, IP address.

hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
Page 4: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 4

• Privileged users Only users listed in the host’s agent.config file may manage that host. This is the prime host-based security available in the Navisphere environment. • Polling rates • All AccessLogix Host information Initiator records (associating a host name with an HBA WWN), Storage group mapping. The association between HBA and the hostname is collected by the array agent and stored in the PSM. This association is used by AccessLogix to ensure the host(s) assigned to a particular storage sees only the storage groups assigned to it – and also ensures that unauthorized hosts do not see into other groups. What array data is stored in the PSM? In addition to host information, the PSM stores the following array agent information in the PSM. • All AccessLogix information Storage groups, current default storage group, physical array (private/public LUs), the user-defined name of the array • SP IP address One of the first steps in a FC4700 installation is to use a serial connection to gain PPP access to the SP. Then you may set the IP address and complete the installation from a remote Management station connected via the LAN. • Privileged users (array) SP authorized users. At initialization, anyone able to access the SP may configure it. After the first privileged user is entered the SP becomes secure and allows only users from the privileged users list to modify the configuration. • ALPA A prerequisite for remote mirroring is that the Arbitrated Loop Physical Addresses (AL_PAs) for each SP must be unique. Vault – private space layout The first nine drives in the DPE have space set aside to accommodate cache de-staging in the event of a component failure in the write caching subsystem. This allows for an orderly way to protect the user data in memory. These drives are configured into a nine-drive RAID3 group. Note: A faulted condition in the DPE will automatically disable write caching. CLARiiONs have Standby Power Supplies, designed to maintain power to the DPE long enough to allow data stored in memory to be securely written to disk (the vault drives), before the system powers off.

Page 5: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 5

Database drives The database drives hold the information that the Core operating system that is running the storage processor needs in order to track array-specific data on the: • LUNs • RAID groups • SP’s PROM code and BIOS • Chassis ID of the array The space used by the database is trivial, no larger than one MB. It is triple mirrored between the first three drives in the DPE.

FC4700 Private Space Layout

CX Series fibre boot - Boot from fibre, picture of SP, note there is no on-board disk.

Page 6: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 6

The CX600 is a fibre boot based storage processor. The boot image exists on fibre channel drives in the first DAE2 chassis, also referred to as the DAE2 O/S. The PSM lun functions in similar fashion to the FC4700. The PSM LUN is integrated and hidden in the CX600; unlike the FC4700, PSM LUN configuration is not required. The differences between PSM usages will not be discussed in this document. The partitioning of the disk drives in this first chassis is shown below. Please note that the size and usage of the partitions changed slightly between pre-Release 11 software and Release 11 going forward. It is important to remember this as once you’ve committed to the new software, there is no going back. Data Directory Boot Service – 2 MB all disks in array – Fixed space for boot service Data Directory – 2 MB all disks in array - Each disk contains a data directory that maintains a map of the database entries for that disk ‘Flare’ Database – 28.3 MB all disks in array – The traditional database is triple mirrored on drives 0, 1 & 2. This area is used in other drives for FRU signature, clean/dirty flags, HW/FRU verify, etc. and a large ‘reserved for future use’ area. External Database – 35 MB drives 0, 1, & 2 – Contains persistent information outside the purview of ‘Flare’ such as: BIOS code image, PROM code image, Chameleon Kernel software, Chameleon volume manager, and Chameleon file system database. NT Boot Partitions – 2826.2 MB drives 0, 1, 2, & 3 - Each SP will have a mirrored NT boot partition. SPA will use drives 0 & 2, SPB will use drives 1 & 3. Reserved Space – 300 MB – Set aside for future NT growth. PSM – 1024 MB drives 0, 1 & 2 – Triple mirrored private LUN for storage of persistent SP data. Vault – 2176 MB drives 0 through 4 – RAID 4+1 area used for vaulting cache data in power fail emergency. Core Dump Partition – 1 GB disk 4 – reserved for Chameleon II NAS software core dumps. Total private space drives 0 – 4 = 6393.5 MB

Page 7: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 7

CX Series Private Space – Release 10 & Prior

(NOTE: Not drawn to scale)

0 1 2 3 4

User Space

Data Directory Boot Service (2MB/disk)

‘Flare’ Db (28.3MB/disk)

External Db (35MB/disk)

PSM (1024MB)

Data Directory (2MB/disk) FRU Signature (28.3MB/disk)

SPA NT Boot Primary

(2826.2MB)

SPB NT Boot Primary

(2826.2MB)

SPA NT Boot

Secondary(2826.2MB)

SPB NT Boot

Secondary(2826.2MB)

N/U N/U

N/U N/U

NAS Core Dump Area

1GB

Vault Area (2176MB)

Reserve Area

CX Series Private Space – Release 11

After Utility Partition NDU and first Utility Partition boot.

(NOTE: Not drawn to scale)

0 1 2 3 4 5 –> end of array

User Space

Data Directory Boot Service (2MB/disk)

‘Flare’ Db (28.3MB/disk)

External Db (35MB/disk)

PSM (1024MB/disk)

Data Directory (2MB/disk) FRU Signature (28.3MB/disk)

SPA NT Boot Primary

(2826.2MB)

SPB NT Boot Primary

(2826.2MB)

SPA NT Boot

Secondary(2826.2MB)

SPB NT Boot

Secondary(2826.2MB)

N/U N/U

N/U N/U

NAS Core Dump Area

1GB

Vault Area (2176MB or 3200mb w/Release 12)

Reserve Area (100MB/disk)

1GB

1GB Image Repository

SPB Utility Pri (200MB)

SPB Utility Sec (200MB)

SPA Utility Pri (200MB)

SPA Utility Sec (200MB)

Page 8: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 8

What is the difference between an SPE (CX600) and a DPE (FC4700)? FC4700 – has an OS based on NT, which resides in an onboard IDE drive. The PSM is a hidden LUN that is on a Raid Group selected during the initialization process. Note that the Raid Group is out on the fibre channel drives, separate from the SP-IDE drive. If an SP is replaced, a process call newSP will run and allow the new SP to get its software packages from the PSM. Both SPs have access to this single LUN and will always keep their environmental records in sync. The PSM can exist on as few as two drives and as many as 10 drives. As noted before is that the Vault area is on the first nine drives of the DPE Chassis and the Data Base Drives are a triple mirror on the first three drives. FC4700 Array – back view

SSPPSS

DDPPEE

DDAAEE

Page 9: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 9

CX600, CX400 and CX200 – The CX series arrays are a fabric boot based SP. The NT boot image exists on fibre channel drives in the first DAE. This first DAE, Bus 0 is also known as the DAEOS chassis. The PSM and Vault areas are also part of a private area that is reserved on the first five disk drives. The Data Base area is still a triple mirror but is now part of the private area. See page seven for more information on which disks contain the above named areas. CX600 Array – back view

CX600 improvements over FC4700 The CX600 array provides for enhanced Storage Processors which consist of a motherboard with two Pentium 4 processors and a minimum of 2 GB of cache memory. There is an option of 2 GB of additional memory, but it is not field-upgradeable. For the additional 2 GB of cache upon SP replacement, the DIMMs are ordered separately. The SAN personality card consists of four fibre optic connections. Other items of interest include:

Drives per Storage System - 240 Drive Cache Vault - 5 Maximum LUN Counts - See Primus article emc70491 for details Max LUN Size - 2TB

Max RAID Groups - 240 Array Boots from first DAE2 (DAE2 O/S) chassis and contains a factory bound PSM LUN

SSPPSS

SSPPEE

DDAAEE22 OO//SS

Page 10: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 10

Navisphere Block diagram and data flow

The above diagram is a reminder of your previous CLARiiON training. It shows how the legacy arrays were managed over the fibre channel. Starting with the FC4700, management was taken from being host based to being array based. The Navisphere Agent was moved down into the array with management occurring over the IP network. There still remains a host agent which is used to register the host HBAs with the array. It is also used to provide file system information to the LUN listing within Navisphere.

FC4700 & CXArray Agent to Core OS inside stack

IP FC

Navi Manager

FC4700 or CX-series array

SP A SP B SPB

p0

p1

SPAp0

p1

Pre-FC4700 Host Agent to Core OS via in-line fiber

Manager sends commands to agent over IP

Pre-FC4700 storage system

Host

Intranet

Directory

CLA

RiiO

N

Security

Legacy

Futures

Futures

Analyzer

Linux withBrowser

Management Server

Persistence

Solariswith

Browser

Windows2000 withBrowser

NT withBrowser

Management, SnapView, and MirrorView GUI

Page 11: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 11

This illustration above shows how the software components of Navi 6 interact. The cloud shown represents the clients subnet (not the internet yet, as we’re still trying to understand firewall and security issues) and each circle represents one of the four operating systems you can open a browser on in order to connect to the array IP address by which to manage with Navisphere 6. The green box is the array and it contains the providers that process various calls made by the client browser for changes to security, etc. Also within the array are the modules for future support. The user on the NT browser is issuing a request to make a change on the array. The command goes over the blue arrow (the LAN) to the Management Server, which routes the call to the correct provider. The CLARiiON provider then translates the call to the array agent, which passes the command to Core software. CIMOM Architecture (also known as ManagementServer)

The CIMOM is comprised of several layers which include a web server to provide HTTP access, an encoding layer to translate CIM/XML and a CIMOM object manager. The providers are used to collect data and feed that data into CIMOM as well as execute methods. RAID ++ Provider The next page shows us that the Raid++ provider is at the core of the Navisphere CIMOM architecture. It is responsible for handling all Raid specific get/set operations.

Page 12: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 12

Directory Provider

This provider is responsible for caching the list of arrays found on the selected subnets. It will periodically ping arrays in the list to verify there state. It will also maintain the heartbeat connection to all arrays within the management domain. Certain array (or arrays) will be designated as the “directory provider master” to minimize heartbeat pings on the network.

Page 13: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 13

Event Monitor Provider In a centralized notification model, one of the arrays is designated to process and forward critical events via the following mechanisms:

— Modem,Pager or Email — Launch executable

Security Provider This provider is responsible for authenticating users to the array and allowing a user based on a security ID to make requests on objects within the CIMOM.

Page 14: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 14

Admin Provider This provider is responsible for managing all configuration aspects of the Navisphere Manager 6.X infrastructure. This would include a web server, the CIMOM and provider.

Page 15: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 15

Boot issues (array) To troubleshoot a boot issue effectively, one must understand some of the basics of the boot process. From the point of power up to the operating system boot sequence and finally to when the array is ready to process host I/O. The storage processor (SP) has an operating system and other software components which replace FLARE as the sole base code. Under this base operating system reside Layered Drivers which are software components which provide storage-oriented functionality. Being such we have to go through a boot process that is very similar to a standard NT server boot sequence. What follows is a description of the boot process from power up. You will be able to see the BIOS portion of the boot but not the actual NT process. The portions of the NT boot sequence will be visible in SP event log. Local or FC Disk/Booting When Windows/NT is booted, the BIOS finds a disk based on a search pattern in NVRAM. The disk is assumed to be partitioned with a FAT or NTFS file system on the first partition. In the root directory there is a file called “boot.ini” which is read to determine which partition to actually boot from. The kernel is then loaded and the file system in that partition it mounted. That partition must contain a paging file, along with other files. A “normal” NT Workstation install takes 200-300MB of disk space. Before we ever get to the NT boot sequence, we first must look at the BIOS boot sequence. Here is a power up of SPA; messages seen are similar to the following as viewed from a hyperterminal connection. Phoenix ServerBIOS 3 Release 6.0. Copyright 1985-2001 Phoenix Technologies Ltd. All Rights Reserved Copyright 1999-2002 by EMC Corporation, All Rights Reserved. EMC BIOS Release 3.26 CPU = 2 Intel(R) XEON(TM) CPU 2.00GHz 637K System RAM Passed 173M Extended RAM Passed Press <F2> to enter SETUP Hard Disk 0 : None Hard Disk 1 : None Hard Disk 2 : None Hard Disk 3 : None Press Any Key to Continue PhoenixBIOS Setup Utility CPU Type : Intel(R) XEON(TM) System ROMz : E9D9 - FFFF CPU Speed : 2000 MHz BIOS Date : 05/22/03 System Memory : 640 KB COM Ports : 03F8 02F8 0300 0308 Extended Memory : 2096128 KB LPT Ports : 03BC Shadow Ram : 384 KB

Page 16: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 16

Display Type : EGA \ VGA Cache Ram : 512 KB PS/2 Mouse : Not Installed Hard Disk 0 : None Hard Disk 1 : None Hard Disk 2 : None Hard Disk 3 : None Copyright (c) EMC Corporation , 2003 <- This is the start of FLARE Disk Array Subsystem Controller Model: CX600 DiagName: Extended POST DiagRev: Rev. 02.99 Build Date: Tue Jul 22 14:45:46 2003 StartTime: 10/20/2003 21:16:18 SaSerialNo: LKE00022706003 __ FLARE post testing, hit ESC at any | point here to enter debug mode. V AabcdeBCDabEabcdFGHabIabcJabcKabcLabcMabcNabOabPabQabRabSabTabUabVabWabXYZ Initializing back end FIBRE... PCI Config Reg: 2.4.1 0x0157 FCDMTL 0 [2.4.1] Dual Mode Fibre init - OSW DB PTR 0x20000000 FCDMTL 0 [2.4.1] Cached memory - 0xF77B9 bytes @ 0x200006B0 FCDMTL 0 [2.4.1] Noncached memory - 0xC037F bytes @ 0x200F7E69 (0x200F7E69 phys) FCDMTL 0 [2.4.1] DVM Initialized FCDMTL 0 [2.4.1] IMQ base ptr = 20170000; IMQ length = 8000 Dualmode fibre init completed FCDMTL 0 [2.4.1] TPM Notify: st=0xA000000, flg=0x4, cmd=0x1 FCDMTL 0 [2.4.1] TPM Hndle API Event: cntx=0x200004C4, evnt=0x4002, info=0x0 FCDMTL 0 [2.4.1] TPM Lnk Up: state=0xA000000, flg=0x84 Link Event: 0x00030005 FCDMTL 0 [2.4.1] DVM Duplicate address id already in list: EF FCDMTL 0 [2.4.1] DVM Duplicate address id already in list: E4 FCDMTL 0 [2.4.1] DVM Duplicate address id already in list: E1 FCDMTL 0 [2.4.1] DVM Duplicate address id already in list: E8 FCDMTL 0 [2.4.1] DVM Duplicate address id already in list: E2 Device Event (0xE4): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE1): 0x00030012, tach_ptr: 0x08491854 Device Event (0xEF): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE8): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE2): 0x00030012, tach_ptr: 0x08491854 DL waited 1s for discovery Target 0 is online Target 1 is online Target 2 is online Target 3 is online Target 4 is online Relocating Data Directory Boot Service (DDBS)... Autoflash POST? POST/DIAG image located at sector LBA 0x00012048 Autoflash BIOS? BIOS image located at sector LBA 0x00011048 EndTime: 10/20/2003 21:16:52

Page 17: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 17

int13 - RESET (1) <- System BIOS using int13 reads the master boot record (MBR) and loads it into memory. The system BIOS then transfers the execution of the startup process to the MBR. After the MBR loads a copy of the active partition's boot sector into memory, the boot sector code starts the operating system as defined by the operating system. DDBS: MDB read from both disks. DDBS: Chassis and disk WWN seeds match. DDBS: First disk is valid for boot. DDBS: Second disk is valid for boot. NT FLARE image (0x00400007) located at sector LBA 0x0002284B Disk Set: 0 2 <- Found boot location Total Sectors: 0x005821A1 <- Boot disk drive 0_0_0 information Relative Sectors: 0x0000003F Calculated mirror drive geometry: Sectors: 63 Heads: 240 Cylinders: 382 Capacity: 5775840 sectors Total Sectors: 0x005821A1 <- Boot disk drive 0_0_2 information Relative Sectors: 0x0000003F Calculated mirror drive geometry: Sectors: 63 Heads: 240 Cylinders: 382 Capacity: 5775840 sectors int13 - READ PARAMETERS (19) int13 - READ PARAMETERS (22) int13 - DRIVE TYPE (59) int13 - READ PARAMETERS (60) int13 - DRIVE TYPE (61) Error : Invalid Drive ID - 0x81 int13 - CHECK EXTENSIONS PRESENT (63) int13 - GET DRIVE PARAMETERS (Extended) (64) int13 - READ PARAMETERS (65) int13 - READ PARAMETERS (67) int13 - READ PARAMETERS (1224) int13 - READ PARAMETERS (1263) int13 - READ PARAMETERS (1299) int13 - READ PARAMETERS (1334) int13 - READ PARAMETERS (1372) int13 - READ PARAMETERS (1515) int13 - READ PARAMETERS (1548) int13 - READ PARAMETERS (1582) int13 - READ PARAMETERS (1640) int13 - READ PARAMETERS (1672) int13 - READ PARAMETERS (1744) <- NT load continues and is being handed over to the hba driver. This number shown will not be the same in all cases. What follows is the unseen sequence of an NT boot process.

Page 18: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 18

Two layers of software are interacting during the NT boot process. The first layer to start is the kernel layer. In simplified terms here is what occurs. One, due to NT design, the miniport drivers are first up, but do NOT expose themselves to the fabric until instructed by user-space software. This is so the WWN, which is dependent on the array SSN, can be set. The reboot driver then checks its reboot count (registry). If the counter is >=3, a failure will be reported to the Service Control Manager. No other drivers will be started because they depend upon the reboot driver. This prevents a “bad” component from causing a reboot loop. The drivers dependent upon the miniport driver start up next. These are the scsitarg, CMI and SMIScd drivers. Scsitarg reads miniport WWN from the registry and sets the WWNs of the miniports, which can then be enabled. SCSITarg does not yet allow I/O (it returns ‘busy’). This is followed by drivers dependent upon CMI, which are disktarg, MPS, DLS and Flare. User-space processes are then started next:

• PPP, eventlog, etc. started as part of OS. • KTCons (tracing) is an anomaly: it is not controlled by Governor. • K10Governor starts. It has a list pf processes which will:

o check miniport WWNs vs. Array SSN o Check installed SW, and make sure it all is working o tells scsitarg to “drop the gate” and allow IO o Starts external admin services (Navisphere)

K10Governor (NT Service autostart)

NDUApp 1) Create DeviceMap object, rebuild map report if successful. Set registry flag to IOInhibited if not.

NewSP 1) Check Installed SW vs PSM. 2) Check ArraySSN, use it to generate miniport WWNs. 3) Reboot, if 1 or 2.

Registry: RebootCount Inhibited Mode State Degraded Mode state

DumpMgr1) Look for dump, copy, report.

NDUMon1) Check reboot count in Registry, reset, or set “degraded”. 2) Check “IOInhibited” flag (set by NDUApp), if OK, tell hostside software to allow IO. Else log the failure. 3) Wait for NDU requests.

K10_DGSSP1) Get all events from NVRAM and log in NT Event Log 2) Clear NVRAM event Log 3) Poll with high frequency for new events.

MessageDispatcher1) Ping on MPS channel and wait for peer to respond. 2) After handshake, ping and detect peer death. Set named event.

Navisphere1) Poll Array. Redirector will read “Degraded Mode State” from Registry.

Set by RebootDriver

Page 19: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 19

The basic boot sequence

• REBOOT driver begins and checks the reboot count. • NT EVENT log starts • SPID checks the id of the SP that is booting • NTMIRROR driver begins • DLS driver begins (distributed lock service) • DLU driver begins (disk logical unit) • SCSITARG starts and claims the ports for the FE (frontend) and CMI • CMISCD driver begins • CMI driver establishes contact with its peer • BE (backend) starts • PSM driver begins • DISKTARG begins • SCSITARG activates the TDD (target disk driver) allowing flare to communicate with NT • SAFETYNET starts and it then starts the K10governor • NEWSP begins and runs the ndu sync process • NDUAPP begins • DUMPMANAGER begins • NDUMON begins and will unquiesce the frontend (allow host log in) if all is okay. If not it

will skip the unquiesce if there is a problem. This starts the reboot count and three more reboots will be attempted. On the fourth reboot, the SP will come up in a degraded mode with no drivers started.

• NDUMON also will check the PSM for the ndu-cache-settings • MESSAGE DISPATCHER begins • SCSITARG starts the FE (frontend) if it has received a good status from ndumon • LOCKWATCH begins • KTCONSERVICE starts • K10GOVERNOR process count checked • K10_DGSSP begins • NAVISPHERE AGENT (sp agent) starts

What you will see in the SP event log SP Shutting down Timestamp (1776)The Event log service was stopped. EventLog SP Starting up Timestamp (71200002)Compiled at Aug 19 2003. Reboot Timestamp (1779)Microsoft (R) Windows NT (R) 4.0 1381

Service Pack 5 Uniprocessor Free. EventLog Timestamp (1775)The Event log service was started. EventLog Timestamp (71200006)Current (incremented) reboot count is 1. Reboot Timestamp (71200007)Found package Base02.05.1.40.5.008. Reboot Timestamp (71200003)DriverEntry() returned 0. Reboot Timestamp (71190002)My SP ID is 0x3f23209060010650:0, signature is 0xca4c0. spid Timestamp (71320002)Compiled on Aug 19 2003. SMBus Timestamp (71320003)DriverEntry() returned 0. SMBus Timestamp (7124000f)NT Mirror Driver Compiled on Aug 19 2003 12:21:43 Free (Retail) Build 02_05_08. ntmirror Timestamp (71240014)Creating root partition \Device\Harddisk0\Partition0 P=0 S=2. ntmirror

Page 20: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 20

Timestamp (71240016)Internal information only. Unit State: ENABLED P=READY (3) S=READY (3). ntmirror Timestamp (71240014)Creating root partition \Device\Utility\UtilityPartition1 P=1 S=3. ntmirror Timestamp (71240016)Internal information only.

Unit State: ENABLED P=READY (3) S=READY (3). ntmirror Timestamp (71240010)DriverEntry() exiting with status 0. ntmirror Timestamp (71110002)Compiled on Aug 19 2003 at 11:50:13, Free (Retail) Build. dls Timestamp (71110003)DriverEntry() returned 0. dls Timestamp (71120002)Compiled on Aug 19 2003 at 11:50:24, Free (Retail) Build. dlu Timestamp (71120003)DriverEntry() returned at 0. dlu Timestamp (71170000)ScsiTarg (TCD) starting. scsitarg Timestamp (71170002)TCD0 claimed LogPort 1 for FE. scsitarg Timestamp (71170002)TCD1 claimed LogPort 0 for FE. scsitarg Timestamp (71170002)TCD2 claimed LogPort 3 for CMI. scsitarg Timestamp (71170002)TCD3 claimed LogPort 2 for CMI. scsitarg Timestamp (71230002)Compiled on Aug 19 2003 at 11:48:36, Free (Retail) Build. cmiscd Timestamp (71170003)CMI linked with ScsiTarg. scsitarg Timestamp (71230003)DriverEntry() returned 0. cmiscd Timestamp (3) User configuration data for parameter COM1 overriding firmware configuration data. serial Timestamp (3) User configuration data for parameter COM2 overriding firmware configuration data. serial Timestamp (3) User configuration data for parameter COM3 overriding firmware configuration data. serial Timestamp (3) User configuration data for parameter COM4 overriding firmware configuration data. serial Timestamp (71180002)Calling DriverEntry(). cmi Timestamp (71180003)My SP ID is 3f23209060010650:0. cmi Timestamp (71180004)Heartbeat interval is 10 1/10-second ticks. cmi Timestamp (71180005)Peer SP timeout interval is 100 1/10-second ticks. cmi Timestamp (71180006)Remote SP timeout interval is 100 1/10-second ticks. cmi Timestamp (71180009)CMI Transport Device 0: 0 gate(s) found. cmi Timestamp (71180009)CMI Transport Device 1: 0 gate(s) found. cmi Timestamp SP A (63f) Resume PROM information was read successfully. [0x00] 0 403 Timestamp Enclosure 0 SPS A (698) Battery Testing In Progress [0x00] 0 80 Timestamp (71150005)Read and processed default Persistent Container \Device\CLARiiON_PSM psm Timestamp (71150003)DriverEntry() returned 0. psm Timestamp (71160000)DiskTarg (TDD) starting. disktarg Timestamp (71170003)TDD linked with ScsiTarg. scsitarg Timestamp (71170004)TDD activated with ScsiTarg. scsitarg Timestamp (12f530)Safety net starting SafetyNet Timestamp (12f530)Starting K10Governor SafetyNet Timestamp (1b72)The following boot-start or system-start driver(s) failed to load: atapi Hpt366 Service Control Manager Timestamp (41000000)Starting service: K10Governor K10Governor Timestamp (41000001)K10Monitor process started, executable = K10Monitor. K10Governor Timestamp (71510000)Informational message. File: newSP.cpp Line: 998 Details: Starting. newSP Timestamp (76000001)newSP inhibits I/O. newSP Timestamp (71510000)Informational message. File: K10NDUAdmin.cpp Line: 496 Details: Processing sync NDU Timestamp (71510000)Informational message. File: K10NDUAdmin.cpp Line: 509 Details: Completed sync NDU Timestamp (71510000)Informational message. File: newSP.cpp Line: 1302 Details: Normal Exit. newSP Timestamp (41000002)Starting NduApp Timestamp (40000001)NduApp normal exit. NduApp Timestamp (41000100)DumpManager started DumpManager Timestamp (41000101)No new dump found DumpManager Timestamp (71510000)Informational message. File: NDUmon.cpp Line: 1142 Details: NDUMon starting ndumon Timestamp (71510000)Informational message. File: NDUmon.cpp Line: 1201 Details: SP Unquiesce succeeded ndumon

Page 21: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 21

Timestamp (71510000)Informational message. File: NDUmon.cpp Line: 1263 Details: PSM file ndu-cache-settings does not exist, skipping cache restoration ndumon Timestamp (40000001)Message Dispatcher has started MessageDispatcher Timestamp (71170009)Fibre Channel loop up on logical port 1 scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 0 scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 1 scsitarg Timestamp (71170008)Fibre Channel loop down on logical port 1. scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 1 scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 1 scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 0 scsitarg Timestamp (71170008)Fibre Channel loop down on logical port 0. scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 0 scsitarg Timestamp (71170009)Fibre Channel loop up on logical port 0 scsitarg Timestamp (41000300)LockWatch started LockWatch Timestamp (71214000)ktconsService log: Waiting for signal from the Governor to take ktrace dump. ktconsService Timestamp (41000001)All processes started, process count = 11. K10Governor Timestamp (76000100)K10_DGSSP Starting K10_DGSSP Timestamp (1) Navisphere Agent, version 6.5.0.3.7, has started Navisphere Agent Timestamp (2000)Application Starting Up Timestamp (4700)'10.5.43.206' was managed successfully. Timestamp (4700)'10.5.43.207' was managed successfully. Timestamp (4700)'10.5.43.192' was managed successfully. Timestamp Enclosure 0 SPS A (637) SPS Recharging [0x00] 0 0 Note: There may be other events unrelated to the boot process displayed. The above list is only a sample representation. What you will see in the ktrace_usr file (see sp_collect files) > !ktrace -T -r user rtc_freq 799860000 ti_slot 135; ti_size 4096; ti_cirbuf 0x80baf000; ti_altbuf 0x80baf000 Boot 2003/10/21 07:37:17.187 stamp 0039ccd6a9 DATE: 2003/10/21 07:38:02.565 07:38:02.565 0 81ec72e0 NDU: Found package Navisphere 07:38:02.600 34309 81ec72e0 NDU: Found package SANCopyUI 07:38:02.637 37410 81ec72e0 NDU: Found package SnapCloneProvider 07:38:02.679 42148 81ec72e0 NDU: Found package SnapViewUI 07:38:02.742 62861 81ec72e0 NDU: Exit ToC::Mirror() 0 07:38:02.747 5044 81ec72e0 NDU: Clearing Autorevert Flag 07:38:02.903 155419 81ec72e0 NDU: Synchronizing ToC 07:38:02.904 1754 81ec72e0 NDU: Synchronize complete 07:38:02.905 1093 81ec72e0 NDU: Dropping lock 07:38:02.918 12728 81ec72e0 NDU: SP::sync no reboot required 07:38:02.918 37 81ec72e0 newSP: newSP sync complete 07:38:02.919 367 81ec72e0 newSP: Calling TerminateThread to cancel HangTimer : 3c 07:38:02.919 35 81ec72e0 newSP: Hang Timer canceled 07:38:02.920 919 81ec72e0 newSP: newSP normal exit. 07:38:03.992 1072054 81ec0020 NduApp: FlareData mutex count inc 1 07:38:04.005 13712 81ec0020 NduApp: FlareData mutex count dec 0 07:38:04.006 208 81ec0020 NduApp: FlareData mutex count inc 1 07:38:04.006 57 81ec0020 NduApp: Wait on mutex 07:38:04.006 37 81ec0020 NduApp: Got Devmap mutex 07:38:04.107 100937 81ec0020 NduApp: release mutex 07:38:04.107 41 81ec0020 NduApp: FlareData mutex count dec 0 07:38:07.394 3287056 81ea7480 ndumon: NDUMon starting 07:38:07.396 2646 81ea7480 ndumon: Degraded mode 0 07:38:07.396 128 81ea7480 ndumon: IO Inhibit 0 07:38:07.397 102 81ea7480 ndumon: Disk is partitioned correctly 07:38:07.397 30 81ea7480 ndumon: Checking NDU status 07:38:07.409 12648 81ea7480 ndumon: Scheduling peer sync 07:38:07.409 184 81ea7480 ndumon: Clearing SafeRevision 07:38:07.410 252 81ea7480 ndumon: Unquiescing I/O

Page 22: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 22

07:38:07.413 3566 81ea5020 ndumon: DelaySyncPeer waiting 07:38:07.420 7072 81ea7480 ndumon: Pre-unquiesce device map build 07:38:07.421 601 81ea7480 ndumon: FlareData mutex count inc 1 07:38:07.435 13834 81ea7480 ndumon: FlareData mutex count dec 0 07:38:07.435 205 81ea7480 ndumon: FlareData mutex count inc 1 07:38:07.435 58 81ea7480 ndumon: Wait on mutex 07:38:07.435 37 81ea7480 ndumon: Got Devmap mutex 07:38:07.458 22842 81ea7480 ndumon: release mutex 07:38:07.458 41 81ea7480 ndumon: FlareData mutex count dec 0 07:38:07.461 2763 81ea7480 ndumon: Unquiesce of K10AggDrvAdmin 07:38:07.463 2623 81ea7480 ndumon: Hostside unquiesce 07:38:07.463 163 81ea7480 ndumon: HostAdmin quiesce opcode 0 07:38:07.522 58147 81ea7480 ndumon: SP Unquiesce succeeded 07:38:07.560 38220 81ea7480 ndumon: PSM File OPEN FAILED 0x00000002 ndu-cache-settings 07:38:07.560 77 81ea7480 ndumon: PSM file ndu-cache-settings does not exist, skipping cache restoration 2 07:38:07.561 1310 81ea7480 ndumon: No post command pending 07:38:07.561 30 81ea7480 ndumon: Creating locks 07:38:07.562 820 81ea7480 ndumon: Creating server thread 07:38:07.562 154 81ea7480 ndumon: Wait for Termination Event 07:38:07.563 818 81ea5780 NDU: Starting server loop 07:38:07.874 310963 81ea8d40 MessageDispatch: #THREADI: Entering Run 07:39:07.406 59531896 81ea5020 ndumon: Acquiring operation lock 07:39:07.422 15844 81ea5020 ndumon: Releasing operation lock 07:39:07.422 482 81ea5020 ndumon: Synchronizing SP times 07:39:09.719 2296380 81ea5020 NDU: peer returned 0 07:39:09.719 456 81ea8960 MessageDispatch: #CXN (outg): SendPacket failed: 0x0000006d0 07:39:09.728 9131 81ea5020 NDU: Time difference of 15 seconds is within threshold of 60 seconds 07:39:09.728 89 81ea5020 ndumon: DelaySyncPeer running 07:39:09.728 31 81ea5020 ndumon: Acquiring peer sync lock 07:39:12.280 2551995 81ea5020 NDU: peer returned 0 07:39:12.280 37 81ea5020 ndumon: DSP sync peer returned 0 07:39:12.280 31 81ea5020 ndumon: Releasing peer sync lock 07:39:12.281 211 81ea8960 MessageDispatch: #CXN (outg): SendPacket failed: 0x0000006d1 07:39:12.281 679 81ea5020 ndumon: DelaySyncPeer quitting 08:29:22.700 -1284548591 81db3020 NaviCimom: PSM File OPEN FAILED 0x00000002 PersistenceProviderTOC Note: There may be other events unrelated to the boot process displayed. The above list is only a sample representation.

Page 23: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 23

A few hints for troubleshooting either an FC4700 or a CX-series array. FC4700 - Watch the VGA port output or by attaching to the serial port via Hyperterm. Watch for failures in BIOS or POST. The DPE power up and initialization indicates when ac power is initially applied to a DPE, the disk drives power up and spin up in a specified sequence. The maximum delay is 48 seconds for the last drive to start spinning in a DPE, and 84 seconds for the last drive to start spinning in a DAE. The same delays occur when you insert a drive while a DPE is powered up. Status lights on the DPE and its CRUs indicate error conditions. These lights are visible outside the DPE. Some lights are visible from the front, and some are visible from the back. The check status light is located behind the SP fan pack. It is partially visible from the front if you look between the slats on the front panel. If you have difficulty seeing these lights, simply remove the fan pack cover using appropriate methods described in manuals.

LIGHT QUANTITY COLOR MEANING Enclosure Address 2 Green ON – indicates enclosure address 0 Disk Active 1 per disk

module Green OFF – when module slot is empty or contains a filler

FLASHING – (mostly off) drive is powered up but not spinning; this is a normal part of the spin-up sequence, occurs during the spin-up delay of a disk drive slot. FLASHING - (at a constant rate) when the disk drive is spinning up or spinning down normally. ON - drive is spinning but not handling any I/O activity (the ready state). FLASHING - (mostly on) disk drive is spinning and handling I/O activity.

Disk Check 1 per disk slot Amber ON – disk module is faulty or as an indication to remove the disk module DPE Active 1 Green ON – DPE is powered up DPE Check 1 Amber ON – any fault condition exists. If the fault is not obvious from another

fault light on the front, look at obvious from another fault light on the front, look at the back of the DPE.

SP Fan Pack Check 1 Amber ON - SP fan pack is faulty, not visible with the fan pack cover on. SP Active 1 per SP Green ON – SP is operating normally or flashing when firmware is being loaded SP Check 1 per SP Amber ON – when SP fault condition exists LAN Link 1 per SP Green ON – when there is a valid eth connection LAN Activity 1 per SP Amber BLINKING - blinks during Ethernet activity LCC Active 1 per LCC Green ON – when LCC is powered up LCC Check 1 per LCC Anger ON – when either LCC or FCAL connection is faulty. Power Supply Active 1 per supply Green ON – power supply is operating Power Supply Check 1 per supply Amber ON – power supply is faulty or is not receiving AC line voltage Cooling Check 1 per supply Amber FLASHING – when multiple fans in the drive fan pack are faulty or the

drive fan pack is removed. The DPE powers down the SPs and disk drives when the fault persists for more than about two minutes.

Drive Fan Pack Check 1 per fan pack Yellow ON –a fan the drive fan pack is faulty

If the DPE Check light is on, you should look at the other Check lights to determine which CRU(s) are faulty. If the Check light for a CRU remains on, replace the CRU as soon as possible. If a CRU fails in a DPE, the DPE’s high availability will be compromised until you replace the faulty CRU. The write cache function (if any) will be disabled.

Page 24: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 24

CX Series - Watch the serial port output via Hyperterm since the VGA connection is no longer available. Check logs on the other SP if available for backend issues. Remember that the CX uses the backend fibre to boot the SP. It is important that you do not replace an SP without direction for a boot problem. You may want to consider the cables from the SP to the DAEOS, the cable could be bad. (Check for presence of Amphenol type cables) CX600 Status Indications CX600 Storage Processor (SP) Status Lights

LIGHT QUANTITY COLOR MEANING BE 1, BE 0, AUX 0, AUX 1 Link LEDs

1 per port Green ON – indicates auxiliary or backend activity

LAN Link 1 per LAN port Green ON – when there is a valid Ethernet connection LAN Activity 1 per LAN port Amber FLASHES – indicates LAN activity Power 1 per SP Green ON – indicates +12 volt power Fault 1 per SP Amber Flashing Indications:

Once / 4 seconds – BIOS Activity Once / second – POST Activity Four / second – Booting Steady indicates a fault condition

Link LEDs 0, 1, 2, 3 1 per port Green ON – indicates I/O with the host

Page 25: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 25

CX600 Power Supply Status Lights

LIGHT QUANTITY COLOR MEANING Power Supply Active 1 per supply Green ON – power supply is operating Power Supply Fault 1 per supply Amber ON – when the power supply is faulty or if one of the two is

not receiving ac line voltage. FLASHING - when system has been shut down due to a multiple fan fault or ambient over-temperature.

SPS Active 1 per SPS Green ON - when the SPS is ready and operating normally. Flashes when SPS is re-charging.

SPS On Battery 1 per SPS Amber ON – indicates the AC power line in no longer available and the SPS is supplying DC output power from battery

SPS Replace Battery 1 per SPS Amber ON – indicates the SPS battery pack can no longer support loads. Replace SPS as soon as possible.

SPS Fault 1 per SPS Amber ON - indicates the SPS has an internal fault. Replace the SPS as soon as possible.

CX600 Status Lights

Page 26: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 26

LIGHT QUANTITY COLOR MEANING CX600 Power OK 1 Green ON – indicates the SPE is powered up CX600 System Fault 1 Amber ON – when any fault condition exists, if the fault is not obvious

from another fault light on the front, look at the back.

If the System Fault LED is on, you should look at the other Status LEDs to identify the faulty FRU(s). If the Status LED for a FRU remains on, replace the FRU as soon as possible. If a FRU fails in a CX600 SPE, the write cache function is disabled and high availability is compromised until you replace the faulty FRU. Each fan module includes one amber cooling check (fan fault) LED that indicates a faulty module. These lights, visible with the front bezel removed.

CX400 and CX200 Status Indications See the following manuals; CX400-Series Hardware Reference 014003049-Axx CX200-Series Initialization Guide 014003117-Axx NOTE: For any boot issues or power up issues, do not consider that re-imaging the system is the proper or correct step to take. Consider all other possibilities before performing a re-image of the base operating system.

Page 27: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 27

Utility Partition This is a tool starting at release version 11 code, which is used to re-image SPs, resetting SPs to a “factory fresh” state and for doing conversions. To enter the utility menu, attach a serial cable to the storage processor and make a hyperterminal connection. Reboot the storage processor and when you see the FLARE post testing (ABC…..), strike the ESC key. Flare will stop with an error at which point you will type in DB_key. A diagnostic menu will then appear. Please see EMC document CLAR-PSP-078 “Recovering a Boot Image on a CX System Using Recovery Drives or the CLARiiON Utility Partition” for more detailed information. Diagnostic Menu 1) Reset Controller 3) DDBS Service Sub-Menu 2) Display Warnings/Errors 4) FCC Boot Sub-Menu DDBS Service Sub-Menu 1) Drive Slot ID Check 2) Utility Partition Boot 0) Exit

Which Back End Loop? 0 - BE Loop 0 1 - BE Loop 1 Enter number (0-1) [0]: 0 Initializing back end FIBRE... PCI Config Reg: 2.4.1 0x0157 FCDMTL 0 [2.4.1] Dual Mode Fibre init - OSW DB PTR 0x20000000 FCDMTL 0 [2.4.1] Cached memory - 0xF77B9 bytes @ 0x200006B0 FCDMTL 0 [2.4.1] Noncached memory - 0xC037F bytes @ 0x200F7E69 (0x200F7E69 phys) FCDMTL 0 [2.4.1] DVM Initialized FCDMTL 0 [2.4.1] IMQ base ptr = 20170000; IMQ length = 8000 Dualmode fibre init completed FCDMTL 0 [2.4.1] TPM Notify: st=0xA000000, flg=0x4, cmd=0x1 FCDMTL 0 [2.4.1] TPM Hndle API Event: cntx=0x200004C4, evnt=0x4002, info=0x0 FCDMTL 0 [2.4.1] TPM Lnk Up: state=0xA000000, flg=0x84 Link Event: 0x00030005 "FCDMTL 0 [2.4.1] DVM address IDs will be shown" "Device Events will be shown" "FCDMTL 0 [2.4.1] DVM address IDs will be shown" "Device Events will be shown" "Targets found will be shown and their state" Relocating Data Directory Boot Service (DDBS)... Drive Slot Check Report for Back End Loop 0 ------------------------------------------- LOOP: 0 Summary: Total Disks in the Correct Slots: 30 Total Disks in the WRONG Slots: 0 Total Slots Checked: 30 DDBS Service Sub-Menu 1) Drive Slot ID Check 2) Utility Partition Boot 0) Exit

int13 - RESET (1) Initializing back end FIBRE... PCI Config Reg: 2.4.1 0x0157 FCDMTL 1 [2.4.1] Dual Mode Fibre init - OSW DB PTR 0x20000000

hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
Page 28: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 28

FCDMTL 1 [2.4.1] Cached memory - 0xF77B9 bytes @ 0x200006B0 FCDMTL 1 [2.4.1] Noncached memory - 0xC037F bytes @ 0x200F7E69 (0x200F7E69 phys) FCDMTL 1 [2.4.1] DVM Initialized FCDMTL 1 [2.4.1] IMQ base ptr = 20170000; IMQ length = 8000 Dualmode fibre init completed FCDMTL 1 [2.4.1] TPM Notify: st=0xA000000, flg=0x4, cmd=0x1 FCDMTL 1 [2.4.1] TPM Hndle API Event: cntx=0x200004C4, evnt=0x4002, info=0x0 FCDMTL 1 [2.4.1] TPM Lnk Up: state=0xA000000, flg=0x84 Link Event: 0x00030005 FCDMTL 1 [2.4.1] DVM Duplicate address id already in list: EF FCDMTL 1 [2.4.1] DVM Duplicate address id already in list: E2 FCDMTL 1 [2.4.1] DVM Duplicate address id already in list: E4 FCDMTL 1 [2.4.1] DVM Duplicate address id already in list: E8 FCDMTL 1 [2.4.1] DVM Duplicate address id already in list: E1 Device Event (0xEF): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE2): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE4): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE8): 0x00030012, tach_ptr: 0x08491854 Device Event (0xE1): 0x00030012, tach_ptr: 0x08491854 DL waited 1s for discovery Target 0 is online Target 1 is online Target 2 is online Target 3 is online Target 4 is online Relocating Data Directory Boot Service (DDBS)... DDBS: MDB read from both disks. DDBS: Chassis and disk WWN seeds match. DDBS: First disk is valid for boot. DDBS: Second disk is valid for boot. NT Utility image (0x0040000F) located at sector LBA 0x00BE804C Disk Set: 1 3 Total Sectors: 0x0005FF61 Relative Sectors: 0x0000003F Calculated mirror drive geometry: Sectors: 63 Heads: 240 Cylinders: 26 Capacity: 393120 sectors Total Sectors: 0x0005FF61 Relative Sectors: 0x0000003F Calculated mirror drive geometry: Sectors: 63 Heads: 240 Cylinders: 26 Capacity: 393120 sectors int13 - READ PARAMETERS (19) int13 - READ PARAMETERS (22) int13 - DRIVE TYPE (57) int13 - READ PARAMETERS (58) int13 - DRIVE TYPE (59) Error : Invalid Drive ID - 0x81 -----------------------------this is normal int13 - CHECK EXTENSIONS PRESENT (61)

Page 29: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 29

int13 - GET DRIVE PARAMETERS (Extended) (62) int13 - READ PARAMETERS (63) int13 - READ PARAMETERS (65) int13 - READ PARAMETERS (1130) int13 - READ PARAMETERS (1168) int13 - READ PARAMETERS (1202) int13 - READ PARAMETERS (1236) int13 - READ PARAMETERS (1270) int13 - READ PARAMETERS (1327) int13 - READ PARAMETERS (1359) int13 - READ PARAMETERS (1426) int13 - READ PARAMETERS (1457) int13 - READ PARAMETERS (1474) CLARiiON Utility Toolkit (c) EMC Corporation 2001-2003 All Rights Reserved DiagName: UtilityToolkit DiagRev: 1.04.03 StartTime: 10/12/03 21:32:38 SPID.......................... Running FCDMTL........................ Running NTMIRROR...................... Running ASIDC......................... Running ASIRAMDISK.................... Running ICA........................... Running Connecting to ICA............. Success SP Type....................... CX600 SP ID......................... A Checking Disk 4............... Present Searching for Image RepositoryFound Volume Sizing Image Repository....... 1024 MB Checking Image Repository..... Done Sizing RAM Disk............... 381 MB Checking LAN Port State....... Not Started Checking LAN Port Config...... Not Found Starting FTP Server........... Success Loading Plugins............... Done Finding incompatible images... Done ========================================================= !!! WARNING !!! ========================================================= Installing a Release 11 (02.04.X.XX.X.XXX) or earlier Recovery Image or Conversion Image on an array running Release 12 (02.05.X.XX.X.XXX) or higher Core Array Software will result in permanent, unrecoverable loss of configuration information and customer data. The following images have been automatically removed from this array's Image Repository to prevent accidental installation: SAN_Image-02.04.0.60.5.001.mif (SAN Image 02.04.0.60.5.001) Have you read and understood the warning above? [y/N] : y “Note that N is the default” Checking for Upgrade Wizard...Not Found EndTime: 10/12/03 21:32:51 Press the Enter key to continue

Page 30: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 30

========================================================= CLARiiON Utility Toolkit Main Menu ========================================================= 1) About the Utility Toolkit 2) Reset Storage Processor 3) Wizard Sub-Menu 4) Image Repository Sub-Menu 5) Plugin Sub-Menu 6) Enable LAN Service Port 7) Enable Engineering Mode 8) Install Images Enter Option: ========================================================= CLARiiON Utility Toolkit Image Repository Menu ========================================================= 1) Back to the Main Menu 2) List Image Repository Contents 3) Delete Files from the Image Repository 4) Copy Files from the RAM Disk to the Image Repository 5) Copy Files from the Image Repository to the RAM Disk Enter Option: FCC Boot Sub-Menu 1) Restore Def Port Settings 4) BE1 FCC Boot 2) Display Port Settings 5) AUX0 FCC Boot 3) BE0 FCC Boot 6) AUX1 FCC Boot 0) Exit PORT SETTINGS Port B/E WWN Port WWN Primary Num B/E WWN Node Name::Bus:Dev:Func WWN Secondary Port Settings ------------------------------------------------------------------------------------------------------------------- 000 00000000:00000000 BE0 FCC::02:04:01 00000000:00000000 2Gb, ENA, WWN

00000000:00000000 00000000:00000000 00000000:00000000 00000000:00000000

001 00000000:00000000 BE1 FCC::02:04:00 00000000:00000000 2Gb, ENA, WWN 00000000:00000000 00000000:00000000 00000000:00000000 00000000:00000000

002 00000000:00000000 AUX0 FCC::01:04:01 00000000:00000000 2Gb, ENA, WWN 00000000:00000000 00000000:00000000

00000000:00000000 00000000:00000000 003 00000000:00000000 AUX1 FCC::01:06:01 00000000:00000000 2Gb, ENA, WWN

00000000:00000000 00000000:00000000 00000000:00000000 00000000:00000000 Diagnostic Menu 1) Reset Controller 3) DDBS Service Sub-Menu 2) Display Warnings/Errors 4) FCC Boot Sub-Menu Requesting System Reset Copyright 1985-2001 Phoenix Technologies Ltd. All Rights Reserved Copyright 1999-2002 by EMC Corporation, All Rights Reserved. EMC BIOS Release 3.26 CPU = 2 Intel(R) XEON(TM) CPU 2.00GHz 637K System RAM Passed …”power up messages will continue”….

Page 31: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 31

Unmanaged SP’s Is the customer data still accessible from the hosts?

If above is yes then DO NOT restart the K10 governor or reboot the SP in any way. If the above is true try to use navicli getagent command

See if the cimom is the source of the problem. Try pinging the SP on the customer network. Failing those try to establish a PPP connection to the serial port.

If the customer cannot access data on the fabric then the SP may in fact be hung.

Try the NMI switch to see if a dump can be gotten. Connect to the serial port via Hyperterm to watch for a reboot. If no response from the NMI then a reset on the FC4700 would be in order. On a CX series you will need to reseat the SP. In either case collect all logs to attempt to determine the cause.

If this is the first instance of a hang then keep all information handy for this failure.

If a second hang of the same type, then a SP replacement may be in order.

Page 32: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 32

SP Failures FC4700 Hangs Unmanaged (later)

Hard hang no response from any attempt to communicate (Navi, ping, PPP, NMI)

Misconceptions for SP Failures

NDU Failure that happen when starting with no faults. Unmanaged SP (Almost always) Panics (especially when layered products are involved)

Real SP Failures

Memory errors IDE faults and panics - A message that the internal drive is corrupt is not an IDE failure

and as such is not an SP failure but an NT issue. Boot failures (watch the VGA and serial port during power up to determine the fault)

Page 33: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 33

CX Series Hangs Unmanaged (later)

Hard hang no response from any attempt to communicate (Navi, ping, PPP, NMI)

NMI switch is located at the dot being pointed to by the arrow. Misconceptions for SP Failures

NDU Failure that happen when starting with no faults. Unmanaged SP - (Almost always) Panics - (especially when layered products are involved) SP will not reboot - (Watch the power up via the serial port for actual SP

failures that would require an SP replacement). Real SP Failures

Memory errors See above for the limited boot failures caused by a failed SP.

Almost every instance of a boot failure is caused by a backend failure or some misguided troubleshooting step taken.

Replacing SP’s (Don’t) FC4700 All software needs to be loaded on the new SP by the ndumon process.

This can and will take several reboots. All logs and troubleshooting information will go with the SP that was replaced. Save it in case this information is needed.

If an SP is replaced the SP that was inserted can NEVER be put back into stock. It MUST be returned to the repair center to be reimaged.

CX Series All software and logs remain with the image on the NT drives for the SP.

SP’s that were replaced and then removed can be returned to stock. Panics Check panic against list of known panics to see if a fix has been identified Check Primus and DIMs (internal to EMC only) for any instances of the same panic to see if a solution exists or if the dump requires submittal for further collateral information.

hero.cai
高亮
hero.cai
高亮
Page 34: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 34

NOTE: Always refer to the latest available document.

The goal of this Support Procedure is to reduce the number of CLARiiON Storage Processors replaced unnecessarily. There are numerous failure modes in a CLARiiON array that appear to indicate a faulty Storage Processor. Many of these failure modes related to software faults or other components in the array and may incorrectly appear to be Storage Processor failures. Proper service action requires careful diagnosis before replacing a Storage Processor. If you have any question about the advisability or need to replace a Storage Processor, please contact the Call Center. This document will cover the replacement of an SP in a FC4700 and a CX-series array and will note when there are differences. The table at the end of this document lists several resources that offer direction when deciding if an SP replacement is necessary. When is it not OK to remove or replace a Storage Processor (SP)?

• If the SP considered for replacement is servicing active I/O to a LUN • If the other SP does not appear healthy • If there is a second problem on the array. Navisphere should indicate no problem other

than a single SP failure • If an SP panics and the event log entry indicates an “Internal SW Error” never replace an

SP just because it has had a panic • Simply because of infrequent single bit ECC errors (see Primus case emc65498) • If the SP has been replaced recently for the same or similar symptom, you should

consider whether a second replacement is appropriate • It fails to boot. See clar-psp-093 to determine if there is a problem with the SP or with

the NT Image which the SP is trying to boot. Identifying a BAD SP Unmanaged SP (“U” over a single SP icon) - Regardless of what Navisphere indicates regarding the Storage Processor, you should always verify if the Storage Processor is still handling I/O from attached servers to their LUNs via the Storage Processor in question. Navisphere could indicate that an SP is Unmanaged when it is running or when it has stopped. You must determine if the SP is running I/O or just not currently being managed via Navisphere. 1. Monitor I/O activity of LUNs owned by that SP, from the other SP. 2. Are any LUNs trespassed to the “other” SP 3. Has power path failed over because the server can not use the SP. 4. Can the SP LAN address be pinged from a server on the same Subnet as the Array

Page 35: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 35

If investigation proves that the SP is running I/O but Navisphere can not manage it, there are options other than replacing the SP. Reference Primus emc52543 in addition to this information.

• If the SP does respond to a PING, this mans that the OS is running on the SP. • If a navicli command of any type addressed to the LAN of the SP yield a positive

response, this indicates that the Navisphere Agent on the array is running but he Management Server on the SP may Not be running. Restart the management Server via connection to the /Setup page of the SP serial connection. Example: 192.168.1.1/setup

• If the SP does not answer a Ping but I/O is running through the SP, look for a cable connection where the SP connects to the LAN.

• If the SP does answer a Ping but will not answer to a navicli command this mans that the OS on the SP is running but the SP agent is not. Call EMC/CLARiiON Support.

• If the above step does not work, a reboot of the SP may be required. See Below for SP reboot directions.

When is restarting a Storage Processor recommended before replacement? If an SP appears to be HUNG, it is advisable to attempt to retrieve a “Panic dump from the SP. • FC4700 - The FC4700 has 2 buttons a NMI button which will cause a Panic dump/reboot and a Reset Button accessible though the air-dam. The Reset button will cause just a reboot. • CX array - The CX Series SPs have a RESET Button accessible through the air-dam. This is actually NMI Button which when pushed, will cause a Panic Dump and a reboot. It could take up to 45 minutes for the SP to respond. If a Storage Processor is non-responsive after the use of the switches noted above, it is always advisable to try to restart the Storage Processor before replacing it. • FC4700 & CX array - Never attempt to restart an SP by cycling power or by disconnecting power cords or a cache dirty (data loss) condition may result. • FC4700 - Do not simply RESEAT an FC4700 SP to induce a reboot. The FC4700 has an IDE drive which is VERY susceptible to damage if physically removed from its slot and reseated in order to cause a reboot. For FC4700 Storage Processors (SP) always use the NMI button to attempt to restart the SP. It could take up to 45 minutes for the SP to respond. If the NMI restart does not work try using the Reset button before replacement. • CX array - The CX Series SP can also be Removed and re-inserted to cause a complete power-up and reboot of that SP. When removing an FC4700 SP that will be replaced, special care should be taken to ensure that the IDE drive is not damaged in the process. Press the Reset Switch and wait approximately 10 seconds then remove the SP. Waiting 10 seconds allows time for the heads of the IDE drive to land properly before jerky motion of an SP removal. Waiting too long (1 minute) after hitting reset and the drive heads will be out of the landing area and damage could result if violently moved. • Never Re-use an SP from another FC4700. Once an SP is inserted and a boot sequence has begun, the FC4700 SP has taken on properties of the Array it was plugged into. Any future use in another array is completely unpredictable. An SP that is inserted into a running array must remain in THAT array or be returned to the factory to be re-imaged. See Primus case emc71665

hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
hero.cai
高亮
Page 36: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 36

Page 37: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 37

Commonly used tools ktcons (K10 trace console, k10 is a codename) This is a tool that can be used to examine the KTRACE buffers (engineering level info) on an SP. Information relating directly to flare can be obtained and engineering level commands can be performed. It is accessed and executed by a Symmremote session directly into the SP. This tool can be run remotely to the SP or locally while connected to the storage processor. Caution: must be taken when using this tool. Use under direction of Technical Support only. C:\>ktcons Remote IP address is required USAGE: ktcons -h [-i <invocationType>] [-p tcpPort] [-r remoteHost][-d <bitMask>|s ] [-s <a|d>] [-n] where: -h: Display Help // display this text -i: Invocation Type {l|L|r|R|s|S} // local/remote/service -t: tcpPort // unused port number -r: Remote_HostName // name or IP address -f: sourceFileName // initial source file name -d: Debug Level 1,2,4, s // init/data transfer/timing mask bits; // s - Signal ktcons to take a dump of ktrace buffer -s: Service {a|d} // add/delete service -q: Queue Mode // ktcons starts and runs in queue mode. // When signaled by K10Governor, it takes a dump of // ktrace buffer. -n: // Do not Reconnect, when connection is lost

if -i omitted and running as ktconsService.exe -- run as service (-s) if -i omitted and running as ktcons.exe -- run as remote observer (-r) if -il or -iL specified, run on target as server and observer -t is TCP/IP port number used by KtCon. Server gets default from registry Observer gets value from command line or uses KTCONS_DEF_TCP_PORT -r is TCP/IP address of the server. No default -s valid if run as ktconsService.exe and add/deletes it as a service.

hero.cai
高亮
Page 38: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 38

psmtool (persistent storage manager tool) For accessing information related to the PSM data areas. Information relating directly to flare can be obtained and engineering level commands can be performed. It is accessed and executed by a Symmremote session directly into the SP. Basic commands are list, show and del. Caution: must be taken when using this tool. Use under direction of Technical Support only. C:\>psmtool Usage: psmtool op ... put file dataArea get dataArea file del dataArea list show dataArea status enum layout

Page 39: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 39

flarecons (flare console) This is an internal SP tool available only to EMC personnel that allows engineering access to the fcli (flare cli) prompt. You can obtain information relating directly to flare and perform engineering level commands. It is accessed and executed by a Symmremote session directly into the SP. The command to enter into flarecons will be provided by Technical Support when needed. This tool is used primarily for clearing of resume proms, performing functions on the vault lun, etc.) Caution: must be taken when using this tool. Use under direction of Technical Support only. fcli> ? Notes: command full name/abbreviation - summary clearlog/cl - Destroy contents of RAID storage controller's error log access/acc - access -m [1 | 2] eccerr/ecc - eccerr <-mb> [-bit [all | value]] lrucmd/lru - lrucmd <-r | -w> [offset] [value] getlog/l - returns specified portions of the storage processor unsolicited log getwwn/gw - get current World Wide Name Seed getdropevtcnt/gdec - get drop event messages count getprominfo/gp - Displays the resume Prom information for a particular Device lccupgrade/lcc - Controls and monitors the upgrading of the LCC firmware. lccdebugcmd/ld - Issue a LCC Debug command to simulate faults on the specified enclosure help/? - list all available commands with summary lustat/ls - Logical Unit Status -- summary info for all LU's setcache/c - modify cache configuration and state information setdate/da - set the Storage Processor date and time setdisk/di - Set disk configuration parameters seterr/e - set/display periodic error reporting setunit/u - sets unit parameters not associated with cache spstat/sp - Show summary of various statistics/revisions trespass/tr - trespass zero_disk/zd - Initiate/abort factory-zeroing of disks

Page 40: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 40

Admintool Is an SP resident tool that provides a utility to handle LUs. This tool is primarily used for the clearing of dirty cache. Uses only at the direction of Technical Support. C:\>admintool == Main Menu == 0: Exit 1: Test _____________ 2: Recovery | Selection[0]: 1 | | V == Test Menu == 0: Exit 1: Dump DeviceMap 2: Build DeviceMap 3: Test PSM 4: Dump TransactLog 5: Compare luns of StorageCentric and Flare (N/A) 6: List Raid Groups 7: CMI enumerate arrays 8: List physical arrays Selection[0]: == Main Menu == 0: Exit 1. Test

2: Recovery _____________ Selection[0]: 2 V == Recovery Menu == 0: Exit 1: Clear TransactLog 2: Fix up transaction 3: Test Layered Driver 4: Scrub lu 5: SP Control 6: Fix DeviceMap (N/A) 7: Clear CacheDirty LU 8: Make Flare LUN Public 9: Execute Work List Selection[0]: 5

hero.cai
高亮
hero.cai
高亮
Page 41: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 41

Less commonly used tools ktr (used for obtaining performance information) How to enable and disable host-traffic tracing in SPs

1. Log in to the SP either directly, or using Symm-Remote Client 2. Bring up a DOS command window. 3. Create a separate directory for your tracefiles. While in directory C: give a command

mkdir tracefiles to create C:\tracefiles 4. To enable tracing, enter rba by typing rba 5. At rba’s prompt, enter the following if you want to create a tracefile named “mytrace.ktr” rba> -o \??\C:\tracefiles\mytrace.ktr -r traffic This will open a tracefile in the named path for tracing host-traffic. Note carefully the \??\ at the start of the path. This is necessary because the internal software needs this in order to find the root directory. Also notice the -r traffic at the end of the command. All of the commands you give to rba to control host-traffic tracing, should contain -r traffic. 6. Tracing is now enabled for this SP. From this point onward any host-originated I/Os

done through this SP will result in Trace Records being written to the internal buffers for this file. Each internal buffer is 1 Megabyte long, enough for 32,768 Trace Records. When full the buffer is physically written to the file.

7. Note that you can now quit rba, by issuing a “q” command: rba> q You enter back into rba by typing rba again. Exiting and re-entering rba has no effect on the tracing. If tracing has been enabled, it keeps going until you explicitly disable it as described below.

8. To end your tracing and close the tracing file, usually requires two steps: rba> -f -r traffic rba> -c -r traffic This first command flushes the current (that is, final) one-megabyte buffer, the second command actually closes the file. Note that if you don’t care about the final records, you do not need to give the first command above. 9. At the end of the above, you have a completed file named mytrace.ktr but it is in the

SP’s disk space. To get a copy down to a host computer, you can invoke “File Transfer” from the FILE menu of Symm-Remote Client.

Once the file has been copied as in Step 9 above, you can use the “ktrcutil” utility to examine its contents. This utility is available from the Performance Engineering group and can also extract Trace Records, converting them to the “traditional” trace file format, thereby creating a file for you that can be used with the existing Excel Trace Tools. luntool Is an SP resident tool that provides a utility that operates on LUs and Admin libs. It supports the commands list, add and remove commands. As with many of the internal SP utilities/tools, use only at the direction of Technical Support.

Page 42: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 42

hostconfcli – ability to perform various configuration options C:\>hostconfcli Host Configuration CLI menu

CX Series - Jul 27 2003 0 - Exit. 1 - System Options Menu. 2 - Port Menu. 3 - XLU Menu. 4 - Virtual Array Menu. 5 - Initiator Menu. 6 - Engineering Menu. 7 - Statistics Menu. 8 - HostConfCLI Display Options Menu.

Selection (0 - 8) [0] in decimal:

hfon/hfoff Setting this to ‘hands free off’ will cause the SP to boot without the drivers. You have to set this back to hfon after completing your work as the setting will survive a power cycle. flarestart.bat Used to start the drivers after you have come up in the hfoff mode getspids C:\>getspids K10 -- User-space Message Passing Service (UMps) [Checked (Debug) Build] Compiled: May 15 2003 01:17:26 Array 0 % Success Sec/IO ------------------------------------------------------------------------- * 9203608060010650:0 [0009c773] 9203608060010650:1 [0009b5c5] 100.00% 0.00017

Page 43: 5 General TS and Navisphere(Important)

EMC / CLARiiON Troubleshooting Strictly Confidential General Array Troubleshooting & Navisphere Section Five

Copyright © 2004 EMC Corporation. All rights reserved. Revision A02 43

This page left intentionally blank.

END OF SECTION FIVE