35
CERC SATA Best Practices Reference Guide Authored By: Worldwide Services Team March 2007 rev A00 ____________________ Information in this document is subject to change without notice. © Copyright 2007 Dell Inc. All rights reserved.

CERC Dell Best

Embed Size (px)

Citation preview

Page 1: CERC Dell Best

CERC SATA Best Practices Reference Guide

Authored By: Worldwide Services Team

March 2007 rev A00

____________________

Information in this document is subject to change without notice.

© Copyright 2007 Dell Inc. All rights reserved.

Page 2: CERC Dell Best

Page 3

Reproduction in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden.

THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.

Trademarks used in this text: Dell, the DELL logo, PowerEdge, PowerVault, Precision, and OpenManage are trademarks of Dell Inc.; Microsoft, Windows, Windows NT, and Windows Server are either trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries; Red Hat, Red Hat Enterprise Linux, and Red Hat Linux are registered trademarks of Red Hat, Inc. in the United States and other countries; Novell and Netware are registered trademarks of Novell, Inc., in the United States and other countries.

Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.

Page 3: CERC Dell Best

Page 4

TABLE OF CONTENTS

OBJECTIVE AND SCOPE ................................................................................... 5

SECTION 1: INTRODUCTION ................................................................................................................. 5 CERC SATA 1.5/6ch ................................................................................................................................. 5 CERC SATA 1.5/2s ................................................................................................................................... 6 Unsupported SATA Solutions ................................................................................................................... 6

SECTION 2: OVERVIEW OF STEPS ENSURING RAID BEST PRACTICES .................................. 7 Maintenance of Arrays .............................................................................................................................. 7 Recovery of Arrays .................................................................................................................................... 7 Upgrading and Reconfiguring Arrays ........................................................................................................ 7

SECTION 3: MAINTENANCE OF ARRAYS .......................................................................................... 7 Utilities and Applications Used for Array Maintenance ............................................................................ 7 Definition of Drive and Array Status as Reported in the Controller BIOS ................................................ 9 RAID Support in Linux ........................................................................................................................... 12 Consistency Check of RAID Arrays – CERC SATA 1.5/6ch ................................................................. 14 Background Consistency Check of RAID Arrays – CERC SATA 1.5/6ch ............................................. 15 Consistency Check of RAID Arrays – CERC SATA 1.5/2s ................................................................... 15 Backup and Recovery of Data ................................................................................................................. 17

SECTION 4: RECOVERY OF ARRAYS ................................................................................................ 18 Capture System Logs and Details Surrounding Array Failures to Assist in the Recovery ...................... 18 Write Down the Circumstances or the Exact Steps Performed Preceding the Failure ............................. 19 Understand Possible Causes of Drive Array Failure ............................................................................... 19 Common BIOS Messages ........................................................................................................................ 19 Simple Troubleshooting Steps When a Failure Is Discovered ................................................................. 20 Recovering from Arrays in a Degraded State .......................................................................................... 20 Recovering from Arrays in a Failed State ................................................................................................ 21 CERC SATA 1.5/6ch Array Restoration - <CTRL><R> Enable/Restore RAID .................................... 21 CERC SATA 1.5/2s Array Restoration ................................................................................................... 21 Double Fault Scenario ............................................................................................................................. 22 Rebuilding ............................................................................................................................................... 22 Known Hard Drive Replacement Issues .................................................................................................. 22

SECTION 5: UPGRADING AND RECONFIGURING ARRAYS ....................................................... 23 Array Reconstruction ............................................................................................................................... 23 Capacity Expansion ................................................................................................................................. 23 RAID Level Migration ............................................................................................................................ 25 Past Known Issues ................................................................................................................................... 27

SECTION 6: PERFORMANCE ............................................................................................................... 28

APPENDIX: SATA BEST PRACTICES ............................................................. 30

CERC SATA 1.5/6ch Controller Specifications ....................................................................................... 30 Minimum System Requirements ............................................................................................................. 30

CERC SATA 1.5/2s Controller Specifications ......................................................................................... 31 Setting Up Automated Scheduling of Consistency Checks on Windows Systems .................................. 32

Page 4: CERC Dell Best

Page 5

OBJECTIVE AND SCOPE This document contains the best practices for routine maintenance of systems using the CERC SATA 1.5/6ch and CERC SATA 1.5/2s controllers to handle their RAID needs. This document is not intended for addressing or recommending the type or size of arrays for specific applications. These maintenance best practices are recommended to all Dell™ Enterprise users to avoid failures, downtime, and data loss. These practices will help to ensure a better user experience by maintaining the integrity of data and minimizing cost of downtime. The document covers the following practices:

• Maintenance of arrays • Recovery of arrays • Upgrading and Reconfiguring arrays

SECTION 1: INTRODUCTION The goal of Redundant Array of Independent Disks (RAID) is to provide better performance and/or reliability from combinations of disk drives than the performance provided with non-RAID configurations. Serial ATA (SATA) disks are based on a low-cost technology that replaces Parallel ATA (PATA) disk drives in value servers. Serial ATA incorporates significant technical enhancements over traditional ATA making it ideal for RAID implementations. Along with several configuration benefits, SATA improves data transmissions through a point-to-point topology, which eliminates bus sharing and allows up to a full 1.5 Gb/s bandwidth to each drive. The SATA standard also specifies a power connector that is different from the 4-pin connector used by Parallel ATA (PATA) drives. The larger numbers of pins are used to supply three different voltages if required – 3.3V, 5V and 12V. A key feature supported by some SATA solutions (but not PATA) is also hot-swapping. Dell offers two cost-effective SATA RAID solutions, specifically the CERC SATA 1.5/6ch and the CERC SATA 1.5/2s.

CERC SATA 1.5/6ch The CERC SATA 1.5/6ch is a six-port Serial ATA I/O processor-based RAID controller that supports advanced RAID technology features. The controller’s RAID features include:

• Optimized Disk Utilization – Enables use of the full capacity of all the drives, even if the drive sizes vary.

• Online Capacity/Volume Expansion – Enables capacity expansion of the RAID array

during system operation.

• Online RAID Level Migration – Enables migration between RAID levels without rebuilding the array from scratch.

• Multiple Arrays – Enables the user to create multiple arrays from a single set of drives.

• SATA Disk Hot Plug – The PowerVault 745N storage solution with a CERC SATA

1.5/6Ch supports hot plug hard drives. Hot plug hard drives can be added and removed without shutting down the system.

The CERC SATA 1.5/6ch supports RAID levels 0, 1, 5, 10, and simple volume configurations. It also supports automatic failover, which allows the controller to automatically rebuild an array when a failed array is replaced with a new drive. This feature applies only to fault-tolerant arrays.

Page 5: CERC Dell Best

Page 6

The CERC SATA 1.5/6ch card is offered with Power Edge systems 700, 750, 800, 1800, 830, 850, and 1420SC; Power Vault system 745N; and Precision Workstation systems 470 and 670. Figure 1 is a product image of the CERC SATA 1.5/6ch card:

Figure 1: CERC SATA 1.5/6ch Product Image

For more controller specifications and supported operating systems, please refer to Appendix A in this document.

CERC SATA 1.5/2s The CERC SATA 1.5/2s supports two SATA disk drives and is an integrated software-based RAID implementation. It can be a cost-effective alternative when the more advanced capabilities of a hardware implementation are not needed. The CERC SATA 1.5/2s supports RAID levels 0 and 1 and up to two single configured drives. No other RAID types (for example, 5,10, or 50) are supported by the CERC SATA 1.5/2s. Supported systems include the Power Edge SC420, SC1420, SC1425, 800, and 850; and Precision Workstation systems 470 and 670. The CERC SATA 1.5/2s does not support hot plugging on any of these systems. For more controller specifications and supported operating systems, please refer to Appendix B in this document.

Unsupported SATA Solutions The CERC SATA 1.5/2s cannot coexist with the CERC SATA 1.5/6ch controller. If both are enabled, there might be boot issues. The CERC SATA 1.5/2s must be disabled in the BIOS (System Setup) when using the CERC SATA 1.5/6ch, or the CERC SATA 1.5/6ch must be removed when using the CERC SATA 1.5/2s. With the CERC SATA 1.5/2s BIOS disabled, the attached drives will need to be managed by the CERC SATA 1.5/6ch. Multiple CERC SATA 1.5/6ch cards on a single system is also unsupported. Migrating or upgrading from the CERC SATA 1.5/2s to the CERC SATA 1.5/6ch is not supported. In addition, migrating the CERC SATA 1.5/2s from non-RAID mode (“RAID off”) to RAID mode (“RAID on”) is also not supported.

Page 6: CERC Dell Best

Page 7

SECTION 2: OVERVIEW OF STEPS ENSURING RAID BEST PRACTICES The following is an overview of the steps that can be taken to ensure RAID Best Practices.

Maintenance of Arrays • Run regular consistency checks on the system.

• Perform all recommended driver, firmware, and Storage Management Application updates.

• Monitor System Event Logs and Array Manager Event Logs.

• Establish Best Practices for Backup and Recovery of data.

• Ensure that properly qualified SATA cables are used and that they are not excessively bent.

Recovery of Arrays • Capture system logs and details surrounding array failures to assist in the recovery.

• Write down the exact steps or circumstances that caused the system to get in the failed state

Upgrading and Reconfiguring Arrays • Before any Array Expansion operation, it is advisable to back up all critical data in the

event of an array reconstruction failure. • Proper procedures should be followed when increasing array size depending on whether

an array expansion is done by increasing hard-drive size or by increasing the number of drives.

SECTION 3: MAINTENANCE OF ARRAYS

Utilities and Applications Used for Array Maintenance This section describes the main utilities and applications used to maintain arrays.

BIOS RAID Configuration Utility

The Adaptec BIOS RAID Configuration Utility is an embedded BIOS utility that includes the following:

• CERC Array Configuration Utility – Used to create, configure, and manage arrays. Also used to initialize logical drives and rescan hard drives.

• SATASelect – Used to change device and controller settings. • Disk Utilities – Used to format or verify media.

Note: The CERC SATA 1.5/2s BIOS RAID Configuration Utility only consists of the Array Configuration Utility and the Disk Utilities. It does not contain a SATASelect utility.

Page 7: CERC Dell Best

Page 8

To run the utility, press <Ctrl><A> when prompted by the following message during system startup: "Press <Ctrl><A> for BIOS RAID Configuration Utility".

CERC Array Configuration Utility

The CERC Array Configuration Utility enables the management, creation, and deletion of arrays. It also supports hard-drive initialization and rescan, and hot-spare assignment. The Array Configuration Utility can be used to create a bootable array for the system. It is recommended that the system is configured to boot from an array instead of a single disk, in order to take advantage of the redundancy and performance features of arrays.

Some key points to take note of: • During array creation, there is an option available to enable read and write caching for the

array. When enabled (default setting), maximum performance is seen. However, there is a potential for data loss or corruption during a power failure. Caching should be enabled to optimize performance, unless the user data is highly sensitive, or the user’s application performs completely random reads.

• During array creation, 3 options will be provided – Build, Clear and Quick Init. The Build

operation is a background initialization of a redundant array. The array is accessible throughout. The Clear operation is a foreground initialization of a fault-tolerant array and zeros out all blocks of the array. The array is not accessible until the clear task is complete. With the Quick Init operation, an array is available immediately with no on-going background controller activity. For a RAID 5, write performance is impacted until a Verify with Fix is run on the array.

• When deleting an array, a backup of the data on the array should be done. Deleted arrays

cannot be restored and all the data on the array will be lost. • The CERC SATA 1.5/6ch has a Disk Initialization option, which overwrites the partition table

on the disk and makes any data on the disk inaccessible. If the drive is used in an array, the array may not be able to be used again. A drive that is part of a boot array should not be initialized. (The boot array is the lowest numbered array – normally 00)

• The CERC SATA 1.5/2s has a Configure Drives option. If an installed disk does not appear in

the disk selection list for creating a new array or if it appears grayed out, it will need to be configured before it can be used as part of an array. If a drive is configured, but not made part of a RAID 0 or a RAID 1, it will function as a simple volume. Configuring a single drive overwrites the partition table on the disk and makes any data on the disk inaccessible. If the drive is used in an array, the array may not be able to be used again.

SATA Select Utility

The SATA Select utility allows the device and controller settings to be changed without opening the system or handling the card. With this utility, the Channel Interface Definitions and Device Configuration Options can be modified.

Disk Utilities

With the disk utilities, a low-level format or a verify operation of the hard disks can be done via the Format Disk or Verify Disk Media options. Format Disk is a low-level format of the hard drive that writes zeros to the entire disk. SATA drives are formatted at the factory and do not need to be re-

Page 8: CERC Dell Best

Page 9

formatted. Formatting destroys all the data on the drive. It is recommended that a fully tested backup of all the data that is to be recovered is available before performing the Format Disk option. Verify Disk Media scans the media of a disk drive for defects and any recoverable defects are remapped.

BIOS Event Logs

The BIOS-based event log stores all firmware events (configuration changes, array creation, boot activity etc.). The event log has a fixed size, and once it is full, old events are flushed as new events are stored. The log is also volatile and hence cleared after each system reboot. The BIOS event logs are only available for the CERC SATA 1.5/6ch and not the CERC SATA 1.5/2s.

Definition of Drive and Array Status as Reported in the Controller BIOS

Drive Status

Optimal - An array member disk with this status is in the optimal state and there are no errors detected. For drives that do not belong to an array, this status indicates that they are ready for use.

Rebuilding - A drive with this status is currently in the rebuilding process.

Unable to Access Drive - A drive obtains this status when the controller card is unable to detect any physical connection between the controller and the hard drive. This could occur due to hardware errors on the drive, loose connections between the controller port and the hard drive, or accidental unseating of the drive.

Missing member - A drive obtains this status from the previous Unable to Access Drive status after a rescan or a system reboot is done, during which the bus is rescanned and the configuration is updated to reflect the missing drive.

Grayed out - A drive with this status is one that used to be part of a logical array, and is recognized as a previous member of that array, but is not currently incorporated as a member of the degraded or failed array.

Array Status

Optimal - An array with this status is optimal and ready for use.

Degraded - An array with this status is no longer fault tolerant.

Building/Verifying - An array with this state is currently building the mirror for a RAID 1 array or calculating the parity for a RAID 5 array.

Rebuilding - An array with this status is currently rebuilding.

Failed - An array assumes the failed status when two or more hard drives fail and the data is lost.

Impacted - An array obtains this status when its performance becomes impacted. This could happen when:

Page 9: CERC Dell Best

Page 10

• The two mirrors of a RAID 1 are not identical. • There are parity inconsistencies in a RAID 5 array. • A building (scrubbing) process is aborted before the array becomes optimal.

DOS Flash Utility

The DOS Flash Utility (applicable to the CERC SATA 1.5/6ch only) is used to update the flash EEPROM components on one or more RAID controllers. The utility can also be used to verify a controller's current flash contents against the flash images in a specified file or to save a controller's current flash contents to a file.

The CERC SATA 1.5/6ch controller uses nonvolatile flash to store on-board software, such as BIOS, microprocessor kernel, and monitor. Whenever it becomes necessary to update any of those components, you can update your controller's flash components using this utility. The utility updates the controller's flash by reading flash image data from a supplied User Flash Image (UFI) file and writing it to the controller's flash components. A UFI file contains all of a controller's flash images, as well as information about each image. It also includes general controller information, such as controller type, to ensure that the utility uses the correct UFI file when updating the controller's flash.

The utility performs the following primary functions:

• Update - Updates all the flash components on a controller with the flash image data from a UFI file to ensure the utility uses the correct UFI file when updating the controller’s flash.

• Save - Reads the contents of a controller's flash components and saves the data to a UFI file. This enables you to later restore a controller's flash to its previous contents should the need arise.

• Verify - Reads the contents of a controller's flash components and compares it to the contents of the specified flash image file.

• Version - Displays version information about a controller's flash components. • List - Lists all the supported controllers detected in your system.

RAID Storage Manager

The RAID Storage Manager (RSM) is a storage management solution used in SC-class servers and Dell Precision™ workstations. It is an Adaptec utility used to manage only Adaptec-based controllers and can be run under Windows or Linux. The following options can be configured in RSM:

• Spanned volumes and RAID volumes • Read and Write caching • Array capacity • Stripe size • Array initialization settings • Array rebuild rate

The array options include Creating, Migrating, Deleting, Rebuilding and Verifying an Array, and Preparing an array for Windows.

Page 10: CERC Dell Best

Page 11

Some key points to take note of:

• When performing a RAID level migration, interrupting this process may result in data loss. Partitioning or formatting the new array will result in complete data loss.

• Deleting an array destroys all data on the array. Deleting an array in which the operating system resides will destroy the operating system and the system will no longer boot. RSM will not allow the deletion of an array in which the operating system resides. The partition must first be deleted, or the array will need to be deleted from the controller BIOS.

• If multiple drives fail in separate disk groups, replace each defunct drive. If multiple physical drives fail simultaneously within the same disk group, contact your service representative.

OpenManage Storage Management

The Dell™ OpenManage™ Storage Management (OMSM) provides storage management information in an integrated graphical view. Storage Management provides RAID storage management that is integrated with Server Administrator. OMSA has drop-down menus and wizards for executing storage management and configuration tasks.

• Create Virtual Disk • Reconfigure Virtual Disk • Maintain Integrity of Redundant Virtual Disks • Assign Hot Spares • Rebuild a Failed Array Disk • Restore Dead Segments

With OMSM, the Rebuild rate, Background Initialization rate, and the Check Consistency rate, can all be set. Foreign configurations can also be imported.

OpenManage Array Manager

Array Manager (AM) is a storage management application that allows the configuration and management of local and remote storage attached to a server while the server is online and continuing to process requests. AM retrieves information about storage devices attached to a server, including controllers and array disks, and information on the storage system’s logical components, such as virtual disks and volumes. AM consists of the Console (Client), Managed System (Server), and Array Manager Utilities and can be used to create, configure, reconfigure, format, delete virtual disks, check consistency and assign hot spares. AM is supported by both Microsoft® Windows® and Novell® Netware®. AM is not supported on Red Hat® Linux® operating systems. Please note that AM versions 3.5 and later have added support for the CERC SATA 1.5/6ch controller, and AM versions 3.6 and later have added support for the CERC SATA 1.5/2s controller. Versions of Dell OpenManage Server Administrator (OMSA) previous to version 4.4 used AM as the RAID management utility. OMSA versions 4.4 and later use OMSM.

Page 11: CERC Dell Best

Page 12

Note: The functionality of the Storage Management applications is limited on the CERC SATA 1.5/2s. No reconfiguration of the subsystem can be done. However, the applications are still useful for obtaining array status, starting consistency check, and forcing rebuilds if they do not start automatically. Table 1 is a feature comparison chart of RAID Storage Manager, Array Manager, and Open Manage Storage Management (OMSM): .

Feature

Raid Storage Manager

Array Manager

OMSM

Remote RAID Management No Yes Yes Alarm Functionality Yes Yes Yes Adaptec Support Yes Yes Yes AMI/LSI Support No Yes Yes Force Online Option No Yes Yes Automatic Rebuild Yes Yes Yes

Table 1: Feature Comparison Chart

RAID Support in Linux For systems supporting the CERC SATA 1.5/6ch and Linux OS, in addition to RSM or OMSM, the Command Line Interface (CLI) can be used to manage controller components. CLI commands can enable test automation or array creation in a production environment using Linux shell scripts. For more information on RAID support in Linux, please refer to the CERC SATA 1.5/6ch or CERC SATA 1.5/2S user guides found on support.dell.com. Understanding Drive and Array Status Tables 2 and 3 show the various possible Array and Drive status for the CERC SATA 1.5/6ch across the three main Storage Management utilities.

BIOS RAID Utility AM OMSM RSM

Optimal Ready Ready Optimal

Degraded Failed Redundancy Degraded Degraded

Building/Verifying Resynching, Not Redundant Resynching Verifying

Rebuilding Rebuilding Regenerating Rebuilding

Failed Failed Failed Failed

Impacted Failed Redundancy Failed Redundancy Impacted

Table 2: CERC SATA 1.5/6ch Array Status

Page 12: CERC Dell Best

Page 13

BIOS RAID Utility AM OMSM RSM

Unable to Access Drive Offline Offline (Drive Disappears)

Missing Member (Drive Disappears) (Drive Disappears) (Drive Disappears)

(Grayed Out) – Degraded Degraded Offline Optimal

Drive is displayed as part of an array (Whitened) Ready Online Optimal

(Grayed Out) – Rebuilding Ready Online Rebuilding

Table 3: CERC SATA 1.5/6ch Drive Status

Similarly, Tables 4 and 5 show the various possible Array and Drive status for the CERC SATA 1.5/2s across the three main Storage Management utilities.

BIOS RAID Utility AM OMSM RSM

Optimal Ready Ready Optimal

Degraded Failed Redundancy Failed Redundancy Degraded

Building (Initial Build) Resynching, Not Redundant Resynching Verifying

Building (Rebuild) Rebuilding Regenerating Rebuilding

Failed Failed

(BSOD if OS Array fails) Failed

(BSOD if OS Array fails) Failed

(BSOD if OS Array fails)

Table 4: CERC SATA 1.5/2s Array Status

BIOS RAID Utility AM OMSM RSM

Unable to Access Drive Offline Offline (Drive Disappears)

Missing Member (Drive Disappears) (Drive Disappears) (Drive Disappears)

(Grayed Out ) – Degraded Degraded Offline Optimal

Drive is displayed as part of an array (Whitened) Ready Online Optimal

(Grayed Out) – Rebuilding Ready Online Rebuilding

Table 5: CERC SATA 1.5/2s Drive Status

Page 13: CERC Dell Best

Page 14

Consistency Check of RAID Arrays – CERC SATA 1.5/6ch RAID arrays are used mainly to protect critical data through redundancy, either in the form of parity calculations or simple mirroring. Hard drive media defects have improved over time, even as drive sizes continue to increase. Hard drives, however, are not expected to be completely flawless and normal wear on a drive may lead to an increase in media or “grown” defects over time. These bad blocks will need to be remapped to another location on the drive. If a bad block is detected during a normal write operation, the controller will mark that block as bad and the block will be added to the “grown defects list” in the drive’s NVRAM. That write operation will be considered incomplete until the data is properly written to a remapped location successfully. If a bad block is detected during a normal read operation, the controller will reconstruct the missing data and remap to a new location. A double fault scenario is one in which the controller detects a bad block on a drive in a RAID array and then detects a second bad block on another drive in the same data stripe. This scenario can also occur when rebuilding a degraded logical drive, when the controller encounters a bad block on a good drive in the array. This will lead to a rebuild failure and potential data loss. For the CERC SATA 1.5/6ch, there are two types of consistency checks offered – the consistency check and the background consistency check. The consistency check (CC) is used to restore the consistency for redundant arrays after unexpected events, such as a power loss. For RAID 5 based arrays, it recalculates and restores parity if needed. For RAID 1 data based arrays, it restores the mirror. If media errors are encountered, data recovery is initiated like in a background consistency check (BCC). A CC can be initiated via any of the storage management applications, RSM, AM or OMSS. Regular consistency checks will reduce the risk of double fault scenarios. To avoid downtime and to ensure data integrity, it is recommended that consistency checks be included as part of routine maintenance of all RAID systems. For more information on scheduling consistency checks in Windows, refer to Appendix C. To enable a consistency check in the following Array Management Utilities, the following steps need to be performed: OpenManage Array Manager: 1. Open the Array Manager Console. 2. Under Arrays -> PERC Subsystem -> CERC SATA 1.6/6ch Controller, right-click on the required virtual disk. 3. Click on Check Consistency to enable the consistency check. 4. To view the event logs, click on the Events tab. Dell OpenManage Storage Management: 1. Open Dell Open Manage Storage Management. 2. Under System -> Storage -> CERC SATA 1.5/6ch, click on Virtual Disks to view the required array. 3. Under Tasks, scroll down and select Check Consistency. Click on Execute to enable the consistency check. Raid Storage Manager: 1. Open RAID Storage Manager. 2. Click on RAID Controller CERC SATA 1.5/6ch. 3. Under Actions, click on Enable background consistency check.

Page 14: CERC Dell Best

Page 15

Background Consistency Check of RAID Arrays – CERC SATA 1.5/6ch Background consistency check is a method used by the CERC SATA 1.5/6ch controller to detect hard drive media errors and recover data. It can be enabled in the controller BIOS to run in the background while other processes are going on, in order to proactively and efficiently detect and fix media errors. When a hard drive media error is detected, it proceeds to recover the lost data by regenerating the right data from peer disks and relocating the generated data. Background consistency check will only run on redundant arrays. The Background Consistency Check feature was implemented in the CERC SATA 1.5/6ch firmware version 4.1.0.7417. It is disabled by default and will need to be manually enabled in the controller BIOS in order to implement it. Typical performance impact when this feature is enabled is about 1-4%. The worst-case scenario can be approximately 10% for Random Writes. Performance numbers may also vary depending on the configuration. To enable/disable Background Consistency Check in the CERC SATA 1.5/6ch BIOS:

1. Press <Ctrl><A> to enter the Adaptec RAID Configuration Utility. 2. From the main Options menu, choose SATASelect Utility. 3. On the next Options menu, choose Controller Configuration. 4. Scroll down to Array Background Consistency Check. Press <Enter> to select option

and choose either Disabled or Enabled. 5. Save changes made upon exit.

Consistency Check of RAID Arrays – CERC SATA 1.5/2s For the CERC SATA 1.5/2s, a consistency check can be done as per the CERC SATA 1.5/6ch. However, there is no background consistency check option available for a CERC SATA 1.5/2s. In the CERC SATA 1.5/s BIOS, there is a Verify Command that is similar to a consistency check. If a mismatch of data during a build of a RAID array is found, an option to verify the drives will be available. This Verify option will only be available if the array is optimal. If the array has failed, it will have to be rebuilt. To verify the drives, the <Ctrl><S> can be used. A prompt will pop up asking if the utility should automatically fix any errors. When the verification is complete, a verification complete message will appear. Another key point to note is that the Verify command cannot be performed on the CERC SATA 1.5/2s while another operation is queued, such as rebuild or initialization. If the Verify command is run while another activity is in progress, the system will return to the Manage Arrays section without completing the verify process.

Scheduling Consistency Checks

It is recommended that consistency checks on each RAID logical volume be performed at least once a month. This will increase the chance of detecting any media defects (bad blocks), remap them and recalculate the parity on the data stripes. This will also reduce the probability of encountering double fault scenarios during rebuild and causing inconvenient down times. Please refer to the Appendix for instructions on how to set up automated scheduling of consistency checks on Windows Systems.

Page 15: CERC Dell Best

Page 16

Upgrading Firmware, Drivers and Storage Management Utilities Concurrently to the Latest Versions

The latest RAID controller firmware and driver for both the CERC SATA 1.5/6ch and CERC SATA 1.5/2s can be found on support.dell.com. This upgrade will ensure maximum performance, reliability, and functionality of the RAID controllers. Upgrading the firmware and drivers along with the latest versions of the Storage Management utilities will ensure correct functionality at all levels and availability of all features. It is recommended that the driver should be updated before updating the firmware.

Monitoring System Event Logs and Storage Management Utility Event Logs

System event logs are generated to provide information to the user or for notifying the user about events that may affect the physical security and availability of their data. With a Windows based OS and with Array Manager installed, the help file of the application can be reviewed to get a complete list of the event types. The system event logs should be checked regularly for any warnings or error messages. All storage management applications’ event logs should also be monitored regularly for any media errors, including corrected media errors. Corrected media errors are normal, but an excessive number of such errors within a short period of time may be indicative of a drive that will need to be proactively replaced during a maintenance cycle. These event logs will also be available from any Novell Netware server with the Windows console. These events will be displayed on the Array Manager Event logs, as shown in Figure 2.

Figure 2: Array Manager Event Log

The BIOS-based event logs can also be monitored. The BIOS-based event logs store all firmware events like configuration changes, array creation, boot activity, and so on. This event log has a fixed size and once full, older events are flushed as newer events are populated. This log is also volatile, and it is cleared with each system restart. To access the event log:

1. Press <Ctrl><A> to access the BIOS when prompted.

Page 16: CERC Dell Best

Page 17

2. From the BIOS RAID Configuration utility menu, press <Ctrl><P>. The Controller Service menu appears.

3. Select Controller Log Information, and then press <Enter>. The current log is displayed.

Backup and Recovery of Data It is highly recommended that a comprehensive backup and recovery strategy be implemented in order to protect all data. This recovery strategy should be reviewed and tested regularly in order to ensure that it will be suitable as well as efficient. During a backup process, there may be a reduction in the normal system performance due to the increased workload.

Hotspare Assignments A hotspare is a drive that is reserved to replace a failed drive in a redundant array. In the event of a drive failure, the hot spare replaces the failed drive and the array is built automatically. Before becoming an array member as a result of a failure, a hot spare can be unassigned using a management utility. Note: For the rebuild to complete successfully, a hot spare must be of the same size or larger than the smallest drive in an array. For the CERC SATA 1.5/s, there is an Add/Delete Hotspares option in the controller BIOS. However, the CERC SATA 1.5/2s only supports 2 drive configurations, and no hot spares can be assigned when the array is in an optimal state. This option can only be used in the case of a degraded array, if there are problems kicking off a rebuild. The CERC SATA 1.5/6ch supports two types of hot spares:

• Global: Protects any array that the spare drive has sufficient capacity to protect • Dedicated: Protects only the array to which it has been assigned

Global Hot Spares When a drive in an array fails, a global hot spare with enough capacity is automatically used to store the data contained on the failed drive. The system’s behavior after a failure depends on the size of the spare relative to the drive it is replacing.

• If the global hot spare is larger than the drive it is replacing by 100MB or more, the spare will replace the failed drive, but still remain as a global hot spare. The unused portion will be available for use in the event of future failures.

• If the global hot spare is the same size or less than 100MB larger than the drive it is replacing, it becomes a member of the array with the failed drive and will no longer be marked as a global hot spare.

Note: For a RAID 10 array, the system can use the same global hot spare to replace two failed drives in the same array if the global hot spare is at least twice the size of the failed drives in the array. This is not recommended because redundancy will be affected. When assigning a global hotspare in a system with a RAID 10 array, a spare which is the same size as the members of the array should be used.

Dedicated Hot Spares

Page 17: CERC Dell Best

Page 18

When a drive in an array containing a dedicated hot spare fails, the spare is automatically used to store the data contained on the failed drive if the spare has enough capacity. The spare becomes a member of the array and will no longer be identified as a hot spare. If the spare is larger than the drive it is replacing, the extra portion will remain unused.

Automatic Failover The automatic failover feature allows the controller to automatically rebuild an array when a failed drive is replaced with a new drive. This feature applies only to fault tolerant arrays. In the CERC SATA 1.5/6ch controller BIOS, to ensure that automatic failover is enabled, the following steps can be performed:

1. At the BIOS screen, press the <Ctrl> + <A> keys together when prompted to enter the Adaptec RAID Configuration Utility. 2. From the Options menu, select SATASelect Utility. Then select Controller Configuration. Verify that Automatic Failover is enabled. If it is disabled, press <Enter> to select the Enabled option. Press <Esc> to exit and choose Yes to save changes made.

Cabling Practices The following are some general cabling best practices that should be followed:

• Ensure that properly qualified cables are being used • Ensure that the SATA cables are properly connected to the controller or SATA ports and

SATA hard drives and that there are no loose connections • Ensure that the cables are not excessively bent • Ensure that the cable lengths are appropriate for installation • Examine the cables for cuts or exposed shielding

SECTION 4: RECOVERY OF ARRAYS To avoid loss of data integrity or to aid in the recovery of lost arrays, perform the following simple steps.

Capture System Logs and Details Surrounding Array Failures to Assist in the Recovery In the Windows® OS environment, the use of the Dell™ Server E-Support Tool (DSET) is recommended. This tool will capture all the system description and configuration data needed in a debug or recovery effort. DSET is a small, non-intrusive tool that does not require a reboot of the system to provide basic functionality. Immediately after installation, DSET can collect information about Windows® drivers, services, network settings, etc. It will also collect basic information about the system's storage such as active drives and RAID containers. DSET will also collect extended hardware information such as processors, memory, PCI cards, ESM log, BIOS/firmware versions and system health (fan/voltage levels). If Array Manager 2.5 or later is installed, DSET will gather Dell-specific storage information such as CERC and PERC controllers and their firmware, array/containers, logical disk signatures, enclosures and physical hard drives installed. DSET always collects Windows NT® 2000 information such as services, tasks, drivers, and events logs. The ESM Log of supported systems can be cleared so that the amber hardware warning light on Dell PowerEdge™ systems can be properly reset once an event has been remedied. Note: The latest version of DSET can be found at ftp://dropbox.us.dell.com/dropbox2/ips/DSET/ For the Linux operating system, the “Getconfig” tool can be used to retrieve issue information when possible. Getconfig is a utility that pulls the hardware configuration data from various

Page 18: CERC Dell Best

Page 19

sources on a Linux box. Retrieving the system data from a Linux box is very labor intensive and this tool fully automates the process.

Write Down the Circumstances or the Exact Steps Performed Preceding the Failure As much as it is possible, all the steps that occurred prior to the failure, including any changes made to the operating system, system software or hardware, or the CERC driver or firmware, needs to be monitored. The ability to backtrack and understand the activities that may have contributed to a failure will assist in the attempt to recover a failed array and could also help correct the conditions that helped contribute to the failure.

Understand Possible Causes of Drive Array Failure While the top priority during an array failure is to bring the system back online, it is also important to determine the root cause of the problem. Failure to determine a root cause may lead to further outages and data loss. Cables – All cables should be approved cables for the particular application. Pins should be inspected and all external and internal SATA cables attached to the system should be reseated. If any damage is identified on the cable pins, the female connection should be inspected as well for resultant damage.

Power – Each system in the rack should be protected by an approved Uninterruptible Power Supply (UPS) and it should be verified that each server has the proper amount of power. Low voltage or power spikes will knock an array offline.

Firmware and Driver – A mismatch of firmware and driver could result in random controller lockups, hangs or BSODs. The system should be at the current approved level. The latest firmware and driver versions can be found on support.dell.com. When performing a firmware and a driver update, it is recommended that the driver be updated before the firmware. The driver is always backward compatible with older firmware. A new firmware with an older driver will usually result in abnormal behavior.

Defective Hard Drive – In some cases, a drive can cause noise on the SATA bus and knock an array offline. If this is the case, the diagnostics should fail the drive, unless the drive is producing sufficient noise to render the bus inoperable. System logs should be reviewed for drive reported errors. Drives can also be reseated to ensure good connections.

Common BIOS Messages The following are the common CERC SATA 1.5/6ch and CERC SATA 1.5/2s messages that appear in the controller BIOS:

The following message indicates Port Roaming:

The following drives have moved to different Ch:Id:Lun 0:3:0 � 0:4:0

The following message is seen when the firmware detects that a new drive has been inserted since the last boot:

New devices detected at the following SATA Ports: Port#4

The following message appears when an array is in a degraded state or rebuilding:

Page 19: CERC Dell Best

Page 20

The following Arrays have Missing or Rebuilding or Failed members and are degraded: Array#4

The following message appears when an array is missing more than one drive and is failed.

The Following Arrays have missing required members and cannot be configured: Array#4

The following message appears when there is a Smart error:

Port# Vendor Product Info Rev# SMART Error ---------------------------------------------------------------------------------------- 0 ST340014 AS 8.05 Y

The following message appears when the firmware has a problem with one of attached drives. Either the firmware cannot prepare the driver during boot up or a firmware kernel crash has occurred:

“Fatal Error: Controller monitor failed. Controller not started. Press any key to continue.”

Simple Troubleshooting Steps When a Failure Is Discovered The following simple actions and troubleshooting steps can be performed before calling technical support for assistance. These steps will also assist the technical support representative.

1. Capture controller logs. 2. Record RAID configuration. 3. If the array is degraded or failed, note the offline or grayed out drive IDs. 4. Shut down the system and disconnect/reconnect all SATA power and data cables. 5. Check cable ends for bent pins/problems. 6. Move the CERC SATA 1.5/6ch controller, if present, to another PCI slot. 7. Boot system and check the array and drive status after performing a rescan operation.

The CERC SATA 1.5/6ch and CERC SATA 1.5/2s user guides available on support.dell.com can be used to find more troubleshooting tips and guidelines.

Recovering from Arrays in a Degraded State An array is considered to be in a degraded state when it loses its redundancy. This can be caused by a drive dropping offline, failing, or by a drive becoming degraded (still online but grayed out). When an array is in a degraded state, a rebuild can be attempted. For the CERC SATA 1.5/6ch, the Automatic Failover feature allows the controller to automatically rebuild an array when a failed drive is replaced. This feature only applies to fault-tolerant arrays. This feature is enabled by default. If disabled, the array will need to be rebuilt manually. The drive replacement should be done when the system is turned off (except in the case of the PV745N, which supports hot plugging). A hot spare is a disk that is not used in data storage, but is reserved for use as a replacement for one of the other drives in the array in the event of a failure. If the automatic rebuild does not start automatically, storage management utilities like Array Manager or OMSM can be used to perform a rescan and trigger the rebuild. If the rebuild still does not start, a manual rebuild should be attempted. In the controller BIOS, the <CTRL><S> function can be used to assign the newly inserted drive as a dedicated hot spare or <CTRL><G> can be used to assign it as a global hot spare for the array to be rebuilt. Once the hot spare is assigned, the rebuild should start. Storage management utilities can also be used to assign a

Page 20: CERC Dell Best

Page 21

drive as a hot spare. If an error message displays while assigning the hot spare, initialize the newly inserted hard drive first to erase the old configuration data from the previous usage. Note: Initializing a hard drive will destroy all the data on that drive.

Recovering from Arrays in a Failed State Note: The following recovery methods can be used to recover data, if possible.

CERC SATA 1.5/6ch Array Restoration - <CTRL><R> Enable/Restore RAID <CTRL><R> is a feature present in the CERC SATA 1.5/6ch BIOS RAID configuration utility. When a redundant array fails, this option can be used to recover access to some or all the data on the failed array. <CTRL><R> however cannot guarantee the consistency of the data. The integrity of the data needs to be verified before it is used. <CTRL><R> can only be used when the array status is failed. All the original array member disks must be present in the system. Drives grayed out under Array Members can be considered to be original members of the failed array. <CTRL><R> can incorporate these drives back into the array. If the array is in a degraded state, a hot spare should be assigned to initiate a rebuild to restore the array to its optimal status. <CTRL><R> cannot be used to recover RAID 0 arrays. Note: <CTRL><R> cannot guarantee the consistency of the data. The integrity of the data will need to be verified. This option should be used only to try to recover the data The data may be lost permanently. To Enable/Restore RAID or a <CTRL><R> operation, perform the following steps:

1. At the BIOS screen, press <CTRL><A> to enter the BIOS RAID Configuration Utility. 2. From the Options menu, select Array Configuration Utility. Then select Manage

Arrays. 3. Choose the desired array under List of Arrays. Press <CTRL><R> to Enable/Restore

RAID. When the warning message appears, type Y to continue. Back up as much data as possible from the recovered array.

The Enable/Restore RAID function is also available in Array Manager and OMSM and is referred to as Restore Dead Disk Segments. If the OS is up and running, this option can be used to force the drives online.

CERC SATA 1.5/2s Array Restoration As the CERC SATA 1.5/2s supports only two drives, if both drives are grayed out in the system, follow the steps mentioned below to attempt to recover the data.

1. Delete the array. When asked whether to delete the boot sectors, select NONE. 2. Re-create an array with the same size as the one that failed. 3. Check if the system can be booted to the OS. 4. If the system can be booted to the OS, perform a backup of all the data required. Then,

perform a Verify Disk Media (in the BIOS) or Dell Diagnostics hard drive long DST test on the originally problematic drives.

5. If the test passes, re-create the array, perform an Array Verify, and restore the data from the backup.

6. If the test fails, replace the hard drives that fail the hard drive diagnostics.

Page 21: CERC Dell Best

Page 22

Note: The preceding restoration process is not a design feature of the CERC SATA 1.5/2s and it cannot be guaranteed to help restore a system in a failed state.

Double Fault Scenario A double fault scenario occurs when an array is in a degraded state and a bad block is detected on another drive, which is part of the degraded array. This scenario can occur when a rebuild of a degraded array is in progress. A double fault scenario may result in data loss if data is present on the stripe.

To determine if a double fault scenario has happened under Array Manager or OMSM:

If a rebuild fails, the array disks should be first checked. If a drive other than the one that was replaced or re-inserted to fix the original issue appears “Degraded” or “Offline”, this indicates the double fault scenario. If a drive that was re-inserted or replaced appears “Degraded” or “Offline”, then follow the steps described in the Recovering from Arrays in a Degraded State section. Alternatively, under the Events log, the ID of the hard drive which showed the medium error can be checked. If the ID is that of an existing drive in the array, one that was not replaced or re-inserted, this indicates the double fault scenario. The medium error and rebuild failure error messages will appear as shown below: Error 544 Virtual Disk (RAID5 0) rebuild failed Error 691 Medium Error: ID (0:00) Medium Error - Bad Block Replacement Possible. Note: When rebuild fails due to a double fault scenario, it is advisable to back up all critical data, re-create the array, and restore the data. To avoid this scenario in the future, consistency checks should be scheduled on a regular basis. To determine if a double fault scenario has happened under the controller BIOS: If a rebuild fails, check the Array Members under Array Properties. If one of the existing array disks, one that was not replaced or re-inserted, appears to be grayed out or missing, this indicates the double fault scenario.

Rebuilding

If a rebuild fails due to a double fault scenario, the rebuild will not kick off again even if a drive is assigned as a hot spare. This is working as designed. In a double fault scenario, the firmware is unable to generate the parity for the stripe due to a bad block on the secondary drive (double fault on an array), thus disabling rebuilding on the replaced new drive (in other words, it does not kick off rebuild at all even if the drive is replaced again with another drive).

In some cases, a rebuild might not kick off at all on the new or original drive even if there’s no double fault scenario. To recover from this situation, initialize the drive and/or assign the replaced or original drive as a hot spare to kick off the rebuild process.

Known Hard Drive Replacement Issues 1. On a CERC SATA 1.5/6ch, once a rebuild fails on the replaced drive, the rebuild will

not kick off again 2. On a CERC SATA 1.5/6ch, once rebuild fails on a virtual disk, the rebuild may

not restart. The rebuild could automatically fail due to the following reasons:

Page 22: CERC Dell Best

Page 23

i. The replaced drive is bad. In this case, run hard drive diagnostics on the replaced drive to verify if the drive is truly bad before replacing it again.

ii. One of the existing drives, in a degraded volume, has a bad block (Double Fault Scenario). In this case, depending on which virtual disk is affected due to the bad block, the rebuild will fail ONLY on the affected Virtual Disk. However, it will continue and complete on rest of the virtual disks.

a. If a user reinserts the same drive or replaces it with a new one, the rebuild will not restart for ONLY those virtual disks that had failed earlier (due to dual failure scenario).

b. Array Manager logs the following error in the AM log: Perc2Pro 544 CERC SATA1.5/6ch Controller 0 , Virtual Disk (OS 0) rebuild failed

This problem can be avoided by updating the CERC SATA 1.5/6ch firmware to version 4.1.0.7417 or later. It is also recommended that regular consistency checks be scheduled to avoid running into rebuild failure issues.

3. A RAID 1 rebuild may not start and may generate a stop error on a CERC SATA 1.5/2s system.

If a drive fails in a RAID 1 and the rebuild option is selected within OMSS 1.0, the rebuild may not start or may generate a “stop error”. The server should be restarted and the Configure Drives option should be selected in the controller BIOS. The new disk should be selected, followed by the Add/Delete Hotspare option. The new disk should then be selected again. After rebooting the system, the virtual disk will rebuild automatically when the operating system starts. This procedure can also be performed when the drive is first replaced, in which case, no operating system boot will be required. All newer OMSS versions have a fix for this issue. No hardware replacements should be required.

SECTION 5: UPGRADING AND RECONFIGURING ARRAYS

Array Reconstruction In the CERC SATA 1.5/6ch, array Capacity Expansion (CE) and RAID Level Migration (RLM) are supported. The process of rebuilding the new array that is created by CE and RLM operations is called Array Reconstruction. The latest OpenManage Array Manager User Guide or Open Manage Storage Manager User Guides can be referred to for instructions regarding Array Reconstruction (available at support.dell.com). Redundancy is maintained during reconstruction when the initial and final RAID levels are redundant levels. If a disk fails during this process, the reconstruction process must continue and finish before the degraded array can be rebuilt. Note: Virtual disks or arrays larger than 2 Terabyte (TB) cannot be created on any Dell CERC controller. The SATA specification supports 2 TB Virtual Disk (array). However, the 2TB limitation on CERC SATA is imposed due to BIOS, driver, and Application Programming Interface (API) restrictions. Currently, Dell does not have any plans to support Virtual Disks larger than 2TB on the CERC SATA 1.5/6ch or the CERC SATA 1.5/2s. Due to this limitation, certain CE or RLM operations beyond the 2TB limit may not work.

Capacity Expansion Capacity Expansion involves adding a physical disk member to an existing RAID array and expanding the logical drive by utilizing the additional capacity. CE also allows expansion of the

Page 23: CERC Dell Best

Page 24

logical drive by utilizing the unused space in the existing drives, without inserting a new drive. Windows Server 2003, Windows 2000, and Netware, also support Online Capacity Expansion (OCE). Upon completion of an array expansion, the additional capacity can be used without restarting the system. This feature is not available in the CERC SATA 1.5/2s. The following are the basic procedures that need to be followed when expanding an array. The first way to perform an array expansion is by increasing hard drive size. An example would be upgrading all 80GB hard drives to 250GB or 400GB hard drives. The second way to perform array expansion is by increasing the total number of hard drives that make up the array. An example would be adding an additional drive to a three drive RAID 5 to make a four drive RAID 5. Note: Before any array expansion operation, all critical data should be backed up in the event of an array reconstruction failure. Note: Array expansion cannot be done via the BIOS. A storage management application such as Array Manager or OMSM is required. Note: The CERC SATA 1.5/6ch does not support hard drive sizes larger than or equal to 1 TB. Array expansion via increasing hard drive size: This type of array expansion includes making available existing unused hard drive space, as well as replacing the existing drives with drives of larger capacity. The following steps should be taken to replace the existing drives with larger capacity drives: RAID 0 Array:

1. Back up all data, replace the existing drives with the new drives, re-create the array, and restore the data on the new array from backup.

RAID 1 Array:

1. Back up all data (Recommended). 2. Remove the first drive and add the replacement drive. 3. Perform a rebuild process (either manually or automatically if auto rebuild is enabled). 4. Upon rebuild completion, remove the second drive and add the replacement drive. 5. Perform a rebuild process (either manually or automatically if auto rebuild is enabled). 6. Perform a Capacity Expansion.

Note: This series of steps may be very time consuming depending on the size of the drives and the system utilization by applications, due to the dual rebuild cycles required. Alternatively, the following steps can be performed to reduce the number of rebuild cycles:

1. Back up all data. 2. Delete the existing array and create a new RAID 1 array. 3. Restore the data on the new array from backup.

RAID 5 Array:

1. Back up all data (Recommended). 2. Remove the first drive and add the replacement drive. 3. Perform a rebuild process (either manually or automatically if auto rebuild is enabled). 4. Upon rebuild completion, remove the second drive and add the replacement drive. 5. Perform a rebuild process (either manually or automatically if auto rebuild is enabled). 6. Repeat this process until all the drives are replaced. 7. Perform a Capacity Expansion.

Note: This series of steps may be very time consuming depending on the number of drives in the RAID 5 array, the system utilization by applications, and the size of the drives, due to the multiple rebuild cycles required. Alternatively, the following steps can be performed to reduce the number of rebuild cycles.

Page 24: CERC Dell Best

Page 25

1. Back up all data. 2. Delete the existing array and create a new RAID 5 array. 3. Restore the data on the new array from backup.

RAID 10 Array:

1. Back up all data (Recommended). 2. Remove the first drive and add the replacement drive. 3. Perform a rebuild process (either manually or automatically if auto rebuild is

enabled). 4. Upon rebuild completion, remove the second drive and add the replacement drive. 5. Perform a rebuild process (either manually or automatically if auto rebuild is

enabled). 6. Repeat this process until all the drives are replaced. 7. Perform a Capacity Expansion.

Note: This series of steps may be very time consuming depending on the size of the drives and system utilization by applications, due to the multiple rebuild cycles required. Alternatively, the following steps can be performed to reduce the number of rebuild cycles.

1. Back up all data. 2. Delete the existing array and create a new RAID 10 array. 3. Restore the data on the new array from backup.

Note: When there is a drive failure, and the failed drive is replaced with a drive that is larger than the rest of the drives in the array, the size of the virtual disk will not increase due to disk coercion. The leftover space on the new drive however is available for use by other virtual disks in the system. Array expansion via increasing the number of hard drives: RAID 0 Array:

1. Back up all data. 2. Select the additional drives to be added. 3. Recreate the RAID 0 array and restore the data on the new array from backup.

RAID 1 Array: No additional drives can be added to a RAID 1 array, because by definition, it is formed using only 2 drives. RAID 5 Array:

1. Back up all data (Recommended). 2. Select the additional drives to be added. 3. Perform an Array Reconfiguration operation.

RAID Level Migration Online RAID level migration is an advanced RAID technology feature present in the CERC SATA 1.5/6ch. This feature allows RAID levels to be changed without rebuilding the array from scratch. This feature is not available in the CERC SATA 1.5/2s.

The CERC SATA 1.5/6ch supports modifying existing arrays by expansion, migration from one array type to another, and changing the stripe size. These migration scenarios are described in Table 6.

Page 25: CERC Dell Best

Page 26

Current Array Type New Array Type

RAID 0

RAID 5 or 10

RAID 1

RAID 0 or 5 or 10

RAID 5

RAID 0 or 10

RAID 10

RAID 0 or 5

Table 6: Array Migration Possibilities in the CERC SATA 1.5/6ch

RLM can occur by migrating from a lower redundancy RAID level to a higher level or from a higher redundancy RAID level to a lower level, both without taking the array offline. Both types of RLM must involve migration to an array with a capacity greater than or equal to the original array. This can be done by combining the RLM operation with the CE operation. Figures 3, 4 and 5 illustrate a RLM from a 2-drive RAID 1 Array to a 4-drive RAID 5 array.

Figure 3: Selecting the Array Disks to be added to the Virtual Disk

Page 26: CERC Dell Best

Page 27

Figure 4: Selecting the Attributes for the Reconfigured Virtual Disk

Figure 5: New and Old Virtual Disk Configuration Information

Past Known Issues 1. There was an issue with the CERC SATA 1.5/6ch controller (firmware version 4.1.0.7401

and all earlier versions) in which attempting to morph an array of size greater than 1.09 terabytes (TB) could result in data loss. Whereas, creating a new array greater than 1 TB was not a problem. Morphing encompasses RLM, OCE, shrinking, stripe size migration and so on. The limitation was on the array size only, and NOT on the number of arrays. To resolve this issue, a different array would need to be created rather than modifying the existing array. This issue has been fixed in all firmware versions greater than 4.1.0.7401.

Page 27: CERC Dell Best

Page 28

2. A second issue involved the CERC SATA 1.5/6ch crashing the Windows Server when

morphing a RAID 5 array. This happened when the morph destination hard drive sequence did not match the source hard drive sequence on the same set of hard drives. This issue has been fixed in the latest firmware version 4.1.0.7417.

Note: To ensure that the above listed or other issues pertaining to array morphing are not seen, the CERC SATA 1.5/6ch firmware should be updated to the latest version 4.1.0.7417, which can be found on support.dell.com

SECTION 6: PERFORMANCE The CERC SATA 1.5/2s is a software-based RAID implementation and has no internal cache memory. This software implementation is integrated within the driver, which contains the code to run the RAID engine within the OS environment. Driver-based RAID depends completely on the resources of the system processor and memory for RAID execution, and may affect system performance in high CPU utilization environments. There is also a known issue with Windows 2003, in which if a system is running the native Microsoft driver atapi.sys, instead of the CERC SATA 1.5/2s driver, the system may run in Programmable Input/Output (PIO) mode, which may lead to poor performance. For all CERC SATA 1.5/2s RAID implementations, it is recommended that the latest aarich.sys driver, found on support.dell.com, be used. The CERC SATA 1.5/6ch is a hardware RAID implementation, in which dedicated hardware with embedded firmware is used to control the RAID operations. The performance of a hardware RAID solution is dependent on the processing power of the controller’s I/O processor and the cache size, unlike software RAID, whose performance is directly dependent on server CPU performance and load. Cache is a fast-access memory on the controller that serves as intermediate storage for data that is read from, or written to drives. There are 2 caches that can affect performance, the hard drive cache and the controller cache. The hard drive cache is enabled by default. With the hard drive cache enabled, a performance gain of up to 40% on write commands can be obtained. The CERC SATA 1.5/6ch has a controller cache memory of 64 MB, fixed ECC SDRAM, which when enabled, can significantly improve sequential and random write performance. The I/O throughput of the CERC SATA 1.5/6ch is mostly determined by the attached hard drive performance, the onboard I/O processor’s processing power, and the onboard cache size. The local CPU speed and memory size will not affect the RAID storage subsystem throughput too much. If a machine is running applications that consume a lot of system memory and free space becomes scarce, this could affect the RAID subsystem’s operations. The following are the main SATA configuration options that may affect the performance of the CERC SATA 1.5/6ch.

• Write Cache (Default: ENABLED) – When Write Cache is enabled, performance is maximized. Caching should usually be enabled to optimize performance, unless the data is highly sensitive, or unless an application performs completely random reads, which is unlikely.

Note: When Write Cache is enabled, there is a potential for data loss or corruption during a power failure. A UPS solution is recommended to ensure fault tolerance.

• DMA (Default: ENABLED) – When enabled, Direct Memory Access (DMA) mode is used for the drive, providing maximum performance.

Page 28: CERC Dell Best

Page 29

• Allow Read Ahead (Default: ENABLED) – When enabled, the drive’s read ahead cache algorithm is used, providing maximum performance under most circumstances.

• Stripe Size (Default: 64MB) – The default stripe size gives the best overall performance

in most network environments. • Array Background Consistency Check (Default: Disabled) – When enabled,

consistency checking processes reduce performance. For RAID 5, the performance reduction is significant.

Write Cache The write cache policy for the CERC SATA 1.5/6ch is usually set during the creating of a Virtual Disk. This policy can be changed using an array management utility such as Array Manager or OMSM. The hard drive cache is enabled by default. With the hard drive cache enabled, a performance gain of up to 40% on write commands can be obtained. The write cache policy cannot be changed with Array Manager 3.6 or below. Array Manager 3.7 has provided a way, via registry change, to enable the write cache on the CERC SATA 1.5/6ch controller. This registry change allows an Array Manager user to perform a Change Policy Virtual disk command and select the Write Cache Enabled Always setting. This change will permit an Array Manager user to enable this setting only on CERC SATA 1.5/6ch controllers without recreating their Virtual Disks. Please note before making any registry changes, it is recommended that all critical data be backed up. The Write Cache Enable Always setting can lead to cache data loss. Data in the write cache will be lost if power is lost to the server. This setting should only be used when there is a UPS battery backup for the system. Even with UPS battery backup for the server, there is no guarantee that cached data will not be lost during a power failure. This setting should be selected only on virtual disks that contain non-critical data or data where the potential for data loss will not be catastrophic. Note: This functionality was added with Array Manager 3.7 and will not work with earlier versions. Array Manager 3.7 can only be installed while installing OMSA 4.3 and above.

Page 29: CERC Dell Best

Page 30

Appendix: SATA BEST PRACTICES

CERC SATA 1.5/6ch Controller Specifications

Minimum System Requirements

Server or workstation with one universal PCI slot and a motherboard and BIOS that complies with the PCI Local Bus Specification, Revision 2.2 and provides large memory-mapped address ranges.

Controller Specifications Component Description Computer bus 32 or 64-bit PCI local bus

On-board processors Intel 80302 Intelligent I/O Processor Three Silicon Image SI3512 dual SATA 1.0 controllers with command queuing

Cache memory 64 MB, fixed ECC SDRAM Data safety Audible alarm Device protocol SATA 1.0 and SATA II RAID levels RAID 0, RAID 1, RAID 10, RAID 5, and Simple volume Container (array) support Up to 64 containers per controller; 64 partitions maximum per container

PCI bus 64-bit, 66 MHz (32-bit, 33 MHz-compatible) SATA channels Six internal channels

Device support Up to six SATA devices per controller (1 per channel) Supports a RAID container as a boot device

Supported Operating Systems

Red Hat Linux Advanced Server 3 Red Hat Linux Advanced Server 2.1 Red Hat Linux Professional (Depending on Version) Windows Server® 2003 (32bit) Standard Edition Windows 2003 Enterprise Server (32bit) Small Business Server 2003 (32bit) Windows 2003 Web Server (32bit) Windows 2000 Server Windows 2000 Advanced Server Windows 2000 Small Business Server Novell NetWare, versions 5.1 and 6.5 Novell NetWare Small Business Suite

Page 30: CERC Dell Best

Page 31

CERC SATA 1.5/2s Controller Specifications

Controller Specifications Component Description SATA Controller Integrated ICH5R (SC1420/SC1425/PE1800) and ICH6R(SC420/PE800) Cache memory None Device protocol SATA 1.0 RAID levels RAID 0, RAID 1, Up to two single configured drives Container support One container. One logical drive SATA channels Two SATA ports

Device support One SATA drive per port, maximum two HDDs Supports Logical Drive as boot device

SMART Support Yes

Supported Operating Systems Windows 2003 Server (32bit) Standard Edition Windows 2003 Enterprise Server (32bit) Small Business Server 2003 (32bit) Windows 2003 Web Server (32bit) Windows 2000 Server Windows 2000 Advanced Server Windows 2000 Small Business Server Novell Netware, versions 5.1 and 6.5

Page 31: CERC Dell Best

Page 32

Setting Up Automated Scheduling of Consistency Checks on Windows Systems

1. For systems with a Windows OS system and Array Manager installed, you can use the Scheduled Tasks option from the menu under the Accessories folder. Double-click on Add Scheduled Task and the following wizard will appear:

2. Click Next and the following screen will appear:

3. Click Browse and locate the file amcli.exe. The AMCLI executable is located in the Array Manager installation directory.

4. Select the file and click OK.

Page 32: CERC Dell Best

Page 33

5. Click Next and the following screen will appear:

6. Enter a name for this task and select how often the task should be run. The minimum recommendation for this task is to be run at least once a month.

Page 33: CERC Dell Best

Page 34

7. Click Next and the following screen will appear:

8. Select the time at which the Consistency Check should run. Remember that there will be a system performance impact so you want to run this at a low traffic time.

9. Click Next and the following screen will appear:

10. Fill in the name and password fields appropriately so the task can be executed correctly.

Page 34: CERC Dell Best

Page 35

11. Click Next and the following screen will appear:

12. Select the checkbox for Open advanced properties for this task when I click Finish. 13. Click Finish and the following screen will appear:

Page 35: CERC Dell Best

Page 36

14. In the Run textbox you can type different parameters. The following is example syntax for scheduling a check consistency on virtual disk 1. "C:\PathName\amcli.exe" /c1 where PathName is the path to the AMCLI executable. This will be the command executed by the scheduler every time it runs this event.

15. To run a consistency check, the parameter is amcli /cn where the c option indicates consistency check and n is the number of a virtual disk as displayed in the Array Manager tree view.