12
Hi Siddique, As discussed please go through the below doc , compare with your procedure, check with seniors and ensure that the problem is fully resolved. below is the customer escalated mail for your info. Document Audience: SPECTRUM Document ID: 208671 Old Document ID: (formerly 73132) Title: Solaris[TM] Volme Ma!a"er sof#$are% Re&la'i!" is s(Solaris[TM] * +&era#i!" Sys#em a!, a-o.e) Copyright Notice:Co&yri"/# 200* S! Mi'rosys#ems !' 4ll Ri"/#s Reser.e, Update Date: T/ May 01 00%00%00 M T 2008 Solution Type Technical Instruction Solution 208!" : Solaris#T$% &olume $anager so't(are: )eplacing Dis*s+Solaris#T$% , Operating System and a-o.e/ Rela#e, Ca#e"ories Home>Product>Storage>Storage Management Software Description Beginning with the Solaris[TM] 9 Operating System Solaris[TM] !olume Manager"!M# software uses a new feature called $e%ice&'$ "$e%'$#( This feature identifies each dis) not only *y its c+t+d+ name *ut *y a uni,ue '$ which is generated *y the dis)-s ../ or serial num*er( Solaris !olume Manager"!M# relies on the Solaris OS to supply it with each dis)-s correct $e%'$( 5/e! a ,is fails a!, is re&la'e, a s&e'ifi' &ro'e,re is re ire, for ,is s #o #/a# Solaris +S is &,a#e, $i#/ #/e !e$ ,is s e. f #/is &ro'e,re is !o# follo$e, e a'#ly #/e errors -elo$ may -e see!% 0un 11 23411456 host2 metade%adm4 ['$ 179899 daemon(error] 'n%alid de%ice relocation information detected in Solaris !olume Manager 4s a resl# Solaris +S $ill !o# &,a#e #/e e. !#il #/e !e # re-oo# mea!i!" al#/o"/ a 9E5 ,is is i! #/e sys#em #/e e. -ei!" re&or#e, -y Solaris +S #o Solaris VM sof#$are is s#ill #/e +: ,is s e. or 1 ample:

Disk Replacemtn Svm-raid0and5

Embed Size (px)

DESCRIPTION

svm

Citation preview

Hi Siddique,

Hi Siddique,

As discussed please go through the below doc , compare with your procedure, check with seniors and ensure that the problem is fully resolved. below is the customer escalated mail for your info.

Document Audience:SPECTRUM

Document ID:208671

Old Document ID:(formerly 73132)

Title:Solaris[TM] Volume Manager software: Replacing Disks(Solaris[TM] 9 Operating System and above)

Copyright Notice:Copyright ? 2009 Sun Microsystems, Inc. All Rights Reserved

Update Date:Thu May 01 00:00:00 MDT 2008

Solution TypeTechnical Instruction

Solution 208671 : Solaris[TM] Volume Manager software: Replacing Disks(Solaris[TM] 9 Operating System and above) Related Categories

Home>Product>Storage>Storage Management Software

DescriptionBeginning with the Solaris[TM] 9 Operating System, Solaris[TM] Volume Manager(VM) software uses a new feature called Device-ID (DevID). This feature identifies each disk not only by its c#t#d# name, but by a unique ID which is generated by the disk's WWN or serial number. Solaris Volume Manager(VM) relies on the Solaris OS to supply it with each disk's correct DevID.

When a disk fails and is replaced, a specific procedure is required for disks to make sure that Solaris OS is updated with the new disk's DevID.

If this procedure is not followed exactly, the errors below may be seen:

Jun 22 18:22:57 host1 metadevadm: [ID 209699 daemon.error] Invalid device

relocation information detected in Solaris Volume Manager

As a result, Solaris OS will not update the DevID until the next reboot, meaning that although a NEW disk is in the system, the DevID being reported by Solaris OS to the Solaris VM software is still the OLD disk's DevID.

For Example:

If the DevID of c0t1d0 was "SSEAGATE_ST318203_LR7943" and it is replaced with a new disk (whose DevID would be "SFUJITSU_MAG3182_005268"), Solaris OS will still report that the c0t1d0 disk has the DevID of "SSEAGATE_ST318203_LR7943" until the host is rebooted.

Although it is possible to replace the disk without running through this procedure, the next system reboot will cause the Solaris VM software to fail the new disk because the DevID of the disk will have changed, and Solaris VMwill not have any knowledge of that new DevID.

To replace a disk, certain commands must be used to unconfigure the disk that is to be replaced, as well as configure the new disk. This will cause an update of the Solaris OS device framework, such that the new disk's DevID will be inserted and the old one removed.

This information applies to disks marked as "failing", as well as disks that have already failed. The commands to remove/clear/replace metadevice entities deal initially with the SVM name-placeholders (d10, d1, d30, etc), and not the actual device names. A "failing" disk is one that still responds to inquiries, but has experienced errors that could indicate a future full-failure of the disk. A "failed" disk has already experienced such a failure. The replacement procedures for either remain essentially the same.

Steps to FollowPROCEDURE FOR REPLACING MIRRORED DISKS

Given all of the above, the following set of commands should work in all cases (though depending on the system configuration, some commands may not be necessary):

To replace a Solaris VM-controlled disk which is part of a mirror, the following steps must be followed:

1. Run 'metadetach' to detach all submirrors on the failing disk from their

respective mirrors:

metadetach -f

Note: If the "-f" option is not used, the following message will be returned:

"Attempt an operation on a submirror that has erred component".

Then run 'metaclear' (**) on those submirror devices:

metaclear

Verify there are no existing metadevices left on the disk by running:

metastat -p | grep c#t#d#

2. If there are any replicas on this disk, remove them using:

metadb -d c#t#d#s#

Verify there are no existing replicas left on the disk, by running:

metadb | grep c#t#d#

3. If there are any open filesystems on this disk (not under Solaris VM

control), unmount them. If the disk or a slice on the disk is being used as a dump device, move it temporarily to another disk. You can check the existing dump device by running "dumpadm" and change the current dump device using "dumpadm -d ". If this is not done the disk will fail to unconfigure.

4. Run the 'cfgadm' command to remove the failed disk.

cfgadm -c unconfigure c#::dsk/c#t#d#

NOTE: Use the "cfgadm -al" command to obtain the variable "c#::dsk/c#t#d#".The variable will be listed under the 'Ap_Id' column from the "cfgadm -al" command's output.

NOTE: if the message "Hardware specific failure: failed to unconfigure SCSIdevice: I/O error" appears, check to make sure that you cleared allreplicas and metadevices from the disk, and that the disk is not beingaccessed.

NOTE: To replace internal FC-AL disks, follow

Technical Instruction < Solution: 214845 >

5. Insert and configure the new disk.

cfgadm -c configure c#::dsk/c#t#d#

cfgadm -al (to confirm that disk is configured properly)

6. Run 'format' or 'prtvtoc' to put the desired partition table on the new disk

7. If necessary, recreate any replicas on the new disk:

metadb -a c#t#d#s#

8. Recreate each metadevice to be used as a submirror, then use 'metattach' to

attach those submirrors to the mirrors and start the resync.

NOTE: If the submirror was something other than a simple one-slice concat device, the metainit command will be different than shown here.

metainit 1 1

metattach

9. Run 'metadevadm' on the disk, which will update the New DevID.

metadevadm -u c#t#d#

NOTE: If you get the message "Open of /dev/dsk/c#t#d#s0 failed", it can safely be ignored (this is a known bug pending a fix).

NOTE: 'metadevadm -u' is usually unnecessary for this replacement procedure since the DevID information is completely removed from the SVM database by metadetach, metaclear and 'metadb -d' in step 1 and 2. On the other hand, 'metadevadm -u' is necessary if the failed disk is replaced by using 'metareplace -e' described in http://docs.sun.com/app/docs/doc/817-2530/6mi6gg8de?a=viewPROCEDURE FOR REPLACING DISKS IN A RAID-5 META-DEVICE

Note: If a disk is used in BOTH a mirror and a RAID5, don't use the following procedure. Instead, follow the instructions for the MIRRORED devices(above). This is because the RAID5 array, just healed, is treated as a single disk for mirroring purposes.

To replace an SVM-controlled disk which is part of a RAID5 meta-device, thefollowing steps must be followed.

1. If there are any open filesystems on this disk (not under SVM control), unmount them. If the disk or a slice on the disk is being used as a dump device, move it temporarily to another disk. You can check the existing dump device by running "dumpadm" and change the current dump device using "dumpadm -d ". If this is not done the disk will fail to unconfigure.

2. If there are any replicas on this disk, remove them using:

metadb -d c#t#d#s#

Verify there are no existing replicas left on the disk by running:

metadb | grep c#t#d#

3. Run the 'cfgadm' command to remove the failed disk.

cfgadm -c unconfigure c#::dsk/c#t#d#

NOTE: To replace internal FC-AL disks, follow Technical Instruction < Solution: 214845 >

4. Insert and configure the new disk.

cfgadm -c configure c#::dsk/c#t#d#

cfgadm -al (just to confirm that disk is configured properly)

5. Run 'format' or 'prtvtoc' to put the desired partition table on the new disk

6. If necessary, recreate any replicas on the new disk:

metadb -a c#t#d#s#

7. Run metareplace to enable and resync the new disk*.

metareplace -e c#t#d#s#

8. Run 'metadevadm' on the disk, which will update the New DevID.

metadevadm -u c#t#d#

Note: Due to CR 4808079, a disk can show up as "unavailable" in themetastat command after running Step 7. To resolve this, run "metastat -i".After running this command, the device should show a metastat status of "Okay".

EXAMPLES

The following two examples illustrate the commands and sample outputs of theabove procedures.

Example 1: Replacing a Mirrored Disk In this example, a Netra[TM] t 1400 Server has only one SCSI controller, with 4disks. SVM is used to mirror both the root and the swap devices between c0t0d0and c0t2d0. The disk c0t2d0 is failing and needs to be replaced.

Here is the 'format' display before the submirror disk replacement:

format

AVAILABLE DISK SELECTIONS:

0. c0t0d0

/pci@1f,4000/scsi@3/sd@0,0

1. c0t1d0

/pci@1f,4000/scsi@3/sd@1,0

2. c0t2d0

/pci@1f,4000/scsi@3/sd@2,0

3. c0t3d0

/pci@1f,4000/scsi@3/sd@3,0

Here is the 'cfgadm' display for controller c0:

cfgadm -al

Ap_Id Type Receptacle Occupant Condition

c0 scsi-bus connected configured unknown

c0::dsk/c0t0d0 disk connected configured unknown

c0::dsk/c0t1d0 disk connected configured unknown

c0::dsk/c0t2d0 disk connected configured unknown

c0::dsk/c0t3d0 disk connected configured unknown

Here is the output of the 'metadb' command, showing the locations of the SVMdatabase replicas. There is one on each disk.

metadb

flags first blk block count

a u 16 8192 /dev/dsk/c0t0d0s7

a u 16 8192 /dev/dsk/c0t1d0s7

a u 16 8192 /dev/dsk/c0t2d0s7

a u 16 8192 /dev/dsk/c0t3d0s7

Here is the SVM configuration before the submirror disk replacement. Note: The DevID information is at the bottom.

metastat

d0: Mirror

Submirror 0: d10

State: Okay

Submirror 1: d20

State: Needs maintenance

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 6295232 blocks (3.0 GB)

d10: Submirror of d0

State: Okay

Size: 6295232 blocks (3.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t0d0s0 0 No Okay Yes

d20: Submirror of d0

State: Needs maintenance

Invoke: metareplace d20 c0t2d0s0

Size: 6295232 blocks (3.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t2d0s0 0 No Maintenance Yes

d1: Mirror

Submirror 0: d11

State: Okay

Submirror 1: d21

State: Needs maintenance

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 2101552 blocks (1.0 GB)

d11: Submirror of d1

State: Okay

Size: 2101552 blocks (1.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t0d0s1 0 No Okay Yes

d21: Submirror of d1

State: Needs maintenance

Invoke: metareplace d21 c0t2d0s1

Size: 2101552 blocks (1.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t2d0s1 0 No Maintenance Yes

Device Relocation Information:

Device Reloc Device ID

c0t2d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____

c0t0d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR795377000010210UN3

Since c0t2d0 is the drive that needs to be replaced, use 'metadetach' and 'metaclear' to detach and remove the bad submirrors from that disk.

metadetach -f d0 d20

d0: submirror d20 is detached

metadetach -f d1 d21

d1: submirror d21 is detached

metaclear d20

d20: Concat/Stripe is cleared

metaclear d21

d21: Concat/Stripe is cleared

Here is the 'metastat' output after detaching and removing d20 and d21:

d0: Mirror

Submirror 0: d10

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 6295232 blocks (3.0 GB)

d10: Submirror of d0

State: Okay

Size: 6295232 blocks (3.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t0d0s0 0 No Okay Yes

d1: Mirror

Submirror 0: d11

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 2101552 blocks (1.0 GB)

d11: Submirror of d1

State: Okay

Size: 2101552 blocks (1.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t0d0s1 0 No Okay Yes

Device Relocation Information:

Device Reloc Device ID

c0t2d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____

c0t0d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR795377000010210UN3

Since there is a database replica on the disk to be removed, remove it using:

metadb -d c0t2d0s7

and then remove the failed disk from the system using:

cfgadm -c unconfigure c0::dsk/c0t2d0

After the disk has been physically replaced, use the 'cfgadm' command toconfigure the new disk:

cfgadm -c configure c0::dsk/c0t2d0

and then confirm that the new disk has been configured:

cfgadm -al

Ap_Id Type Receptacle Occupant Condition

c0 scsi-bus connected configured unknown

c0::dsk/c0t0d0 disk connected configured unknown

c0::dsk/c0t1d0 disk connected configured unknown

c0::dsk/c0t2d0 disk connected configured unknown

c0::dsk/c0t3d0 disk connected configured unknown

Then run 'format' to put the appropriate partition table onto the disk.

format

[ the steps to create a valid partition table have been left

out for brevity ]

Run 'metadb' to recreate the replica that we removed from the disk:

metadb -a c0t2d0s7

and run 'metainit' to recreate the metadevices that were previously removed and'metattach' to reattach them to their respective mirrors.

metainit d20 1 1 c0t2d0s0

d20: Concat/Stripe is setup

metainit d21 1 1 c0t2d0s1

d21: Concat/Stripe is setup

metattach d0 d20

d0: submirror d20 is attached

metattach d1 d21

d1: submirror d21 is attached

Running a 'metastat' command will now show the NEW DeviceID for disk c0t2d0:

metastat

d0: Mirror

Submirror 0: d10

State: Okay

Submirror 1: d20

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 6295232 blocks (3.0 GB)

d10: Submirror of d0

State: Okay

Size: 6295232 blocks (3.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t0d0s0 0 No Okay Yes

d20: Submirror of d0

State: Okay

Size: 6295232 blocks (3.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t2d0s0 0 No Okay Yes

d1: Mirror

Submirror 0: d11

State: Okay

Submirror 1: d21

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 2101552 blocks (1.0 GB)

d11: Submirror of d1

State: Okay

Size: 2101552 blocks (1.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t0d0s1 0 No Okay Yes

d21: Submirror of d1

State: Okay

Size: 2101552 blocks (1.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c0t2d0s1 0 No Okay Yes

Device Relocation Information:

Device Reloc Device ID

c0t0d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526873____

c0t2d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

After the new disk is attached to the mirror disk, it will be resynchronized.

Run 'metadevadm' to update the SVM database with the new DevID information.Here we see the old DevID and the new DevID are same since the DevID has been updated by removing and recreating all metadevices and SVM database replicas on c0t2d0:

metadevadm -u c0t2d0

Updating Solaris Volume Manager device relocation information for c0t2d0

Old device reloc information:

id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

New device reloc information:

id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

Once the resynchronization process is completed, the mirror disk will be back tofully redundant mode.

Example 2: Replacing a Disk used in only RAID5 metadevice(s) In this example, a Netra[TM] t 1400 server has only one SCSI controller with 4disks. A RAID5 SVM configuration is set up across three disks - c0t1d0, c0t2d0and c0t3d0. The disk c0t2d0 is failing, and needs to be replaced.

Here is the 'format' display before the submirror disk replacement:

format

AVAILABLE DISK SELECTIONS:

0. c0t0d0

/pci@1f,4000/scsi@3/sd@0,0

1. c0t1d0

/pci@1f,4000/scsi@3/sd@1,0

2. c0t2d0

/pci@1f,4000/scsi@3/sd@2,0

3. c0t3d0

/pci@1f,4000/scsi@3/sd@3,0

Here is the 'cfgadm' display for controller c0:

cfgadm -al

Ap_Id Type Receptacle Occupant Condition

c0 scsi-bus connected configured unknown

c0::dsk/c0t0d0 disk connected configured unknown

c0::dsk/c0t1d0 disk connected configured unknown

c0::dsk/c0t2d0 disk connected configured unknown

c0::dsk/c0t3d0 disk connected configured unknown

Here is the output of the 'metadb' command, showing the locations of the SVMdatabase replicas. There is one on each disk.

metadb

flags first blk block count

a u 16 8192 /dev/dsk/c0t0d0s7

a u 16 8192 /dev/dsk/c0t1d0s7

a u 16 8192 /dev/dsk/c0t2d0s7

a u 16 8192 /dev/dsk/c0t3d0s7

Here is the SVM configuration before the disk replacement. Note: The DevID information at the bottom.

metastat

d3: RAID

State: Needs Maintenance

Invoke: metareplace d3 c0t2d0s7

Interlace: 32 blocks

Size: 2077992 blocks (1014 MB)

Original device:

Size: 2081984 blocks (1016 MB)

Device Start Block Dbase State Reloc Hot Spare

c0t1d0s5 9754 No Okay Yes

c0t2d0s5 9754 No Maintenance Yes

c0t3d0s5 9754 No Okay Yes

Device Relocation Information:

Device Reloc Device ID

c0t0d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR795377000010210UN3

c0t1d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526873____

c0t2d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____

c0t3d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526842____

Since c0t2d0 is the drive that needs to be replaced, and since the only otherthing on this disk is the SVM replica, remove the existing replica on diskc0t2d0 using:

metadb -d c0t2d0s7

and use the 'cfgadm' command to remove the failed disk from the system:

cfgadm -c unconfigure c0::dsk/c0t2d0

After the disk has been physically replaced, we use 'cfgadm' to configure thenew disk:

cfgadm -c configure c0::dsk/c0t2d0

and then confirm that the new disk has been configured:

cfgadm -al

Ap_Id Type Receptacle Occupant Condition

c0 scsi-bus connected configured unknown

c0::dsk/c0t0d0 disk connected configured unknown

c0::dsk/c0t1d0 disk connected configured unknown

c0::dsk/c0t2d0 disk connected configured unknown

c0::dsk/c0t3d0 disk connected configured unknown

Then, run 'format' to put the appropriate partition table onto the disk.

format

[ the steps to create a valid partition table have been left

out for brevity ]

Run 'metadb' to recreate the replica that we removed fromthe disk:

metadb -a c0t2d0s7

Run 'metareplace' to add the new disk into the RAID5 device, and for a resyncto occur:

metareplace -e d3 c0t2d0s5

Run 'metadevadm' to update the SVM database with the new DevID information.Here, the old DevID and the new DevID can be seen:

metadevadm -u c0t2d0

Old device reloc information:

id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____

New device reloc information:

id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

Running a 'metastat' command will now show the NEW DeviceID for disk c0t2d0:

metastat

d3: RAID

State: Okay

Interlace: 32 blocks

Size: 2077992 blocks (1014 MB)

Original device:

Size: 2081984 blocks (1016 MB)

Device Start Block Dbase State Reloc Hot Spare

c0t1d0s5 9754 No Okay Yes

c0t2d0s5 9754 No Okay Yes

c0t3d0s5 9754 No Okay Yes

Device Relocation Information:Device Reloc Device IDc0t1d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526873____c0t2d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0c0t3d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526842____

ProductSolstice DiskSuite 4.2.1

Solaris Volume Manager Software

Keywordsmetadevadm, replace disk, svm, mirrored disk

Previously Published As73132

Product_uuida9c6eca6-2bd5-11d6-9284-e4d8d3a761a9|Solstice DiskSuite 4.2.1

18628121-c84b-11d7-9de1-080020a9ed93|Solaris Volume Manager Software