7
Guide to Resolving Drive Problems IMPORTANT: Also, ensure that you include the letters ADIC somewhere in the title of any bridge cases that are raised for ADIC/GRAU drive problems – we need this in order to provide stats for management meetings with the people from ADIC. The bridge cases raised for drive on any of the Netbackup master or media servers (except for the 3 CPR2 boxes – caam02, lsam02 and cdam02 for which we handle the callout) MUST go directly to CWTPOPS who arrange callout regardless of which site is affected. 2 commands will tell you whether a tape is stuck in a drive: vmoprcmd: ipcscs0 $ vmoprcmd PENDING REQUESTS <NONE> DRIVE STATUS Drv Type Control User Label RVSN EVSN Ready Wr.Enbl. ReqId 0 hcart2 DOWN-TLM - No - - 1 hcart2 TLM root Yes YIP265 YIP265 Yes Yes 0 ADDITIONAL DRIVE STATUS Drv DriveName Multihost Assigned Comment 0 Drive0 No - Automated Notification 1 Drive1 No ipcscs0 As you can see, drive index (Drv) 0 is ‘DOWN’ as far as NetBackup is concerned, this means that no tape mounting will occur until it is ‘UP’. There is a tape in drive index 1 – YIP265 (RVSN is the internal label, EVSN is the external label of the cart, always refer to the external id). 1

Guide to Resolving Drive Problems

Embed Size (px)

Citation preview

Page 1: Guide to Resolving Drive Problems

Guide to Resolving Drive Problems

IMPORTANT: Also, ensure that you include the letters ADIC somewhere in the title of any bridge cases that are raised for ADIC/GRAU drive problems – we need this in order to provide stats for management meetings with the people from ADIC. The bridge cases raised for drive on any of the Netbackup master or media servers (except for the 3 CPR2 boxes – caam02, lsam02 and cdam02 for which we handle the callout) MUST go directly to CWTPOPS who arrange callout regardless of which site is affected.

2 commands will tell you whether a tape is stuck in a drive:

vmoprcmd:

ipcscs0 $ vmoprcmd

PENDING REQUESTS

<NONE>

DRIVE STATUS

Drv Type Control User Label RVSN EVSN Ready Wr.Enbl. ReqId 0 hcart2 DOWN-TLM - No - - 1 hcart2 TLM root Yes YIP265 YIP265 Yes Yes 0

ADDITIONAL DRIVE STATUS

Drv DriveName Multihost Assigned Comment 0 Drive0 No - Automated Notification 1 Drive1 No ipcscs0

As you can see, drive index (Drv) 0 is ‘DOWN’ as far as NetBackup is concerned, this means that no tape mounting will occur until it is ‘UP’.

There is a tape in drive index 1 – YIP265 (RVSN is the internal label, EVSN is the external label of the cart, always refer to the external id).

The Control column will indicate whether you are dealing with an ADIC/GRAU library (TLM) or an IBM library (TLH). If you ever see AVR under this column, then immediate attention is required, as the library has dropped into pause mode and no drives will be available for backups.

The second command to verify the status of the drives is dependant on whether it is an IBM library or an ADIC/GRAU library:

IBM – use mtlib –l /dev/lmcp0 –qL - the output is too detailed to copy into this document, however, the first couple of lines should look like this:

Library Data: operational state..........Automated Operational State

1

Page 2: Guide to Resolving Drive Problems

If the library is not working, it will state ‘Intervention Required’ alongside operational state.

ADIC/GRAU – use dasadmin commands.ipcscs0 $ dasadmin ld ipcscs0

listd for client: ipcscs0 successfuldrive: DRIVE23 amu drive: 23 st: UP type: N sysid: client: ipcscs0 volser: YIP265 cleaning 0 clean_count: 15drive: DRIVE24 amu drive: 24 st: UP type: N sysid: client: ipcscs0 volser: WIP217 cleaning 0 clean_count: 5

Compare the output above with the output from the Vmoprcmd command and you will see that both YIP265 and WIP217 are mounted ‘physically’ – as dasadmin interrogates the library directly. So in order to get a true picture, it is usually best to trust the output from this command.

As such, you will see that WIP217 is mounted in drive 24 according to dasadmin, but not according to vmoprcmd, therefore there is an obvious problem with this tape.

This is the most common problem – a tape that is known to dasadmin but not known to vmoprcmd. Usually this means that the tape has become stuck on the spooler and will need to be dismounted.

The first command to try is:

dasadmin dismount –t 3590 AAA123 (tape id)

The –t 3590 syntax is necessary where the default tape type is anything but 3590, so it’s useful to get into the habit of using it.

If the tape had managed to rewind itself before it became stuck, then this command will work, otherwise you will have to arrange for engineer callout to be made, a message such as ‘the drive did not eject a volume’ will eventually be returned.

If the dasadmin dismount command does not work, you could contact the onsite Ops team and ask them to power cycle the drive at fault.

However,

In the event of a required callout, it is recommended to put in as much information as possible regarding the library and drive number. For identifying the library, use the NetBackup Interactive Reporting page at http://byadsm03.nat.bt.com and search though library information, this highlights each library and the servers that it supports.

You should already know the offending drive by the output from the dasadmin ld command.

2

Page 3: Guide to Resolving Drive Problems

Tpconfig command

This command shows the drives on the system.

$ tpconfig –d

Index DriveName DrivePath Type Shared Status***** ********* ********** **** ****** ****** 0 TLMDrive9 /dev/rmt/13cbn hcart No UP TLM(4) DAS Drive Name=DRIVE9 1 TLMDrive11 /dev/rmt/1cbn hcart No UP TLM(4) DAS Drive Name=DRIVE11 2 IBMDrive19 /dev/rmt/5cbn hcart2 No UP TLH(1) IBM Device Name=003590E1A18 3 Drive0 /dev/rmt/0cbn 8mm No UP TLM(2) DAS Drive Name=DRIVE999 4 IBMDrive20 /dev/rmt/8cbn hcart2 No UP TLH(1) IBM Device Name=003590E1A19

Currently defined robotics are: TLH(1) library name = 3494b, volume database host = ipadsm01 TLM(2) DAS server = amu2, volume database host = ipadsm01 TLM(4) DAS server = amu, volume database host = ipadsm01

Tape stuck in a drive follow the following commands

(as root)

mt –f <pathname /dev/rmt/ (from tpconfig –d)> status

# mt -f /dev/rmt/8cbn status/dev/rmt/8cbn: Device busy#

If the tape is stuck you need to unload it # mt -f /dev/rmt/8cbn rewoffl

If this does not work powecycle the drive by the site ops NOTE MUST NOT BE AN LTO DRIVEIf this does not work call out engineer to fix drive

offline, rewoffl Rewind the tape and, if appropriate, take the drive unit off-line by unloading the tape. It cycles through all four tapes.

3

Page 4: Guide to Resolving Drive Problems

status Print status information about the tape unit.

Glossary

Below is a list of useful commands for identifying drive problems:

vmoprcmd – shows what tapes are in use by backups (this is how NetBackup sees the library)

dasadmin ld hostname – show what tapes are mounted and should ALWAYS be cross-referenced with vmoprcmd.

dasadmin dismount –t 3590 ABC123 – will dismount any tape that has been rewound

dasadmin eject –c –t 3590 ABC123 E01 – will eject a tae from the library (this will NOT dismount a tape from a drive, it will simply take the tape from it’s slot in the silo (library) and will place it in the ‘eject area’ from which the relevant media group can retrieve the tape. Note that you must always use the –c flag with this command, this clears the slot for use by another tape.

dasadmin insert I01 – will take all tapes from the ‘input hopper’ and insert them into free slots in the silo (library)

mtlib –l /dev/lmcp0 –qL – will show the status of an IBM library. Note that drive failures are relatively infrequent on the IBM libraries, which is why we have ‘skirted’ around the commands for it.

mtlib –l /dev/lmcp0 –q –V –V ABC123 – will display the status of a tape known to the IBM library

Robtest

We also have access to the robtest function, which gives you sole control of the robot which is why it must be used with care – while ever you are in robtest no further tape mounting can be done.

You can use robtest to dismount tapes, check for mounted tapes, generally everything that you can do from the command line, however, you have more privileges.

To enter robtest, simply type robtest from the command line, this will give you a prompt, listing all the available robot types (i.e. TLM0, this is useful if you are unsure as to what libraries a server is connected to).

The main commands used with robtest are:

libstat – shows the status of the library

4

Page 5: Guide to Resolving Drive Problems

drstat – shows all of the drives and what is presently mounted cancel <request_id> - Cancel server requestclient <new_client_name> - Print client status or change DAS client namedm <volser> - Dismount volsereject <area> <vol> [<vol>] - Eject volsers in range to eject areaf <drivenum> - Force dismount (SCSI unload may be required)list - Print outstanding requestsinit - Initialize the DAS ACI libraryinsert <area> <type> - Insert volsers from insert areainventory - Inventory (audit) the AMLm <volser> <drivenum> - Mount volserquery [<volser>] [<volser>] - Print volsers in rangetypes - Print list of media typesversion - Print software versionsview <volser> - Print volser database entry

Always remember to logout of robtest cleanly otherwise it will have an effect on backups.

5