SVC Best Pratices

ibm.com/redbooks

Front cover

SAN Volume ControllerBest Practices and Performance Guidelines

Jon TateKatja Gebuhr

Alex HowellNik Kjeldsen

Read about best practices learned from the field

Learn about SVC performance advantages

Fine-tune your SVC

http://www.redbooks.ibm.com/


International Technical Support Organization

SAN Volume Controller Best Practices and Performance Guidelines

December 2008

SG24-7521-01

© Copyright International Business Machines Corporation 2008. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP ScheduleContract with IBM Corp.

Second Edition (December 2008)

This edition applies to Version 4, Release 3, Modification 0 of the IBM System Storage SAN Volume Controller.

Note: Before using this information and the product it supports, read the information in “Notices” on page xi.

Contents

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiThe team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xvComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiDecember 2008, Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Chapter 1. SAN fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 SVC SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Topology basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.3 ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.4 Single switch SVC SANs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.5 Basic core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.6 Four-SAN core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.7 Common topology issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.1 Selecting SAN switch models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.2 Switch port layout for large edge SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.3 Switch port layout for director-class SAN switches . . . . . . . . . . . . . . . . . . . . . . . . 101.2.4 IBM System Storage/Brocade b-type SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.5 IBM System Storage/Cisco SANs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.6 SAN routing and duplicate WWNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1 Types of zoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.2 Pre-zoning tips and shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.3 SVC intra-cluster zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.4 SVC storage zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.5 SVC host zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.6 Sample standard SVC zoning configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3.7 Zoning with multiple SVC clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3.8 Split storage subsystem configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.4 Switch Domain IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5 Distance extension for mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.5.1 Optical multiplexors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5.2 Long-distance SFPs/XFPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.5.3 Fibre Channel: IP conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6 Tape and disk traffic sharing the SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.7 Switch interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.8 TotalStorage Productivity Center for Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Chapter 2. SAN Volume Controller cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1 Advantages of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.1 How does the SVC fit into your environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Scalability of SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

© Copyright IBM Corp. 2008. All rights reserved. iii

2.2.1 Advantage of multi-cluster as opposed to single cluster . . . . . . . . . . . . . . . . . . . . 252.2.2 Performance expectations by adding an SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2.3 Growing or splitting SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 SVC performance scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.4 Cluster upgrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Chapter 3. SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1 SVC Console installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.1.1 Software only installation option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.1.2 Combined software and hardware installation option . . . . . . . . . . . . . . . . . . . . . . 393.1.3 SVC cluster software and SVC Console compatibility . . . . . . . . . . . . . . . . . . . . . 393.1.4 IP connectivity considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2 Using the SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2.1 SSH connection limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2.2 Managing multiple SVC clusters using a single SVC Console . . . . . . . . . . . . . . . 443.2.3 Managing an SVC cluster using multiple SVC Consoles . . . . . . . . . . . . . . . . . . . 453.2.4 SSH key management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2.5 Administration roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2.6 Audit logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.2.7 IBM Support remote access to the SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . 503.2.8 SVC Console to SVC cluster connection problems . . . . . . . . . . . . . . . . . . . . . . . 503.2.9 Managing IDs and passwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.2.10 Saving the SVC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2.11 Restoring the SVC cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4. Storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.1 Controller affinity and preferred path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.1.1 ADT for DS4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.1.2 Ensuring path balance prior to MDisk discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Pathing considerations for EMC Symmetrix/DMX and HDS . . . . . . . . . . . . . . . . . . . . . 594.3 LUN ID to MDisk translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3.1 ESS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3.2 DS6000 and DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4 MDisk to VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.5 Mapping physical LBAs to VDisk extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.1 Investigating a medium error using lsvdisklba . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.5.2 Investigating Space-Efficient VDisk allocation using lsmdisklba. . . . . . . . . . . . . . 63

4.6 Medium error logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.6.1 Host-encountered media errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.6.2 SVC-encountered medium errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.7 Selecting array and cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.7.1 DS4000 array width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.7.2 Segment size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.7.3 DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.8 Considerations for controller configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.8.1 Balancing workload across DS4000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 674.8.2 Balancing workload across DS8000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 684.8.3 DS8000 ranks to extent pools mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.8.4 Mixing array sizes within an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.8.5 Determining the number of controller ports for ESS/DS8000 . . . . . . . . . . . . . . . . 714.8.6 Determining the number of controller ports for DS4000 . . . . . . . . . . . . . . . . . . . . 72

4.9 LUN masking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.10 WWPN to physical port translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

iv SAN Volume Controller Best Practices and Performance Guidelines

4.11 Using TPC to identify storage controller boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 754.12 Using TPC to measure storage controller performance . . . . . . . . . . . . . . . . . . . . . . . 77

4.12.1 Normal operating ranges for various statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 784.12.2 Establish a performance baseline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.12.3 Performance metric guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.12.4 Storage controller back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Chapter 5. MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.1 Back-end queue depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 MDisk transfer size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2.1 Host I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.2.2 FlashCopy I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.2.3 Coalescing writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Selecting LUN attributes for MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.4 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.5 Adding MDisks to existing MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.5.1 Adding MDisks for capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.5.2 Checking access to new MDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.5.3 Persistent reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.5.4 Renaming MDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.6 Restriping (balancing) extents across an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.6.1 Installing prerequisites and the SVCTools package . . . . . . . . . . . . . . . . . . . . . . . 895.6.2 Running the extent balancing script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.7 Removing MDisks from existing MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.7.1 Migrating extents from the MDisk to be deleted . . . . . . . . . . . . . . . . . . . . . . . . . . 925.7.2 Verifying an MDisk’s identity before removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.8 Remapping managed MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.9 Controlling extent allocation order for VDisk creation . . . . . . . . . . . . . . . . . . . . . . . . . . 965.10 Moving an MDisk between SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Chapter 6. Managed disk groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.1 Availability considerations for MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.1.1 Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.1.2 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.2 Selecting the number of LUNs per array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.2.1 Performance comparison of one LUN compared to two LUNs per array . . . . . . 105

6.3 Selecting the number of arrays per MDG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.4 Striping compared to sequential type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.5 SVC cache partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.6 SVC quorum disk considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.7 Selecting storage subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Chapter 7. VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197.1 New features in SVC Version 4.3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.1.1 Real and virtual capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.1.2 Space allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.1.3 Space-Efficient VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.1.4 Testing an application with Space-Efficient VDisk . . . . . . . . . . . . . . . . . . . . . . . 1217.1.5 What is VDisk mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.1.6 Creating or adding a mirrored VDisk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.1.7 Availability of mirrored VDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.1.8 Mirroring between controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2 Creating VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.2.1 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Contents v

7.2.2 Changing the preferred node within an I/O Group . . . . . . . . . . . . . . . . . . . . . . . 1247.2.3 Moving a VDisk to another I/O Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.3 VDisk migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267.3.1 Migrating with VDisk mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.3.2 Migrating across MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.3.3 Image type to striped type migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.3.4 Migrating to image type VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.3.5 Preferred paths to a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.3.6 Governing of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7.4 Cache-disabled VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1337.4.1 Underlying controller remote copy with SVC cache-disabled VDisks . . . . . . . . . 1347.4.2 Using underlying controller PiT copy with SVC cache-disabled VDisks . . . . . . . 1347.4.3 Changing cache mode of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.5 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387.5.1 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.6 The effect of load on storage controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Chapter 8. Copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1518.1 SAN Volume Controller Advanced Copy Services functions. . . . . . . . . . . . . . . . . . . . 152

8.1.1 Setting up FlashCopy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.1.2 Steps to making a FlashCopy VDisk with application data integrity . . . . . . . . . . 1538.1.3 Making multiple related FlashCopy VDisks with data integrity . . . . . . . . . . . . . . 1568.1.4 Creating multiple identical copies of a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.1.5 Creating a FlashCopy mapping with the incremental flag. . . . . . . . . . . . . . . . . . 1588.1.6 Space-Efficient FlashCopy (SEFC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1598.1.7 Using FlashCopy with your backup application. . . . . . . . . . . . . . . . . . . . . . . . . . 1598.1.8 Using FlashCopy for data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1608.1.9 Summary of FlashCopy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.2 Metro Mirror and Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1628.2.1 Using both Metro Mirror and Global Mirror between two clusters . . . . . . . . . . . . 1628.2.2 Performing three-way copy service functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1628.2.3 Using native controller Advanced Copy Services functions . . . . . . . . . . . . . . . . 1638.2.4 Configuration requirements for long distance links . . . . . . . . . . . . . . . . . . . . . . . 1648.2.5 Saving bandwidth creating Metro Mirror and Global Mirror relationships . . . . . . 1658.2.6 Global Mirror guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1678.2.7 Migrating a Metro Mirror relationship to Global Mirror. . . . . . . . . . . . . . . . . . . . . 1698.2.8 Recovering from suspended Metro Mirror or Global Mirror relationships . . . . . . 1708.2.9 Diagnosing and fixing 1920 errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1708.2.10 Using Metro Mirror or Global Mirror with FlashCopy. . . . . . . . . . . . . . . . . . . . . 1728.2.11 Using TPC to monitor Global Mirror performance. . . . . . . . . . . . . . . . . . . . . . . 1738.2.12 Summary of Metro Mirror and Global Mirror rules. . . . . . . . . . . . . . . . . . . . . . . 174

Chapter 9. Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759.1 Configuration recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

9.1.1 The number of paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1769.1.2 Host ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779.1.3 Port masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779.1.4 Host to I/O Group mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1789.1.5 VDisk size as opposed to quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1789.1.6 Host VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1789.1.7 Server adapter layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1829.1.8 Availability as opposed to error isolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

9.2 Host pathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

vi SAN Volume Controller Best Practices and Performance Guidelines

9.2.1 Preferred path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1839.2.2 Path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1839.2.3 Path management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1849.2.4 Dynamic reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1859.2.5 VDisk migration between I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

9.3 I/O queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1889.3.1 Queue depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.4 Multipathing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1909.5 Host clustering and reserves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

9.5.1 AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1939.5.2 SDD compared to SDDPCM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1969.5.3 Virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1979.5.4 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1999.5.5 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2009.5.6 Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2019.5.7 VMware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

9.6 Mirroring considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2049.6.1 Host-based mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

9.7 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2049.7.1 Automated path monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2059.7.2 Load measurement and stress tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Chapter 10. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20710.1 Application workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

10.1.1 Transaction-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20810.1.2 Throughput-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20810.1.3 Storage subsystem considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20910.1.4 Host considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

10.2 Application considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20910.2.1 Transaction environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21010.2.2 Throughput environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

10.3 Data layout overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21110.3.1 Layers of volume abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21110.3.2 Storage administrator and AIX LVM administrator roles . . . . . . . . . . . . . . . . . . 21210.3.3 General data layout recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21210.3.4 Database strip size considerations (throughput workload) . . . . . . . . . . . . . . . . 21510.3.5 LVM volume groups and logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

10.4 When the application does its own balancing of I/Os . . . . . . . . . . . . . . . . . . . . . . . . 21610.4.1 DB2 I/O characteristics and data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 21610.4.2 DB2 data layout example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21710.4.3 SVC striped VDisk recommendation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

10.5 Data layout with the AIX virtual I/O (VIO) server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21810.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21910.5.2 Data layout strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

10.6 VDisk size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22010.7 Failure boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Chapter 11. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22111.1 Configuring TPC to analyze the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22211.2 Using TPC to verify the fabric topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

11.2.1 SVC node port connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22311.2.2 Ensuring that all SVC ports are online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22511.2.3 Verifying SVC port zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

Contents vii

11.2.4 Verifying paths to storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22811.2.5 Verifying host paths to the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

11.3 Analyzing performance data using TPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23311.3.1 Setting up TPC to collect performance information. . . . . . . . . . . . . . . . . . . . . . 23411.3.2 Viewing TPC-collected information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23411.3.3 Cluster, I/O Group, and node reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23511.3.4 Managed Disk Group, Managed Disk, and Volume reports . . . . . . . . . . . . . . . 24011.3.5 Using TPC to alert on performance constraints . . . . . . . . . . . . . . . . . . . . . . . . 24111.3.6 Monitoring MDisk performance for mirrored VDisks . . . . . . . . . . . . . . . . . . . . . 242

11.4 Monitoring the SVC error log with e-mail notifications. . . . . . . . . . . . . . . . . . . . . . . . 24311.4.1 Verifying a correct SVC e-mail configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Chapter 12. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24512.1 Configuration and change tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

12.1.1 SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24612.1.2 SVC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24912.1.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24912.1.4 General inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24912.1.5 Change tickets and tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25012.1.6 Configuration archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

12.2 Standard operating procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25112.3 Code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

12.3.1 Upgrade code levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25312.3.2 Upgrade frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25412.3.3 Upgrade sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25412.3.4 Preparing for upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25412.3.5 SVC upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25512.3.6 Host code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25612.3.7 Storage controller upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

12.4 SAN hardware changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25612.4.1 Cross-referencing the SDD adapter number with the WWPN . . . . . . . . . . . . . 25612.4.2 Changes that result in the modification of the destination FCID . . . . . . . . . . . . 25712.4.3 Switch replacement with a like switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25812.4.4 Switch replacement or upgrade with a different kind of switch . . . . . . . . . . . . . 25912.4.5 HBA replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

12.5 Naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25912.5.1 Hosts, zones, and SVC ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25912.5.2 Controllers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26012.5.3 MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26012.5.4 VDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26012.5.5 MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

Chapter 13. Cabling, power, cooling, scripting, support, and classes . . . . . . . . . . . 26113.1 Cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

13.1.1 General cabling advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26213.1.2 Long distance optical links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26213.1.3 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26213.1.4 Cable management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26313.1.5 Cable routing and support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26313.1.6 Cable length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26413.1.7 Cable installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

13.2 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26413.2.1 Bundled uninterruptible power supply units . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

viii SAN Volume Controller Best Practices and Performance Guidelines

13.2.2 Power switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26513.2.3 Power feeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

13.3 Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26513.4 SVC scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

13.4.1 Standard changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26613.5 IBM Support Notifications Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26613.6 SVC Support Web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26713.7 SVC-related publications and classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

13.7.1 IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26713.7.2 Courses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

Chapter 14. Troubleshooting and diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26914.1 Common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

14.1.1 Host problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27014.1.2 SVC problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27014.1.3 SAN problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27214.1.4 Storage subsystem problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

14.2 Collecting data and isolating the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27414.2.1 Host data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27414.2.2 SVC data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27714.2.3 SAN data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27914.2.4 Storage subsystem data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

14.3 Recovering from problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28214.3.1 Solving host problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28314.3.2 Solving SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28414.3.3 Solving SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28814.3.4 Solving back-end storage problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

14.4 Livedump. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

Chapter 15. SVC 4.3 performance highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29315.1 SVC and continual performance enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29415.2 SVC 4.3 code improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29615.3 Performance increase when upgrading to 8G4 nodes . . . . . . . . . . . . . . . . . . . . . . . 296

15.3.1 Performance scaling of I/O Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301Referenced Web sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

Contents ix

x SAN Volume Controller Best Practices and Performance Guidelines

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

© Copyright IBM Corp. 2008. All rights reserved. xi

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml

The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

1350™AIX®alphaWorks®Chipkill™DB2®DS4000™DS6000™DS8000™

Enterprise Storage Server®FlashCopy®GPFS™HACMP™IBM®Redbooks®Redbooks (logo) ®ServeRAID™

System p®System Storage™System x™System z®Tivoli Enterprise Console®Tivoli®TotalStorage®

The following terms are trademarks of other companies:

Disk Magic, and the IntelliMagic logo are trademarks of IntelliMagic BV in the United States, other countries, or both.

NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries.

Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates.

QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered trademark in the United States.

VMware, the VMware "boxes" logo and design are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions.

Solaris, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Active Directory, Internet Explorer, Microsoft, Visio, Windows NT, Windows Server, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

xii SAN Volume Controller Best Practices and Performance Guidelines

http://www.ibm.com/legal/copytrade.shtml

Preface

This IBM® Redbooks® publication captures several of the best practices based on field experience and describes the performance gains that can be achieved by implementing the IBM System Storage™ SAN Volume Controller.

This book is intended for extremely experienced storage, SAN, and SVC administrators and technicians.

Readers are expected to have an advanced knowledge of the SAN Volume Controller (SVC) and SAN environment, and we recommend these books as background reading:

� IBM System Storage SAN Volume Controller, SG24-6423� Introduction to Storage Area Networks, SG24-5470� Using the SVC for Business Continuity, SG24-7371

The team that wrote this book

This book was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center.

Jon Tate is a Project Manager for IBM System Storage SAN Solutions at the International Technical Support Organization, San Jose Center. Before joining the ITSO in 1999, he worked in the IBM Technical Support Center, providing Level 2 support for IBM storage products. Jon has 23 years of experience in storage software and management, services, and support, and is both an IBM Certified IT Specialist and an IBM SAN Certified Specialist. Jon also serves as the UK Chair of the Storage Networking Industry Association.

Katja Gebuhr is a Support Center Representative for IBM Germany in Mainz. She joined IBM in 2003 for an apprenticeship as an IT-System Business Professional and started working for the DASD Front End SAN Support in 2006. Katja provides Level 1 Hardware and Software support for SAN Volume Controller and SAN products for IMT Germany and CEMAAS.

Alex Howell is a Software Engineer in the SAN Volume Controller development team, based at IBM Hursley, UK. He has worked on SVC since the release of Version 1.1.0 in 2003, when he joined IBM as a graduate. His roles have included test engineer, developer, and development team lead. He is a development lab advocate for several SVC clients, and he has led a beta program piloting new function.

Nik Kjeldsen is an IT Specialist at IBM Global Technology Services, Copenhagen, Denmark. With a background in data networks, he is currently a Technical Solution Architect working with the design and implementation of Enterprise Storage infrastructure. Nikolaj has seven years of experience in the IT field and holds a Master’s degree in Telecommunication Engineering from the Technical University of Denmark.

© Copyright IBM Corp. 2008. All rights reserved. xiii

Figure 0-1 Authors (L-R): Katja, Alex, Nik, and Jon

We extend our thanks to the following people for their contributions to this project.

There are many people that contributed to this book. In particular, we thank the development and PFE teams in Hursley, England. Matt Smith was also instrumental in moving any issues along and ensuring that they maintained a high profile. Barry Whyte was instrumental in steering us in the correct direction and for providing support throughout the life of the residency.

The authors of the first edition of this book were:

Deon GeorgeThorsten HossRonda HrubyIan MacQuarrieBarry MellishPeter Mescher

We also want to thank the following people for their contributions:

Trevor BoardmanCarlos FuenteGary JarmanColin JewellAndrew MartinPaul MerrisonSteve RandleBill ScalesMatt SmithBarry WhyteIBM Hursley

Tom JahnIBM Germany

Peter MescherIBM Raleigh

xiv SAN Volume Controller Best Practices and Performance Guidelines

Paulo NetoIBM Portugal

Bill WiegandIBM Advanced Technical Support

Mark BalsteadIBM Tucson

Dan BradenIBM Dallas

Lloyd DeanIBM Philadelphia

Dorothy FaurotIBM Raleigh

Marci NagelIBM Rochester

Bruce McNuttIBM Tucson

Glen RoutleyIBM Australia

Dan C RumneyIBM New York

Chris SaulIBM San Jose

Brian SmithIBM San Jose

Sharon WangIBM Chicago

Deanna PolmSangam RacherlaIBM ITSO

Become a published author

Join us for a two- to six-week residency program. Help write a book dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You will have the opportunity to team with IBM technical professionals, IBM Business Partners, and Clients.

Your efforts will help increase product acceptance and client satisfaction. As a bonus, you will develop a network of contacts in IBM development labs, and increase your productivity and marketability.

Preface xv

Find out more about the residency program, browse the residency index, and apply online at:

ibm.com/redbooks/residencies.html

Comments welcome

Your comments are important to us!

We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways:

� Use the online Contact us review IBM Redbooks publications form found at:

ibm.com/redbooks

� Send your comments in an e-mail to:

[email protected]

� Mail your comments to:

IBM Corporation, International Technical Support OrganizationDept. HYTD Mail Station P0992455 South RoadPoughkeepsie, NY 12601-5400

xvi SAN Volume Controller Best Practices and Performance Guidelines

http://www.redbooks.ibm.com/residencies.html

http://www.redbooks.ibm.com/residencies.html



http://www.redbooks.ibm.com/contacts.html

Summary of changes

This section describes the technical changes made in this edition of the book and in previous editions. This edition might also include minor corrections and editorial changes that are not identified.

Summary of Changesfor SG24-7521-01for SAN Volume Controller Best Practices and Performance Guidelinesas created or updated on December 7, 2008.

December 2008, Second Edition

This revision reflects the addition, deletion, or modification of new and changed information described below.

New informationNew material:

� Space-Efficient VDisks� SVC Console� VDisk Mirroring

© Copyright IBM Corp. 2008. All rights reserved. xvii

xviii SAN Volume Controller Best Practices and Performance Guidelines

Chapter 1. SAN fabric

The IBM Storage Area Network (SAN) Volume Controller (SVC) has unique SAN fabric configuration requirements that differ from what you might be used to in your storage infrastructure. A quality SAN configuration can help you achieve a stable, reliable, and scalable SVC installation; conversely, a poor SAN environment can make your SVC experience considerably less pleasant. This chapter provides you with information to tackle this topic.

As you read this chapter, remember that this is a “best practices” book based on field experiences. Although there will be many possible (and supported) SAN configurations that do not meet the recommendations found in this chapter, we think they are not ideal configurations.

1

Note: As with any of the information in this book, you must check the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156, and IBM System Storage SAN Volume Controller Restrictions, S1003283, for limitations, caveats, updates, and so on that are specific to your environment. Do not rely on this book as the last word in SVC SAN design. Also, anyone planning for an SVC installation must be knowledgeable about general SAN design principles.

You must refer to the IBM System Storage SAN Volume Controller Support Web page for updated documentation before implementing your solution. The Web site is:

http://www.ibm.com/storage/support/2145

Note: All document citations in this book refer to the 4.3 versions of the SVC product documents. If you use a different version, refer to the correct edition of the documents.

© Copyright IBM Corp. 2008. All rights reserved. 1


1.1 SVC SAN topology

The topology requirements for the SVC do not differ too much from any other storage device. What make the SVC unique here is that it can be configured with a large number of hosts, which can cause interesting issues with SAN scalability. Also, because the SVC often serves so many hosts, an issue caused by poor SAN design can quickly cascade into a catastrophe.

1.1.1 Redundancy

One of the fundamental SVC SAN requirements is to create two (or more) entirely separate SANs that are not connected to each other over Fibre Channel in any way. The easiest way is to construct two SANs that are mirror images of each other.

Technically, the SVC supports using just a single SAN (appropriately zoned) to connect the entire SVC. However, we do not recommend this design in any production environment. In our experience, we also do not recommend this design in “development” environments either, because a stable development platform is important to programmers, and an extended outage in the development environment can cause an expensive business impact. For a dedicated storage test platform, however, it might be acceptable.

Redundancy through Cisco VSANs or Brocade Traffic Isolation ZonesSimply put, using any logical separation in a single SAN fabric to provide SAN redundancy is unacceptable for a production environment. While VSANs and Traffic Isolation Zones can provide a measure of port isolation, they are no substitute for true hardware redundancy. All SAN switches have been known to suffer from hardware or fatal software failures.

1.1.2 Topology basics

No matter the size of your SVC installation, there are a few best practices that you need to apply to your topology design:

� All SVC node ports in a cluster must be connected to the same SAN switches as all of the storage devices with which the SVC cluster is expected to communicate. Conversely,

Note: Due to the nature of Fibre Channel, it is extremely important to avoid inter-switch link (ISL) congestion. While Fibre Channel (and the SVC) can, under most circumstances, handle a host or storage array that has become overloaded, the mechanisms in Fibre Channel for dealing with congestion in the fabric itself are not effective. The problems caused by fabric congestion can range anywhere from dramatically slow response time all the way to storage access loss. These issues are common with all high-bandwidth SAN devices and are inherent to Fibre Channel; they are not unique to the SVC.

When an Ethernet network becomes congested, the Ethernet switches simply discard frames for which there is no room. When a Fibre Channel network becomes congested, the Fibre Channel switches instead stop accepting additional frames until the congestion clears, in addition to occasionally dropping frames. This congestion quickly moves “upstream” in the fabric and clogs the end devices (such as the SVC) from communicating anywhere. This behavior is referred to as head-of-line blocking, and while modern SAN switches internally have a non-blocking architecture, head-of-line-blocking still exists as a SAN fabric problem. Head-of-line-blocking can result in your SVC nodes being unable to communicate with your storage subsystems or mirror their write caches, just because you have a single congested link leading to an edge switch.

2 SAN Volume Controller Best Practices and Performance Guidelines

storage traffic and inter-node traffic must never transit an ISL, except during migration scenarios.

� High-bandwidth-utilization servers (such as tape backup servers) must also be on the same SAN switches as the SVC node ports. Putting them on a separate switch can cause unexpected SAN congestion problems. Putting a high-bandwidth server on an edge switch is a waste of ISL capacity.

� If at all possible, plan for the maximum size configuration that you ever expect your SVC installation to reach. As you will see in later parts of this chapter, the design of the SAN can change radically for larger numbers of hosts. Modifying the SAN later to accommodate a larger-than-expected number of hosts either produces a poorly-designed SAN or is difficult, expensive, and disruptive to your business, which does not mean that you need to purchase all of the SAN hardware initially, just that you need to lay out the SAN while considering the maximum size.

� Always deploy at least one “extra” ISL per switch. Not doing so exposes you to consequences from complete path loss (this is bad) to fabric congestion (this is even worse).

� The SVC does not permit the number of hops between the SVC cluster and the hosts to exceed three hops, which is typically not a problem.

1.1.3 ISL oversubscription

The IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156, specifies a suggested maximum host port to ISL ratio of 7:1. With modern 4 or 8 Gbps SAN switches, this ratio implies an average bandwidth (in one direction) per host port of approximately 57 MBps (4 Gbps). It you do not expect most of your hosts to reach anywhere near that value, it is possible to request an exception to the ISL oversubscription rule, known as a Request for Price Quotation (RPQ), from your IBM marketing representative. Before requesting an exception, however, consider the following factors:

� You must take peak loads into consideration, not average loads. For instance, while a database server might only use 20 MBps during regular production workloads, it might perform a backup at far higher data rates.

� Congestion to one switch in a large fabric can cause performance issues throughout the entire fabric, including traffic between SVC nodes and storage subsystems, even if they are not directly attached to the congested switch. The reasons for these issues are inherent to Fibre Channel flow control mechanisms, which are simply not designed to handle fabric congestion. Therefore, any estimates for required bandwidth prior to implementation must have a safety factor built into the estimate.

� On top of the safety factor for traffic expansion, implement a spare ISL or ISL trunk, as stated in 1.1.2, “Topology basics” on page 2. You need to still be able to avoid congestion if an ISL fails due to issues, such as a SAN switch line card or port blade failure.

� Exceeding the “standard” 7:1 oversubscription ration requires you to implement fabric bandwidth threshold alerts. Anytime that one of your ISLs exceeds 70%, you need to schedule fabric changes to distribute the load further.

� You need to also consider the bandwidth consequences of a complete fabric outage. While a complete fabric outage is a fairly rare event, insufficient bandwidth can turn a single-SAN outage into a total access loss event.

� Take the bandwidth of the links into account. It is common to have ISLs run faster than host ports, which obviously reduces the number of required ISLs.

Chapter 1. SAN fabric 3

The RPQ process involves a review of your proposed SAN design to ensure that it is reasonable for your proposed environment.

1.1.4 Single switch SVC SANs

The most basic SVC topology consists of nothing more than a single switch per SAN, which can be anything from a 16-port 1U switch for a small installation of just a few hosts and storage devices all the way up to a director with hundreds of ports. This design obviously has the advantage of simplicity, and it is a sufficient architecture for small to medium SVC installations.

It is preferable to use a multi-slot director-class single switch over setting up a core-edge fabric made up solely of lower end switches.

As stated in 1.1.2, “Topology basics” on page 2, keep the maximum planned size of the installation in mind if you decide to use this architecture. If you run too low on ports, expansion can be difficult.

1.1.5 Basic core-edge topology

The core-edge topology is easily recognized by most SAN architects, as illustrated in Figure 1-1 on page 5. It consists of a switch in the center (usually, a director-class switch), which is surrounded by other switches. The core switch contains all SVC ports, storage ports, and high-bandwidth hosts. It is connected via ISLs to the edge switches.

The edge switches can be of any size. If they are multi-slot directors, they are usually fitted with at least a few oversubscribed line cards/port blades, because the vast majority of hosts do not ever require line-speed bandwidth, or anything close to it. Note that ISLs must not be on oversubscribed ports.


Figure 1-1 Core-edge topology

1.1.6 Four-SAN core-edge topology

For installations where even a core-edge fabric made up of multi-slot director-class SAN switches is insufficient, the SVC cluster can be attached to four SAN fabrics instead of the normal two SAN fabrics. This design is especially useful for large, multi-cluster installations. As with a regular core-edge, the edge switches can be of any size, and multiple ISL links should be installed per switch.

As you can see in Figure 1-2 on page 6, we have attached the SVC cluster to each of four independent fabrics. The storage subsystem used also connects to all four SAN fabrics, even though this design is not required.

Core Switch Core Switch

SVC Node SVC Node

Edge Switch Edge SwitchEdge SwitchEdge Switch

Host Host

2

2

2 2 2 2

2


Figure 1-2 Four-SAN core-edge topology

While certain clients have chosen to simplify management by connecting the SANs together into pairs with a single ISL link, we do not recommend this design. With only a single ISL connecting fabrics together, a small zoning mistake can quickly lead to severe SAN congestion.

Using the SVC as a SAN bridge: With the ability to connect an SVC cluster to four SAN fabrics, it is possible to use the SVC as a bridge between two SAN environments (with two fabrics in each environment). This configuration can be useful for sharing resources between the SAN environments without merging them. Another use is if you have devices with different SAN requirements present in your installation.

When using the SVC as a SAN bridge, pay special attention to any restrictions and requirements that might apply to your installation.


SVC Node SVC Node

Edge Switch Edge SwitchEdge SwitchEdge Switch

Host Host

2 2 2 2



1.1.7 Common topology issues

In this section, we describe common topology problems that we have encountered.

Accidentally accessing storage over ISLsOne common topology mistake that we have encountered in the field is to have SVC paths from the same node to the same storage subsystem on multiple core switches that are linked together (refer to Figure 1-3). This problem is commonly encountered in environments where the SVC is not the only device accessing the storage subsystems.

Figure 1-3 Spread out disk paths

If you have this type of topology, it is extremely important to zone the SVC so that it will only see paths to the storage subsystems on the same SAN switch as the SVC nodes. Implementing a storage subsystem host port mask might also be feasible here.

Because of the way that the SVC load balances traffic between the SVC nodes and MDisks, the amount of traffic that transits your ISLs will be unpredictable and vary significantly. If you

Note: This type of topology means you must have more restrictive zoning than what is detailed in 1.3.6, “Sample standard SVC zoning configuration” on page 16.

Switch

SVC Node SVC Node

SVC-attach host Non-SVC-attach host

Switch Switch Switch

2 2

SVC -> Storage Traffic should be zoned to never

travel over these links

2 2


have the capability, you might want to use either Cisco Virtual SANs (VSANs) or Brocade Traffic Isolation to help enforce the separation.

Accessing storage subsystems over an ISL on purposeThis practice is explicitly advised against in the SVC configuration guidelines, because the consequences of SAN congestion to your storage subsystem connections can be quite severe. Only use this configuration in SAN migration scenarios, and when doing so, closely monitor the performance of the SAN.

SVC I/O Group switch splittingClients often want to attach another I/O Group to an existing SVC cluster to increase the capacity of the SVC cluster, but they lack the switch ports to do so. If this situation happens to you, there are two options:

� Completely overhaul the SAN during a complicated and painful redesign.

� Add a new core switch, and inter-switch link the new I/O Group and the new switch back to the original, as illustrated in Figure 1-4.

Figure 1-4 Proper I/O Group splitting

Old Switch

SVC Node SVC Node

Host Host

New Switch Old Switch New Switch

SVC -> Storage Traffic should be zoned and

masked to never travel over these links, but they should be zoned for intra-Cluster communications

SVC Node SVC Node

Old I/O Group New I/O Group

2 2 2 2 2 2 22


This design is a valid configuration, but you must take certain precautions:

� As stated in “Accidentally accessing storage over ISLs” on page 7, the zone and Logical Unit Number (LUN) mask the SAN and storage subsystems, so that you do not access the storage subsystems over the ISLs. This design means that your storage subsystems will need connections to both the old and new SAN switches.

� Have two dedicated ISLs between the two switches on each SAN with no data traffic traveling over them. The reason for this design is because if this link ever becomes congested or lost, you might experience problems with your SVC cluster if there are also issues at the same time on the other SAN. If you can, set a 5% traffic threshold alert on the ISLs so that you know if a zoning mistake has allowed any data traffic over the links.

1.2 SAN switches

In this section, we discuss several considerations when you select the Fibre Channel (FC) SAN switches for use with your SVC installation. It is important to understand the features offered by the various vendors and associated models in order to meet design and performance goals.

1.2.1 Selecting SAN switch models

In general, there are two “classes” of SAN switches: fabric switches and directors. While normally based on the same software code and Application Specific Integrated Circuit (ASIC) hardware platforms, there are differences in performance and availability. Directors feature a slotted design and have component redundancy on all active components in the switch chassis (for instance, dual-redundant switch controllers). A SAN fabric switch (or just a SAN switch) normally has a fixed port layout in a non-slotted chassis (there are exceptions to this rule though, such as the IBM/Cisco MDS9200 series, which features a slotted design). Regarding component redundancy, both fabric switches and directors are normally equipped with redundant, hot-swappable environmental components (power supply units and fans).

In the past, over-subscription on the SAN switch ports had to be taken into account when selecting a SAN switch model. Over-subscription here refers to a situation in which the combined maximum port bandwidth of all switch ports is higher than what the switch internally can switch. For directors, this number can vary for different line card/port blade options, where a high port-count module might have a higher over-subscription rate than a low port-count module, because the capacity toward the switch backplane is fixed. With the latest generation SAN switches (both fabric switches and directors), this issue has become less important due to increased capacity in the internal switching. This situation is true for both switches with an internal crossbar architecture and switches realized by an internal core/edge ASIC lineup.

For modern SAN switches (both fabric switches and directors), processing latency from ingress to egress port is extremely low and is normally negligible.

When selecting the switch model, try to take the future SAN size into consideration. It is generally better to initially get a director with only a few port modules instead of having to implement multiple smaller switches. Having a high port-density director instead of a number

Note: It is not a best practice to use this configuration to perform mirroring between I/O Groups within the same cluster. And, you must never split the two nodes in an I/O Group between various SAN switches within the same SAN fabric.


of smaller switches also saves ISL capacity and therefore ports used for inter-switch connectivity.

IBM sells and support SAN switches from both of the major SAN vendors listed in the following product portfolios:

� IBM System Storage b-type/Brocade SAN portfolio� IBM System Storage/Cisco SAN portfolio

1.2.2 Switch port layout for large edge SAN switches

While users of smaller, non-bladed, SAN fabric switches generally do not need to concern themselves with which ports go where, users of multi-slot directors must pay careful attention to where the ISLs are located in the switch. Generally, the ISLs (or ISL trunks) must be on separate port modules within the switch to ensure redundancy. The hosts must be spread out evenly among the remaining line cards in the switch. Remember to locate high-bandwidth hosts on the core switches directly.

1.2.3 Switch port layout for director-class SAN switches

Each SAN switch vendor has a selection of line cards/port blades available for their multi-slot director-class SAN switch models. Some of these options are over-subscribed, and some of them have full bandwidth available for the attached devices. For your core switches, we suggest only using line cards/port blades where the full line speed that you expect to use will be available. You need to contact your switch vendor for full line card/port blade option details.

Your SVC ports, storage ports, ISLs, and high-bandwidth hosts need to be spread out evenly among your line cards in order to help prevent the failure of any one line card from causing undue impact to performance or availability.

1.2.4 IBM System Storage/Brocade b-type SANs

These are several of the features that we have found useful.

Fabric WatchThe Fabric Watch feature found in newer IBM/Brocade-based SAN switches can be useful because the SVC relies on a properly functioning SAN. This is a licensed feature, but it comes pre-bundled with most IBM/Brocade SAN switches. With Fabric Watch, you can pre-configure thresholds on certain switch properties, which when triggered, produce an alert. These attributes include:

� Switch port event, such as link reset� Switch port errors (link quality)� Component failures

Another useful feature included with Fabric Watch is Port Fencing, which can exclude a switch port if the port is misbehaving.

Fibre Channel Routing/MetaSANsTo enhance SAN scalability beyond a single Fibre Channel (FC) fabric, Fibre Channel Routing (FCR) for IBM/Brocade SANs can be useful. This hierarchical networks approach allows separate FC fabrics to be connected without merging them. This approach can also be useful for limiting the fault domains in the SAN environment. With the latest generation of


IBM/Brocade SAN switches, FCR is an optionally licensed feature. With older generations, special hardware is needed.

For more information about the IBM System Storage b-type/Brocade products, refer to the following IBM Redbooks publications:

� Implementing an IBM/Brocade SAN, SG24-6116

� IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation, SG24-7544

1.2.5 IBM System Storage/Cisco SANs

We have found the following features to be useful.

Port ChannelsTo ease the required planning efforts for future SAN expansions, ISLs/Port Channels can be made up of any combination of ports in the switch, which means that it is not necessary to reserve special ports for future expansions when provisioning ISLs. Instead, you can use any free port in the switch for expanding the capacity of an ISL/Port Channel.

Cisco VSANsVSANs and inter-VSAN routing (IVR) enable port/traffic isolation in the fabric. This port/traffic isolation can be useful for instance fault isolation and scalability.

It is possible to use Cisco VSANs, combined with inter-VSAN routes, to isolate the hosts from the storage arrays. This arrangement provides little benefit for a great deal of added configuration complexity. However, VSANs with inter-VSAN routes can be useful for fabric migrations from non-Cisco vendors onto Cisco fabrics, or other short-term situations. VSANs can also be useful if you have hosts that access the storage directly, along with virtualizing part of the storage with the SVC. (In this instance, it is best to use separate storage ports for the SVC and the hosts. We do not advise using inter-VSAN routes to enable port sharing.)

1.2.6 SAN routing and duplicate WWNNs

The SVC has a built-in service feature that attempts to detect if two SVC nodes are on the same FC fabric with the same worldwide node name (WWNN). When this situation is detected, the SVC will restart and turn off its FC ports to prevent data corruption. This feature can be triggered erroneously if an SVC port from fabric A is zoned through a SAN router so that an SVC port from the same node in fabric B can log into the fabric A port.

To prevent this situation from happening, it is important that whenever implementing advanced SAN FCR functions, be careful to ensure that the routing configuration is correct.

1.3 Zoning

Because it differs from traditional storage devices, properly zoning the SVC into your SAN fabric is a source of misunderstanding and errors. Despite the misunderstandings and errors, zoning the SVC into your SAN fabric is not particularly complicated.

Note: Errors caused by improper SVC zoning are often fairly difficult to isolate, so create your zoning configuration carefully.


Here are the basic SVC zoning steps:

1. Create SVC intra-cluster zone.2. Create SVC cluster.3. Create SVC → Back-end storage subsystem zones.4. Assign back-end storage to the SVC.5. Create host → SVC zones.6. Create host definitions on the SVC.

The zoning scheme that we describe next is slightly more restrictive than the zoning described in the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156. The Configuration Guide is a statement of what is supported, but this publication is a statement of our understanding of the best way to set up zoning, even if other ways are possible and supported.

1.3.1 Types of zoning

Modern SAN switches have three types of zoning available: port zoning, worldwide node name (WWNN) zoning, and worldwide port name (WWPN) zoning. The preferred method is to use only WWPN zoning.

There is a common misconception that WWPN zoning provides poorer security than port zoning, which is not the case. Modern SAN switches enforce the zoning configuration directly in the switch hardware, and port binding functions can be used to enforce that a given WWPN must be connected to a particular SAN switch port.

There are multiple reasons not to use WWNN zoning. For hosts, it is absolutely a bad idea, because the WWNN is often based on the WWPN of only one of the HBAs. If you have to replace that HBA, the WWNN of the host will change on both fabrics, which will result in access loss. In addition, it also makes troubleshooting more difficult, because you have no consolidated list of which ports are supposed to be in which zone, and therefore, it is difficult to tell if a port is missing.

Special note for IBM/Brocade SAN Webtools usersIf you use the Brocade Webtools Graphical User Interface (GUI) to configure zoning, you must take special care not to use WWNNs. When looking at the “tree” of available worldwide names (WWNs), the WWNN is always presented one level higher than the WWPNs. Refer to Figure 1-5 on page 13 for an example. Make sure that you use a WWPN, not the WWNN.

Note: Avoid using a zoning configuration with port and worldwide name zoning intermixed.


Figure 1-5 IBM/Brocade Webtools zoning

1.3.2 Pre-zoning tips and shortcuts

Now, we describe several tips and shortcuts for the SVC zoning.

Naming convention and zoning schemeIt is important to have a defined naming convention and zoning scheme when creating and maintaining an SVC zoning configuration. Failing to have a defined naming convention and zoning scheme can make your zoning configuration extremely difficult to understand and maintain.

Remember that different environments have different requirements, which means that the level of detailing in the zoning scheme will vary among environments of different sizes. It is important to have an easily understandable scheme with an appropriate level of detailing and then to be consistent whenever making changes to the environment.

Refer to 12.5, “Naming convention” on page 259 for suggestions for an SVC naming convention.


AliasesWe strongly recommend that you use zoning aliases when creating your SVC zones if they are available on your particular type of SAN switch. Zoning aliases make your zoning easier to configure and understand and cause fewer possibilities for errors.

One approach is to include multiple members in one alias, because zoning aliases can normally contain multiple members (just like zones). We recommend that you create aliases for:

� One that holds all the SVC node ports on each fabric

� One for each storage subsystem (or controller blade, in the case of DS4x00 units)

� One for each I/O Group port pair (that is, it needs to contain the 1st node in the I/O Group, port 2, and the 2nd node in the I/O Group, port 2)

Host aliases can be omitted in smaller environments, as in our lab environment.

1.3.3 SVC intra-cluster zone

This zone needs to contain every SVC node port on the SAN fabric. While it will overlap with the storage zones that you will create soon, it is handy to have this zone as a “fail-safe,” in case you ever make a mistake with your storage zones.

1.3.4 SVC storage zones

You need to avoid zoning different vendor storage subsystems together; the ports from the storage subsystem need to be split evenly across the dual fabrics. Each controller might have its own recommended best practice.

DS4x00 and FAStT storage controllersEach DS4x00 and FAStT storage subsystem controller consists of two separate blades. It is a best practice that these two blades are not in the same zone if you have attached them to the same SAN. There might be a similar best practice suggestion from non-IBM storage vendors; contact them for details.

1.3.5 SVC host zones

There must be a single zone for each host port. This zone must contain the host port, and one port from each SVC node that the host will need to access. While there are two ports from each node per SAN fabric in a usual dual-fabric configuration, make sure that the host only accesses one of them. Refer to Figure 1-6 on page 15.

This configuration provides four paths to each VDisk, which is the number of paths per VDisk for which IBM Subsystem Device Driver (SDD) multipathing software and the SVC have been tuned.


Figure 1-6 Typical host → SVC zoning

The IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156, discusses putting many hosts into a single zone as a supported configuration under certain circumstances. While this design usually works just fine, instability in one of your hosts can trigger all sorts of impossible to diagnose problems in the other hosts in the zone. For this reason, you need to only have a single host in each zone (single initiator zones).

It is a supported configuration to have eight paths to each VDisk, but this design provides no performance benefit (indeed, under certain circumstances, it can even reduce performance), and it does not improve reliability or availability by any significant degree.

Hosts with four (or more) HBAsIf you have four host bus adapters (HBAs) in your host instead of two HBAs, it takes a little more planning. Because eight paths are not an optimum number, you must instead configure your SVC Host Definitions (and zoning) as though the single host is two separate hosts. During VDisk assignment, you alternate which VDisk was assigned to one of the “pseudo-hosts.”

The reason that we do not just assign one HBA to each of the paths is because, for any specific VDisk, one node solely serves as a backup node (a preferred node scheme is used). The load is never going to get balanced for that particular VDisk. It is better to load balance by I/O Group instead, and let the VDisks get automatically assigned to nodes.

Switch B

ZoneFoo_Slot5_SAN_B

ZoneBar_Slot8_SAN_B

Switch A

ZoneFoo_Slot3_SAN_A

ZoneBar_Slot2_SAN_A

I/O Group 0

Zone: Foo_Slot3_SAN_A 50:00:11:22:33:44:55:66 SVC_Group0_Port_AZone: Bar_Slot2_SAN_A 50:11:22:33:44:55:66:77 SVC_Group0_Port_C

Zone: Foo_Slot5_SAN_B 50:00:11:22:33:44:55:67 SVC_Group0_Port_DZone: Bar_Slot8_SAN_B 50:11:22:33:44:55:66:78 SVC_Group0_Port_B

Host Foo Host Bar

SVC Node

A B C DSVC Node

A B C D


1.3.6 Sample standard SVC zoning configuration

This section contains a sample “standard” zoning configuration for an SVC cluster. Our sample setup has two I/O Groups, two storage subsystems, and eight hosts. (Refer to Figure 1-7.)

Obviously, the zoning configuration must be duplicated on both SAN fabrics; we will show the zoning for the SAN named “A.”

Figure 1-7 Example SVC SAN

For the sake of brevity, we only discuss SAN “A” in our example.

AliasesUnfortunately, you cannot nest aliases, so several of these WWPNs appear in multiple aliases. Also, do not be concerned if none of your WWPNs looks like the example; we made a few of them up when writing this book.

Note that certain switch vendors (for example, McDATA) do not allow multiple-member aliases, but you can still create single-member aliases. While creating single-member aliases does not reduce the size of your zoning configuration, it still makes it easier to read than a mass of raw WWPNs.

For the alias names, we have appended “SAN_A” on the end where necessary to distinguish that these alias names are the ports on SAN “A”. This system helps if you ever have to perform troubleshooting on both SAN fabrics at one time.

Switch A Switch B

Jon Ian

SVC Node SVC Node

SVC Node SVC Node

Note: All SVC Nodes have two connections per

switch.

FooBarryPeter Thorsten Ronda Deon


SVC cluster aliasAs a side note, the SVC has an extremely predictable WWPN structure, which helps make the zoning easier to “read.” It always starts with 50:05:07:68 (refer to Example 1-1) and ends with two octets that distinguish for you which node is which. The first digit of the third octet from the end is the port number on the node.

The cluster alias that we create will be used for the intra-cluster zone, for all back-end storage zones, and also in any zones that you need for remote mirroring with another SVC cluster (which will not be discussed in this example).

Example 1-1 SVC cluster alias

SVC_Cluster_SAN_A:50:05:07:68:01:10:37:e550:05:07:68:01:30:37:e550:05:07:68:01:10:37:dc50:05:07:68:01:30:37:dc50:05:07:68:01:10:1d:1c50:05:07:68:01:30:1d:1c50:05:07:68:01:10:27:e250:05:07:68:01:30:27:e2

SVC I/O Group “port pair” aliasesThese are the basic “building-blocks” of our host zones. Because the best practices that we have described specify that each HBA is only supposed to see a single port on each node, these aliases are the aliases that will be included. To have an equal load on each SVC node port, you need to roughly alternate between the ports when creating your host zones. Refer to Example 1-2.

Example 1-2 I/O Group port pair aliases

SVC_Group0_Port1:50:05:07:68:01:10:37:e550:05:07:68:01:10:37:dc

SVC_Group0_Port3:50:05:07:68:01:30:37:e550:05:07:68:01:30:37:dc

SVC_Group1_Port1:50:05:07:68:01:10:1d:1c50:05:07:68:01:10:27:e2

SVC_Group1_Port3:50:05:07:68:01:30:1d:1c50:05:07:68:01:30:27:e2

Storage subsystem aliasesThe first two aliases here are similar to what you might see with an IBM System Storage DS4800 storage subsystem with four back-end ports per controller blade. We have created different aliases for each blade in order to isolate the two controllers from each other, which is a best practice suggested by DS4x00 development.


Because the IBM System Storage DS8000™ has no concept of separate controllers (at least, not from the viewpoint of a SAN), we put all the ports on the storage subsystem into a single alias. Refer to Example 1-3.

Example 1-3 Storage aliases

DS4k_23K45_Blade_A_SAN_A20:04:00:a0:b8:17:44:3220:04:00:a0:b8:17:44:33

DS4k_23K45_Blade_B_SAN_A20:05:00:a0:b8:17:44:3220:05:00:a0:b8:17:44:33

DS8k_34912_SAN_A50:05:00:63:02:ac:01:4750:05:00:63:02:bd:01:3750:05:00:63:02:7f:01:8d50:05:00:63:02:2a:01:fc

ZonesRemember when naming your zones that they cannot have identical names as aliases.

Here is our sample zone set, utilizing the aliases that we have just defined.

SVC intra-cluster zoneThis zone is simple; it only contains a single alias (which happens to contain all of the SVC node ports). And yes, this zone does overlap with every single storage zone. Nevertheless, it is good to have it as a fail-safe, given the dire consequences that will occur if your cluster nodes ever completely lose contact with one another over the SAN. Refer to Example 1-4.

Example 1-4 SVC cluster zone

SVC_Cluster_Zone_SAN_A:SVC_Cluster_SAN_A

SVC → Storage zonesAs we have mentioned earlier, we put each of the storage controllers (and, in the case of the DS4x00 controllers, each blade) into a separate zone. Refer to Example 1-5.

Example 1-5 SVC → Storage zones

SVC_DS4k_23K45_Zone_Blade_A_SAN_A:SVC_Cluster_SAN_ADS4k_23K45_Blade_A_SAN_A

SVC_DS4k_23K45_Zone_Blade_B_SAN_A:SVC_Cluster_SAN_ADS4K_23K45_BLADE_B_SAN_A

SVC_DS8k_34912_Zone_SAN_A:SVC_Cluster_SAN_ADS8k_34912_SAN_A


SVC → Host zonesWe have not created aliases for each host, because each host is only going to appear in a single zone. While there will be a “raw” WWPN in the zones, an alias is unnecessary, because it will be obvious where the WWPN belongs.

Notice that all of the zones refer to the slot number of the host, rather than “SAN_A.” If you are trying to diagnose a problem (or replace an HBA), it is extremely important to know on which HBA you need to work.

For System p® hosts, we have also appended the HBA number (FCS) into the zone name, which makes device management easier. While it is possible to get this information out of SDD, it is nice to have it in the zoning configuration.

We alternate the hosts between the SVC node port pairs and between the SVC I/O Groups for load balancing. While we are just simply alternating in our example, you might want to balance the load based on the observed load on ports and I/O Groups. Refer to Example 1-6.

Example 1-6 SVC → Host zones

WinPeter_Slot3:21:00:00:e0:8b:05:41:bcSVC_Group0_Port1

WinBarry_Slot7:21:00:00:e0:8b:05:37:abSVC_Group0_Port3

WinJon_Slot1:21:00:00:e0:8b:05:28:f9SVC_Group1_Port1

WinIan_Slot2:21:00:00:e0:8b:05:1a:6fSVC_Group1_Port3

AIXRonda_Slot6_fcs1:10:00:00:00:c9:32:a8:00SVC_Group0_Port1

AIXThorsten_Slot2_fcs0:10:00:00:00:c9:32:bf:c7SVC_Group0_Port3

AIXDeon_Slot9_fcs3:10:00:00:00:c9:32:c9:6fSVC_Group1_Port1

AIXFoo_Slot1_fcs2:10:00:00:00:c9:32:a8:67SVC_Group1_Port3

1.3.7 Zoning with multiple SVC clusters

Unless two clusters participate in a mirroring relationship, all zoning must be configured so that the two clusters do not share a zone. If a single host requires access to two different


clusters, create two zones with each zone to a separate cluster. The back-end storage zones must also be separate, even if the two clusters share a storage subsystem.

1.3.8 Split storage subsystem configurations

There might be situations where a storage subsystem is used both for SVC attachment and direct-attach hosts. In this case, it is important that you pay close attention during the LUN masking process on the storage subsystem. Assigning the same storage subsystem LUN to both a host and the SVC will almost certainly result in swift data corruption. If you perform a migration into or out of the SVC, make sure that the LUN is removed from one place at the exact same time that it is added to another place.

1.4 Switch Domain IDs

All switch Domain IDs must be unique between both fabrics, and the name of the switch needs to incorporate the Domain ID. Having a domain ID that is totally unique makes troubleshooting problems much easier in situations where an error message contains the FCID of the port with a problem.

1.5 Distance extension for mirroring

To implement remote mirroring over a distance, you have several choices:

� Optical multiplexors, such as DWDM or CWDM devices� Long-distance small form-factor pluggable transceivers (SFPs) and XFPs� Fibre Channel → IP conversion boxes

Of those options, the optical varieties of distance extension are the “gold standard.” IP distance extension introduces additional complexity, is less reliable, and has performance limitations. However, we do recognize that optical distance extension is impractical in many cases due to cost or unavailability.

1.5.1 Optical multiplexors

Optical multiplexors can extend your SAN up to hundreds of kilometers (or miles) at extremely high speeds, and for this reason, they are the preferred method for long distance expansion. When deploying optical multiplexing, make sure that the optical multiplexor has been certified to work with your SAN switch model. The SVC has no allegiance to a particular model of optical multiplexor.

If you use multiplexor-based distance extension, closely monitor your physical link error counts in your switches. Optical communication devices are high-precision units. When they shift out of calibration, you start to see errors in your frames.

Note: Distance extension must only be utilized for links between SVC clusters. It must not be used for intra-cluster. Technically, distance extension is supported for relatively short distances, such as a few kilometers (or miles). Refer to the IBM System Storage SAN Volume Controller Restrictions, S1003283, for details explaining why this arrangement is not recommended.


1.5.2 Long-distance SFPs/XFPs

Long-distance optical transceivers have the advantage of extreme simplicity. No expensive equipment is required, and there are only a few configuration steps to perform. However, ensure that you only use transceivers designed for your particular SAN switch. Each switch vendor only supports a specific set of small form-factor pluggable transceivers (SFPs/XFPs), so it is unlikely that Cisco SFPs will work in a Brocade switch.

1.5.3 Fibre Channel: IP conversion

Fibre Channel IP conversion is by far the most common and least expensive form of distance extension. It is also a form of distance extension that is complicated to configure, and relatively subtle errors can have severe performance implications.

With Internet Protocol (IP)-based distance extension, it is imperative that you dedicate bandwidth to your Fibre Channel (FC) → IP traffic if the link is shared with other IP traffic. Do not assume that because the link between two sites is “low traffic” or “only used for e-mail” that this type of traffic will always be the case. Fibre Channel is far more sensitive to congestion than most IP applications. You do not want a spyware problem or a spam attack on an IP network to disrupt your SVC.

Also, when communicating with your organization’s networking architects, make sure to distinguish between megabytes per second as opposed to megabits. In the storage world, bandwidth is usually specified in megabytes per second (MBps, MB/s, or MB/sec), while network engineers specify bandwidth in megabits (Mbps, Mbit/s, or Mb/sec). If you fail to specify megabytes, you can end up with an impressive-sounding 155 Mb/sec OC-3 link, which is only going to supply a tiny 15 MBps or so to your SVC. With the suggested safety margins included, this is not an extremely fast link at all.

Exact details of the configurations of these boxes is beyond the scope of this book; however, the configuration of these units for the SVC is no different than any other storage device.

1.6 Tape and disk traffic sharing the SAN

If you have free ports on your core switch, there is no problem with putting tape devices (and their associated backup servers) on the SVC SAN; however, you must not put tape and disk traffic on the same Fibre Channel host bus adapter (HBA).

Do not put tape ports and backup servers on different switches. Modern tape devices have high bandwidth requirements and to do so can quickly lead to SAN congestion over the ISL between the switches.

1.7 Switch interoperability

The SVC is rather flexible as far as switch vendors are concerned. The most important requirement is that all of the node connections on a particular SVC cluster must all go to switches of a single vendor. This requirement means that you must not have several nodes or node ports plugged into vendor A, and several nodes or node ports plugged into vendor B.

While the SVC supports certain combinations of SANs made up of switches from multiple vendors in the same SAN; in practice, we do not particularly recommend this approach. Despite years of effort, interoperability among switch vendors is less than ideal, because the


Fibre Channel standards are not rigorously enforced. Interoperability problems between switch vendors are notoriously difficult and disruptive to isolate, and it can take a long time to obtain a fix. For these reasons, we suggest only running multiple switch vendors in the same SAN long enough to migrate from one vendor to another vendor, if this setup is possible with your hardware.

It is acceptable to run a mixed-vendor SAN if you have gained agreement from both switch vendors that they will fully support attachment with each other. In general, Brocade will interoperate with McDATA under special circumstances. Contact your IBM marketing representative for details (“McDATA” here refers to the switch products sold by the McDATA Corporation prior to their acquisition by Brocade Communications Systems. Much of that product line is still for sale at this time). QLogic/BladeCenter FCSM will work with Cisco.

We do not advise interoperating Cisco with Brocade at this time, except during fabric migrations, and only then if you have a back-out plan in place. We also do not advise that you connect the QLogic/BladeCenter FCSM to Brocade or McDATA.

When having SAN fabrics with multiple vendors, pay special attention to any particular requirements. For instance, observe from which switch in the fabric the zoning must be performed.

1.8 TotalStorage Productivity Center for Fabric

TotalStorage® Productivity Center (TPC) for Fabric can be used to create, administer, and monitor your SAN fabrics. There is nothing special that you need to do to use it to administer an SVC SAN fabric as opposed to any other SAN fabric. We discuss information about TPC for Fabric in Chapter 11, “Monitoring” on page 221.

For further information, consult the TPC IBM Redbooks publication, IBM TotalStorage Productivity Center V3.1: The Next Generation, SG24-7194, or contact your IBM marketing representative.


Chapter 2. SAN Volume Controller cluster

In this chapter, we discuss the advantages of virtualization and the optimal time to use virtualization in your environment. Furthermore, we describe the scalability options for the IBM System Storage SAN Volume Controller (SVC) and when to grow or split an SVC cluster.

2


2.1 Advantages of virtualization

The IBM System Storage SAN Volume Controller (SVC), which is shown in Figure 2-1, enables a single point of control for disparate, heterogeneous storage resources. The SVC enables you to put capacity from various heterogeneous storage subsystem arrays into one pool of capacity for better utilization and more flexible access. This design helps the administrator to control and manage this capacity from a single common interface instead of managing several independent disk systems and interfaces. Furthermore, the SVC can improve the performance of your storage subsystem array by introducing 8 GB of cache memory in each node, mirrored within a node pair.

SVC virtualization provides users with the ability to move data non-disruptively from one storage subsystem to another storage subsystem. It also introduces advanced copy functions that are usable over heterogeneous storage subsystems. For many users, who are offering storage to other clients, it is also extremely attractive because you can create a “tiered” storage environment.

Figure 2-1 SVC 8G4 model

2.1.1 How does the SVC fit into your environment

Here is a short list of the SVC features:

� Combines capacity into a single pool

� Manages all types of storage in a common way from a common point

� Provisions capacity to applications easier

� Improves performance through caching and striping data across multiple arrays

� Creates tiered storage arrays

� Provides advanced copy services over heterogeneous storage arrays

� Removes or reduces the physical boundaries or storage controller limits associated with any vendor storage controllers

� Brings common storage controller functions into the Storage Area Network (SAN), so that all storage controllers can be used and can benefit from these functions

2.2 Scalability of SVC clusters

The SAN Volume Controller is highly scalable, and it can be expanded up to eight nodes in one cluster. An I/O Group is formed by combining a redundant pair of SVC nodes (System x™ server-based). Each server includes a four-port 4 Gbps-capable host bus adapter (HBA),


which is designed to allow the SVC to connect and operate at up to 4 Gbps SAN fabric speed. Each I/O Group contains 8 GB of mirrored cache memory. Highly available I/O Groups are the basic configuration element of an SVC cluster. Adding I/O Groups to the cluster is designed to linearly increase cluster performance and bandwidth. An entry level SVC configuration contains a single I/O Group. The SVC can scale out to support four I/O Groups, and the SVC can scale up to support 1 024 host servers. For every cluster, the SVC supports up to 8 192 virtual disks (VDisks). This configuration flexibility means that SVC configurations can start small with an attractive price to suit smaller clients or pilot projects and yet can grow to manage extremely large storage environments.

2.2.1 Advantage of multi-cluster as opposed to single cluster

Growing or adding new I/O Groups to an SVC cluster is a decision that has to be made when either a configuration limit is reached or when the I/O load reaches a point where a new I/O Group is needed. The saturation point for the configuration that we tested was reached at approximately 70 000 I/Os per second (IOPS) for the current SVC hardware (8G4 nodes on an x3550) and SVC Version 4.x (refer to Table 2-2 on page 29).

To determine the number of I/O Groups and monitor the CPU performance of each node, you can also use TotalStorage Productivity Center (TPC). The CPU performance is related to I/O performance. When the CPUs become consistently 70% busy, you must consider either:

� Adding more nodes to the cluster and moving part of the workload onto the new nodes

� Moving several VDisks to another I/O Group, if the other I/O Group is not busy

To see how busy your CPUs are, you can use the TPC performance report, by selecting CPU Utilization as shown in Figure 2-2 on page 26.

Several of the activities that affect CPU utilization are:

� VDisk activity: The preferred node is responsible for I/Os for the VDisk and coordinates sending the I/Os to the alternate node. While both systems will exhibit similar CPU utilization, the preferred node is a little busier. To be precise, a preferred node is always responsible for the destaging of writes for VDisks that it owns. Therefore, skewing preferred ownership of VDisks toward one node in the I/O Group will lead to more destaging, and therefore, more work on that node.

� Cache management: The purpose of the cache component is to improve performance of read and write commands by holding part of the read or write data in SVC memory. The cache component must keep the caches on both nodes consistent, because the nodes in a caching pair have physically separate memories.

� FlashCopy® activity: Each node (of the flash copy source) maintains a copy of the bitmap; CPU utilization is similar.

� Mirror Copy activity: The preferred node is responsible for coordinating copy information to the target and also ensuring that the I/O Group is up-to-date with the copy progress information or change block information. As soon as Global Mirror is enabled, there is an additional 10% overhead on I/O work due to the buffering and general I/O overhead of performing asynchronous Peer-to-Peer Remote Copy (PPRC).

Chapter 2. SAN Volume Controller cluster 25

Figure 2-2 TPC Performance Report: Storage Subsystem Performance by Node

After you reach the performance or configuration maximum for an I/O Group, you can add additional performance or capacity by attaching another I/O Group to the SVC cluster.

Table 2-1 on page 27 shows the current maximum limits for one SVC I/O Group.


Table 2-1 Maximum configurations for an I/O Group

2.2.2 Performance expectations by adding an SVC

As shown in 2.2.1, “Advantage of multi-cluster as opposed to single cluster” on page 25, there are limits that will cause the addition of a new I/O Group to the existing SVC cluster.

In Figure 2-3 on page 28, you can see the performance improvements by adding a new I/O Group to your SVC cluster. A single SVC cluster can reach a performance of more than 70 000 IOPS, given that the total response time will not pass five milliseconds. If this limit is close to being exceeded, you will need to add a second I/O Group to the cluster.

With the newly added I/O Group, the SVC cluster can now manage more than 130 000 IOPS. An SVC cluster itself can be scaled up to an eight node cluster with which we will reach a total I/O rate of more than 250 000 IOPS.

Objects Maximum number Comments

SAN Volume Controller nodes Eight Arranged as four I/O Groups

I/O Groups Four Each containing two nodes

VDisks per I/O Group 2048 Includes managed-mode and image-mode VDisks

Host IDs per I/O Group 256 (Cisco, Brocade, or McDATA)64 QLogic®

N/A

Host ports per I/O Group 512 (Cisco, Brocade, or McDATA)128 QLogic

N/A

Metro/Global Mirror VDisks per I/O Group

1024 TB There is a per I/O Group limit of 1024 TB on the quantity of Primary and Secondary VDisk address space, which can participate in Metro/Global Mirror relationships. This maximumconfiguration will consume all 512 MB of bitmap space for the I/O Group and allow no FlashCopy bitmap space. The default is 40 TB.

FlashCopy VDisks per I/O Group

1024 TB This limit is a per I/O group limit on the quantity of FlashCopy mappings using bitmap space from a given I/O Group. This maximum configuration will consume all 512 MB of bitmap space for the I/O Group and allow no Metro Mirror or Global Mirror bitmap space. The default is 40 TB.


Figure 2-3 Performance increase by adding I/O Groups

Looking at Figure 2-3, you can see that the response time over throughput can be scaled nearly linearly by adding SVC nodes (I/O Groups) to the cluster.

2.2.3 Growing or splitting SVC clusters

Growing an SVC cluster can be done concurrently, and the SVC cluster can grow up to the current maximum of eight SVC nodes per cluster in four I/O Groups. Table 2-2 on page 29 contains an extract of the total SVC cluster configuration limits.


Table 2-2 Maximum SVC cluster limits

If you exceed one of the current maximum configuration limits for the fully deployed SVC cluster, you then scale out by adding a new SVC cluster and distributing the workload to it.

Because the current maximum configuration limits can change, use the following link to get a complete table of the current SVC restrictions:

http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283

Splitting an SVC cluster or having a secondary SVC cluster provides you with the ability to implement a disaster recovery option in the environment. Having two SVC clusters in two locations allows work to continue even if one site is down. With the SVC Advanced Copy functions, you can copy data from the local primary environment to a remote secondary site.

The maximum configuration limits apply here as well.

Another advantage of having two clusters is that the SVC Advanced Copy functions license is based on:

� The total amount of storage (in gigabytes) that is virtualized � The Metro Mirror and Global Mirror or FlashCopy capacity in use

In each case, the number of terabytes (TBs) to order for Metro Mirror and Global Mirror is the total number of source TBs and target TBs participating in the copy operations.

Growing the SVC clusterBefore adding a new I/O Group to the existing SVC cluster, you must make changes. It is important to adjust the zoning so that the new SVC node pair can join the existing SVC

Objects Maximumnumber

Comments

SAN Volume Controller nodes Eight Arranged as four I/O Groups

MDisks 4 096 The maximum number of logical units that can be managed by SVC. This number includes disks that have not been configured into Managed Disk Groups.

Virtual disks (VDisks) per cluster

8 192 Includes managed-mode VDisks and image-mode VDisks. The maximum requires an 8 node cluster.

TotalStorage manageable by SVC

8 PB If maximum extent size of 2048 MB is used

Host IDs per cluster 1 024 (Cisco, Brocade, and McDATA fabrics)155 CNT256 QLogic

A Host ID is a collection of worldwide port names (WWPNs) that represents a host. This Host ID is used to associate SCSI LUNs with VDisks.

Host ports per cluster 2048 (Cisco, Brocade, and McDATA fabrics)310 CNT512 QLogic

N/A


cluster. It is also necessary to adjust the zoning for each SVC node in the cluster to be able to see the same subsystem storage arrays.

After you make the zoning changes, you can add the new nodes into the SVC cluster. You can use the guide for adding nodes to an SVC cluster in IBM System Storage SAN Volume Controller, SG24-6423-06.

Splitting the SVC cluster Splitting the SVC cluster might become a necessity if the maximum number of eight SVC nodes is reached, and you have a requirement to grow the environment beyond the maximum number of I/Os that a cluster can support, maximum number of attachable subsystem storage controllers, or any other maximum mentioned in the V4.3.0 IBM System Storage SAN Volume Controller restrictions at:


Instead of having one SVC cluster host all I/O operations, hosts, and subsystem storage attachments, the goal here is to create a second SVC cluster so that we equally distribute all of the workload over the two SVC clusters.

There are a number of approaches that you can take for splitting an SVC cluster. The first, and probably the easiest, way is to create a new SVC cluster, attach storage subsystems and hosts to it, and start putting workload on this new SVC cluster.

The next options are more intensive, and they involve performing more steps:

� Create a new SVC cluster and start moving workload onto it. To move the workload from an existing SVC cluster to a new SVC cluster, you can use the Advanced Copy features, such as Metro Mirror and Global Mirror. We describe this scenario in Chapter 8, “Copy services” on page 151.

� You can use the VDisk “managed mode to image mode” migration to move workload from one SVC cluster to the new SVC cluster. Migrate a VDisk from manage mode to image mode, reassign the disk (logical unit number (LUN) masking) from your storage subsystem point of view, introduce the disk to your new SVC cluster, and use the image mode to manage mode migration. We describe this scenario in Chapter 7, “VDisks” on page 119.

From a user perspective, the first option is the easiest way to expand your cluster workload. The second and third options are more difficult, involve more steps, and require more preparation in advance. The third option is the choice that involves the longest outage to the host systems, and therefore, we do not prefer the third choice.

There is only one good reason that we can think of to reduce the existing SVC cluster by a certain number of I/O Groups: if more bandwidth is required on the secondary SVC cluster and if there is spare bandwidth available on the primary cluster.

Note: This move involves an outage from the host system point of view, because the worldwide port name (WWPN) from the subsystem (SVC I/O Group) does change.

Note: This scenario also invokes an outage to your host systems and the I/O to the VDisk.


Adding or upgrading SVC node hardwareIf you have a cluster of six or fewer nodes of older hardware, and you have purchased new hardware, you can choose to either start a new cluster for the new hardware or add the new hardware to the old cluster. Both configurations are supported.

While both options are practical, we recommend that you add the new hardware to your existing cluster. This recommendation only is true if, in the short term, you are not scaling the environment beyond the capabilities of this cluster.

By utilizing the existing cluster, you maintain the benefit of managing just one cluster. Also, if you are using mirror copy services to the remote site, you might be able to continue to do so without having to add SVC nodes at the remote site.

You have a couple of choices to upgrade an existing cluster’s hardware. The choices depend on the size of the existing cluster.

If your cluster has up to six nodes, you have these options available:

� Add the new hardware to the cluster, migrate VDisks to the new nodes and then retire the older hardware when it is no longer managing any VDisks.

This method requires a brief outage to the hosts to change the I/O Group for each VDisk.

� Swap out one node in each I/O Group at a time and replace it with the new hardware. We recommend that you engage an IBM service support representative (IBM SSR) to help you with this process.

You can perform this swap without an outage to the hosts.

If your cluster has eight nodes, the options are similar:

� Swap out a node in each I/O Group one at a time and replace it with the new hardware. We recommend that you engage an IBM SSR to help you with this process.

You can perform this swap without an outage to the hosts, and you need to swap a node in one I/O Group at a time. Do not change all I/O Groups in a multi-I/O Group cluster at one time.

� Move the VDisks to another I/O Group so that all VDisks are on three of the four I/O Groups. You can then remove the remaining I/O Group with no VDisks from the cluster and add the new hardware to the cluster.

As each pair of new nodes is added, VDisks can then be moved to the new nodes, leaving another old I/O Group pair that can be removed. After all the old pairs are removed, the last two new nodes can be added, and if required, VDisks can be moved onto them.

Unfortunately, this method requires several outages to the host, because VDisks are moved between I/O Groups. This method might not be practical unless you need to implement the new hardware over an extended period of time, and the first option is not practical for your environment.

� You can mix the previous two options.

New SVC hardware provides considerable performance benefits on each release, and there have been substantial performance improvements since the first hardware release.

Depending on the age of your existing SVC hardware, the performance requirements might be met by only six or fewer nodes of the new hardware.

If this situation fits, you might be able to utilize a mix of the previous two steps. For example, use an IBM SSR to help you upgrade one or two I/O Groups, and then move the VDisks from the remaining I/O Groups onto the new hardware.


For more details about replacing nodes non-disruptively or expanding an existing SVC cluster, refer to IBM System Storage SAN Volume Controller, SG24-6423-05.

2.3 SVC performance scenarios

In this section, we describe five test scenarios. These scenarios show you a comparison from a direct-attached DS4500 to a Windows® host and the performance improvement by introducing the SVC in the data path. These scenarios also show you the performance results during a VDisk migration from an image mode to a striped VDisk. In the last test, we examined the impact of a node failure on the I/O throughput.

We performed these tests in the following environment:

� Operating System: Microsoft® Windows 2008 Enterprise Edition� Storage: 64 GB LUN/DS4500� SAN: Dual Fabric 2005 B5K/Firmware: V6.1.0c� I/O Application: I/O Meter:

– 70% read– 30% write– 32 KB– 100% sequential– Queue depth: 8

As we have already explained, the test scenarios (each 40 minutes running) are:

� Test 1: Storage subsystem direct-attached to host

� Test 2: SVC in the path and a 64 GB image mode VDisk/cache-enabled

� Test 3: SVC in the path and a 64 GB VDisk during a migration

� Test 4: SVC in the path and a 64 GB striped VDisk

� Test 5: SVC node failure

The overview shown in Figure 2-4 on page 33 does not provide any absolute numbers or show the best performance that you are ever likely to get. The test sequence that we have chosen shows the normal introduction of an SVC cluster in a client environment, going from a native attached storage environment to a virtualized storage attachment environment.

Figure 2-4 on page 33 shows the total data rate in MBps, while the 64 GB disk was managed by the SVC (tests 2, 3, 4, and 5). Test 3 and test 4 show a spike at the beginning of each test. By introducing the SVC in the data path, we introduced a caching appliance. Therefore, host I/O will no longer go directly to the subsystem, it is first cached and then flushed down to the subsystem.


Figure 2-4 SVC node total data rate

During test 5, we disabled all of the ports for node 1 on the switches. Afterward, but still during the test, we enabled the switch ports again. SVC node 1 joined the cluster with a cleared cache, and therefore, you see the spike at the end of the test.

In this section, we show you the value of the SVC cluster in our environment. For this purpose, we only compare the direct-attached storage with a striped VDisk (test 1 and test 4).

Figure 2-5 on page 34 shows the values for the total traffic: the read MBps and the write MBps. Similar to the I/O rate, we saw a 12% improvement for the I/O traffic.


Figure 2-5 Native MBps compared to SVC-attached storage

For both parameters, I/Ops and MBps in Figure 2-5, we saw a performance improvement by using the SVC.

2.4 Cluster upgrade

The SVC cluster is designed to perform a concurrent code update. Although it is a concurrent code update for the SVC, it is disruptive to upgrade certain other parts in a client environment, such as updating the multipathing driver. Before applying the SVC code update, the administrator needs to review the following Web page to ensure the compatibility between the SVC code and the SVC Console GUI. The SAN Volume Controller and SVC Console GUI Compatibility Web site is:


Furthermore, certain concurrent upgrade paths are only available through an intermediate level. Refer to the following Web page for more information, SAN Volume Controller Concurrent Compatibility and Code Cross-Reference:


Even though the SVC code update is concurrent, we recommend that you perform several steps in advance:

� Before applying a code update, ensure that there are no open problems in your SVC, SAN, or storage subsystems. Use the “Run maintenance procedure” on the SVC and fix the open problems first. For more information, refer to 14.3.2, “Solving SVC problems” on page 284.

� It is also extremely important to check your host dual pathing. Make sure that from the host’s point of view that all paths are available. Missing paths can lead to I/O problems

Direct attached

Direct attached

Direct attached

Striped VDisk

Striped VDisk

Striped VDisk




during the SVC code update. Refer to Chapter 9, “Hosts” on page 175 for more information about hosts.

� It is wise to schedule a time for the SVC code update during low I/O activity.

� Upgrade the Master Console GUI first.

� Allow the SVC code update to finish before making any other changes in your environment.

� Allow at least one hour to perform the code update for a single SVC I/O Group and 30 minutes for each additional I/O Group. In a worst case scenario, an update can take up to two hours, which implies that the SVC code update will also update the BIOS, SP, and the SVC service card.

New features are not available until all nodes in the cluster are at the same level. Features, which are dependent on a remote cluster Metro Mirror or Global Mirror, might not be available until the remote cluster is at the same level too.

Important: The Concurrent Code Upgrade (CCU) might appear to stop for a long time (up to an hour) if it is upgrading a low level BIOS. Never power off during a CCU unless you have been instructed to power off by IBM service personnel. If the upgrade encounters a problem and fails, the upgrade will be backed out.



Chapter 3. SVC Console

In this chapter, we describe important areas of the IBM System Storage SAN Volume Controller (SVC) Console. The SVC Console is a Graphical User Interface (GUI) application installed on a server running a server version of the Microsoft Windows operating system.

3


3.1 SVC Console installation

The SVC Console is mandatory for installing and managing an SVC cluster. Currently, the SVC Console is available as a software only solution, or the SVC Console can be a combined software and hardware solution that can be ordered together with an SVC cluster. Common to both options is that they communicate with the SVC cluster using an IP/Ethernet network connection and therefore require an IP address and an Ethernet port that can communicate with the SVC cluster. The SVC Console also serves as the SVC data source for use with IBM TotalStorage Productivity Center (TPC).

3.1.1 Software only installation option

The SVC Console software is available for installation on a client-provided server running one of the following operating systems:

� Microsoft Windows 2000 Server� Microsoft Windows Server® 2003 Standard Edition� Microsoft Windows Server 2003 Enterprise Edition

You access the SVC Console application by using a Web browser. Therefore, ensure that Microsoft Windows Internet Explorer® Version 7.0 (or Version 6.1 with Service Pack 1, for Microsoft Windows 2000 Server) is installed on the server.

Secure Shell (SSH) connectivity with the SVC cluster uses the PuTTY SSH suite. The PuTTY installation package comes bundled with the SVC Console software, and you must install it prior to installing the SVC Console software.

While not a requirement, we recommend that adequate antivirus software is installed on the server together with software for monitoring the server health status. Whenever service packs or critical updates for the operating system become available, we recommend that they are applied.

To successfully install and run the SVC Console software, the server must have adequate system performance. We suggest a minimum hardware configuration of:

� Single Intel® Xeon dual-core processor, minimum 2.1Ghz (or equivalent)� 4 GB DDR memory� 70 GB primary hard disk drive capacity using a disk mirror (for fault tolerance)� 100 Mbps Ethernet connection

To minimize the risk of conflicting applications, performance problems, and so on, we recommend that the server is not assigned any other roles except for serving as the SVC Console server. We also do not recommend that you set up the server to be a member of any Microsoft Windows Active Directory® domain.

Note: Only x86 (32-bit) versions of these operating systems are supported. Do not use x64 (64-bit) variants.

Requirements: If you want to use Internet Protocol (IP) Version 6 (IPv6) communication with your SVC cluster, you must run Windows 2003 Server and your PuTTY version must be at least 0.60.


3.1.2 Combined software and hardware installation option

If you choose to order an SVC Console server (feature code 2805-MC2) together with an SVC cluster, you will receive the System Storage Productivity Center (SSPC). SSPC is an integrated hardware and software solution that provides a single management console for managing IBM Storage Area Network (SAN) Volume Controller, IBM DS8000, and other components of your data storage infrastructure.

The SSPC server has following initial hardware configuration:

� 1x quad-core Intel Xeon® processor E531, 1.60 GHz, 8 MB L2 cache� 4x 1GB PC2-5300 ECC DDR2 Chipkill™ memory� 2x primary hard disk drives: 146 GB 15k RPM SAS drives, ServeRAID™ 8k RAID 1 array� 2x integrated 10/100/1000 Mbps Ethernet connections� Microsoft Windows Server 2003 Enterprise Edition

If you plan to install and use TPC for Replication or plan to manage a large number of components using the SSPC server, we recommend that you order the SSPC server with the additional Performance Upgrade kit (feature code 1800). With this kit installed, both the processor capacity and memory capacity are doubled compared to the initial configuration.

When using SSPC, the SVC Console software is already installed on the SSPC server, as well as PuTTY. For a detailed guide to the SSPC, we recommend that you refer to the IBM System Storage Productivity Center Software Installation and User’s Guide, SC23-8823.

The SSPC server does not ship with antivirus software installed. We recommend that you install antivirus software. Also, you need to apply service packs and critical updates to the operating system when they become available.

Do not use the SSPC server for any roles except roles related to SSPC, and we do not recommend joining the server to a Microsoft Windows Active Directory domain.

3.1.3 SVC cluster software and SVC Console compatibility

In order to allow seamless operation between the SVC cluster software and the SVC Console software, it is of paramount importance that software levels match between the two. Before adding an SVC cluster to an SVC Console, or before upgrading the SVC cluster software on an existing SVC cluster, you must ensure that the software levels are compatible.

To check the current SVC cluster software level, connect to the SVC cluster using SSH and then issue the svcinfo lscluster command, which is shown in Example 3-1 on page 40.

Note: The SSPC option replaces the dedicated Master Console server (feature code 4001), which is being discontinued. The Master Console is still supported and will run the latest code levels of the SVC Console software.

Note: If you want to use IPv6 communication with your SSPC and SVC cluster, ensure that your PuTTY version is at least 0.60.

Chapter 3. SVC Console 39

Example 3-1 Checking the SVC cluster software version (lines removed for clarity)

IBM_2145:itsosvccl1:admin>svcinfo lscluster itsosvccl1cluster_IP_address 9.43.86.117cluster_service_IP_address 9.43.86.118code_level 4.3.0.0 (build 8.16.0806230000)BM_2145:itsosvccl1:admin>

You can locate the SVC Console version on the Welcome window (Figure 3-1), which displays after you log in to the SVC Console.

Figure 3-1 Display SVC Console version

After you obtain the software versions, locate the appropriate SVC Console version. For an overview of SAN Volume Controller and SVC Console compatibility, refer to the Web site, which is shown in Figure 3-2.


Figure 3-2 SVC cluster software to SVC Console compatibility matrix


http://www-1.ibm.com/support/docview.wss?rs=591&context=STCFKTH&context=STCFKTW&dc=D600&uid=ssg1S1002888&loc=en_US&cs=utf-8&lang=en

3.1.4 IP connectivity considerations

Management of an SVC cluster relies on IP communication, including both access to the SVC command line interface (CLI) and communication between the SVC Console GUI application and the SVC cluster. Error reporting and performance data from the SVC cluster are also transferred using IP communications through services, such as e-mail notification and Simple Network Management Protocol (SNMP) traps.

The SVC cluster supports both IP Version 4 (IPv4) and 6 (IPv6) connectivity and attaches to the physical network infrastructure using one 10/100 Mbps Ethernet connection per node. All nodes in an SVC cluster share the same two IP addresses (cluster address and service IP address). The cluster IP address is dynamically following the current config node, whereas the service IP only becomes active when a node is put into service mode using the front panel. At this point, the service IP address will become active for the node entering service mode, and it will remain active until service mode is ended.

It is imperative that all node Ethernet interfaces can access the IP networks where the SVC Console and other management stations reside, because the IP addresses for an SVC cluster are not statically assigned to any specific node in the SVC cluster. While everything will work with only the current config node having the correct access, access to the SVC cluster might be disrupted if the config node role switches to another node in the SVC cluster.

Therefore, in order to allow seamless operations in failover and other state changing situations, observe the following IP/Ethernet recommendations:

� All nodes in an SVC cluster must be connected to the same layer 2 Ethernet segment. If Virtual LAN (VLAN) technology is implemented, all nodes must be on the same VLAN.

� If an IP gateway is configured for the SVC cluster, it must not filter traffic based on Ethernet Media Access Control (MAC) addresses.

� There can be no active packet filters or shapers for traffic to and from the SVC cluster.

� No static (sticky) Address Resolution Protocol (ARP) caching can be active for the IP gateway connecting to the SVC cluster. When the SVC cluster IP addresses shift from one node to another node, the corresponding ARP entry will need to be updated with the new MAC address information.

3.2 Using the SVC Console

The SVC Console is used as a platform for configuration, management, and service activity on the SAN Volume Controller. You can obtain basic instructions for setting up and using the SVC Console in your environment in the IBM System Storage SAN Volume Controller V4.3.0 Installation and Configuration Guide, S7002156, and IBM System SAN Volume Controller V4.3, SG24-6423-06.

Note: The SVC cluster state information is exchanged between nodes through the node Fibre Channel interface. Thus, if the IP/Ethernet network connectivity fails, the SVC cluster will remain fully operational. Only management is disrupted.


3.2.1 SSH connection limitations

To limit resource consumption for management, each SVC cluster can host only a limited number of Secure Shell (SSH) connections. The SVC cluster supports no more than 10 concurrent SSH connections per user ID for a maximum of 20 concurrent connections per cluster (10 for the admin user and 10 for the service user). If this number is exceeded, the SVC cluster will not accept any additional incoming SSH connections. Included in this count are all of the SSH connections, such as interactive sessions, Common Information Model Object Manager (CIMOM) applications (such as the SVC Console), and host automation tools, such as HACMP™-XD.

There is also a limit on the number of SSH connections that can be opened per second. The current limitation is 15 SSH connections per second.

If the maximum connection limit is reached and you cannot determine which clients have open connections to the cluster, the SVC cluster code has incorporated options to help you recover from this state.

A cluster error code (2500) is logged by the SVC cluster when the maximum connection limit is reached. If there is no other error on the SVC cluster with a higher priority than this error, message 2500 will be displayed on the SVC cluster front panel. Figure 3-3 shows this error message in the error log.

Figure 3-3 Error code 2500 “SSH Session limit reached”

If you get this error:

1. If you still have access to an SVC Console GUI session for the cluster, you can use the Service and Maintenance menu to start the “Run Maintenance Procedures” task to fix this error. This option allows you to reset all active connections, which terminates all SSH sessions and clears the login count.

2. If you have no access to the SVC cluster using the SVC Console GUI, there is now a direct maintenance link in the drop-down menu of the View cluster panel of the SVC Console. Using this link, you can get directly to the Service and Maintenance procedures. The following panels guide you to access and use this maintenance feature. Figure 3-4 on page 43 shows you how to launch this procedure.

Note: We recommend that you close SSH connections when they are no longer required. Use the exit command to terminate an interactive SSH session.


Figure 3-4 Launch Maintenance Procedures from the panel to view the cluster

When analyzing the error code 2500, a window similar to the example in Figure 3-5 on page 44 will appear. From this window, you can identify which user has reached the 10 concurrent connections limit, which in this case is the admin user.

Note that the service user has only logged in four times and therefore still has six connections left. From this window, the originating IP address of a given SSH connection is also displayed, which can be useful to determine which user opened the connection.

Remember that if the connection originated from a different IP subnet than where the SVC cluster resides, it might be a gateway device IP address that is displayed, which is the case with the IP address of 9.146.185.99 in Figure 3-5 on page 44. If you are unable to close any SSH connections from the originator side, you can force the closure of all SSH connections from the maintenance procedure panel by clicking Close All SSH Connections.


Figure 3-5 SSH connection limit exceeded

You can read more information about the current SSH limitations and how to fix related problems at:

http://www-1.ibm.com/support/docview.wss?rs=591&context=STCFKTH&context=STCFKTW&dc=DB500&uid=ssg1S1002896&loc=en_US&cs=utf-8&lang=en

3.2.2 Managing multiple SVC clusters using a single SVC Console

If you have more than a single SVC cluster in your environment, a single SVC Console instance can be used to manage multiple SVC clusters. Simply add the additional SVC clusters to the SVC Console by using the Clusters pane of the SVC Console GUI. Figure 3-6 on page 45 shows how to add an additional SVC cluster to an SVC Console, which already manages three SVC clusters.




Figure 3-6 Adding an additional SVC cluster to SVC Console

A single SVC Console can manage a maximum of four SVC clusters. As more testing is done, and more powerful hardware and software become available, this limit might change. For current information, contact your IBM marketing representative or refer to the SVC support site on the Internet:


One challenge of using an SVC Console to manage multiple SVC clusters is that if one cluster is currently not operational, for example, the cluster shows “No Contact” for the SVC cluster state, ease of access to the other clusters is affected by the two minute timeout during the launch of SVC menus when the GUI is checking the status of both clusters. This timeout appears while the SVC Console GUI is trying to access the “missing” SVC cluster.

3.2.3 Managing an SVC cluster using multiple SVC Consoles

In certain environments, it is important to have redundant management tools for the storage infrastructure, which you can have with the SVC Console.

Important: All SVC clusters to be managed by a given SVC Console must have the matching public key file installed, because an SVC Console instance can only load a single SSH certificate (the icat.ppk private SSH key) at a time.

Note: The SVC Console is the management tool for the SVC cluster. Even if the SVC Console fails, the SVC cluster still remains operational.



The advantages of using more than one SVC Console include:

� Redundancy: If one SVC Console is failing, you can use another SVC Console to continue managing the SVC clusters.

� Manageability from multiple locations: If you have two or more physical locations with SVC clusters installed, have an SVC Console in each location to allow you to manage the local clusters even if connectivity to the other sites is lost. It is a best practice to have an SVC Console installed per physical location with an SVC cluster.

� Managing multiple SVC cluster code level versions: For certain environments, it might be necessary to have multiple versions of the SVC Console GUI application running, because multiple versions of the SVC cluster code are in use.

SSH connection limitationsThe SSH connection limitation of a maximum of 10 SSH connections per user ID applies to all SVC Consoles. Each SVC Console uses one SSH connection for each GUI session that is launched.

3.2.4 SSH key management

It is extremely important that the SSH key pairs are managed properly, because management communication with an SVC cluster relies on key-based SSH communications. Lost keys can lead to situations where an SVC cluster cannot be managed.

PuTTYgen is used for generating the SSH key pairs. A PuTTY-generated SSH key pair is required to successfully install an SVC cluster. This specific key pair allows the SVC Console software to communicate with the SVC cluster using the plink.exe PuTTY component. The private key part must be named icat.ppk, and icat.ppk must exist in the C:\Program Files\IBM\svcconsole\cimom directory of the SVC Console server. The public key part is uploaded to the SVC cluster during the initial setup.

As more users are added to an SVC cluster, more key pairs become active, because user separation on the SVC cluster is performed by using different SSH key pairs. After uploading the public key to the SVC cluster, there is no restriction on naming or on where to store the private key for the key pairs (other than the SVC Console key pair, which must be named icat.ppk). However, to increase manageability, we recommend the following actions for SSH key pairs that are used with an SVC cluster:

� Store the public key of the SVC Console key pair as icat.pub in the same directory as the icat.ppk key, which is C:\Program Files\IBM\svcconsole\cimom.

� Always store the public part and private part of an SSH key pair together.

� Name the public key and private key accordingly to allow easy matching.

For more information about SSH keys and how to use them to access the SVC cluster through the SVC Console GUI, or the SVC CLI, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06.

Important: It is essential to continuously maintain a valid backup of all of the SSH key pairs for an SVC cluster. You must store this backup in a safe and known location (definitely not on the SVC Console server), and the backup must be validated for integrity on a regular basis.


3.2.5 Administration roles

You can use role-based security to restrict the administrative abilities of a user at both an SVC Console level and an SVC cluster level.

When you use role-based security at the SVC Console, the view that is presented when opening a GUI session for an SVC cluster is adjusted to reflect the user role. For instance, a user with the Monitor role (Figure 3-7) cannot create a new MDisk group, but a user with the Administrator role (Figure 3-8 on page 48) can create a new MDisk group.

Figure 3-7 MDisk group actions available to SVC Console user with Monitor role


https://www-304.ibm.com/jct03004c/support/electronic/portal/!ut/p/_s.7_0_A/7_0_CI?category=4&locale=en_US

Figure 3-8 MDisk group actions available to SVC Console user with Administrator role

Implementing role-based security at the SVC cluster level implies that different key pairs are used for the SSH communication. When establishing an SSH session with the SVC cluster, available SVC CLI commands will be determined by the role that is associated with the SSH key that established the session.

When implementing role-based security at the SVC cluster level, it is important to understand that when using SSH key pairs with no associated password, anyone with access to the correct key can gain administrative rights on the SVC cluster. If a user with restricted rights can access the private key part of an SSH key pair that has administrative rights on the SVC cluster (such as the icat.ppk used by the SVC Console), a user can elevate that user’s rights. To prevent this situation, it is important that users can only access the SSH keys to which they are entitled. Furthermore, PuTTYgen supports associating a password with generated SSH key pairs at creation time. In conjunction with access control to SSH keys, associating a password with user-specific SSH key pairs is the recommended approach.

For more information about role-based security on the SVC and the commands that each user role can use, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06, and IBM System Storage SAN Volume Controller Command-Line Interface User’s Guide, S7002157.

Note: The SSH key pair used with the SVC Console software cannot have a password associated with it.


3.2.6 Audit logging

Audit logging is a useful and important tool for administrators. At a certain point in time, the administrators might have to prove or validate actions that they have performed on the hosts, storage subsystems, SAN switches, and, in particular, the SVC. An audit log for the SVC keeps track of action commands that are issued through a Secure Shell (SSH) session or through the SVC Console.

The SVC audit logging facility is always turned on.

To create a new audit log file, you must use the CLI to issue the command as shown in Example 3-2.

Example 3-2 Create a new audit log file

IBM_2145:ITSOCL1:admin>svctask dumpauditlogIBM_2145:ITSOCL1:admin>

The audit log file name is generated automatically in the following format:

auditlog_<firstseq>_<lastseq>_<timestamp>_<clusterid>

where

<firstseq> is the audit log sequence number of the first entry in the log

<lastseq> is the audit sequence number of the last entry in the log

<timestamp> is the time stamp of the last entry in the audit log being dumped

<clusterid> is the cluster ID at the time the dump was created

The audit log file that is created can be retrieved using either the SVC Console GUI or by using Secure Copy Protocol (SCP).

The audit log provides the following information:

� The identity of the user who issued the action command

� The name of the action command

� The time stamp of when the action command was issued by the configuration node

� The parameters that were issued with the action command

Important: Any user with access to the file system on the SVC Console server (in general, all users who can interactively log in to the operating system) can retrieve the icat.ppk SSH key and thereby gain administrative access to the SVC cluster. To prevent this general access, we recommend that the SVC Console GUI is accessed through a Web browser from another host. Only allow experienced Microsoft Windows Server professionals to implement additional file level access control in the operating system.

Note: The audit log file names cannot be changed.

Note: Certain commands are not logged in the audit log dump.


This list shows the commands that are not documented in the audit log:

� svctask dumpconfig

� svctask cpdumps

� svctask cleardumps

� svctask finderr

� svctask dumperrlog

� svctask dumpinternallog

� svcservicetask dumperrlog

� svcservicetask finderr

The audit log also tracks commands that failed.

We recommend that audit log data is collected on a regular basis and stored in a safe location. This procedure must take into account any regulations regarding information systems auditing.

3.2.7 IBM Support remote access to the SVC Console

The preferred method of IBM Support to remotely connect to an SVC cluster or the SVC Console is through the use of Assist on Site (AOS). The client is required to provide a workstation that is accessible from the outside and that can also access the SVC IP/Ethernet network. AOS provides multiple levels of access and interactions, which can be selected by the client, including:

� Chat� Shared screen view� Shared control� The capability for the client to end the session at any time� The option for the client to log the session locally

The client can allow IBM Support to control the AOS workstation while the client watches, or alternatively, the client can follow directions from IBM Support, which observes the client’s actions.

For further information regarding AOS, go to:

http://www-1.ibm.com/support/assistonsite/

3.2.8 SVC Console to SVC cluster connection problems

After adding a new SVC cluster to the SVC Console GUI, you might experience a “No Contact” availability status for the SVC cluster as shown in Figure 3-9 on page 51.


http://www-1.ibm.com/support/assistonsite/

Figure 3-9 Cluster with availability status of No Contact

There are two possible problems that might cause an SVC cluster status of “’No Contact.”

The SVC Console code level does not match the SVC cluster code level (for example, SVC Console code V2.1.0.x with SVC cluster code 4.2.0). To fix this problem, you need to install the corresponding SVC Console code that was mentioned in 3.1.3, “SVC cluster software and SVC Console compatibility” on page 39.

The CIMOM cannot execute the plink.exe command (PuTTY component). To test the connection, open a command prompt (cmd.exe) and go to the PuTTY installation directory. Common installation directories are C:\Support Utils\Putty and C:\Program Files\Putty. Execute the following command from this directory:

plink.exe admin@clusterIP -ssh -2 -i "c:\Programfiles\IBM\svcconsole\cimom\icat.ppk"

This command is shown in Example 3-3.

Example 3-3 Command execution

C:\Program Files\PuTTY>plink.exe [email protected] -ssh -2 -i "c:\Program files\IBM\svcconsole\cimom\icat.ppk"Using username "admin".Last login: Sun Jul 27 11:18:48 2008 from 9.43.86.115IBM_2145:itsosvccl1:admin>

In Example 3-3, we executed the command, and the connection was established. If the command fails, there are a few things to check:

� The location of the PuTTY executable does not match the SSHCLI path in the setupcmdline.bat used when installing the SVC Console software.

� The icat.ppk key needs to be in the C:\Program Files\IBM\svcconsole\cimom directory.

� The icat.ppk file found in the C:\Program Files\IBM\svcconsole\cimom directory needs to match the public key uploaded to the SVC cluster.


� The CIMOM can execute the plink.exe command, but the SVC cluster does not exist, it is offline, or the network is down. Check if the SVC cluster is up and running (check the front panel of the SVC nodes and use the arrow keys on the node to determine if the Ethernet port on the configuration node is active). Also, check that the IP address of the cluster matches the IP address that you have entered in the SVC Console. Then, check the IP/Ethernet settings on the SVC Console server and issue a ping to the SVC cluster IP address. If the ping command fails, check your IP/Ethernet network.

If the SVC cluster still reports “No Contact” after you have performed all of these actions on the SVC cluster, contact IBM Support.

3.2.9 Managing IDs and passwords

There are a number of user IDs and passwords needed for managing the SVC Console, the SVC cluster, the SVC CLI (SSH), TotalStorage Productivity Center (TPC) CIMOM, and SVC service mode. It is essential that all of these credentials are carefully tracked and stored in a safe and known location.

The important user IDs and passwords are:

� SVC Console: Login and password

� SVC Console server: Login and password to operating system

� SVC Cluster: Login and password

� SVC Service mode: Login and password

� SVC CLI (SSH): Private and public key

� TPC CIMOM: User and password (same as SVC Console)

Failing to remember a user ID, a password, or an SSH key can lead to not being able to manage parts of an SVC installation. Certain user IDs, passwords, or SSH keys can be recovered or changed, but several of them are fixed and cannot be recovered:

� SVC Console server: You cannot access the SVC Console server. Password recovery depends on the operating system. The administrator will need to recover the lost or forgotten user ID and password.

� SVC Cluster: You cannot access the cluster through the SAN Console without this password. Allow the password reset option during the cluster creation. If the password reset is not enabled, issue the svctask setpwdreset SVC CLI command to view and change the status of the password reset feature for the SAN Volume Controller front panel. Refer to Example 3-4 on page 53.

� SVC Service mode: You cannot access the SVC cluster when it is in service mode. Reset the password in the SVC Console GUI using the Maintain Cluster Passwords feature.

� SVC CLI (PuTTY): You cannot access the SVC cluster through the CLI. Create a new private and public key pair.

� SVC Console: You cannot access the SVC cluster through the SVC Console GUI. Remove and reinstall the SVC Console GUI. Use the default user and password and change the user ID and password during the first logon.

� TPC CIMOM: Same user and password as the SVC Console.

When creating a cluster, be sure to select the option Allow password reset from front panel as shown in Figure 3-10 on page 53. You see this option during the initial cluster creation. For additional information, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06.


Figure 3-10 Select the password reset policy

This option allows access to the cluster if the admin password is lost. If the password reset feature was not enabled during the cluster creation, use the svctask setpwdreset -enable CLI command to enable it. Example 3-4 shows how to determine the current status (a zero indicates that the password reset feature is disabled) and afterwards how to enable it (a one indicates that the password reset feature is enabled).

Example 3-4 Enable password reset by using CLI

IBM_2145:itsosvccl1:admin>svctask setpwdreset -showPassword status: [0]IBM_2145:itsosvccl1:admin>svctask setpwdreset -enableIBM_2145:itsosvccl1:admin>svctask setpwdreset -showPassword status: [1]

3.2.10 Saving the SVC configuration

The SVC configuration will be backed up every day at 01:00 a.m. depending on the time zone. There is no way to change the backup schedule on the SVC. In addition to the automated configuration backup, it is possible to create a new backup by user intervention. You can either run the backup command on the SVC CLI or issue a configuration backup from the SVC Console GUI.

The SVC cluster maintains two copies of the configuration file:

� svc.config.backup.xml � svc.config.backup.bak

These backup files contain information about the current SVC cluster configuration, such as:

� Code level� Name and IP address� MDisks� Managed Disk Groups (MDGs)� VDisks� Hosts� Storage controllers

If the SVC cluster has experienced a major problem and IBM Support has to rebuild the configuration structure, the svc.config.backup.xml file is necessary.

Before making major changes on your SVC cluster, such as SVC cluster code upgrades, storage controller changes, or SAN changes, we recommend that you create a new backup of the SVC configuration.

Note: The configuration backup does not include any data from any MDisks. The configuration backup only saves SVC cluster configuration data.


Creating a new configuration backup using the SVC CLITo create a configuration backup file from the SVC CLI, open an SSH connection and run the command svcconfig backup, as shown in Example 3-5.

Example 3-5 Running the SVC configuration backup (lines removed for clarity)

IBM_2145:itsosvccl1:admin>svcconfig backup......CMMVC6130W Inter-cluster partnership fully_configured will not be restored..CMMVC6112W controller controller0 has a default name...CMMVC6112W mdisk mdisk1 has a default name................CMMVC6136W No SSH key file svc.config.admin.admin.keyCMMVC6136W No SSH key file svc.config.test.admin.key......................................CMMVC6155I SVCCONFIG processing completed successfully

After the backup file is created, it can be retrieved from the SSH cluster using SSH Secure Copy.

Creating a configuration backup using the SVC Console GUITo create a configuration backup file from the SVC Console GUI, you must open the Service and Maintenance panel and run the Backup Configuration task as shown in Figure 3-11.

Figure 3-11 Backing up the SVC configuration

As in the case with the SVC CLI, a new svc.config.backup.xml_Node-1 file will appear in the List Dumps section.

Automated configuration backupWe recommend that you periodically copy the configuration backup files off of the SVC cluster and store them in a safe location. There is a guide that explains how to set up a manual or scheduled task for the SVC Console server at:

http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&q1=pageant&uid=ssg1S1002175&loc=en_US&cs=utf-8&lang=en




3.2.11 Restoring the SVC cluster configuration

Do not attempt to restore the SVC configuration on your own. Call IBM Support and have them help you restore the configuration. Make sure that all other components are working as expected. For more information about common errors, refer to Chapter 14, “Troubleshooting and diagnostics” on page 269.

If you are unsure about what to do, call IBM Support and let them help you collect the necessary data.



Chapter 4. Storage controller

In this chapter, we discuss the following topics:

� Controller affinity and preferred path

� Pathing considerations for EMC Symmetrix/DMX and HDS

� Logical unit number (LUN) ID to MDisk translation

� MDisk to VDisk mapping

� Mapping physical logical block addresses (LBAs) to extents

� Media error logging

� Selecting array and cache parameters

� Considerations for controller configuration

� LUN masking

� Worldwide port name (WWPN) to physical port translation

� Using TotalStorage Productivity Center (TPC) to identify storage controller boundaries

� Using TPC to measure storage controller performance

4


4.1 Controller affinity and preferred path

In this section, we describe the architectural differences between common storage subsystems in terms of controller “affinity” (also referred to as preferred controller) and “preferred path.” In this context, affinity refers to the controller in a dual-controller subsystem that has been assigned access to the back-end storage for a specific LUN under nominal conditions (that is to say, both controllers are active). Preferred path refers to the host side connections that are physically connected to the controller that has the assigned affinity for the corresponding LUN being accessed.

All storage subsystems that incorporate a dual-controller architecture for hardware redundancy employ the concept of “affinity.” For example, if a subsystem has 100 LUNs, 50 of them have an affinity to controller 0, and 50 of them have an affinity to controller 1. This means that only one controller is serving any specific LUN at any specific instance in time; however, the aggregate workload for all LUNs is evenly spread across both controllers. This relationship exists during normal operation; however, each controller is capable of controlling all 100 LUNs in the event of a controller failure.

For the DS4000™ and DS6000™, preferred path is important, because Fibre Channel cards are integrated into the controller. This architecture allows “dynamic” multipathing and “active/standby” pathing through Fibre Channel cards that are attached to the same controller (the SVC does not support dynamic multipathing) and an alternate set of paths, which are configured to the other controller that will be used if the corresponding controller fails.

For example, if each controller is attached to hosts through two Fibre Channel ports, 50 LUNs will use the two Fibre Channel ports in controller 0, and 50 LUNs will use the two Fibre Channel ports in controller 1. If either controller fails, the multipathing driver will fail the 50 LUNs associated with the failed controller over to the other controller and all 100 LUNs will use the two ports in the remaining controller. The DS4000 differs from the DS6000 and DS8000, because it has the capability to transfer ownership of LUNs at the LUN level as opposed to the controller level.

For the DS8000 and the Enterprise Storage Server® (ESS), the concept of preferred path is not used, because Fibre Channel cards are outboard of the controllers, and therefore, all Fibre Channel ports are available to access all LUNs regardless of cluster affinity. While cluster affinity still exists, the network between the outboard Fibre Channel ports and the controllers performs the appropriate controller “routing” as opposed to the DS4000 and DS6000 where controller routing is performed by the multipathing driver in the host, such as with IBM Subsystem Device Driver (SDD) and Redundant Disk Array Controller (RDAC).

4.1.1 ADT for DS4000

The DS4000 has a feature called Auto Logical Drive Transfer (ADT). This feature allows logical drive level failover as opposed to controller level failover. When you enable this option, the DS4000 moves LUN ownership between controllers according to the path used by the host.

For the SVC, the ADT feature is enabled by default when you select the “IBM TS SAN VCE” host type when you configure the DS4000.

Note: It is important that you select the “IBM TS SAN VCE” host type when configuring the DS4000 for SVC attachment in order to allow the SVC to properly manage the back-end paths. If the host type is incorrect, SVC will report a 1625 (“incorrect controller configuration”) error.


Refer to Chapter 14, “Troubleshooting and diagnostics” on page 269 for information regarding checking the back-end paths to storage controllers.

4.1.2 Ensuring path balance prior to MDisk discovery

It is important that LUNs are properly balanced across storage controllers prior to performing MDisk discovery. Failing to properly balance LUNs across storage controllers in advance can result in a suboptimal pathing configuration to the back-end disks, which can cause a performance degradation. Ensure that storage subsystems have all controllers online and that all LUNs have been distributed to their preferred controller (local affinity) prior to performing MDisk discovery. Pathing can always be rebalanced later, however, often not until after lengthy problem isolation has taken place.

If you discover that the LUNs are not evenly distributed across the dual controllers in a DS4000, you can dynamically change the LUN affinity. However, the SVC will move them back to the original controller, and the DS4000 will generate an error indicating that the LUN is no longer on its preferred controller. To correct this situation, you need to run the SVC command svctask detectmdisk or use the GUI option “Discover MDisks.” SVC will query the DS4000 again and access the LUNs through the new preferred controller configuration.

4.2 Pathing considerations for EMC Symmetrix/DMX and HDS

There are certain storage controller types that present a unique worldwide node name (WWNN) and worldwide port name (WWPN) for each port. This action can cause problems when attached to the SVC, because the SVC enforces a WWNN maximum of four per storage controller.

Because of this behavior, you must be sure to group the ports if you want to connect more than four target ports to an SVC. Refer to the IBM System Storage SAN Volume Controller Software Installation and Configuration Guide Version 4.3.0, SC23-6628-02, for instructions.

4.3 LUN ID to MDisk translation

The “Controller LUN Number” for MDisks is returned from the storage controllers in the “Report LUNs Data.” The following sections show how to decode the LUN ID from the report LUNs data for storage controllers ESS, DS6000, and DS8000.

4.3.1 ESS

The ESS uses 14 bits to represent the LUN ID, which ESS storage specialist displays in hexadecimal (that is, it is in the range 0x0000 to 0x3FFF). To convert this 14 bits to the SVC “Controller LUN Number”:

� Add 0x4000 to the LUN ID� Append ‘00000000’

For example, LUN ID 1723 on an ESS corresponds to SVC controller LUN 572300000000.

Chapter 4. Storage controller 59

4.3.2 DS6000 and DS8000

The DS6000 and DS8000 use 16 bits to represent the LUN ID, which decodes as:

40XX40YY0000 = XXYY = 16 bit LUN ID

The LUN ID will only uniquely identify LUNs within the same storage controller. If multiple storage devices are attached to the same SVC cluster, the LUN ID needs to be combined with the WWNN attribute in order to uniquely identify LUNs within the SVC cluster. The SVC does not contain an attribute to identify the storage controller serial number; however, the Controller Name field can be used for this purpose and will simplify the LUN ID to MDisk translation.

The Controller Name field is populated with a default value at the time that the storage controller is initially configured to the SVC cluster. You must modify this field by using the SVC console selections: Work with Managed Disk → Disk Storage Controller → Rename a Disk Controller System.

Figure 4-1 shows LUN ID fields that are displayed from the DS8000 Storage Manager. LUN ID 1105, for example, appears as 401140050000 in the Controller LUN Number field on the SVC, which is shown in Figure 4-2 on page 61.

Figure 4-1 DS8K Storage Manager GUI

Best Practice: Include the storage controller serial number in the naming convention for the Controller Name field. For example, use DS8kABCDE for serial number 75-ABCDE.


Figure 4-2 MDisk details

From the MDisk details panel in Figure 4-2, the Controller LUN Number field is 4011400500000000, which translates to LUN ID 0x1105 (represented in Hex).

We can also identify the storage controller from the Controller Name as DS8K7598654, which had been manually assigned.

4.4 MDisk to VDisk mapping

There are instances where it is necessary to map an MDisk back to VDisks in order to determine the potential impact that a failing MDisk might have on attached hosts.

You can use the lsvdiskextent CLI command to obtain this information.

The lsmdiskextent output in Example 4-1 on page 62 shows a list of VDisk IDs that have extents allocated to mdisk14 along with the number of extents. The GUI also has a drop-down option to perform the same function for VDisks and MDisks.

Note: The command line interface (CLI) references the Controller LUN Number as ctrl_LUN_#.


Example 4-1 The lsmdiskextent command

IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk14id number_of_extents copy_id5 16 03 16 06 16 08 13 19 23 08 25 0

4.5 Mapping physical LBAs to VDisk extents

SVC 4.3 provides new functionality, which makes it easy to find the VDisk extent to which a physical MDisk LBA maps and to find the physical MDisk LBA to which the VDisk extent maps. There are a number of situations where this functionality might be useful:

� If a storage controller reports a medium error on a logical drive, but SVC has not yet taken MDisks offline, you might want to establish which VDisks will be affected by the medium error.

� When investigating application interaction with Space-Efficient VDisks (SEV), it can be useful to find out whether a given VDisk LBA has been allocated or not. If an LBA has been allocated when it has not intentionally been written to, it is possible that the application is not designed to work well with SEV.

The two new commands are svctask lsmdisklba and svctask lsvdisklba. Their output varies depending on the type of VDisk (for example, Space-Efficient as opposed to fully allocated) and type of MDisk (for example, quorum as opposed to non-quorum). For full details, refer to the SVC 4.3 Software Installation and Configuration Guide, SC23-6628-02.

4.5.1 Investigating a medium error using lsvdisklba

Assume that a medium error has been reported by the storage controller, at LBA 0x00172001 of MDisk 6. Example 4-2 shows the command that we use to discover which VDisk will be affected by this error.

Example 4-2 Using lsvdisklba to investigate the effect of an MDisk medium error

IBM_2145:itsosvccl1:admin>svcinfo lsvdisklba -mdisk 6 -lba 0x00172001vdisk_id vdisk_name copy_id type LBA vdisk_start vdisk_end mdisk_start mdisk_end0 diomede0 0 allocated 0x00102001 0x00100000 0x0010FFFF 0x00170000 0x0017FFFF

This output shows:

� This LBA maps to LBA 0x00102001 of VDisk 0.� The LBA is within the extent that runs from 0x00100000 to 0x0010FFFF on the VDisk and

from 0x00170000 to 0x0017FFFF on the MDisk (so, the extent size of this Managed Disk Group (MDG) is 32 MB).

So, if the host performs I/O to this LBA, the MDisk goes offline.


4.5.2 Investigating Space-Efficient VDisk allocation using lsmdisklba

After using an application to perform I/O to a Space-Efficient VDisk, you might want to check which extents have been allocated real capacity. You can check which extents have been allocated real capacity with the svcinfo lsmdisklba command.

Example 4-3 shows the difference in output between an allocated and an unallocated part of a VDisk.

Example 4-3 Using lsmdisklba to check whether an extent has been allocated

IBM_2145:itsosvccl1:admin>svcinfo lsmdisklba -vdisk 0 -lba 0x0copy_id mdisk_id mdisk_name type LBA mdisk_start mdisk_end vdisk_start vdisk_end0 6 mdisk6 allocated 0x00050000 0x00050000 0x0005FFFF 0x00000000 0x0000FFFF

IBM_2145:itsosvccl1:admin>svcinfo lsmdisklba -vdisk 14 -lba 0x0copy_id mdisk_id mdisk_name type LBA mdisk_start mdisk_end vdisk_start vdisk_end0 unallocated 0x00000000 0x0000003F

VDisk 0 is a fully allocated VDisk, so the MDisk LBA information is displayed as in Example 4-2 on page 62.

VDisk 14 is a Space-Efficient VDisk to which the host has not yet performed any I/O; all of its extents are unallocated. Therefore, the only information shown by lsmdisklba is that it is unallocated and that this Space-Efficient grain starts at LBA 0x00 and ends at 0x3F (the grain size if 32 KB).

4.6 Medium error logging

Medium errors on back-end MDisks can be encountered by Host I/O and by SVC background functions, such as VDisk migration and FlashCopy. In this section, we describe the detailed sense data for medium errors presented to the host and the SVC.

4.6.1 Host-encountered media errors

Data checks encountered on a VDisk from a host read request will return check condition status with Key/Code/Qualifier = 030000.

Example 4-4 on page 64 shows an example of the detailed sense data returned to an AIX® host for an unrecoverable medium error.


Example 4-4 Sense data

LABEL: SC_DISK_ERR2IDENTIFIER: B6267342

Date/Time: Thu Aug 5 10:49:35 2008Sequence Number: 4334Machine Id: 00C91D3B4C00Node Id: testnodeClass: HType: PERMResource Name: hdisk34Resource Class: diskResource Type: 2145Location: U7879.001.DQDFLVP-P1-C1-T1-W5005076801401FEF-L4000000000000VPD: Manufacturer................IBM Machine Type and Model......2145 ROS Level and ID............0000 Device Specific.(Z0)........0000043268101002 Device Specific.(Z1)........0200604 Serial Number...............60050768018100FF78000000000000F6

SENSE DATA0A00 2800 001C ED00 0000 0104 0000 0000 0000 0000 0000 0000 0102 0000 F000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

From the sense byte decode:

� Byte 2 = SCSI Op Code (28 = 10-Byte Read)

� Bytes 4 - 7 = LBA (Logical Block Address for VDisk)

� Byte 30 = Key

� Byte 40 = Code

� Byte 41 = Qualifier

4.6.2 SVC-encountered medium errors

Medium errors encountered by VDisk migration, FlashCopy, or VDisk Mirroring on the source disk are logically transferred to the corresponding destination disk for a maximum of 32 medium errors. If the 32 medium error limit is reached, the associated copy operation will terminate. Attempts to read destination error sites will result in medium errors just as though attempts were made to read the source media site.

Data checks encountered by SVC background functions are reported in the SVC error log as 1320 errors. The detailed sense data for these errors indicates a check condition status with Key/Code/Qualifier = 03110B.

Example 4-5 shows an example of an SVC error log entry for an unrecoverable media error.


Example 4-5 Error log entry

Error Log Entry 1965 Node Identifier : Node7 Object Type : mdisk Object ID : 48 Sequence Number : 7073 Root Sequence Number : 7073 First Error Timestamp : Thu Jul 24 17:44:13 2008 : Epoch + 1219599853 Last Error Timestamp : Thu Jul 24 17:44:13 2008 : Epoch + 1219599853 Error Count : 21

Error ID : 10025 : Amedia error has occurred during I/O to a Managed Disk Error Code : 1320 : Disk I/O medium error Status Flag : FIXED Type Flag : TRANSIENT ERROR 40 11 40 02 00 00 00 00 00 00 00 02 28 00 58 59 6D 80 00 00 40 00 00 00 00 00 00 00 00 00 80 00 04 02 00 02 00 00 00 00 00 01 0A 00 00 80 00 00 02 03 11 0B 80 6D 59 58 00 00 00 00 08 00 C0 AA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 04 00 00 00 10 00 02 01

Where the sense byte decodes as:

� Byte 12 = SCSI Op Code (28 = 10-Byte Read)

� Bytes 14 - 17 = LBA (Logical Block Address for MDisk)

� Bytes 49 - 51 = Key/Code/Qualifier

Important: Attempting to locate medium errors on MDisks by scanning VDisks with host applications, such as dd, or using SVC background functions, such as VDisk migrations and FlashCopy, can cause the Managed Disk Group (MDG) to go offline as a result of error handling behavior in current levels of SVC microcode. This behavior will change in future levels of SVC microcode. Check with support prior to attempting to locate medium errors by any of these means.

Notes:

� Medium errors encountered on VDisks will log error code 1320 “Disks I/O Medium Error.”

� If more than 32 medium errors are found while data is being copied from one VDisk to another VDisk, the copy operation will terminate and log error code 1610 “Too many medium errors on Managed Disk.”


4.7 Selecting array and cache parameters

In this section, we describe the optimum array and cache parameters.

4.7.1 DS4000 array width

With Redundant Array of Independent Disks 5 (RAID 5) arrays, determining the number of physical drives to put into an array always presents a compromise. Striping across a larger number of drives can improve performance for transaction-based workloads. However, striping can also have a negative effect on sequential workloads. A common mistake that people make when selecting array width is the tendency to focus only on the capability of a single array to perform various workloads. However, you must also consider in this decision the aggregate throughput requirements of the entire storage server. A large number of physical disks in an array can create a workload imbalance between the controllers, because only one controller of the DS4000 actively accesses a specific array.

When selecting array width, you must also consider its effect on rebuild time and availability.

A larger number of disks in an array increases the rebuild time for disk failures, which can have a negative effect on performance. Additionally, more disks in an array increase the probability of having a second drive fail within the same array prior to the rebuild completion of an initial drive failure, which is an inherent exposure to the RAID 5 architecture.

4.7.2 Segment size

With direct-attached hosts, considerations are often made to align device data partitions to physical drive boundaries within the storage controller. For the SVC, aligning device data partitions to physical drive boundaries within the storage controller is less critical based on the caching that the SVC provides and based on the fact that there is less variation in its I/O profile, which is used to access back-end disks.

Because the maximum destage size for the SVC is 32 KB, it is impossible to achieve full stride writes for random workloads. For the SVC, the only opportunity for full stride writes occurs with large sequential workloads, and in that case, the larger the segment size is, the better. Larger segment sizes can adversely affect random I/O, however. The SVC and controller cache do a good job of hiding the RAID 5 write penalty for random I/O, and therefore, larger segment sizes can be accommodated. The primary consideration for selecting segment size is to ensure that a single host I/O will fit within a single segment to prevent accessing multiple physical drives.

Testing has shown that the best compromise for handling all workloads is to use a segment size of 256 KB.

Best practice: For the DS4000, we recommend array widths of 4+p and 8+p.

Best practice: We recommend a segment size of 256 KB as the best compromise for all workloads.


Cache block sizeThe DS4000 uses a 4 KB cache block size by default; however, it can be changed to 16 KB.

For the earlier models of DS4000 using the 2 Gb Fibre Channel (FC) adapters, the 4 KB block size performed better for random I/O, and 16 KB performs better for sequential I/O. However, because most workloads contain a mix of random and sequential I/O, the default values have proven to be the best choice. For the higher performing DS4700 and DS4800, the 4 KB block size advantage for random I/O has become harder to see. Because most client workloads involve at least some sequential workload, the best overall choice for these models is the 16 KB block size.

Table 4-1 is a summary of the recommended SVC and DS4000 values.

Table 4-1 Recommended SVC values

4.7.3 DS8000

For the DS8000, you cannot tune the array and cache parameters. The arrays will be either 6+p or 7+p, depending on whether the array site contains a spare and whether the segment size (contiguous amount of data that is written to a single disk) is 256 KB for fixed block volumes. Caching for the DS8000 is done on a 64 KB track boundary.

4.8 Considerations for controller configuration

In this section, we discuss controller configuration considerations.

4.8.1 Balancing workload across DS4000 controllers

A best practice when creating arrays is to spread the disks across multiple controllers, as well as alternating slots, within the enclosures. This practice improves the availability of the array by protecting against enclosure failures that affect multiple members within the array, as well as improving performance by distributing the disks within an array across drive loops. You

Best practice:

� For the DS4000, leave the cache block size at the default value of 4 KB.

� For the DS4700 and DS4800 models, set the cache block size to 16 KB.

Models Attribute Value

SVC Extent size (MB) 256

SVC Managed mode Striped

DS4000 Segment size (KB) 256

DS4000 Cache block size (KB) 4 KB (default)

DS4700/DS4800 Cache block size (KB) 16 KB

DS4000 Cache flush control 80/80 (default)

DS4000 Readahead 1

DS4000 RAID 5 4+p, 8+p


spread the disks across multiple controllers, as well as alternating slots, within the enclosures by using the manual method for array creation.

Figure 4-3 shows a Storage Manager view of a 2+p array that is configured across enclosures. Here, we can see that each disk of the three disks is represented in a separate physical enclosure and that slot positions alternate from enclosure to enclosure.

Figure 4-3 Storage Manager

4.8.2 Balancing workload across DS8000 controllers

When configuring storage on the IBM System Storage DS8000 disk storage subsystem, it is important to ensure that ranks on a device adapter (DA) pair are evenly balanced between odd and even extent pools. Failing to do this can result in a considerable performance degradation due to uneven device adapter loading.

The DS8000 assigns server (controller) affinity to ranks when they are added to an extent pool. Ranks that belong to an even-numbered extent pool have an affinity to server0, and ranks that belong to an odd-numbered extent pool have an affinity to server1.

Figure 4-4 on page 69 shows an example of a configuration that will result in a 50% reduction in available bandwidth. Notice how arrays on each of the DA pairs are only being accessed by one of the adapters. In this case, all ranks on DA pair 0 have been added to even-numbered extent pools, which means that they all have an affinity to server0, and therefore, the adapter in server1 is sitting idle. Because this condition is true for all four DA pairs, only half of the adapters are actively performing work. This condition can also occur on a subset of the configured DA pairs.


Figure 4-4 DA pair reduced bandwidth configuration

Example 4-6 shows what this invalid configuration looks like from the CLI output of the lsarray and lsrank commands. The important thing to notice here is that arrays residing on the same DA pair contain the same group number (0 or 1), meaning that they have affinity to the same DS8000 server (server0 is represented by group0 and server1 is represented by group1).

As an example of this situation, arrays A0 and A4 can be considered. They are both attached to DA pair 0, and in this example, both arrays are added to an even-numbered extent pool (P0 and P4). Doing so means that both ranks have affinity to server0 (represented by group0), leaving the DA in server1 idle.

Example 4-6 Command output

dscli> lsarray -lDate/Time: Aug 8, 2008 8:54:58 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321Array State Data RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass===================================================================================A0 Assign Normal 5 (6+P+S) S1 R0 0 146.0 ENTA1 Assign Normal 5 (6+P+S) S9 R1 1 146.0 ENTA2 Assign Normal 5 (6+P+S) S17 R2 2 146.0 ENTA3 Assign Normal 5 (6+P+S) S25 R3 3 146.0 ENTA4 Assign Normal 5 (6+P+S) S2 R4 0 146.0 ENTA5 Assign Normal 5 (6+P+S) S10 R5 1 146.0 ENTA6 Assign Normal 5 (6+P+S) S18 R6 2 146.0 ENTA7 Assign Normal 5 (6+P+S) S26 R7 3 146.0 ENT

dscli> lsrank -lDate/Time: Aug 8, 2008 8:52:33 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts======================================================================================R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779


R4 0 Normal Normal A4 5 P4 extpool4 fb 779 779R5 1 Normal Normal A5 5 P5 extpool5 fb 779 779R6 0 Normal Normal A6 5 P6 extpool6 fb 779 779R7 1 Normal Normal A7 5 P7 extpool7 fb 779 779

Figure 4-5 shows an example of a correct configuration that balances the workload across all four DA pairs.

Figure 4-5 DA pair correct configuration

Example 4-7 shows what this correct configuration looks like from the CLI output of the lsrank command. The configuration from the lsarray output remains unchanged. Notice that arrays residing on the same DA pair are now split between groups 0 and 1. Looking at arrays A0 and A4 once again now shows that they have different affinities (A0 to group0, A4 group1). To achieve this correct configuration, what has been changed compared to Example 4-6 on page 69 is that array A4 now belongs to an odd-numbered extent pool (P5).

Example 4-7 Command output

dscli> lsrank -lDate/Time: Aug 9, 2008 2:23:18 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts======================================================================================R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779R4 1 Normal Normal A4 5 P5 extpool5 fb 779 779R5 0 Normal Normal A5 5 P4 extpool4 fb 779 779R6 1 Normal Normal A6 5 P7 extpool7 fb 779 779R7 0 Normal Normal A7 5 P6 extpool6 fb 779 779


4.8.3 DS8000 ranks to extent pools mapping

When configuring the DS8000, two different approaches for the rank to extent pools mapping exist:

� One rank per extent pool� Multiple ranks per extent pool using DS8000 Storage Pool Striping (SPS)

The most common approach is to map one rank to one extent pool, which provides good control for volume creation, because it ensures that all volume allocation from the selected extent pool will come from the same rank.

The SPS feature became available with the R3 microcode release for the DS8000 series and effectively means that a single DS8000 volume can be striped across all the ranks in an extent pool (therefore, the functionality is often referred as “extent pool striping”). So, if a given extent pool includes more than one rank, a volume can be allocated using free space from several ranks (which also means that SPS can only be enabled at volume creation, no reallocation is possible).

The SPS feature requires that your DS8000 layout has been well thought-out from the beginning to utilize all resources in the DS8000. If this is not done, SPS might cause severe performance problems (for example, if configuring a heavily loaded extent pool with multiple ranks from the same DA pair). Because the SVC itself stripes across MDisks, the SPS feature is not as relevant here as when accessing the DS8000 directly.

4.8.4 Mixing array sizes within an MDG

Mixing array sizes within an MDG in general is not of concern. Testing has shown no measurable performance differences between selecting all 6+p arrays and all 7+p arrays as opposed to mixing 6+p arrays and 7+p arrays. In fact, mixing array sizes can actually help balance workload, because it places more data on the ranks that have the extra performance capability provided by the eighth disk. There is one small exposure here in the case where an insufficient number of the larger arrays are available to handle access to the higher capacity. In order to avoid this situation, ensure that the smaller capacity arrays do not represent more than 50% of the total number of arrays within the MDG.

4.8.5 Determining the number of controller ports for ESS/DS8000

Configure a minimum of eight controller ports to the SVC per controller regardless of the number of nodes in the cluster. Configure 16 controller ports for large controller configurations where more than 48 ranks are being presented to the SVC cluster.

Additionally, we recommend that no more than two ports of each of the DS8000’s 4-port adapters are used.

Best practice: Configure one rank per extent pool if using DS8000 R1 or R2 microcode versions.

If using DS8000 R3 or later microcode versions, only configure Storage Pool Striping after contacting IBM to have your design verified.

Best practice: When mixing 6+p arrays and 7+p arrays in the same MDG, avoid having smaller capacity arrays comprise more than 50% of the arrays.


Table 4-2 shows the recommended number of ESS/DS8000 ports and adapters based on rank count.

Table 4-2 Recommended number of ports and adapters

The ESS and DS8000 populate Fibre Channel (FC) adapters across two to eight I/O enclosures, depending on configuration. Each I/O enclosure represents a separate hardware domain.

Ensure that adapters configured to different SAN networks do not share the same I/O enclosure as part of our goal of keeping redundant SAN networks isolated from each other.

4.8.6 Determining the number of controller ports for DS4000

The DS4000 must be configured with two ports per controller for a total of four ports per DS4000.

4.9 LUN masking

For a given storage controller, all SVC nodes must see the same set of LUNs from all target ports that have logged into the SVC nodes. If target ports are visible to the nodes that do not have the same set of LUNs assigned, SVC treats this situation as an error condition and generates error code 1625.

Validating the LUN masking from the storage controller and then confirming the correct path count from within the SVC are critical.

Example 4-8 shows four LUNs being presented from a DS8000 storage controller to a 4-node SVC cluster.

The DS8000 performs LUN masking based on volume group. Example 4-8 shows showvolgrp output for volume group V0, which contains four LUNs.

Example 4-8 The showvolgrp command output

dscli> showvolgrp -dev IBM.2107-75ALNN1 V0Date/Time: August 15, 2008 10:12:33 AM PDT IBM DSCLI Version: 5.0.4.43 DS: IBM.2107-75ALNN1Name SVCVG0 ID V0Type SCSI Mask

Ranks Ports Adapters

2 - 48 8 4 - 8

> 48 16 8 - 16

Best practices that we recommend:

� Configure a minimum of eight ports per DS8000.

� Configure 16 ports per DS8000 when > 48 ranks are presented to the SVC cluster.

� Configure a maximum of two ports per four port DS8000 adapter.

� Configure adapters across redundant SAN networks from different I/O enclosures.


Vols 1000 1001 1004 1005

Example 4-9 shows lshostconnect output from the DS8000. Here, you can see that all 16 ports of the 4-node cluster are assigned to the same volume group (V0) and, therefore, have been assigned to the same four LUNs.

Example 4-9 The lshostconnect command output

dscli> lshostconnect -dev IBM.2107-75ALNN1Date/Time: August 14, 2008 11:51:31 AM PDT IBM DSCLI Version: 5.0.4.43 DS: IBM.2107-75ALNN1Name ID WWPN HostType Profile portgrp volgrpID ESSIOport===============================================================================svcnode 0000 5005076801302B3E SVC San Volume Controller 0 V0 allsvcnode 0001 5005076801302B22 SVC San Volume Controller 0 V0 allsvcnode 0002 5005076801202D95 SVC San Volume Controller 0 V0 allsvcnode 0003 5005076801402D95 SVC San Volume Controller 0 V0 allsvcnode 0004 5005076801202BF1 SVC San Volume Controller 0 V0 allsvcnode 0005 5005076801402BF1 SVC San Volume Controller 0 V0 allsvcnode 0006 5005076801202B3E SVC San Volume Controller 0 V0 allsvcnode 0007 5005076801402B3E SVC San Volume Controller 0 V0 allsvcnode 0008 5005076801202B22 SVC San Volume Controller 0 V0 allsvcnode 0009 5005076801402B22 SVC San Volume Controller 0 V0 allsvcnode 000A 5005076801102D95 SVC San Volume Controller 0 V0 allsvcnode 000B 5005076801302D95 SVC San Volume Controller 0 V0 allsvcnode 000C 5005076801102BF1 SVC San Volume Controller 0 V0 allsvcnode 000D 5005076801302BF1 SVC San Volume Controller 0 V0 allsvcnode 000E 5005076801102B3E SVC San Volume Controller 0 V0 allsvcnode 000F 5005076801102B22 SVC San Volume Controller 0 V0 allfd11asys 0010 210100E08BA5A4BA VMWare VMWare 0 V1 allfd11asys 0011 210000E08B85A4BA VMWare VMWare 0 V1 allmdms024_fcs0 0012 10000000C946AB14 pSeries IBM pSeries - AIX 0 V2 allmdms024_fcs1 0013 10000000C94A0B97 pSeries IBM pSeries - AIX 0 V2 allparker_fcs0 0014 10000000C93134B3 pSeries IBM pSeries - AIX 0 V3 allparker_fcs1 0015 10000000C93139D9 pSeries IBM pSeries - AIX 0 V3 all

Additionally, you can see from the lshostconnect output that only the SVC WWPNs are assigned to V0.

Next, we show you how the SVC sees these LUNs if the zoning is properly configured.

The Managed Disk Link Count represents the total number of MDisks presented to the SVC cluster.

Figure 4-6 on page 74 shows the output storage controller general details. To display this panel, we selected Work with Managed Disks → Disk Controller Systems → View General Details.

In this case, we can see that the Managed Disk Link Count is 4, which is correct for our example.

Important: Data corruption can occur if LUNs are assigned to both SVC nodes and non-SVC nodes, that is, direct-attached hosts.


Figure 4-6 Viewing General Details

Figure 4-7 shows the storage controller port details. To get to this panel, we selected Work with Managed Disks → Disk Controller Systems → View General Details → Ports.

Figure 4-7 Viewing Port Details

Here, a path represents a connection from a single node to a single LUN. Because we have four nodes and four LUNs in this example configuration, we expect to see a total of 16 paths with all paths evenly distributed across the available storage ports. We have validated that


this configuration is correct, because we see eight paths on one WWPN and eight paths on the other WWPN for a total of 16 paths.

4.10 WWPN to physical port translation

Storage controller WWPNs can be translated to physical ports on the controllers for isolation and debugging purposes. Additionally, you can use this information for validating redundancy across hardware boundaries.

In Example 4-10, we show the WWPN to physical port translations for the ESS.

Example 4-10 ESS

WWPN format for ESS = 5005076300XXNNNN

XX = adapter location within storage controller NNNN = unique identifier for storage controller

Bay R1-B1 R1-B1 R1-B1 R1-B1 R1-B2 R1-B2 R1-B2 R1-B2Slot H1 H2 H3 H4 H1 H2 H3 H4XX C4 C3 C2 C1 CC CB CA C9

Bay R1-B3 R1-B3 R1-B3 R1-B3 R1-B4 R1-B4 R1-B4 R1-B4Slot H1 H2 H3 H4 H1 H2 H3 H4XX C8 C7 C6 C5 D0 CF CE CD

In Example 4-11, we show the WWPN to physical port translations for the DS8000.

Example 4-11 DS8000

WWPN format for DS8000 = 50050763030XXYNNN

XX = adapter location within storage controller Y = port number within 4-port adapter NNN = unique identifier for storage controller

IO Bay B1 B2 B3 B4 Slot S1 S2 S4 S5 S1 S2 S4 S5 S1 S2 S4 S5 S1 S2 S4 S5 XX 00 01 03 04 08 09 0B 0C 10 11 13 14 18 19 1B 1C

IO Bay B5 B6 B7 B8 Slot S1 S2 S4 S5 S1 S2 S4 S5 S1 S2 S4 S5 S1 S2 S4 S5 XX 20 21 23 24 28 29 2B 2C 30 31 33 34 38 39 3B 3C

Port P1 P2 P3 P4 Y 0 4 8 C

4.11 Using TPC to identify storage controller boundaries

It is often desirable to map the virtualization layer to determine which VDisks and hosts are utilizing resources for a specific hardware boundary on the storage controller, for example, when a specific hardware component, such as a disk drive, is failing, and the administrator is


interested in performing an application level risk assessment. Information learned from this type of analysis can lead to actions taken to mitigate risks, such as scheduling application downtime, performing VDisk migrations, and initiating FlashCopy. TPC allows the mapping of the virtualization layer to occur quickly, and using TPC eliminates mistakes that can be made by using a manual approach.

Figure 4-8 shows how a failing disk on a storage controller can be mapped to the MDisk that is being used by an SVC cluster. To display this panel, click Physical Disk → RAID5 Array → Logical Volume → MDisk.

Figure 4-8 Mapping MDisk

Figure 4-9 on page 77 completes the end-to-end view by mapping the MDisk through the SVC to the attached host. Click MDisk → MDGroup → VDisk → host disk.


Figure 4-9 Host mapping

4.12 Using TPC to measure storage controller performance

In this section, we provide a brief introduction to performance monitoring for the SVC back-end disk. When talking about storage controller performance, the back-end I/O rate refers to the rate of I/O between the storage controller cache and the storage arrays. In an SVC environment, back-end I/O is also used to refer to the rate of I/O between the SVC nodes and the controllers. Both rates are considered when monitoring storage controller performance.

The two most important metrics when measuring I/O subsystem performance are response time in milliseconds and throughput in I/Os per second (IOPS):

� Response time in non-SVC environments is measured from when the host issues a command to when the storage controller reports that the command has completed. With the SVC, we not only have to consider response time from the host to the SVC nodes, but also from the SVC nodes to the storage controllers.

� Throughput, however, can be measured at a variety of points along the data path, and the SVC adds additional points where throughput is of interest and measurements can be obtained.

TPC offers many disk performance reporting options that support the SVC environment well and also support the storage controller back end for a variety of storage controller types. The most relevant storage components where performance metrics can be collected when monitoring storage controller performance include:

� Subsystem� Controller� Array� MDisk� MDG� Port

Note: In SVC environments, the SVC nodes interact with the storage controllers in the same way as a host. Therefore, the performance rules and guidelines that we discuss in this section are also applicable to non-SVC environments. References to MDisks are analogous with host-attached LUNs in a non-SVC environment.


4.12.1 Normal operating ranges for various statistics

While the exact figures seen depend on both the type of equipment and the workload, certain assumptions can be made about the normal range of figures that will be achievable. If TPC reports results outside of this range, it is likely to indicate a problem, such as overloading or component failure:

� Throughput for storage volumes can range from 1 IOPS to more than 1 000 IOPS based mostly on the nature of the application. The I/O rates for an MDisk approach 1 000 IOPS when that MDisk is encountering extremely good controller cache behavior; otherwise, such high I/O rates are impossible. If the SVC is issuing large I/Os (for example, on a FlashCopy with large grain size), the IOPS figure will be lower for a given data transfer rate.

� A 10 millisecond response time is generally considered to be getting high; however, it might be perfectly acceptable depending on the application behavior and requirements. For example, many online transaction processing (OLTP) environments require response times in the 5 to 8 millisecond range, while batch applications with large sequential transfers are operating nominally in the 15 - 30 millisecond range.

� Nominal service times for disks today are 5 - 7 milliseconds; however, when a disk is at 50% utilization, ordinary queuing adds a wait time roughly equal to the service time, so a 10 - 14 millisecond response time is a reasonable goal in most environments.

� High controller cache hit ratios allow the back-end arrays to run at a higher utilization. A 70% array utilization produces high array response times; however, when averaged with cache hits, they produce acceptable average response times. High SVC read hit ratios can have the same effect on array utilization in that they will allow higher MDisk utilizations and, therefore, higher array response times.

� Poor cache hit ratios require good back-end response times.

� Front-end response times typically must be in the 5 - 15 millisecond range.

� Back-end response times to arrays can usually operate in the 20 - 25 millisecond range up to 60 milliseconds unless the cache hit ratio is low.

4.12.2 Establish a performance baseline

I/O rate often grows over time, and as I/O rates increase, response times also increase. It is important to establish a good performance baseline so that the growth effects of the I/O workload can be monitored and trends identified that can be used to predict when additional storage performance and capacity will be required.

4.12.3 Performance metric guidelines

Several performance metric guidelines are:

Best Practices that we recommend:

� Derive the best (as a general rule) metrics for any system from current and historical data taken from specific configurations and workloads that are meeting application and user requirements.

� Collect new sets of metrics after configuration changes are made to the storage controller configuration of the MDG configuration, such as adding or removing MDisks.

� Keep a historical record of performance metrics.


� Small block reads (4 KB to 8 KB) must have average response times in the 2 - 15 millisecond range.

� Small block writes must have response times near 1 millisecond, because these small block writes are all cache hits. High response times with small block writes often indicate nonvolatile storage (NVS) full conditions.

� With large block reads and writes (32 KB or greater), response times are insignificant as long as throughput objectives are met.

� Read hit percentage can vary from 0% to near 100%. Anything lower than 50% is considered low; however, many database applications can run under 30%. Cache hit ratios are mostly dependent on application design. Larger cache always helps and allows back-end arrays to be driven at a higher utilization.

� Storage controller back-end read response times must not exceed 25 milliseconds unless the cache read hit ratio is near 99%.

� Storage controller back-end write response times can be high due to the RAID 5 and RAID 10 write penalties; however, they must not exceed 60 milliseconds.

� Array throughput above 700 - 800 IOPS can start impacting front-end performance.

� Port response times must be less than 2 milliseconds for most I/O; however, they can reach as high as 5 milliseconds with large transfer sizes.

Figure 4-10 is a TPC graph showing aggregate throughput for several ESS arrays. In this case, all arrays have throughput lower than 700 IOPS.

Figure 4-10 Overall I/O rate for ESS subsystems


4.12.4 Storage controller back end

The back-end I/O rate is the rate of I/O between storage subsystem cache and the storage arrays. Write activity to back-end disk is from cache and is normally an asynchronous operation to move data from cache to free space in NVS.

One of the more common conditions that can impact overall performance is array overdriving. TPC allows metrics to be collected and graphed for arrays, either individually or as a group. Figure 4-11 is a TPC graph showing response times for all ESS arrays that are being monitored. This graph shows that certain arrays are regularly peaking over 200 ms, which indicates overloading.

Figure 4-11 ESS back-end response times showing overloading

Array response times depend on many factors, including disk RPM and the array configuration. However, in all cases when the number of IOPS is near, or exceeds 1 000 IOPS, the array is extremely busy.

Table 4-3 shows the upper limit for several disk speeds and array widths. Remember that while these I/O rates can be achieved, they imply considerable queuing delays and high response times.

Table 4-3 Maximum IOPS for different DDM speeds

DDM speed Single drive (IOPS) 6+P array (IOPS) 7+P array (IOPS)

10 K 150 - 175 900 - 1050 1050 - 1225

15 K 200 - 225 1200 - 1350 1400 - 1575

7.2 k (near-line) 85 - 110 510 - 660 595 - 770


These numbers can vary significantly depending on cache hit ratios, block size, and service time.

Rule: 1 000 IOPS indicate an extremely busy array and can be impacting front-end response times.



Chapter 5. MDisks

In this chapter, we discuss various MDisk attributes, as well as provide an overview of the process of adding and removing MDisks from existing Managed Disk Groups (MDGs).

In this chapter, we discuss the following topics:

� Back-end queue depth

� MDisk transfer size

� Selecting logical unit number (LUN) attributes for MDisks

� Tiered storage

� Adding MDisks to existing MDGs

� Restriping (balancing) extents across an MDG

� Remapping managed MDisks

� Controlling extent allocation order for VDisk creation

5


5.1 Back-end queue depth

SVC submits I/O to the back-end (MDisk) storage in the same fashion as any direct-attached host. For direct-attached storage, the queue depth is tunable at the host and is often optimized based on specific storage type as well as various other parameters, such as the number of initiators. For the SVC, the queue depth is also tuned; however, the optimal value used is calculated internally.

Note that the exact algorithm used to calculate queue depth is subject to change. Do not rely upon the following details staying the same. However, this summary is true of SVC 4.3.0.

There are two parts to the algorithm: a per MDisk limit and a per controller port limit.

Q = ((P x C) / N) / M

If Q > 60, then Q=60 (maximum queue depth is 60)

If Q < 3, then Q=3 (minimum queue depth is 3)

In this algorithm:

Q = The queue for any MDisk in a specific controller

P = Number of WWPNs visible to SVC in a specific controller

N = Number of nodes in the cluster

M = Number of MDisks provided by the specific controller

C = A constant. C varies by controller type:

– FAStT200, 500, DS4100, and EMC Clarion = 200

– DS4700, DS4800, DS6K, and DS8K = 1000

– Any other controller = 500

When SVC has submitted and has Q I/Os outstanding for a single MDisk (that is, it is waiting for Q I/Os to complete), it will not submit any more I/O until part of the I/O completes. That is, any new I/O requests for that MDisk will be queued inside the SVC, which is undesirable and indicates that back-end storage is overloaded.

The following example shows how a 4-node SVC cluster calculates queue depth for 150 LUNs on a DS8000 storage controller using six target ports:

Q = ((6 ports *1000/port)/4 nodes)/150 MDisks) = 10

With the configuration, each MDisk has a queue depth of 10.

5.2 MDisk transfer size

The size of I/O that the SVC performs to the MDisk depends on where the I/O originated.


5.2.1 Host I/O

The maximum transfer size under normal I/O is 32 KB, because the internal cache track size is 32 KB, and, therefore, destages from cache can be up to the cache track size. Although a track can hold up to 32 KB, a read or write operation can only partially populate the track; therefore, a read or write operation to the MDisks can be anywhere from 512 bytes to 32 KB.

5.2.2 FlashCopy I/O

The transfer size for FlashCopy is always 256 KB, because the grain size of FlashCopy is 256 KB and any size write that changes data within a 256 KB grain will result in a single 256 KB write.

5.2.3 Coalescing writes

The SVC coalesces writes up to the 32 KB track size if writes reside in the same tracks prior to destage, for example, if 4 KB is written into a track, another 4 KB is written to another location in the same track.This track will move to the bottom of the least recently used (LRU) list in the cache upon the second write, and the track will now contain 8 KB of actual data. This system can continue until the track reaches the top of the LRU list and is then destaged; the data is written to the back-end disk and removed from the cache. Any contiguous data within the track will be coalesced for the destage.

Sequential writesThe SVC does not employ a caching algorithm for “explicit sequential detect,” which means coalescing of writes in SVC cache has a random component to it. For example, 4 KB writes to VDisks will translate to a mix of 4 KB, 8 KB, 16 KB, 24 KB, and 32 KB transfers to the MDisks with reducing probability as the transfer size grows.

Although larger transfer sizes tend to be more efficient, this varying transfer size has no effect on the controller’s ability to detect and coalesce sequential content to achieve “full stride writes.”

Sequential readsThe SVC uses “prefetch” logic for staging reads based on statistics maintained on 128 MB regions. If the sequential content is sufficiently high enough within a region, prefetch occurs with 32 KB reads.

5.3 Selecting LUN attributes for MDisks

The selection of LUN attributes requires the following primary considerations:

� Selecting array size� Selecting LUN size� Number of LUNs per array� Number of physical disks per array

Important: We generally recommend that LUNs are created to use the entire capacity of the array as described in 6.2, “Selecting the number of LUNs per array” on page 104.

Chapter 5. MDisks 85

Capacity planning considerationWhen configuring MDisks to MDGs, we advise that you consider leaving a small amount of MDisk capacity that can be used as “swing” (spare) capacity for image mode VDisk migrations. A good general rule is to allow enough space equal to the average capacity of the configured VDisks.

Selecting MDisks for MDGsAll LUNs for MDG creation must have the same performance characteristics. If MDisks of varying performance levels are placed in the same MDG, the performance of the MDG can be reduced to the level of the poorest performing MDisk. Likewise, all LUNs must also possess the same availability characteristics. Remember that the SVC does not provide any Redundant Array of Independent Disks (RAID) capabilities within an MDG. The loss of access to any one of the MDisks within the MDG impacts the entire MDG. However, with the introduction of VDisk Mirroring in SVC 4.3, you can protect against the loss of an MDG by mirroring a VDisk across multiple MDGs. Refer to Chapter 7, “VDisks” on page 119 for more information.

We recommend these best practices for LUN selection within an MDG:

� LUNs are the same type.

� LUNs are the same RAID level.

� LUNs are the same RAID width (number of physical disks in array).

� LUNs have the same availability and fault tolerance characteristics.

MDisks created on LUNs with varying performance and availability characteristics must be placed in separate MDGs.

RAID 5 compared to RAID 10In general, RAID 10 arrays are capable of higher throughput for random write workloads than RAID 5, because RAID 10 only requires two I/Os per logical write compared to four I/Os per logical write for RAID 5. For random reads and sequential workloads, there is typically no benefit. With certain workloads, such as sequential writes, RAID 5 often shows a performance advantage.

Obviously, selecting RAID 10 for its performance advantage comes at an extremely high cost in usable capacity, and, in most cases, RAID 5 is the best overall choice.

When considering RAID 10, we recommend that you use DiskMagic to determine the difference in I/O service times between RAID 5 and RAID 10. If the service times are similar, the lower cost solution makes the most sense. If RAID 10 shows a service time advantage over RAID 5, the importance of that advantage must be weighed against its additional cost.

5.4 Tiered storage

The SVC makes it easy to configure multiple tiers of storage within the same SVC cluster. As we discussed in 5.3, “Selecting LUN attributes for MDisks” on page 85, it is important that MDisks that belong to the same MDG share the same availability and performance characteristics; however, grouping LUNs of like performance and availability within MDGs is an attractive feature of SVC. You can define tiers of storage using storage controllers of varying performance and availability levels. Then, you can easily provision them based on host, application, and user requirements.


Remember that a single tier of storage can be represented by multiple MDGs. For example, if you have a large pool of tier 3 storage that is provided by many low-cost storage controllers, it is sensible to use a number of MDGs. Using a number of MDGs prevents a single offline VDisk from taking all of the tier 3 storage offline.

When multiple storage tiers are defined, you need to take precautions to ensure that storage is provisioned from the appropriate tiers. You can ensure that storage is provisioned from the appropriate tiers through MDG and MDisk naming conventions, along with clearly defined storage requirements for all hosts within the installation.

5.5 Adding MDisks to existing MDGs

In this section, we discuss adding MDisks to existing MDGs.

5.5.1 Adding MDisks for capacity

Before adding MDisks to existing MDGs, ask yourself first why you are doing this. If MDisks are being added to the SVC cluster to provide additional capacity, consider adding them to a new MDG. Recognize that adding new MDisks to existing MDGs will reduce the reliability characteristics of the MDG and risk destabilizing the MDG if hardware problems exist with the new LUNs. If the MDG is already meeting its performance objectives, we recommend that, in most cases, you add the new MDisks to new MDGs rather than add the new MDisks to existing MDGs.

5.5.2 Checking access to new MDisks

You must be careful when adding MDisks to existing MDGs to ensure the availability of the MDG is not compromised by adding a faulty MDisk. Because loss of access to a single MDisk will cause the entire MDG to go offline, we recommend that with SVC versions prior to 4.2.1, read/write access to the MDisk is tested prior to adding the MDisk to an existing online MDG. You can test the read/write (R/W) access to the MDisk by creating a test MDG, adding the new MDG, creating a test VDisk, adding it, and then performing a simple R/W to the VDisk.

SVC 4.2.1 introduced a new feature where MDisks are tested for reliable read/write access before being added to an MDG. This means that manually performing a test in this way is no longer necessary. This testing before an MDisk is admitted to an MDG is automatic and no user action is required. The test will fail if:

� One or more nodes cannot access the MDisk through the chosen controller port.� I/O to the disk does not complete within a reasonable time.� The SCSI inquiry data provided for the disk is incorrect or incomplete.� The SVC cluster suffers a software error during the MDisk test.

Note that image-mode MDisks are not tested before being added to an MDG, because an offline image-mode MDisk will not take the MDG offline.

Note: When multiple tiers are configured, it is a best practice to clearly indicate the storage tier in the naming convention used for the MDGs and MDisks.


5.5.3 Persistent reserve

A common condition where MDisks can be configured by SVC, but cannot perform R/W is in the case where a persistent reserve (PR) has been left on a LUN from a previously attached host. Subsystems that are exposed to this condition were previously attached with IBM Subsystem Device Driver (SDD) or SDDPCM, because support for PR comes from these multipath drivers. You do not see this condition on DS4000 when previously attached using RDAC, because RDAC does not implement PR.

In this condition, you need to rezone LUNs and map them back to the host holding the reserve or to another host that has the capability to remove the reserve through the use of a utility, such as lquerypr (included with SDD and SDDPCM).

An alternative option is to remove the PRs from within the storage subsystem. The ESS provides an option to remove the PRs from within the storage subsystem through the GUI (ESS Specialist); however for the DS6000 and DS8000, removing the PRs from within the storage subsystem can only be done by using the command line and, therefore, requires technical support.

5.5.4 Renaming MDisks

We recommend that you rename MDisks from their SVC-assigned name after you discover them. Using a naming convention for MDisks that associates the MDisk to the controller and array helps during problem isolation and avoids confusion that can lead to an administration error.

Note that when multiple tiers of storage exist on the same SVC cluster, you might also want to indicate the storage tier in the name as well. For example, you can use R5 and R10 to differentiate RAID levels or you can use T1, T2, and so on to indicate defined tiers.

5.6 Restriping (balancing) extents across an MDG

Adding MDisks to existing MDGs can result in reduced performance across the MDG due to the extent imbalance that will occur and the potential to create hot spots within the MDG. After adding MDisks to MDGs, we recommend that extents are rebalanced across all available MDisks by using the command line interface (CLI) by manual command entry. Alternatively, you can automate rebalancing the extents across all available MDisks by using a Perl script, available as part of the SVCTools package from the alphaWorks® Web site.

If you want to manually balance extents, you can use the following CLI commands to identify and correct extent imbalance across MDGs:

� svcinfo lsmdiskextent � svctask migrateexts� svcinfo lsmigrate

The following section describes how to use the script from the SVCTools package to rebalance extents automatically. You can use this script on any host with Perl and an SSH client installed; we show how to install it on a Windows Server 2003 server.

Best practice: Use a naming convention for MDisks that associates the MDisk with its corresponding controller and array within the controller, for example, DS8K_R5_12345.


5.6.1 Installing prerequisites and the SVCTools package

For this test, we installed SVCTools on a Windows Server 2003 server. The major prerequisites are:

� PuTTY: This tool provides SSH access to the SVC cluster. If you are using an SVC Master Console or a System Storage Productivity Center (SSPC) server, it has already been installed. If not, you can download PuTTY from the author’s Web site at:

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

The easiest package to install is the “Windows installer,” which installs all the PuTTY tools in one location.

� Perl: Perl packages for Windows are available from a number of sources. We used ActivePerl, which can be downloaded free-of-charge from:

http://www.activestate.com/Products/activeperl/index.mhtml

The SVCTools package is available at:

http://www.alphaworks.ibm.com/tech/svctools

This package is a compressed file, which can be extracted to wherever is convenient. We extracted it to C:\SVCTools on the Master Console. The key files for the extent balancing script are:

� The SVCToolsSetup.doc file, which explains the installation and use of the script in detail

� The lib\IBM\SVC.pm file, which must be copied to the Perl lib directory. With ActivePerl installed in C:\Perl, copy it to C:\Perl\lib\IBM\SVC.pm.

� The examples\balance\balance.pl file, which is the rebalancing script.

5.6.2 Running the extent balancing script

The MDG on which we tested the script was unbalanced, because we recently expanded it from four MDisks to eight MDisks. Example 5-1 shows that all of the VDisk extents are on the original four MDisks.

Example 5-1 The lsmdiskextent script output showing an unbalanced MDG

IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -filtervalue "mdisk_grp_name=itso_ds4500"

id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 1 itso_ds45_18gb 18.0GB 0000000000000000 itso_ds4500 600a0b80001744310000011a4888478c00000000000000000000000000000000 1 mdisk1 online managed 1 itso_ds45_18gb 18.0GB 0000000000000001 itso_ds4500 600a0b8000174431000001194888477800000000000000000000000000000000 2 mdisk2 online managed 1 itso_ds45_18gb 18.0GB 0000000000000002 itso_ds4500 600a0b8000174431000001184888475800000000000000000000000000000000 3 mdisk3 online managed 1 itso_ds45_18gb 18.0GB 0000000000000003 itso_ds4500 600a0b8000174431000001174888473e00000000000000000000000000000000





http://www.activestate.com/Products/activeperl/index.mhtml



4 mdisk4 online managed 1 itso_ds45_18gb 18.0GB 0000000000000004 itso_ds4500 600a0b8000174431000001164888472600000000000000000000000000000000 5 mdisk5 online managed 1 itso_ds45_18gb 18.0GB 0000000000000005 itso_ds4500 600a0b8000174431000001154888470c00000000000000000000000000000000 6 mdisk6 online managed 1 itso_ds45_18gb 18.0GB 0000000000000006 itso_ds4500 600a0b800017443100000114488846ec00000000000000000000000000000000 7 mdisk7 online managed 1 itso_ds45_18gb 18.0GB 0000000000000007 itso_ds4500 600a0b800017443100000113488846c000000000000000000000000000000000

IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk0id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk1id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk2id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk3id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk4IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk5IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk6IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk7

The balance.pl script was then run on the Master Console using the command:

C:\SVCTools\examples\balance>perl balance.pl itso_ds45_18gb -k "c:\icat.ppk" -i 9.43.86.117 -r -e

In this command:

� itso_ds45_18gb is the MDG to be rebalanced.

� -k "c:\icat.ppk" gives the location of the PuTTY private key file, which is authorized for administrator access to the SVC cluster.

� -i 9.43.86.117 gives the IP address of the cluster.


� -r requires that the optimal solution is found. If this option is not specified, the extents can still be somewhat unevenly spread at completion, but not specifying -r will often require fewer migration commands and less time. If time is important, it might be preferable to not use -r at first, and then rerun the command with -r if the solution is not good enough.

� -e specifies that the script will actually run the extent migration commands. Without this option, it will merely print the commands that it might have run. This option can be used to check that the series of steps is logical before committing to migration.

In this example, with 4 x 8 GB VDisks, the migration completed within around 15 minutes. You can use the command svcinfo lsmigrate to monitor progress; this command shows a percentage for each extent migration command issued by the script.

After the script had completed, we checked that the extents had been correctly rebalanced. Example 5-2 shows that the extents had been correctly rebalanced. In a test run of 40 minutes of I/O (25% random, 70/30 R/W) to the four VDisks, performance for the balanced MDG was around 20% better than for the unbalanced MDG.

Example 5-2 The lsmdiskextent output showing a balanced MDG

IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk0id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk1id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 31 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk2id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk3id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk4id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 33 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk5id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk6


id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk7id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0

Notes on the use of the extent balancing scriptTo use the extent balancing script:

� Migrating extents might have a performance impact, if the SVC or (more likely) the MDisks are already at the limit of their I/O capability. The script minimizes the impact by using the minimum priority level for migrations. Nevertheless, many administrators prefer to run these migrations during periods of low I/O workload, such as overnight.

� There are command line options other than balance.pl that you can use to tune how extent balancing works, for example, excluding certain MDisks or certain VDisks from the rebalancing. Refer to the SVCToolsSetup.doc in svctools.zip for details.

� Because the script is written in Perl, the source code is available for you to modify and extend its capabilities. If you want to modify the source code, make sure that you pay attention to the documentation in Plain Old Documentation (POD) format within the script.

5.7 Removing MDisks from existing MDGs

You might want to remove MDisks from an MDG, for example, when decommissioning a storage controller. When removing MDisks from an MDG, consider whether to manually migrate extents from the MDisks. It is also necessary to make sure that you remove the correct MDisks.

5.7.1 Migrating extents from the MDisk to be deleted

If an MDisk contains VDisk extents, these extents need to be moved to the remaining MDisks in the MDG. Example 5-3 shows how to list the VDisks that have extents on a given MDisk using the CLI.

Example 5-3 Listing which VDisks have extents on an MDisk to be deleted

IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk14id number_of_extents copy_id5 16 03 16 06 16 08 13 19 23 08 25 0


Specify the -force flag on the svctask rmvdisk command, or check the corresponding checkbox in the GUI. Either action causes the SVC to automatically move all used extents on the MDisk to the remaining MDisks in the MDG. In most environments, where the extents were automatically allocated in the first place, moving all used extents on the MDisk in this manner will be fine.

Alternatively, you might want to manually perform the extent migrations. For example, database administrators try to tune performance by arranging high workload VDisks on the outside of physical disks. To preserve this type of an arrangement, the user must migrate all extents off the MDisk before deletion; otherwise, the automatic migration will randomly allocate extents to MDisks (and areas of MDisks). After all extents have been migrated, the VDisk removal can proceed without the -force flag.

5.7.2 Verifying an MDisk’s identity before removal

It is critical that MDisks appear to the SVC cluster as unmanaged prior to removing their controller LUN mapping. Unmapping LUNs from the SVC that are still part of an MDG will result in the MDG going offline and will impact all hosts with mappings to VDisks in that MDG.

If the MDisk has been named using the naming convention described in the previous section, the correct LUNs will be easier to identify. However, we recommend that the identification of LUNs that are being unmapped from the controller match the associated MDisk on the SVC using either the Controller LUN Number field or the unique identifier (UID) field.

The UID is the best identifier to use here, because it is unique across all MDisks on all controllers. The Controller LUN Number is only unique within a given controller and for a certain host. Therefore when using the Controller LUN Number, you must check that you are managing the correct storage controller and check that you are looking at the mappings for the correct SVC host object.

Refer to Chapter 5, “MDisks” on page 83 for correlating ESS, DS6000, and DS8000 volume IDs to Controller LUN Number.

Figure 5-1 on page 94 shows an example of the Controller LUN Number and UID fields from the SVC MDisk details.


Figure 5-1 Controller LUN Number and UID fields from the SVC MDisk details panel

Figure 5-2 on page 95 shows an example of the Logical Drive Properties for the DS4000. Note that the DS4000 refers to UID as the Logical Drive ID.


Figure 5-2 Logical Drive properties for DS4000, including the LUN UID

5.8 Remapping managed MDisks

You generally do not unmap managed MDisks from the SVC, because it causes the MDG to go offline. However, if managed MDisks have been unmapped from the SVC for a specific reason, it is important to know that the LUN must present the same UID to the SVC after it has been mapped back.

If the LUN is mapped back with a different UID, the SVC will recognize this MDisk as a new MDisk, and the associated MDG will not come back online. Consider this situation for storage controllers that support LUN selection, because selecting a different LUN ID will change the UID. If the LUN has been mapped back with a different LUN ID, it must be remapped again using the previous LUN ID.

Another instance where the UID can change on a LUN is in the case where DS4000 support has regenerated the metadata for the logical drive definitions as part of a recovery procedure.

Note: The SVC identifies MDisks based on the UID of the LUN.


When logical drive definitions are regenerated, the LUN will appear as a new LUN just as it does when it is created for the first time (the only exception is that the user data will still be present).

In this case, restoring the UID on a LUN back to its prior value can only be done with the assistance of DS4000 support. Both the previous UID and the subsystem identifier (SSID) will be required, both of which can be obtained from the controller profile. To view the logical drive properties, click Logical/Physical View → LUN → Open Properties.

Refer to Figure 5-2 on page 95 for an example of the Logical Drive Properties panel for a DS4000 logical drive. This panel shows Logical Drive ID (UID) and SSID.

5.9 Controlling extent allocation order for VDisk creation

When creating striped mode VDisks, it is sometimes desirable to control the order in which extents are allocated across the MDisks in the MDG for the purpose of balancing workload across controller resources. For example, you can alternate extent allocation across “DA pairs” and even and odd “extent pools” in the DS8000.

The following example using DS8000 LUNs illustrates how the extent allocation order can be changed to provide a better balance across controller resources.

Table 5-1 shows the initial discovery order of six MDisks. Note that adding these MDisks to an MDG in this order results in three contiguous extent allocations alternating between the even and odd extent pools, as opposed to alternating between extent pools for each extent.

Table 5-1 Initial discovery order

To change extent allocation so that each extent alternates between even and odd extent pools, the MDisks can be renamed after being discovered and then added to the MDG in their new order.

Table 5-2 on page 97 shows how the MDisks have been renamed so that when they are added to the MDG in their new order, the extent allocation will alternate between even and odd extent pools.

Note: When VDisks are created, the extents are allocated across MDisks in the MDG in a round-robin fashion in the order in which the MDisks were initially added to the MDG.

LUN ID MDisk ID MDisk name Controller resourceDA pair/extent pool

1000 1 mdisk01 DA2/P0

1001 2 mdisk02 DA6/P16

1002 3 mdisk03 DA7/P30

1100 4 mdisk04 DA0/P9

1101 5 mdisk05 DA4/P23

1102 6 mdisk06 DA5/P39


Table 5-2 MDisks renamed

There are two options available for VDisk creation. We describe both options along with the differences between the two options:

� Option A: Explicitly select the candidate MDisks within the MDG that will be used (through the command line interface (CLI) or GUI). Note that when explicitly selecting the MDisk list, the extent allocation will round-robin across MDisks in the order that they are represented on the list starting with the first MDisk on the list:

– Example A1: Creating a VDisk with MDisks from the explicit candidate list order: md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then begin at “md001” and alternate round-robin around the explicit MDisk candidate list. In this case, the VDisk is distributed in the following order: md001, md002, md003, md004, md005, and md006.

– Example A2: Creating a VDisk with MDisks from the explicit candidate list order: md003, md001, md002, md005, md006, and md004. The VDisk extent allocations then begin at “md003” and alternate round-robin around the explicit MDisk candidate list. In this case, the VDisk is distributed in the following order: md003, md001, md002, md005, md006, and md004.

� Option B: Do not explicitly select the candidate MDisks within an MDG that will be used (through the command line interface (CLI) or GUI). Note that when the MDisk list is not explicitly defined, the extents will be allocated across MDisks in the order that they were added to the MDG, and the MDisk that will receive the first extent will be randomly selected.

Example B1: Creating a VDisk with MDisks from the candidate list order (based on this definitive list from the order that the MDisks were added to the MDG: md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then begin at a random MDisk starting point (let us assume “md003” is randomly selected) and alternate round-robin around the explicit MDisk candidate list based on the order that they were added to the MDG originally. In this case, the VDisk is allocated in the following order: md003, md004, md005, md006, md001, and md002.

Summary:

� Independent of the order in which a storage subsystem’s LUNs (volumes) are discovered by SVC, recognize that by renaming MDisks and changing the order that they are added to the MDG will influence how the VDisk’s extents are allocated.

� Renaming MDisks into a particular order and then adding them to the MDG in that order will allow the starting MDisk to be randomly selected for each VDisk created and, therefore, is the optimal method for balancing VDisk extent allocation across storage subsystem resources.

LUN ID MDisk ID MDisk nameoriginal/new

Controller resourceDA pair/extent pool

1000 1 mdisk01/md001 DA2/P0

1100 4 mdisk04/md002 DA0/P9

1001 2 mdisk02/md003 DA6/P16

1101 5 mdisk05/md004 DA4/P23

1002 3 mdisk03/md005 DA7/P30

1102 6 mdisk06/md006 DA5/P39


� When MDisks are added to an MDG based on the order in which the MDisks were discovered, the allocation order can be explicitly specified; however, the MDisk used for the first extent will always be the first MDisk specified on the list.

� When creating VDisks from the GUI:

– Recognize that you are not required to select the MDisks from the Managed Disk Candidates list and click Add, but rather you have the option to just enter a capacity value into the “Type the size of the virtual disks” field and select whether you require formatting the VDisk. With this approach, Option B is the applied methodology for how the VDisk’s extents will be allocated within an MDG.

– When a set or a subset of MDisks is selected and added (by clicking Add) to the “Managed Disks Striped in this Order” column, Option A is the applied methodology for how the VDisk’s extents are explicitly distributed across the selected MDisks.

Figure 5-3 shows the MDisk selection panel for creating VDisks.

Figure 5-3 MDisk selection for a striped-mode VDisk

5.10 Moving an MDisk between SVC clusters

It can sometimes be desirable to move an MDisk to a separate SVC cluster. Before beginning this task, consider the alternatives, which include:

� Using Metro Mirror or Global Mirror to copy the data to a remote cluster. One instance in which this might not be possible is where the SVC cluster is already in a mirroring partnership with another SVC cluster, and data needs to be migrated to a third cluster.

� Attaching a host server to two SVC clusters and using host-based mirroring to copy the data.


� Using storage controller-based copy services. If you use storage controller-based copy services, make sure that the VDisks containing the data are image-mode and cache-disabled.

If none of these options are appropriate, follow these steps to move an MDisk to another cluster:

1. Ensure that the MDisk is in image mode rather than striped or sequential. If the MDisk is in image mode, the MDisk contains only the raw client data and not any SVC metadata. If you want to move data from a non-image mode VDisk, first use the svctask migratetoimage command to migrate to a single image-mode MDisk. For a Space-Efficient VDisk (SEV), image mode means that all metadata for the VDisk is present on the same MDisk as the client data, which will not be readable by a host, but it will be able to be imported by another SVC cluster.

2. Remove the image-mode VDisk from the first cluster using the svctask rmvdisk command.

3. Check by using svcinfo lsvdisk that the VDisk is no longer displayed. You must wait until it is removed to allow cached data to destage to disk.

4. Change the back-end storage LUN mappings to prevent the source SVC cluster from seeing the disk, and then make it available to the target cluster.

5. Perform an svctask detectmdisk command on the target cluster.

6. Import the MDisk to the target cluster. If it is not an SEV, you will use the svctask mkvdisk command with the -image option. If it is an SEV, you will also need to use two other options:

– -import instructs the SVC to look for SEV metadata on the specified MDisk.

– -rsize indicates that the disk is Space-Efficient. The value given to -rsize must be at least the amount of space that the source cluster used on the Space-Efficient VDisk. If it is smaller, a 1862 error will be logged. In this case, delete the VDisk and try the mkvdisk command again.

7. The VDisk is now online. If it is not, and the VDisk is Space-Efficient, check the SVC error log for an 1862 error; if an 1862 error is present, it will indicate why the VDisk import failed (for example, metadata corruption). You might then be able to use the svctask repairsevdisk command to correct the problem.

Note: You must not use the -force option of the rmvdisk command. If you use the -force option, data in cache will not be written to the disk, which might result in metadata corruption for an SEV.



Chapter 6. Managed disk groups

In this chapter, we describe aspects to consider when planning Managed Disk Groups (MDGs) for an IBM System Storage SAN Volume Controller (SVC) implementation. We discuss the following areas:

� Availability considerations for MDGs

� Selecting the number of volumes, or Logical Unit Numbers (LUNs), per storage subsystem array

� Selecting the number of storage subsystem arrays per MDG

� Striping compared to sequential mode VDisks

� Selecting storage subsystems

6


6.1 Availability considerations for MDGs

While the SVC itself provides many advantages through the consolidation of storage, it is important to understand the availability implications that storage subsystem failures can have on availability domains within the SVC cluster.

In this section, we point out that while the SVC offers significant performance benefits through its ability to stripe across back-end storage volumes, it is also worthwhile considering the effects that various configurations will have on availability.

When selecting Managed Disks (MDisks) for an MDG, performance is often the primary consideration. While performance is nearly always important, there are many instances where the availability of the configuration is traded for little or no performance gain. A performance-optimized configuration consists of including MDisks from multiple arrays (and possibly multiple storage subsystems/controllers) within the same MDG. In order to include MDisks from multiple arrays within the same MDG with large array sizes, this effort typically involves configuring arrays into multiple LUNs and assigning them as MDisks to multiple MDGs. These types of configurations have an availability cost associated with them and might not yield the performance benefits that you intend.

Well-designed MDGs balance the required performance and storage management objectives against availability, and therefore, all three objectives must be considered during the planning phase.

Remembering that the SVC must take the whole MDG offline if a single MDisk in that MDG goes offline, the number of storage subsystem arrays per MDG has an impact on availability, for example, if you have 40 arrays of 1 TB each for a total capacity of 40 TB. With all 40 arrays placed in the same MDG, we have put the entire 40 TB of capacity at risk if one of the 40 arrays failed, therefore, causing an MDisk to go offline. If we then spread the 40 arrays out over a larger number of MDGs, the effect of an array failure affects less storage capacity, thus, limiting the failure domain, if MDisks from a given array are all assigned to the same MDG. If MDisks from a given array are not all assigned to the same MDG, a single array failure impacts all MDGs in which it resides, and therefore, the failure domain expands to multiple MDGs.

The following best practices focus on availability and not on performance, so there are valid reasons why these best practices do not fit in all cases. As is always the case, consider performance in terms of specific application workload characteristics and requirements.

Note: Configurations designed with performance in mind tend to also offer the most in terms of ease of use from a storage management perspective, because they encourage greater numbers of resources within the same MDG.

Note: Increasing the performance “potential” of an MDG does not necessarily equate to a gain in application performance.


In the following sections, we examine the effects of these best practices on performance.

6.1.1 Performance considerations

Most applications meet performance objectives when average response times for random I/O are in the 2 - 15 millisecond range; however, there are response-time sensitive applications (typically transaction-oriented) that cannot tolerate maximum response times of more than a few milliseconds. You must consider availability in the design of these applications; however, be careful to ensure that sufficient back-end storage subsystem capacity is available to prevent elevated maximum response times.

Considering application boundaries and dependenciesReducing hardware failure boundaries for back-end storage is only part of what you must consider. When determining MDG layout, you also need to consider application boundaries and dependencies in order to identify any availability benefits that one configuration might have over another configuration.

Recognize that reducing hardware failure boundaries is not always advantageous from an application perspective. For instance, when an application uses multiple VDisks from an MDG, there is no advantage to splitting those VDisks between multiple MDGs, because the loss of either of the MDGs results in an application outage. However, if an SVC cluster is serving storage for multiple applications, there might be an advantage to having several applications continue uninterrupted while an application outage has occurred on other applications. It is the later scenario that places the most emphasis on availability when planning the MDG layout.

6.1.2 Selecting the MDisk Group

You can use the SVC to create tiers of storage in which each tier has different performance characteristics by only including MDisks that have the same performance characteristics within an MDG. So, if you have a storage infrastructure with, for example, three classes of storage, you create each VDisk from the MDG, which has the class of storage that most closely matches the VDisk’s expected performance characteristics.

Because migrating between storage pools, or rather MDGs, is non-disruptive to the users, it is an easy task to migrate a VDisk to another storage pool, if the actual performance is different than expected.

Best practices for availability:

� Each storage subsystem must be used with only a single SVC cluster.

� Each array must be included in only one MDG.

� Each MDG must only contain MDisks from a single storage subsystem.

� Each MDG must contain MDisks from no more than approximately 10 storage subsystem arrays.

Note: We recommend that you use the Disk Magic™ application to size the performance demand for specific workloads. You can obtain a copy of Disk Magic, which can assist you with this effort, from:

http://www.intellimagic.net

Chapter 6. Managed disk groups 103

http://www.intellimagic.net

Batch and OLTP workloadsClients often want to know whether to mix their batch and online transaction processing (OLTP) workloads in the same MDG. Batch and OLTP workloads might both require the same tier of storage, but in many SVC installations, there are multiple MDGs in the same storage tier so that the workloads can be separated.

We usually recommend mixing workloads so that the maximum resources are available to any workload when needed. However, batch workloads are a good example of the opposing point of view. There is a fundamental problem with letting batch and online work share resources: the amount of I/O resources that a batch job can consume is often limited only by the amount of I/O resources available.

To address this problem, it obviously can help to segregate the batch workload to its own MDG, but segregating the batch workload to its own MDG does not necessarily prevent node or path resources from being overrun. Those resources might also need to be considered if you implement a policy of batch isolation.

For SVC, an interesting alternative is to cap the data rate at which batch volumes are allowed to run by limiting the maximum throughput of a VDisk; refer to 7.3.6, “Governing of VDisks” on page 130. Capping the data rate at which batch volumes are allowed to run can potentially let online work benefit from periods when the batch load is light while limiting the damage when the batch load is heavy.

A lot depends on the timing of when the workloads will run. If you have mainly OLTP during the day shift and the batch workloads run at night, there is normally no problem with mixing the workloads in the same MDG. But you run the two workloads concurrently, and if the batch workload runs with no cap or throttling and requires high levels of I/O throughput, we recommend that wherever possible, the workloads are segregated onto different MDGs that are supported by different back-end storage resources.

6.2 Selecting the number of LUNs per array

We generally recommend that you configure LUNs to use the entire array, which is especially true for midrange storage subsystems where multiple LUNs configured to an array have shown to result in a significant performance degradation. The performance degradation is attributed mainly to smaller cache sizes and the inefficient use of available cache, defeating the subsystem’s ability to perform “full stride writes” for Redundant Array of Independent Disks 5 (RAID 5) arrays. Additionally, I/O queues for multiple LUNs directed at the same array can have a tendency to overdrive the array.

Higher end storage controllers, such as the IBM System Storage DS8000 series, make this much less of an issue through the use of large cache sizes. However, large array sizes might require that multiple LUNs are created due to LUN size limitations. Later, we examine the performance implications of having more than one LUN per array on an DS8000 storage subsystem. However, on higher end storage controllers, most workloads show the difference between a single LUN per array compared to multiple LUNs per array to be negligible.

Consider the manageability aspects of creating multiple LUNs per array configurations. Be careful in regard to the placement of these LUNs so that you do not create conditions where

Note: If there is uncertainty about in which storage pool (MDG) to create a VDisk, initially use the pool with the lowest performance and then move the VDisk up to a higher performing pool later if required.


over-driving an array can occur. Additionally, placing these LUNs in multiple MDGs expands failure domains considerably as we discussed in 6.1, “Availability considerations for MDGs” on page 102.

Table 6-1 provides our recommended guidelines for array provisioning on IBM storage subsystems.

Table 6-1 Array provisioning

6.2.1 Performance comparison of one LUN compared to two LUNs per array

The following example shows a comparison between one LUN per array as opposed to two LUNs using DS8000 arrays. Because any performance benefit to be gained relies on having both LUNs within an array to be evenly loaded, this comparison was performed by placing both LUNs for each array within the same MDG. Testing was performed on two MDGs with eight MDisks per MDG. Table 6-2 shows the MDG layout for Config1 with two LUNs per array and Table 6-3 on page 106 shows the MDG layout for Config2 with a single LUN per array.

Table 6-2 Two LUNs per array

Controller type LUNs per array

IBM System Storage DS4000 1

IBM System Storage DS6000 1

IBM System Storage DS8000 1 - 2

IBM Enterprise Storage Server (ESS) 1 - 2

DS8000 array LUN1 LUN2

Array1 MDG1 MDG1

Array2 MDG1 MDG1

Array3 MDG1 MDG1

Array4 MDG1 MDG1

Array5 MDG2 MDG2

Array6 MDG2 MDG2

Array7 MDG2 MDG2

Array8 MDG2 MDG2


Table 6-3 One LUN per array

We performed testing using a four node SVC cluster with two I/O Groups and eight VDisks per MDG.

The following workloads were used in the testing:

� Ran-R/W-50/50-0%CH

� Seq-R/W-50/50-25%CH

� Seq-R/W-50/50-0%CH

� Ran-R/W-70/30-25%CH

� Ran-R/W-50/50-25%CH

� Ran-R/W-70/30-0%CH

� Seq-R/W-70/30-25%CH

� Seq-R/W-70/30-0%CH

We collected the following performance metrics for a single MDG using IBM TotalStorage Productivity Center (TPC). Figure 6-1 on page 107 and Figure 6-2 on page 108 show the I/Os per second (IOPS) and response time comparisons between Config1 (two LUNs per array) and Config2 (one LUN per array).

DS8000 array LUN1

Array1 MDG1

Array2 MDG1

Array3 MDG1

Array4 MDG1

Array5 MDG2

Array6 MDG2

Array7 MDG2

Array8 MDG2

Note: Ran=Random, Seq=Sequential, R/W= Read/Write, and CH=Cache Hit (25%CH means that 25% of all I/Os are read cache hits)


Figure 6-1 IOPS comparison between two LUNs per array and one LUN per array


Figure 6-2 Response time comparison between two LUNs per array and one LUN per array

The test shows a small response time advantage to the two LUNs per array configuration and a small IOPS advantage to the one LUN per array configuration for sequential workloads. Overall, the performance differences between these configurations are minimal.

6.3 Selecting the number of arrays per MDG

The capability to stripe across disk arrays is the single most important performance advantage of the SVC; however, striping across more arrays is not necessarily better. The objective here is to only add as many arrays to a single MDG as required to meet the performance objectives. Because it is usually difficult to determine what is required in terms of performance, the tendency is to add far too many arrays to a single MDG, which again increases the failure domain as we discussed previously in 6.1, “Availability considerations for MDGs” on page 102.

It is also worthwhile to consider the effect of aggregate load across multiple MDGs. It is clear that striping workload across multiple arrays has a positive effect on performance when you are talking about dedicated resources, but the performance gains diminish as the aggregate load increases across all available arrays. For example, if you have a total of eight arrays and are striping across all eight arrays, your performance is much better than if you were striping across only four arrays. However, if the eight arrays are divided into two LUNs each and are also included in another MDG, the performance advantage drops as the load of MDG2 approaches that of MDG1, which means that when workload is spread evenly across all MDGs, there will be no difference in performance.

More arrays in the MDG have more of an effect with lower performing storage controllers. So, for example, we require fewer arrays from a DS8000 than we do from a DS4000 to achieve the same performance objectives. Table 6-4 on page 109 shows the recommended number of


arrays per MDG that is appropriate for general cases. Again, when it comes to performance, there can always be exceptions.

Table 6-4 Recommended number of arrays per MDG

DA pair considerations for selecting ESS and DS8000 arraysThe ESS and DS8000 storage architectures both access disks through pairs of device adapters (DA pairs) with one adapter in each storage subsystem controller. The ESS contains four DA pairs and the DS8000 scales from two to eight DA pairs. When possible, consider adding arrays to MDGs based on multiples of the installed DA pairs. For example, if the storage controller contains six DA pairs, use either six or 12 arrays in an MDG with arrays from all DA pairs in a given MDG.

Performance comparison of reducing the number of arrays per MDGThe following test compares the performance between eight arrays per MDG and four arrays per MDG. The configuration with eight arrays per MDG represents a performance-optimized configuration, and the four arrays per MDG configuration represents a configuration that has better availability characteristics.

We performed testing on the following configuration:

� There are eight ranks from a DS8000.

� Each rank is configured as one RAID 5 array.

� Each RAID 5 array is divided into four LUNs.

� Four MDGs are configured.

� Each MDG uses one LUN (MDisk) from each of the eight arrays.

� The VDisks are created in sequential mode.

The array to MDisk mapping for this configuration is represented in Table 6-5.

Table 6-5 Configuration one: Each array is contained in four MDGs

You can see from this design that if a single array fails, all four MDGs are affected, and all SVC VDisks that are using storage from this DS8000 fail.

Controller type Arrays per MDG

DS4000 4 - 24

ESS/DS8000 4 - 12

DS8000 array LUN1 LUN2 LUN3 LUN4

Array1 MDG1 MDG2 MDG3 MDG4









Table 6-6 shows an alternative to this configuration. Here, the arrays are divided into two LUNs each, and there are half the number of arrays for each MDG as there were in the first configuration. In this design, the failure boundary of an array failure is cut in half, because any single array failure only affects half of the MDGs.

Table 6-6 Configuration two: Each array is contained in two MDGs

We collected the following performance metrics using TPC to compare these configurations.

The first test was performed with all four MDGs evenly loaded. Figure 6-3 on page 111 and Figure 6-4 on page 112 show the IOPS and response time comparisons between Config1 (four LUNs per array) and Config2 (two LUNs per array) for varying workloads.

DS8000 array LUN1 LUN2

Array1 MDG1 MDG3

Array2 MDG1 MDG3

Array3 MDG1 MDG3

Array4 MDG1 MDG3

Array5 MDG2 MDG4

Array6 MDG2 MDG4

Array7 MDG2 MDG4

Array8 MDG2 MDG4


Figure 6-3 IOPS comparison of eight arrays/MDG and four arrays/MDG with all four MDGs active


Figure 6-4 Response time comparison between eight and four arrays/MDG with all four MDGs active

This test shows virtually no difference between using eight arrays per MDG compared to using four arrays per MDG, when all MDGs are evenly loaded (with the exception of a small advantage in IOPS for the eight array MDG for sequential workloads).

We performed two additional tests to show the potential effect when MDGs are not loaded evenly. We performed the first test using only one of the four MDGs, while the other three MDGs remained idle. This test presents the worst case scenario, because the eight array MDG has the fully dedicated bandwidth of all eight arrays available to it, and therefore, halving the number of arrays has a pronounced effect. This test tends to be an unrealistic scenario, because it is unlikely that all host workload will be directed at a single MDG.

Figure 6-5 on page 113 shows the IOPS comparison between these configurations.


Figure 6-5 IOPS comparison between eight and four arrays/MDG with a single MDG active

We performed the second test with I/O running to only two of the four MDGs, which is shown in Figure 6-6 on page 114.


Figure 6-6 IOPS comparison between eight arrays/MDG and four arrays/MDG with two MDGs active

Figure 6-6 shows the results from the test where only two of the four MDGs are loaded. This test shows no difference between the eight arrays per MDG configuration and the four arrays per MDG configuration for random workload. This test shows a small advantage to the eight arrays per MDG configuration for sequential workloads.

Our conclusions are:

� The performance advantage with striping across a larger number of arrays is not as pronounced as you might expect.

� You must consider the number of MDisks per array along with the number of arrays per MDG to understand aggregate MDG loading effects.

� You can achieve availability improvements without compromising performance objectives.

6.4 Striping compared to sequential type

With extremely few exceptions, you must always configure VDisks using striping.

However, one exception to this rule is an environment where you have a 100% sequential workload where disk loading across all VDisks is guaranteed to be balanced by the nature of the application. For example, specialized video streaming applications are exceptions to this rule. Another exception to this rule is an environment where there is a high dependency on a large number of flash copies. In this case, FlashCopy loads the VDisks evenly and the sequential I/O, which is generated by the flash copies, has higher throughput potential than


what is possible with striping. This situation is a rare exception given the unlikely requirement to optimize for FlashCopy as opposed to online workload.

6.5 SVC cache partitioning

In a situation where more I/O is driven to an SVC node than can be sustained by the back-end storage, the SVC cache can become exhausted. This situation can happen even if only one storage controller is struggling to cope with the I/O load, but it impacts traffic to others as well. To avoid this situation, SVC cache partitioning provides a mechanism to protect the SVC cache from not only overloaded controllers, but also misbehaving controllers.

The SVC cache partitioning function is implemented on a per Managed Disk Group (MDG) basis. That is, the cache automatically partitions the available resources on a per MDG basis.

The overall strategy is to protect the individual controller from overloading or faults. If many controllers (or in this case, MDGs) are overloaded, the overall cache can still suffer.

Table 7 shows the upper limit of write cache data that any one partition, or MDG, can occupy.

Table 7 Upper limit of write cache data

The effect of the SVC cache partitioning is that no single MDG occupies more than its upper limit of cache capacity with write data. Upper limits are the point at which the SVC cache starts to limit incoming I/O rates for VDisks created from the MDG.

If a particular MDG reaches the upper limit, it will experience the same result as a global cache resource that is full. That is, the host writes are serviced on a one-out one-in basis - as the cache destages writes to the back-end storage. However, only writes targeted at the full MDG are limited, all I/O destined for other (non-limited) MDGs continues normally.

Read I/O requests for the limited MDG also continue normally. However, because the SVC is destaging write data at a rate that is obviously greater than the controller can actually sustain (otherwise, the partition does not reach the upper limit), reads are serviced equally as slowly.

The main thing to remember is that the partitioning is only limited on write I/Os. In general, a 70/30 or 50/50 ratio of read to write operations is observed. Of course, there are applications, or workloads, that perform 100% writes; however, write cache hits are much less of a benefit than read cache hits. A write always hits the cache. If modified data already resides in the cache, it is overwritten, which might save a single destage operation. However, read cache hits provide a much more noticeable benefit, saving seek and latency time at the disk layer.

Note: Electing to use sequential type over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting system performance.

Number of MDGs Upper limit

1 100%

2 66%

3 40%

4 30%

5 or more 25%


In all benchmarking tests performed, even with single active MDGs, good path SVC I/O group throughput remains the same as it was before the introduction of SVC cache partitioning.

For in-depth information about SVC cache partitioning, we recommend the following IBM Redpaper publication:

� IBM SAN Volume Controller 4.2.1 Cache Partitioning, REDP-4426-00

6.6 SVC quorum disk considerations

When back-end storage is initially added to an SVC cluster as an MDG, three quorum disks are automatically created by allocating space from the assigned MDisks. As more back-end storage controllers (and therefore MDGs) are added to the SVC cluster, the quorum disks do not get reallocated to span multiple back-end storage subsystems. To eliminate a situation where all quorum disks go offline due to a back-end storage subsystem failure, we recommend allocating quorum disks on multiple back-end storage subsystems. This design is of course only possible when multiple back-end storage subsystems (and therefore multiple MDGs) are available.

Even when there is only a single storage subsystem, but multiple MDGs created from this, the quorum disk must be allocated from several MDGs to avoid an array failure causing the loss of the quorum. Reallocating quorum disks can be done from either the SVC Console or from the SVC command line interface (CLI). The SVC CLI command to use is:

svctask setquorum -quorum <quorum id> <mdisk_id>

In this command:

� The <quorum id> represents the quorum disk number and can have a value of 0, 1, or 2.

� The <mdisk_id> is the MDisk from where the quorum disk must now allocate space. The specified MDisk must be assigned to the desired MDG, and free space (256 MB or one extent, whichever is larger) must be available in the MDG.

To check if a specific MDisk is used as a quorum disk, the following SVC CLI command can be used:

svcinfo lsmdisk <mdisk_id>

If this command shows a non-blank quorum-index value, the MDisk is used as a quorum disk.

6.7 Selecting storage subsystems

When selecting storage subsystems, the decision generally comes down to the ability of the storage subsystem to meet the availability objectives of the applications. Because the SVC does not provide any data redundancy, the availability characteristics of the storage subsystems’ controllers have the most impact on the overall availability of the data virtualized by the SVC.

Performance becomes less of a determining factor due to the SVC’s ability to use various storage subsystems, regardless of whether they scale up or scale out. For example, the DS8000 is a scale-up architecture that delivers “best of breed” performance per unit, and the DS4000 can be scaled out with enough units to deliver the same performance. Because the SVC hides the scaling characteristics of the storage subsystems, the inherent performance characteristics of the storage subsystems tend not to be a direct determining factor.


A significant consideration when comparing native performance characteristics between storage subsystem types is the amount of scaling that is required to meet the performance objectives. While lower performing subsystems can typically be scaled to meet performance objectives, the additional hardware that is required lowers the availability characteristics of the SVC cluster. Remember that all storage subsystems possess an inherent failure rate, and therefore, the failure rate of an MDG becomes the failure rate of the storage subsystem times the number of units.

Of course, there might be other factors that lead you to select one storage subsystem over another storage subsystem, such as utilizing available resources or a requirement for additional features and functions, such as the System z® attach capability.



Chapter 7. VDisks

In this chapter, we show the new features of SVC Version 4.3.0 and discuss Virtual Disks (VDisks). We describe creating them, managing them, and migrating them across I/O Groups.

We then discuss VDisk performance and how you can use TotalStorage Productivity Center (TPC) to analyze performance and to help guide you to possible solutions.

7


7.1 New features in SVC Version 4.3.0

In this section, we highlight the following new VDisk features and details for performance enhancement:

� Space-Efficient VDisks� VDisk mirroring

7.1.1 Real and virtual capacities

One feature of SVC Version 4.3.0 is the Space-Efficient VDisk (SE VDisk). You can configure a VDisk to either be “Space-Efficient” or “Fully Allocated.” The SE VDisks are created with different capacities: real and virtual capacities. You can still create VDisks in striped, sequential, or image mode virtualization policy, just as you can any other VDisk.

The real capacity defines how much disk space is actually allocated to a VDisk. The virtual capacity is the capacity of the VDisk that is reported to other SVC components (for example, FlashCopy or Remote Copy) and to the hosts.

A directory maps the virtual address space to the real address space. The directory and the user data share the real capacity.

There are two operating modes for SE VDisks. An SE VDisk can be configured to be Auto-Expand or not. If you select the Auto-Expand operating mode, the SVC automatically expands the real capacity of the SE VDisk. The mode of the respective SE VDisk can be switched at any time.

7.1.2 Space allocation

As mentioned, when a SE VDisk is initially created, a small amount of the real capacity is used for initial metadata. Write I/Os to the grains of the SE VDisk that have not previously been written to cause grains of the real capacity to be used to store metadata and user data. Write I/Os to grains that have previously been written to update the grain where data was previously written.

Smaller granularities can save more space, but they have larger directories. When you use SE with FlashCopy (FC), specify the same grain size for both SE and FC. For more details about SEFC, refer to 8.1.6, “Space-Efficient FlashCopy (SEFC)” on page 159.

7.1.3 Space-Efficient VDisk performance

SE VDisks require more I/Os because of the directory accesses:

� For truly random workloads, an SE VDisk requires approximately one directory I/O for every user I/O, so performance will be 50% of a normal VDisk

� The directory is 2-way write-back cached (just like the SVC fastwrite cache), so certain applications perform better.

� SE VDisks require more CPU processing, so the performance per I/O group will be lower.

Note: The grain is defined when the VDisk is created and can be 32 KB, 64 KB, 128 KB, or 256 KB.


You need to use the striping policy in order to spread SE VDisks across many MDisks.

SE VDisks only save capacity if the host server does not write to the whole VDisk. Whether the Space-Efficient VDisk works well is partly dependent on how the filesystem allocated the space:

� Certain filesystems (for example, NTFS (NT File System)) will write to the whole VDisk before overwriting deleted files, while other filesystems will reuse space in preference to allocating new space.

� Filesystem problems can be moderated by tools, such as “defrag” or by managing storage using host Logical Volume Managers (LVMs).

The SE VDisk is also dependent on how applications use the filesystem, for example, certain applications only delete log files when the filesystem is nearly full.

7.1.4 Testing an application with Space-Efficient VDisk

To help you understand what works in combination with SE VDisks, perform this test:

1. Create an SE VDisk with Auto-Expand turned off.

2. Test the application.

3. If the application and SE do not work well, the VDisk will fill up and in the worst case, it will go offline.

4. If the application and SE do work well, the VDisk will not fill up and will remain online.

5. You can configure warnings and also monitor how much capacity is being used.

6. If necessary, the user can expand or shrink the real capacity of the VDisk.

7. When you have determined if the combination of the application and SE works well, you can enable Auto-Expand.

7.1.5 What is VDisk mirroring

With the VDisk mirroring feature, we now can create a VDisk with one or two copies. These copies can be in the same or in different MDisk Groups (with different extent sizes of the MDisk Group). The first MDisk Group that is specified contains the “primary” copy.

If a VDisk is created with two copies, both copies use the same virtualization policy, just as any other VDisk. But there is also a way to have two copies of a VDisk with different virtualization policies. In combination with space efficiency, each mirror of a VDisk can be Space-Efficient or fully allocated and in striped, sequential, or image mode.

A mirrored VDisk has all of the capabilities of a VDisk and also the same restrictions as a VDisk (for example, a mirrored VDisk is owned by an I/O Group, just as any other VDisk).

This feature also provides a point-in-time copy functionality that is achieved by “splitting” a copy from the VDisk.

Important: Do not use SE VDisks where high I/O performance is required.

Note: There is no recommendation for SEV and best performance or practice. As already explained, it depends on what is used in the particular environment. For the absolute best performance, use fully allocated VDisks instead of an SE VDisk.

Chapter 7. VDisks 121

7.1.6 Creating or adding a mirrored VDisk

When a mirrored VDisk is created and the format has been specified, all copies are formatted before the VDisk comes online. The copies are then considered synchronized.

Alternatively, with the “no synchronization” option chosen, the mirrored VDisks are not synchronized.

This might be helpful in these cases:

� If it is known that the already formatted MDisk space will be used for mirrored VDisks.

� If it is just not required, that the copies are synchronized.

7.1.7 Availability of mirrored VDisks

VDisk mirroring provides a low level Redundant Array of Independent Disks 1 (RAID 1) mirroring to protect against controller and MDisk Group failure, because it allows you to create a VDisk with two copies, which are in different MDisk Groups. If one storage controller or MDisk Group failed, a VDisk copy is not affected if it has been placed on a different storage controller or in a different MDisk Group.

For FlashCopy usage, a mirrored VDisk is only online to other nodes if it is online in its own I/O Group and if the other nodes have visibility to the same copies as the nodes in the I/O Group. If a mirrored VDisk is a source VDisk in a FlashCopy relationship, asymmetric path failures or a failure of the mirrored VDisk’s I/O Group can cause the target VDisk to be taken offline.

7.1.8 Mirroring between controllers

As mentioned, one advantage of mirrored VDisks is having the VDisk copies on different storage controllers/MDisk Groups. Normally, the read I/O is directed to the primary copy, but the primary copy must be available and synchronized. The location of the primary copy can be selected at its creation, but the location can also be changed later.

The write performance will be constrained by the lower performance controller, because writes must complete to both copies before the VDisk is considered to have been written successfully.

7.2 Creating VDisks

IBM System Storage SAN Volume Controller, SG24-6423-06, fully describes the creation of VDisks.

The best practices that we strongly recommend are:

� Decide on your naming convention before you begin. It is much easier to assign the correct names at the time of VDisk creation than to modify them afterwards. If you do need

Important: For the best practice and best performance, put all the primary mirrored VDisks on the same storage controller, or you might see a performance impact. Selecting the copy that is allocated on the higher performance storage controller will maximize the read performance of the VDisk.


to change the VDisk name, use the svctask chvdisk command (refer to Example 7-1). This command changes the name of the VDisk Test_0 to Test_1.

Example 7-1 The svctask chvdisk command

IBM_2145:itsosvccl1:admin>svctask chvdisk -name Test_1 Test_0

� Balance the VDisks across the I/O Groups in the cluster to balance the load across the cluster. At the time of VDisk creation, the workload to be put on the VDisk might not be known. In this case, if you are using the GUI, accept the system default of load balancing allocation. Using the command line interface (CLI), you must manually specify the I/O Group. In configurations with large numbers of attached hosts where it is not possible to zone a host to multiple I/O Groups, it might not be possible to choose to which I/O Group to attach the VDisks. The VDisk has to be created in the I/O Group to which its host belongs. For moving a VDisk across I/O Groups, refer to 7.2.3, “Moving a VDisk to another I/O Group” on page 125.

� By default, the preferred node, which owns a VDisk within an I/O Group, is selected on a load balancing basis. At the time of VDisk creation, the workload to be put on the VDisk might not be known. But it is important to distribute the workload evenly on the SVC nodes within an I/O Group. The preferred node cannot easily be changed. If you need to change the preferred node, refer to 7.2.2, “Changing the preferred node within an I/O Group” on page 124.

� The maximum number of VDisks per I/O Group is 2 048.

� The maximum number of VDisks per cluster is 8 192 (eight node cluster).

� The smaller the extent size that you select, the finer the granularity of the VDisk of space occupied on the underlying storage controller. A VDisk occupies an integer number of extents, but its length does not need to be an integer multiple of the extent size. The length does need to be an integer multiple of the block size. Any space left over between the last logical block in the VDisk and the end of the last extent in the VDisk is unused. A small extent size is used in order to minimize this unused space. The counter view to this view is that the smaller the extent size, the smaller the total storage volume that the SVC can virtualize. The extent size does not affect performance. For most clients, extent sizes of 128 MB or 256 MB give a reasonable balance between VDisk granularity and cluster capacity. There is no longer a default value set. Extent size is set during the Managed Disk (MDisk) Group creation.

As mentioned in the first section of this chapter, a VDisk can be created as Space-Efficient or fully allocated, in one of these three modes: striped, sequential, or image and with one or two copies (VDisk mirroring).

With extremely few exceptions, you must always configure VDisks using striping mode.

Note: Migrating VDisks across I/O Groups is a disruptive action. Therefore, it is best to specify the correct I/O Group at the time of VDisk creation.

Important: VDisks can only be migrated between Managed Disk Groups (MDGs) that have the same extent size, except for mirrored VDisks. The two copies can be in different MDisk Groups with different extent sizes.


7.2.1 Selecting the MDisk Group

As discussed in 6.1.2, “Selecting the MDisk Group” on page 103, you can use the SVC to create tiers (each one with different performance characteristics) of storage.

7.2.2 Changing the preferred node within an I/O Group

The plan is to simplify changing the preferred node within an I/O Group in a future release of the SVC code so that a single SVC command can change the preferred node within an I/O Group. Currently, no non-disruptive or easy method exists to change the preferred node within an I/O Group.

There are three alternative techniques that you can use; they are all disruptive to the host to which the VDisk is mapped:

� Migrate the VDisk out of the SVC as an image mode-managed disk (MDisk) and then import it back as an image mode VDisk. Make sure that you select the correct preferred node. The required steps are:

a. Migrate the VDisk to an image mode VDisk.

b. Cease I/O operations to the VDisk.

c. Disconnect the VDisk from the host operating system. For example, in Windows, remove the drive letter.

d. On the SVC, unmap the VDisk from the host.

e. Delete the image mode VDisk, which removes the VDisk from the MDG.

f. Add the image mode MDisk back into the SVC as an image mode VDisk, selecting the preferred node that you want.

g. Resume I/O operations on the host.

h. You can now migrate the image mode VDisk to a regular VDisk.

� If remote copy services are enabled on the SVC, perform an intra-cluster Metro Mirror to a target VDisk with the preferred node that you want. At a suitable opportunity:

a. Cease I/O to the VDisk.b. Flush the host buffers. c. Stop copy services and end the copy services relationship.d. Unmap the original VDisk from the host.e. Map the target VDisk to the host.f. Resume I/O operations.

� FlashCopy the VDisk to a target VDisk in the same I/O Group with the preferred node that you want, using the auto-delete option. The steps to follow are:

a. Cease I/O to the VDisk.b. Start FlashCopy.c. When the FlashCopy completes, unmap the source VDisk from the host.d. Map the target VDisk to the host.e. Resume I/O operations.f. Delete the source VDisk.

Note: Electing to use sequential mode over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting the system performance.


There is a fourth, non-SVC method of changing the preferred node within an I/O Group if the host operating system or logical volume manager supports disk mirroring. The steps are:

1. Create a VDisk, the same size as the existing one, on the desired preferred node.2. Mirror the data to this VDisk using host-based logical volume mirroring.3. Remove the original VDisk from the Logical Volume Manager (LVM).

7.2.3 Moving a VDisk to another I/O Group

The procedure of migrating a VDisk between I/O Groups is disruptive, because access to the VDisk is lost. If a VDisk is moved between I/O Groups, the path definitions of the VDisks are not refreshed dynamically. The old IBM Subsystem Device Driver (SDD) paths must be removed and replaced with the new one.

The best practice is to migrate VDisks between I/O Groups with the hosts shut down. Then, follow the procedure listed in 9.2, “Host pathing” on page 183 for the reconfiguration of SVC VDisks to hosts. We recommend that you remove the stale configuration and reboot the host in order to reconfigure the VDisks that are mapped to a host.

Ensure that when you migrate a VDisk to a new I/O Group, you quiesce all I/O operations for the VDisk. Determine the hosts that use this VDisk. Stop or delete any FlashCopy mappings or Metro/Global Mirror relationships that use this VDisk. To check if the VDisk is part of a relationship or mapping, issue the svcinfo lsvdisk command that is shown in Example 7-2 where vdiskname/id is the name or ID of the VDisk.

Example 7-2 Output of svcinfo lsvdisk command

IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0id 11name Image_mode0IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 18.0GBtype imageformatted nomdisk_id 10mdisk_name mdisk10FC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002A...

Look for the FC_id and RC_id fields. If these fields are not blank, the VDisk is part of a mapping or a relationship.

The procedure is:

1. Cease I/O operations to the VDisk.

2. Disconnect the VDisk from the host operating system. For example, in Windows, remove the drive letter.

3. Stop any copy operations.


4. Issue the command to move the VDisk (refer to Example 7-3). This command does not work while there is data in the SVC cache that is to be written to the VDisk. After two minutes, the data automatically destages if no other condition forces an earlier destaging.

5. On the host, rediscover the VDisk. For example in Windows, run a rescan, then either mount the VDisk or add a drive letter. Refer to Chapter 9, “Hosts” on page 175.

6. Resume copy operations as required.

7. Resume I/O operations on the host.

After any copy relationships are stopped, you can move the VDisk across I/O Groups with a single command in an SVC:

svctask chvdisk -iogrp newiogrpname/id vdiskname/id

In this command, newiogrpname/id is the name or ID of the I/O Group to which you move the VDisk and vdiskname/id is the name or ID of the VDisk.

Example 7-3 shows the command to move the VDisk named Image_mode0 from its existing I/O Group, io_grp1, to PerfBestPrac.

Example 7-3 Command to move a VDisk to another I/O Group

IBM_2145:itsosvccl1:admin>svctask chvdisk -iogrp PerfBestPrac Image_mode0

Migrating VDisks between I/O Groups can be a potential issue if the old definitions of the VDisks are not removed from the configuration prior to importing the VDisks to the host. Migrating VDisks between I/O Groups is not a dynamic configuration change. It must be done with the hosts shut down. Then, follow the procedure listed in Chapter 9, “Hosts” on page 175 for the reconfiguration of SVC VDisks to hosts. We recommend that you remove the stale configuration and reboot the host to reconfigure the VDisks that are mapped to a host.

For details about how to dynamically reconfigure IBM Subsystem Device Driver (SDD) for the specific host operating system, refer to Multipath Subsystem Device Driver: User’s Guide, SC30-4131-01, where this procedure is also described in great depth.

This command will not work if there is any data in the SVC cache, which has to be flushed out first. There is a -force flag; however, this flag discards the data in the cache rather than flushing it to the VDisk. If the command fails due to outstanding I/Os, it is better to wait a couple of minutes after which the SVC will automatically flush the data to the VDisk.

7.3 VDisk migration

In this section, we discuss the best practices to follow when you perform VDisk migrations.

Note: Do not move a VDisk to an offline I/O Group under any circumstances. You must ensure that the I/O Group is online before moving the VDisks to avoid any data loss.

Note: Using the -force flag can result in data integrity issues.


7.3.1 Migrating with VDisk mirroring

VDisk mirroring offers the facility to migrate VDisks between MDisk Groups with different extent sizes:

1. First, add a copy to the target MDisk Group.

2. Wait until the synchronization is complete.

3. Remove the copy in the source MDisk Group.

The migration from a Space-Efficient to a fully allocated VDisk is almost the same:

1. Add a target fully allocated copy.

2. Wait for synchronization to complete.

3. Remove the source Space-Efficient copy.

7.3.2 Migrating across MDGs

Migrating a VDisk from one MDG to another MDG is non-disruptive to the host application using the VDisk. Depending on the workload of the SVC, there might be a slight performance impact. For this reason, we recommend that you migrate a VDisk from one MDG to another MDG when there is a relatively low load on the SVC.

7.3.3 Image type to striped type migration

When migrating existing storage into the SVC, the existing storage is brought in as image type VDisks, which means that the VDisk is based on a single MDisk. In general, we recommend that the VDisk is migrated to a striped type VDisk, which is striped across multiple MDisks and, therefore, multiple RAID arrays as soon as it is practical. You generally expect to see a performance improvement by migrating from image type to striped type. Example 7-4 shows the command. This process is fully described in IBM System Storage SAN Volume Controller, SG24-6423-06.

Example 7-4 Image mode migration command

IBM_2145:itsosvccl1:svctask migratevdisk -mdiskgrp itso_ds45_64gb -threads 4 -vdisk image_mode0

This command migrates our VDisk, image_mode0, to the MDG, itso_ds45_64gb, and uses four threads while migrating. Note that instead of using the VDisk name, you can use its ID number.

7.3.4 Migrating to image type VDisk

An image type VDisk is a direct “straight through” mapping to exactly one image mode MDisk. If a VDisk is migrated to another MDisk, the VDisk is represented as being in managed mode during the migration. It is only represented as an image type VDisk after it has reached the state where it is a straight through mapping.

Image type disks are used to migrate existing data into an SVC and to migrate data out of virtualization. Image type VDisks cannot be expanded.

The usual reason for migrating a VDisk to an image type VDisk is to move the data on the disk to a non-virtualized environment. This operation is also carried out to enable you to


change the preferred node that is used by a VDisk. Refer to 7.2.2, “Changing the preferred node within an I/O Group” on page 124. The procedure of migrating a VDisk to an image type VDisk is non-disruptive to host I/O.

In order to migrate a striped type VDisk to an image type VDisk, you must be able to migrate to an available unmanaged MDisk. The destination MDisk must be greater than or equal to the size of the VDisk. Regardless of the mode in which the VDisk starts, it is reported as managed mode during the migration. Both of the MDisks involved are reported as being in image mode during the migration. If the migration is interrupted by a cluster recovery, the migration will resume after the recovery completes.

You must perform these command line steps:

1. To determine the name of the VDisk to be moved, issue the command:

svcinfo lsvdisk

The output is in the form that is shown in Example 7-5.

Example 7-5 The svcinfo lsvdisk output

IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -delim :id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:type:FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count0:diomede0:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018381BF2800000000000024:0:11:diomede1:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018381BF2800000000000025:0:12:diomede2:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018381BF2800000000000026:0:13:vdisk3:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::60050768018381BF2800000000000009:0:14:diomede3:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018381BF2800000000000027:0:15:vdisk5:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::60050768018381BF280000000000000B:0:16:vdisk6:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::60050768018381BF280000000000000C:0:17:siam1:0:PerfBestPrac:online:4:itso_ds47_siam:70.0GB:striped:::::60050768018381BF2800000000000016:0:18:vdisk8:0:PerfBestPrac:online:many:many:800.0MB:many:::::60050768018381BF2800000000000013:0:29:vdisk9:0:PerfBestPrac:online:2:itso_smallgrp:1.5GB:striped:::::60050768018381BF2800000000000014:0:110:Diomede_striped:0:PerfBestPrac:online:0:itso_ds45_64gb:64.0GB:striped:::::60050768018381BF2800000000000028:0:111:Image_mode0:0:PerfBestPrac:online:0:itso_ds45_64gb:18.0GB:image:::::60050768018381BF280000000000002A:0:112:Test_1:0:PerfBestPrac:online:0:itso_ds45_64gb:8.0GB:striped:::::60050768018381BF280000000000002B:0:1


2. In order to migrate the VDisk, you need the name of the MDisk to which you will migrate it. Example 7-6 shows the command that you use.

Example 7-6 The svcinfo lsmdisk command output

IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -delim :id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_name:UID0:mdisk0:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000000:itso_ds4500:600a0b80001744310000011a4888478c0000000000000000000000001:mdisk1:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000001:itso_ds4500:600a0b800017443100000119488847780000000000000000000000002:mdisk2:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000002:itso_ds4500:600a0b800017443100000118488847580000000000000000000000003:mdisk3:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000003:itso_ds4500:600a0b8000174431000001174888473e0000000000000000000000004:mdisk4:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000004:itso_ds4500:600a0b800017443100000116488847260000000000000000000000005:mdisk5:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000005:itso_ds4500:600a0b8000174431000001154888470c0000000000000000000000006:mdisk6:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000006:itso_ds4500:600a0b800017443100000114488846ec0000000000000000000000007:mdisk7:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000007:itso_ds4500:600a0b800017443100000113488846c00000000000000000000000008:mdisk8:online:unmanaged:::64.0GB:0000000000000018:itso_ds4500:600a0b80001744310000013a48a32b54000000000000000000000000000000009:mdisk9:online:unmanaged:::18.0GB:0000000000000008:itso_ds4500:600a0b80001744310000011b4888aeca00000000000000000000000000000000...

From this command, we can see that mdisk8 and mdisk9 are candidates for the image type migration, because they are unmanaged.

3. We now have enough information to enter the command to migrate the VDisk to image type, and you can see the command in Example 7-7.

Example 7-7 The migratetoimage command

IBM_2145:itsosvccl1:admin>svctask migratetoimage -vdisk Test_1 -threads 4 -mdisk mdisk8 -mdiskgrp itso_ds45_64gb

4. If there is no unmanaged MDisk to which to migrate, you can remove an MDisk from an MDisk Group. However, you can only remove an MDisk from an MDisk Group if there are enough free extents on the remaining MDisks in the group to migrate any used extents on the MDisk that you are removing.

7.3.5 Preferred paths to a VDisk

For I/O purposes, SVC nodes within the cluster are grouped into pairs, which are called I/O Groups. A single pair is responsible for serving I/O on a specific VDisk. One node within the I/O Group represents the preferred path for I/O to a specific VDisk. The other node represents the non-preferred path. This preference alternates between nodes as each VDisk is created within an I/O Group to balance the workload evenly between the two nodes.

The SVC implements the concept of each VDisk having a preferred owner node, which improves cache efficiency and cache usage. The cache component read/write algorithms are dependent on one node owning all the blocks for a specific track. The preferred node is set at


the time of VDisk creation either manually by the user or automatically by the SVC. Because read miss performance is better when the host issues a read request to the owning node, you want the host to know which node owns a track. The SCSI command set provides a mechanism for determining a preferred path to a specific VDisk. Because a track is just part of a VDisk, the cache component distributes ownership by VDisk. The preferred paths are then all the paths through the owning node. Therefore, a preferred path is any port on a preferred controller, assuming that the SAN zoning is correct.

By default, the SVC assigns ownership of even-numbered VDisks to one node of a caching pair and the ownership of odd-numbered VDisks to the other node. It is possible for the ownership distribution in a caching pair to become unbalanced if VDisk sizes are significantly different between the nodes or if the VDisk numbers assigned to the caching pair are predominantly even or odd.

To provide flexibility in making plans to avoid this problem, the ownership for a specific VDisk can be explicitly assigned to a specific node when the VDisk is created. A node that is explicitly assigned as an owner of a VDisk is known as the preferred node. Because it is expected that hosts will access VDisks through the preferred nodes, those nodes can become overloaded. When a node becomes overloaded, VDisks can be moved to other I/O Groups, because the ownership of a VDisk cannot be changed after the VDisk is created. We described this situation in 7.2.3, “Moving a VDisk to another I/O Group” on page 125.

SDD is aware of the preferred paths that SVC sets per VDisk. SDD uses a load balancing and optimizing algorithm when failing over paths; that is, it tries the next known preferred path. If this effort fails and all preferred paths have been tried, it load balances on the non-preferred paths until it finds an available path. If all paths are unavailable, the VDisk goes offline. It can take time, therefore, to perform path failover when multiple paths go offline.

SDD also performs load balancing across the preferred paths where appropriate.

7.3.6 Governing of VDisks

I/O governing effectively throttles the amount of IOPS (or MBs per second) that can be achieved to and from a specific VDisk. You might want to use I/O governing if you have a VDisk that has an access pattern that adversely affects the performance of other VDisks on the same set of MDisks, for example, a VDisk that uses most of the available bandwidth.

Of course, if this application is highly important, migrating the VDisk to another set of MDisks might be advisable. However, in some cases, it is an issue with the I/O profile of the application rather than a measure of its use or importance.

Base the choice between I/O and MB as the I/O governing throttle on the disk access profile of the application. Database applications generally issue large amounts of I/O, but they only transfer a relatively small amount of data. In this case, setting an I/O governing throttle based on MBs per second does not achieve much throttling. It is better to use an IOPS throttle.

At the other extreme, a streaming video application generally issues a small amount of I/O, but it transfers large amounts of data. In contrast to the database example, setting an I/O governing throttle based on IOPS does not achieve much throttling. For a streaming video application, it is better to use an MB per second throttle.

Note: The performance can be better if the access is made on the preferred node. The data can still be accessed by the partner node in the I/O Group in the event of a failure.


Before running the chvdisk command, run the svcinfo lsvdisk command against the VDisk that you want to throttle in order to check its parameters as shown in Example 7-8.

Example 7-8 The svcinfo lsvdisk command output

IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0id 11name Image_mode0IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 18.0GBtype imageformatted nomdisk_id 10mdisk_name mdisk10FC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002Athrottling 0preferred_node_id 1fast_write_state emptycache readwriteudid 0fc_map_count 0sync_rate 50copy_count 1...

The throttle setting of zero indicates that no throttling has been set. Having checked the VDisk, you can then run the svctask chvdisk command. The complete syntax of the command is:

svctask chvdisk [-iogrp iogrp_name|iogrp_id] [-rate throttle_rate [-unitmb]] [-name new_name_arg] [-force] vdisk_name|vdisk_id

To just modify the throttle setting, we run:

svctask chvdisk -rate 40 -unitmb Image_mode0

Running the lsvdisk command now gives us the output that is shown in Example 7-9.

Example 7-9 Output of lsvdisk command

IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0id 11name Image_mode0IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 18.0GBtype image


formatted nomdisk_id 10mdisk_name mdisk10FC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002Avirtual_disk_throttling (MB) 40preferred_node_id 1fast_write_state emptycache readwriteudid 0fc_map_count 0sync_rate 50copy_count 1copy_id 0status onlinesync yesprimary yesmdisk_grp_id 0mdisk_grp_name itso_ds45_64gbtype imagemdisk_id 10mdisk_name mdisk10fast_write_state emptyused_capacity 18.00GBreal_capacity 18.00GBfree_capacity 0.00MBoverallocation 100autoexpandwarninggrainsize

This example shows that the throttle setting (virtual_disk_throttling) is 40 MBps on this VDisk. If we had set the throttle setting to an I/O rate by using the I/O parameter, which is the default setting, we do not use the -unitmb flag:

svctask chvdisk -rate 2048 Image_mode0

You can see in Example 7-10 that the throttle setting has no unit parameter, which means that it is an I/O rate setting.

Example 7-10 The svctask chvdisk command and svcinfo lsvdisk output

IBM_2145:itsosvccl1:admin>svctask chvdisk -unitmb -rate 2048 Image_mode0IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0id 11name Image_mode0IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 18.0GBtype image


formatted nomdisk_id 10mdisk_name mdisk10FC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002Athrottling 2048preferred_node_id 1fast_write_state emptycache readwriteudid 0fc_map_count 0sync_rate 50copy_count 1

7.4 Cache-disabled VDisks

You use cache-disabled VDisks primarily when you are virtualizing an existing storage infrastructure and you want to retain the existing storage system copy services. You might want to use cache-disabled VDisks where there is intellectual capital in existing copy services automation scripts. We recommend that you keep the use of cache-disabled VDisks to minimum for normal workloads.

You can use cache-disabled VDisks also to control the allocation of cache resources. By disabling the cache for certain VDisks, more cache resources will be available to cache I/Os to other VDisks in the same I/O Group. This technique is particularly effective where an I/O Group is serving VDisks that will benefit from cache and other VDisks where the benefits of caching are small or non-existent.

Currently, there is no direct way to enable the cache for previously cache-disabled VDisks. There are three options to turn the VDisk caching mechanism on:

� If the VDisk is an image-mode VDisk, you can remove the VDisk from the SVC cluster and redefine it with the cache enabled.

� Use the SVC FlashCopy function to copy the content of the cache-disabled VDisk to a new cache-enabled VDisk. After the FlashCopy has been started, change the VDisk to host mapping to the new VDisk, which involves an outage.

� Use the SVC Metro Mirror or Global Mirror function to mirror the data to another cache-enabled VDisk. As in the second option, you have to change the VDisk to host mapping after the mirror operation is complete, which also involves an outage.

Note: An I/O governing rate of 0 (displayed as virtual_disk_throttling in the CLI output of the svcinfo lsvdisk command) does not mean that zero IOPS (or MBs per second) can be achieved. It means that no throttle is set.


7.4.1 Underlying controller remote copy with SVC cache-disabled VDisks

Where synchronous or asynchronous remote copy is used in the underlying storage controller, the controller LUNs at both the source and destination must be mapped through the SVC as image mode disks with the SVC cache disabled. Note that, of course, it is possible to access either the source or the target of the remote copy from a host directly, rather than through the SVC. You can use the SVC copy services with the image mode VDisk representing the primary site of the controller remote copy relationship. It does not make sense to use SVC copy services with the VDisk at the secondary site, because the SVC does not see the data flowing to this LUN through the controller.

Figure 7-1 shows the relationships between the SVC, the VDisk, and the underlying storage controller for a cache-disabled VDisk.

Figure 7-1 Cache-disabled VDisk in remote copy relationship

7.4.2 Using underlying controller PiT copy with SVC cache-disabled VDisks

Where point-in-time (PiT) copy is used in the underlying storage controller, the controller LUNs for both the source and the target must be mapped through the SVC as image mode disks with the SVC cache disabled as shown in Figure 7-2 on page 135.

Note that, of course, it is possible to access either the source or the target of the FlashCopy from a host directly rather than through the SVC.


Figure 7-2 PiT copy with cache-disabled VDisks

7.4.3 Changing cache mode of VDisks

There is no non-disruptive method to change the cache mode of a VDisk. If you need to change the cache mode of a VDisk, follow this procedure:

1. Convert the VDisk to an image mode VDisk. Refer to Example 7-11.

Example 7-11 Migrate to an image mode VDisk

IBM_2145:itsosvccl1:admin>svctask migratetoimage -vdisk Test_1 -threads 4 -mdisk mdisk8 -mdiskgrp itso_ds45_64gb

2. Stop I/O to the VDisk.

3. Unmap the VDisk from the host.

4. Run the svcinfo lsmdisk command to check your unmanaged MDisks.

5. Remove the VDisk, which makes the MDisk on which it is created become unmanaged. Refer to Example 7-12.

Example 7-12 Removing the VDisk Test_1

IBM_2145:itsosvccl1:admin>svctask rmvdisk Test_1

6. Make an image mode VDisk on the unmanaged MDisk that was just released from the SVC. Check the MDisks by running the svcinfo lsmdisk command first. Refer to Example 7-13 on page 136.


Example 7-13 Making a cache-disabled VDisk

IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -delim :id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_name:UID0:mdisk0:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000000:itso_ds4500:600a0b80001744310000011a4888478c0000000000000000000000001:mdisk1:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000001:itso_ds4500:600a0b800017443100000119488847780000000000000000000000002:mdisk2:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000002:itso_ds4500:600a0b800017443100000118488847580000000000000000000000003:mdisk3:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000003:itso_ds4500:600a0b8000174431000001174888473e0000000000000000000000004:mdisk4:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000004:itso_ds4500:600a0b800017443100000116488847260000000000000000000000005:mdisk5:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000005:itso_ds4500:600a0b8000174431000001154888470c0000000000000000000000006:mdisk6:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000006:itso_ds4500:600a0b800017443100000114488846ec0000000000000000000000007:mdisk7:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000007:itso_ds4500:600a0b800017443100000113488846c00000000000000000000000008:mdisk8:online:unmanaged:::64.0GB:0000000000000018:itso_ds4500:600a0b80001744310000013a48a32b54000000000000000000000000000000009:mdisk9:online:unmanaged:::18.0GB:0000000000000008:itso_ds4500:600a0b80001744310000011b4888aeca00000000000000000000000000000000...

IBM_2145:itsosvccl1:admin>svctask mkvdisk -mdiskgrp itso_ds45_64gb -size 5 -unit gb -iogrp PerfBestPrac -name Image_mode1 -cache noneVirtual Disk, id [13], successfully createdIBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode1id 13name Image_mode1IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 5.0GBtype stripedformatted nomdisk_idmdisk_nameFC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002Dthrottling 0preferred_node_id 1fast_write_state emptycache noneudidfc_map_count 0sync_rate 50copy_count 1


copy_id 0status onlinesync yesprimary yesmdisk_grp_id 0mdisk_grp_name itso_ds45_64gbtype stripedmdisk_idmdisk_namefast_write_state emptyused_capacity 5.00GBreal_capacity 5.00GBfree_capacity 0.00MBoverallocation 100autoexpandwarninggrainsize

7. If you want to create the VDisk with read/write cache, omit the -cache parameter, because cache-enabled is the default setting. Refer to Example 7-14.

Example 7-14 Removing VDisk and recreating with cache enabled

IBM_2145:itsosvccl1:admin>svctask rmvdisk Image_mode1IBM_2145:itsosvccl1:admin>svctask mkvdisk -mdiskgrp itso_ds45_64gb -size 5 -unit gb -iogrp PerfBestPrac -name Image_mode1Virtual Disk, id [13], successfully createdIBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode1id 13name Image_mode1IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 5.0GBtype stripedformatted nomdisk_idmdisk_nameFC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002Dthrottling 0preferred_node_id 1fast_write_state emptycache readwrite...

8. You can then map the VDisk to the host and continue I/O operations after rescanning the host. Refer to Example 7-15 on page 138.


Example 7-15 Mapping VDISK-Image to host Diomede_Win2k8

IBM_2145:itsosvccl1:admin>svctask mkvdiskhostmap -host Diomede_Win2k8 Image_mode1Virtual Disk to Host map, id [5], successfully created

7.5 VDisk performance

The answer to many performance questions is “it depends,” which is not much use to you when trying to solve storage performance problems or rather perceived storage performance problems. But there are no absolutes with performance, so it is truly difficult to state a specific performance number for a VDisk.

Some people expect that the SVC will greatly add to the latency of I/O operations, because the SVC is in-band. But because the SVC is an in-band appliance, all writes are essentially write-hits, because completion is returned to the host at the point that the SVC cache has mirrored the write to its partner node. When the workload is heavy, the cache will destage write data, based on a least recently used (LRU) algorithm, thus, ensuring new host writes continue to be serviced as quickly as possible. The rate of destage is ramped up to free space more quickly when the cache reaches certain thresholds, which avoids any nasty cache full situations.

Reads are likely to be read-hits, and sequential workloads get the benefit of both controller prefetch and SVC prefetch algorithms, giving the latest SVC nodes the ability to show more than 10 GBps on large transfer sequential read miss workloads. Random reads are at the mercy of the storage again, and here we tie in with the “fast path” with tens of microseconds of additional latency on a read-miss. The chances are this read will also be a read miss on the controller where a high-end system will respond in around 10 milliseconds. The order of magnitude of the additional latency introduced by SVC is therefore “lost in the noise.”

A VDisk, just as any storage device, has three basic properties: capacity, I/O rate, and throughput as measured in megabytes per second. One of these properties will be the limiting factor in your environment. Having cache and striping across large numbers of disks can help increase these numbers. But eventually, the fundamental laws of physics apply. There will always be a limiting number. One of the major problems with designing a storage infrastructure is that while it is relatively easy to determine the required capacity, determining the required I/O rate and throughput is not so easy. All too often the exact requirement is only known after the storage infrastructure has been built, and the performance is inadequate. One of the advantages of the SVC is that it is possible to compensate for a lack of information at the design stage due to the SVC’s flexibility and the ability to non-disruptively migrate data to different types of back-end storage devices.

The throughput for VDisks can range from fairly small numbers (1 to 10 IOPS) to extremely large values (more than 1 000 IOPS). This throughput depends greatly on the nature of the application and across how many MDisks the VDisk is striped. When the I/O rates, or throughput, approach 1 000 IOPS per VDisk, it is either because the volume is getting extremely good performance, usually from extremely good cache behavior, or that the VDisk is striped across multiple MDisks and hence usually across multiple RAID arrays on the back-end storage system. Otherwise, it is not possible to perform so many IOPS to a VDisk that is based on a single RAID array and still realize a good response time.

Note: Before removing the VDisk host mapping, it is essential that you follow the procedures in Chapter 9, “Hosts” on page 175 so that you can remount the disk with its access to data preserved.


The MDisk I/O limit depends on many factors. The primary factor is the number of disks in the RAID array on which the MDisk is built and the speed or revolutions per minute (RPM) of the disks. But when the number of IOPS to an MDisk is near or above 1 000, the MDisk is considered extremely busy. For 15 K RPM disks, the limit is a bit higher. But these high I/O rates to the back-end storage systems are not consistent with good performance; they imply that the back-end RAID arrays are operating at extremely high utilizations, which is indicative of considerable queuing delays. Good planning demands a solution that reduces the load on such busy RAID arrays.

For more precision, we will consider the upper limit of performance for 10 K and 15 K RPM, enterprise class devices. Be aware that different people have different opinions about these limits, but all the numbers in Table 7-1 represent extremely busy disk drive modules (DDMs).

Table 7-1 DDM speeds

While disks might achieve these throughputs, these ranges imply a lot of queuing delay and high response times. These ranges probably represent acceptable performance only for batch-oriented applications, where throughput is the paramount performance metric. For online transaction processing (OLTP) applications, these throughputs might already have unacceptably high response times. Because 15 K RPM DDMs are most commonly used in OLTP environments (where response time is at a premium), a simple rule is if the MDisk does more than 1 000 operations per second, it is extremely busy, no matter what the drive’s RPM is.

In the absence of additional information, we often assume, and our performance models assume, that 10 milliseconds (msec) response time is pretty high. But for a particular application, 10 msec might be too low or too high. Many OLTP environments require response times closer to 5 msec, while batch applications with large sequential transfers might run fine with 20 msec response time. The appropriate value can also change between shifts or on the weekend. A response time of 5 msec might be required from 8 a.m. until 5 p.m., while 50 msec is perfectly acceptable near midnight. It is all client and application dependent.

What really matters is the average front-end response time, which is what counts for the users. You can measure the average front-end response time by using TPC for Disk with its performance reporting capabilities. Refer to Chapter 11, “Monitoring” on page 221 for more information.

Figure 7-3 on page 140 shows the overall response time of a VDisk that is under test. Here, we have plotted the overall response time. Additionally, TPC allows us to plot read and write response times as distinct entities if one of these response times was causing problems to the user. This response time in the 1 - 2 msec range gives an acceptable level of performance for OLTP applications.

DDM speed Maximum operations/second

6+P operations/second

7+P operations/second

10 K 150 - 175 900 - 1050 1050 - 1225

15 K 200 - 225 1200 - 1350 1400 - 1575


Figure 7-3 VDisk overall response time

If we look at the I/O rate on this VDisk, we see the chart in Figure 7-4 on page 141, which shows us that the I/O rate to this VDisk was in the region of 2 000 IOPS, which normally is an unacceptably high response time for a LUN that is based on a single RAID array. However, in this case, the VDisk was striped across two MDisks, which gives us an I/O rate per MDisk in the order of 1 200 IOPS. This I/O rate is high and normally gives a high user response time; however, here, the SVC front-end cache mitigates the high latency at the back end, giving the user a good response time.

Although there is no immediate issue with this VDisk, if the workload characteristics change and the VDisk becomes less cache friendly, you need to consider adding another MDisk to the MDG, making sure that it comes from another RAID array, and striping the VDisk across all three MDisks.


Figure 7-4 VDisk I/O rate

7.5.1 VDisk performance

It is vital that you constantly monitor systems when they are performing well so that you can establish baseline levels of good performance. Then, if performance as experienced by the user degrades, you have the baseline numbers for a comparison. We strongly recommend that you use TPC to monitor and manage your storage environment.

OLTP workloadsProbably the most important parameter as far as VDisks are concerned is the I/O response time for OLTP workloads. After you have established what VDisk response time provides good user performance, you can set TPC alerting to notify you if this number is exceeded by about 25%. Then, you check the I/O rate of the MDisks on which this VDisk is built. If there are multiple MDisks per RAID array, you need to check the RAID array performance. You can perform all of these tasks using TPC. The “magic” number here is 1 000 IOPS, assuming that the RAID array is 6+P. Refer to Table 7-1 on page 139.

If one of the back-end storage arrays is running at more than 1 000 IOPS and the user is experiencing poor performance because of degraded response time, this array is probably the root cause of the problem.


If users complain of response time problems, yet the VDisk response as measured by TPC has not changed significantly, this situation indicates that the problem is in the SAN network between the host and the SVC. You can diagnose where the problem is with TPC. The best way to determine the location of the problem is to use the Topology Viewer to look at the host using Datapath Explorer (DPE). This view enables you to see the paths from the host to the SVC, which we show in Figure 7-5.

Figure 7-5 DPE view of the host to the SVC

Figure 7-5 shows the paths from the disk as seen by the server through its host bus adapters (HBAs) to the SVC VDisk. By hovering the cursor over the switch port, you can see the throughput of that port. You can also use TPC to produce reports showing the overall throughput of the ports, which we show in Figure 7-6 on page 143.


Figure 7-6 Throughput of the ports

TPC can present the throughput of the ports graphically over time as shown in Figure 7-7 on page 144.


Figure 7-7 Port throughput rate

From this type of graph, you can identify performance bottlenecks in the SAN fabric and make the appropriate changes.

Batch workloadsWith batch workloads in general, the most important parameter is the throughput rate as measured in megabytes per second. The goal rate is harder to quantify than the OLTP response figure, because throughput is heavily dependent on the block size. Additionally high response times can be acceptable for these workloads. So, it is not possible to give a single metric to quantify performance. It really is a question of “it depends.”

The larger the block size, the greater the potential throughput to the SVC. Block size is often determined by the application. With TPC, you can measure the throughput of a VDisk and the MDisks on which it is built. The important measure for the user is the time that the batch job takes to complete. If this time is too long, the following steps are a good starting point.

Determine the data rate that is needed for timely completion and compare it with the storage system’s capability as documented in performance white papers and Disk Magic. If the storage system is capable of greater performance:

1. Make sure that the application transfer size is as large as possible.

2. Consider increasing the number of concurrent application streams, threads, files, and partitions.

3. Make sure that the host is capable of supporting the required data rate. For example, use tests, such as dd, and use TPC to monitor the results.


4. Check whether the flow of data through the SAN is balanced by using the switch performance monitors within TPC (extremely useful).

5. Check whether all switch and host ports are operating at the maximum permitted data rate of 2 GB per seconds or 4 Gb per seconds.

6. Watch out for cases where the whole batch window stops on a single file or database getting read or written, which can be a practical exposure for obvious reasons. Unfortunately, sometimes there is nothing that can be done. However, it is worthwhile evaluating this situation to see whether, for example, the database can be divided into partitions, or the large file replaced by multiple smaller files. Or, the use of the SVC in combination with SDD might help with a combination of striping and added paths to multiple VDisks. These efforts can allow parallel batch streams to the VDisks and, thus, speed up batch runs.

The chart shown in Figure 7-8 gives an indication of what can be achieved with tuning the VDisk and the application. From point A to point B shows the normal steady state running of the application on the VDisk built on a single MDisk. We then migrated the VDisk so that it spanned two MDisks. From point B to point C shows the drop in performance during the migration. When the migration was complete, the line from point D to point E shows that the performance had almost doubled. The application was one with 75% reads and 75% sequential access. The application was then modified so that it was 100% sequential. The resulting gain in performance is shown between point E and point F.

Figure 7-8 Large 64 KB block workloads with improvements


Figure 7-9 shows the performance enhancements that can be achieved by modifying the number of parallel streams flowing to the VDisk. The line from point A to point B shows the performance with a single stream application. We then doubled the size of the workload, but we kept it in single stream. As you can see from the line between point C and point D, there is no improvement in performance. We were then able to split the workload into two parallel streams at point E. As you can see from the graph, from point E to point F shows that the throughput to the VDisk has increased by over 60%.

Figure 7-9 Effect of splitting a large job into two parallel streams

Mixed workloadsAs discussed in 7.2.1, “Selecting the MDisk Group” on page 124, we usually recommend mixing workloads, so that the maximum resources are available to any workload when needed. When there is a heavy batch workload and there is no VDisk throttling, we recommend that the VDisks are placed on separate MDGs.

This action is illustrated by the chart in Figure 7-10 on page 147. VDisk 21 is running an OLTP workload, and VDisk 20 is running a batch job. Both VDisks were in the same MDG sharing the same MDisks, which were spread over three RAID arrays. As you can see between point A and point B, the response time for the OLTP workload is extremely high, averaging 10 milliseconds. At point in time B, we migrated VDisk 20 to another MDG, using MDisks built on different RAID arrays. As you can see, after the migration had completed, the response time (from point D to point E) dropped for both the batch job and, more importantly, the OLTP workload.


Figure 7-10 Effect of migrating batch workload

7.6 The effect of load on storage controllers

Because the SVC is able to share the capacity of a few MDisks to many more VDisks (and, thus, are assigned to hosts generating I/O), it is possible that an SVC can generate a lot more I/O than the storage controller normally received if there was not an SVC in the middle. To add FlashCopy to this situation can add more I/O to a storage controller in addition to the I/O that hosts are generating.

It is important to take the load that you can put onto a storage controller into consideration when defining VDisks for hosts to make sure that you do not overload a storage controller.

So, assuming that a typical physical drive can handle 150 IOPS (a Serial Advanced Technology Attachment (SATA) might handle slightly fewer IOPS than 150) and by using this example, you can calculate the maximum I/O capability that an MDG can handle.


Then, as you define the VDisks and the FlashCopy mappings, calculate the maximum average I/O that the SVC will receive per VDisk before you start to overload your storage controller.

This example assumes:

� An MDisk is defined from an entire array (that is, the array only provides one LUN and that LUN is given to the SVC as an MDisk).

� Each MDisk that is assigned to an MDG is the same size and same RAID type and comes from a storage controller of the same type.

� MDisks from a storage controller are contained entirely in the same MDG.

The raw I/O capability of the MDG is the sum of the capabilities of its MDisks. For example, for five RAID 5 MDisks with eight component disks on a typical back-end device, the I/O capability is:

5 x (150 x 7) = 5250

This raw number might be constrained by the I/O processing capability of the back-end storage controller itself.

FlashCopy copying contributes to the I/O load of a storage controller, and thus, it must be taken into consideration. The effect of a FlashCopy is effectively adding a number of loaded VDisks to the group, and thus, a weighting factor can be calculated to make allowance for this load.

The affect of FlashCopy copies depends on the type of I/O taking place. For example, in a group with two FlashCopy copies and random writes to those VDisks, the weighting factor is 14 x 2 = 28. The total weighting factor for FlashCopy copies is given in Table 7-2.

Table 7-2 FlashCopy weighting

Thus, to calculate the average I/O per VDisk before overloading the MDG, use this formula:

I/O rate = (I/O Capability) / (No vdisks + Weighting Factor)

So, using the example MDG as defined previously, if we added 20 VDisks to the MDG and that MDG was able to sustain 5 250 IOPS, and there were two FlashCopy mappings that also have random reads and writes, the maximum I/O per VDisks is:

5250 / (20 + 28) = 110

Note that this is an average I/O rate, so if half of the VDisks sustain 200 I/Os and the other half of the VDisks sustain 10 I/Os, the average is still 110 IOPS.

Type of I/O to the VDisk Impact on I/O Weight factor for FlashCopy

None/very little Insignificant 0

Reads only Insignificant 0

Sequential reads and writes Up to 2x I/Os 2 x F

Random reads and writes Up to 15x I/O 14 x F

Random writes Up to 50x I/O 49 x F


ConclusionAs you can see from the previous examples, TPC is an extremely useful and powerful tool for analyzing and solving performance problems. If you want a single parameter to monitor to gain an overview of your system’s performance, it is the read and write response times for both VDisks and MDisks. This parameter shows everything that you need in one view. It is the key day-to-day performance validation metric. It is relatively easy to notice that a system that usually had 2 ms writes and 6 ms reads suddenly has 10 ms writes and 12 ms reads and is getting overloaded. A general monthly check of CPU usage will show you how the system is growing over time and highlight when it is time to add a new I/O Group (or cluster).

In addition, there are useful rules for OLTP-type workloads, such as the maximum I/O rates for back-end storage arrays, but for batch workloads, it really is a case of “it depends.”



Chapter 8. Copy services

In this chapter, we discuss the best practices for using the Advanced Copy Services functions, such as FlashCopy services and Metro Mirror and Global Mirror. We also describe guidelines to obtain the best performance.

8


8.1 SAN Volume Controller Advanced Copy Services functions

In this section, we describe the best practice for the SAN Volume Controller (SVC) Advanced Copy Services functions, and how to get the best performance.

8.1.1 Setting up FlashCopy services

Regardless of whether you use FlashCopy to make one target disk, or multiple target disks, it is important that you consider the application and the operating system. Even though the SVC can make an exact image of a disk with FlashCopy at the point in time that you require, it is pointless if the operating system, or more importantly, the application, cannot use the copied disk.

Data stored to a disk from an application normally goes through these steps:

1. The application records the data using its defined application programming interface. Certain applications might first store their data in application memory before sending it to disk at a later time. Normally, subsequent reads of the block just being written will get the block in memory if it is still there.

2. The application sends the data to a file. The file system accepting the data might buffer it in memory for a period of time.

3. The file system will send the I/O to a disk controller after a defined period of time (or even based on an event).

4. The disk controller might cache its write in memory before sending the data to the physical drive.

If the SVC is the disk controller, it will store the write in its internal cache before sending the I/O to the real disk controller.

5. The data is stored on the drive.

At any point in time, there might be any number of unwritten blocks of data in any of these steps, waiting to go to the next step.

It is also important to realize that sometimes the order of the data blocks created in step 1 might not be the same order that is used when sending the blocks to steps 2, 3, or 4. So it is possible, that at any point in time, data arriving in step 4 might be missing a vital component that has not yet been sent from step 1, 2, or 3.

FlashCopy copies are normally created with data that is visible from step 4. So, to maintain application integrity, when a FlashCopy is created, any I/O that is generated in step 1 must make it to step 4 when the FlashCopy is started. In other words, there must not be any outstanding write I/Os in steps 1, 2, or 3.

If there were outstanding write I/Os, the copy of the disk that is created at step 4 is likely to be missing those transactions, and if the FlashCopy is to be used, these missing I/Os can make it unusable.


8.1.2 Steps to making a FlashCopy VDisk with application data integrity

The steps that you must perform when creating FlashCopy copies are:

1. Your host is currently writing to a VDisk as part of its day-to-day usage. This VDisk becomes the source VDisk in our FlashCopy mapping.

2. Identify the size and type (image, sequential, or striped) of the VDisk. If the VDisk is an image mode VDisk, you need to know its size in bytes. If it is a sequential or striped mode VDisk, its size, as reported by the SVC Master Console or SVC command line interface (CLI), is sufficient.

To identify the VDisks in an SVC cluster, use the svcinfo lsvdisk command, as shown in Example 8-1.

Figure 8-1 on page 154 shows how to obtain the same information using the SVC GUI. If you want to put VDisk 10 into a FlashCopy mapping, you do not need to know the byte size of that VDisk, because it is a striped VDisk. Creating a target VDisk of 18 GB by using the SVC GUI or CLI is sufficient.

Example 8-1 Using the command line to see the type of the VDisks

IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -delim :id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:type:FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count:copy_count0:diomede0:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000024:0:11:diomede1:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000025:0:12:diomede2:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000026:0:13:vdisk3:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838 1BF2800000000000009:0:14:diomede3:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000027:0:15:vdisk5:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838 1BF280000000000000B:0:16:vdisk6:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838 1BF280000000000000C:0:17:siam1:0:PerfBestPrac:online:4:itso_ds47_siam:70.0GB:striped:::::60050768018381 BF2800000000000016:0:18:vdisk8:0:PerfBestPrac:online:many:many:800.0MB:many:::::60050768018381BF280000 0000000013:0:29:vdisk9:0:PerfBestPrac:online:2:itso_smallgrp:1.5GB:striped:::::60050768018381B F2800000000000014:0:110:Diomede_striped:0:PerfBestPrac:online:0:itso_ds45_64gb:64.0GB:striped:::::600 50768018381BF2800000000000028:0:111:Image_mode0:0:PerfBestPrac:online:0:itso_ds45_64gb:18.0GB:image:::::600507680 18381BF280000000000002A:0:1

Chapter 8. Copy services 153

Figure 8-1 Using the SVC GUI to see the type of VDisks

The VDisk 11, which is used in our example, is an image-mode VDisk. In this example, you need to know its exact size in bytes.

In Example 8-2, we use the -bytes parameter of the svcinfo lsvdisk command to find its exact size. Thus, the target VDisk must be created with a size of 19 327 352 832 bytes, not 18 GB. Figure 8-2 on page 155 shows the exact size of an image mode VDisk using the SVC GUI.

Example 8-2 Find the exact size of an image mode VDisk using the command line interface

IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -bytes 11id 11name Image_mode0IO_group_id 0IO_group_name PerfBestPracstatus onlinemdisk_grp_id 0mdisk_grp_name itso_ds45_64gbcapacity 19327352832type imageformatted nomdisk_id 10mdisk_name mdisk10FC_idFC_nameRC_idRC_namevdisk_UID 60050768018381BF280000000000002Athrottling 0preferred_node_id 1fast_write_state emptycache readwrite...


Figure 8-2 Find the exact size of an image mode VDisk using the SVC GUI

3. Create a target VDisk of the required size as identified by the source VDisk in Figure 8-3 on page 163. The target VDisk can be either an image, sequential, or striped mode VDisk; the only requirement is that it must be exactly the same size as the source VDisk. The target VDisk can be cache-enabled or cache-disabled.

4. Define a FlashCopy mapping, making sure that you have the source and target disks defined in the correct order. (If you use your newly created VDisk as a source and the existing host’s VDisk as the target, you will destroy the data on the VDisk if you start the FlashCopy.)

5. As part of the define step, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the data from the source VDisk to the target VDisk.

Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed since the mapping was started on the source VDisk or the target VDisk (if the target VDisk is mounted, read write to a host).

6. The prepare process for the FlashCopy mapping can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the source VDisks to the storage controller’s disks. After the preparation completes, the mapping has a Prepared status and the target VDisk behaves as though it was a cache-disabled VDisk until the FlashCopy mapping is either started or deleted.

Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of an active Metro Mirror relationship, you add additional latency to that existing Metro Mirror relationship (and possibly affect the host that is using the source VDisk of that Metro Mirror relationship as a result).

The reason for the additional latency is that the FlashCopy prepares and disables the cache on the source VDisk (which is the target VDisk of the Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the completion is returned to the host.


7. After the FlashCopy mapping is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process will be different for each application and for each operating system.

One guaranteed way to quiesce the host is to stop the application and unmount the VDisk from the host.

8. As soon as the host completes its flushing, you can the start the FlashCopy mapping. The FlashCopy starts extremely quickly (at most, a few seconds).

9. When the FlashCopy mapping has started, you can then unquiesce your application (or mount the volume and start the application), at which point the cache is re-enabled for the source VDisks. The FlashCopy continues to run in the background and ensures that the target VDisk is an exact copy of the source VDisk when the FlashCopy mapping was started.

You can perform step 1 on page 153 through step 5 on page 155 while the host that owns the source VDisk performs its typical daily activities (that means no downtime). While step 6 on page 155 is running, which can last several minutes, there might be a delay in I/O throughput, because the cache on the VDisk is temporarily disabled.

Step 7 must be performed when the application is down. However, these steps complete quickly and application downtime is minimal.

The target FlashCopy VDisk can now be assigned to another host, and it can be used for read or write even though the FlashCopy process has not completed.

8.1.3 Making multiple related FlashCopy VDisks with data integrity

Where a host has more than one VDisk, and those VDisks are used by one application, FlashCopy consistency might need to be performed across all disks at exactly the same moment in time to preserve data integrity.

Here are examples when this situation might apply:

� A Windows Exchange server has more than one drive, and each drive is used for an Exchange Information Store. For example, the exchange server has a D drive, an E drive, and an F drive. Each drive is an SVC VDisk that is used to store different information stores for the Exchange server.

Thus, when performing a “snap copy” of the exchange environment, all three disks need to be flashed at exactly the same time, so that if they were used during a recovery, no one information store has more recent data on it than another information store.

� A UNIX® relational database has several VDisks to hold different parts of the relational database. For example, two VDisks are used to hold two distinct tables, and a third VDisk holds the relational database transaction logs.

Again, when a snap copy of the relational database environment is taken, all three disks need to be in sync. That way, when they are used in a recovery, the relational database is not missing any transactions that might have occurred if each VDisk was copied by using FlashCopy independently.

Note: If you intend to use the target VDisk on the same host as the source VDisk at the same time that the source VDisk is visible to that host, you might need to perform additional preparation steps to enable the host to access VDisks that are identical.


Here are the steps to ensure that data integrity is preserved when VDisks are related to each other:

1. Your host is currently writing to the VDisks as part of its daily activities. These VDisks will become the source VDisks in our FlashCopy mappings.

2. Identify the size and type (image, sequential, or striped) of each source VDisk. If any of the source VDisks is an image mode VDisk, you will need to know its size in bytes. If any of the source VDisks are sequential or striped mode VDisks, their size as reported by the SVC Master Console or SVC command line will be sufficient.

3. Create a target VDisk of the required size for each source identified in the previous step. The target VDisk can be either an image, sequential, or striped mode VDisk; the only requirement is that they must be exactly the same size as their source VDisk. The target VDisk can be cache-enabled or cache-disabled.

4. Define a FlashCopy Consistency Group. This Consistency Group will be linked to each FlashCopy mapping that you have defined, so that data integrity is preserved between each VDisk.

5. Define a FlashCopy mapping for each source VDisk, making sure that you have the source disk and the target disk defined in the correct order. (If you use any of your newly created VDisks as a source and the existing host’s VDisk as the target, you will destroy the data on the VDisk if you start the FlashCopy).

When defining the mapping, make sure that you link this mapping to the FlashCopy Consistency Group that you defined in the previous step.

As part of defining the mapping, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the source VDisks to the target VDisks. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed on any VDisk since the Consistency Group was started on the source VDisk or the target VDisk (if the target VDisk is mounted read/write to a host).

6. Prepare the FlashCopy Consistency Group. This preparation process can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the VDisks in the Consistency Group to the storage controller’s disks. After the preparation process completes, the Consistency Group has a Prepared status and all source VDisks behave as though they were cache-disabled VDisks until the Consistency Group is either started or deleted.

7. After the Consistency Group is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process differs for each application and for each operating system.

One guaranteed way to quiesce the host is to stop the application and unmount the VDisks from the host.

Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of an active Metro Mirror relationship, this mapping adds additional latency to that existing Metro Mirror relationship (and possibly affects the host that is using the source VDisk of that Metro Mirror relationship as a result).

The reason for the additional latency is that the FlashCopy Consistency Group preparation process disables the cache on all source VDisks (which might be target VDisks of a Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the complete status is returned to the host.


8. As soon as the host completes its flushing, you can then start the Consistency Group. The FlashCopy start completes extremely quickly (at most, a few seconds).

9. When the Consistency Group has started, you can then unquiesce your application (or mount the VDisks and start the application), at which point the cache is re-enabled. The FlashCopy continues to run in the background and preserves the data that existed on the VDisks when the Consistency Group was started.

Step 1 on page 157 through step 6 on page 157 can be performed while the host that owns the source VDisks is performing its typical daily duties (that is, no downtime). While step 6 on page 157 is running, which can take several minutes, there might be a delay in I/O throughput, because the cache on the VDisks is temporarily disabled.

You must perform step 7 when the application is down; however, these steps complete quickly so that the application downtime is minimal.

The target FlashCopy VDisks can now be assigned to another host and used for read or write even though the FlashCopy processes have not completed.

8.1.4 Creating multiple identical copies of a VDisk

Since SVC 4.2, you can create multiple point-in-time copies of a source VDisk. These point-in-time copies can be made at different times (for example, hourly) so that an image of a VDisk can be captured before a previous image has completed.

If there is a requirement to have more than one VDisk copy created at exactly the same time, using FlashCopy Consistency Groups is the best method.

By placing the FlashCopy mappings into a Consistency Group (where each mapping uses the same source VDisks), when the FlashCopy Consistency Group is started, each target will be an identical image of all the other VDisk FlashCopy targets.

The VDisk Mirroring feature, which is new in SVC 4.3, allows you to have one or two copies of a VDisk, too. For more details, refer to Chapter 7, “VDisks” on page 119.

8.1.5 Creating a FlashCopy mapping with the incremental flag

By creating a FlashCopy mapping with the incremental flag, only the data that has been changed since the last FlashCopy was started is written to the target VDisk.

This functionality is necessary in cases where we want, for example, a full copy of a VDisk for disaster tolerance, application testing, or data mining. It greatly reduces the time required to establish a full copy of the source data as a new snapshot when the first background copy is completed. In cases where clients maintain fully independent copies of data as part of their disaster tolerance strategy, using incremental FlashCopy can be useful as the first layer in their disaster tolerance and backup strategy.

Note: If you intend to use any of the target VDisks on the same host as their source VDisk at the same time that the source VDisk is visible to that host, you might need to perform additional preparation steps to enable the host to access VDisks that are identical.


8.1.6 Space-Efficient FlashCopy (SEFC)

Using the Space-Efficient VDisk (SEV) feature, which was introduced in SVC 4.3, FlashCopy can be used in a more efficient way. SEV allows for the late allocation of MDisk space (also called thin-provisioning). Space-Efficient VDisks (SE VDisks) present a virtual size to hosts, while the real MDisk Group space (the number of extents x the size of the extents) allocated for the VDisk might be considerably smaller.

SE VDisks as target VDisks offer the opportunity to implement SEFC. SE VDisks as source VDisk and target VDisk can also be used to make point-in-time copies.

There are two distinct meanings:

� Copy of an SE source VDisk to an SE target VDisk

The background copy only copies allocated regions, and the incremental feature can be used for refresh mapping (after a full copy is complete).

� Copy of a Fully Allocated (FA) source VDisk to an SE target VDisk

For this combination, you must have a zero copy rate to avoid fully allocating the SE target VDisk.

You can use SE VDisks for cascaded FlashCopy and multiple target FlashCopy. It is also possible to mix SE with normal VDisks, and it can be used for incremental FlashCopy too, but using SE VDisks for incremental FlashCopy only makes sense if the source and target are Space-Efficient.

The recommendation for SEFC:

� SEV grain size must be equal to the FlashCopy grain size.

� SEV grain size must be 64 KB for the best performance and the best space efficiency.

The exception is where the SEV target VDisk is going to become a production VDisk (will be subjected to ongoing heavy I/O). In this case, the 256 KB SEV grain size is recommended to provide better long term I/O performance at the expense of a slower initial copy.

8.1.7 Using FlashCopy with your backup application

If you are using FlashCopy together with your backup application and you do not intend to keep the target disk after the backup has completed, we recommend that you create the FlashCopy mappings using the NOCOPY option (background copy rate = 0).

Note: The defaults for grain size are different: 32 KB for SE VDisk and 256 KB for FlashCopy mapping.

Note: Even if the 256 KB SEV grain size is chosen, it is still beneficial if you keep the FlashCopy grain size to 64 KB. It is then possible to minimize the performance impact to the source VDisk, even though this size increases the I/O workload on the target VDisk. Clients with extremely large numbers of FlashCopy/Remote Copy relationships might still be forced to choose a 256 KB grain size for FlashCopy due to constraints on the amount of bitmap memory.


If you intend to keep the target so that you can use it as part of a quick recovery process, you might choose one of the following options:

� Create the FlashCopy mapping with NOCOPY initially. If the target is used and migrated into production, you can change the copy rate at the appropriate time to the appropriate rate to have all the data copied to the target disk. When the copy completes, you can delete the FlashCopy mapping and delete the source VDisk, thus, freeing the space.

� Create the FlashCopy mapping with a low copy rate. Using a low rate might enable the copy to complete without an impact to your storage controller, thus, leaving bandwidth available for production work. If the target is used and migrated into production, you can change the copy rate to a higher value at the appropriate time to ensure that all data is copied to the target disk. After the copy completes, you can delete the source, thus, freeing the space.

� Create the FlashCopy with a high copy rate. While this copy rate might add additional I/O burden to your storage controller, it ensures that you get a complete copy of the source disk as quickly as possible.

By using the target on a different Managed Disk Group (MDG), which, in turn, uses a different array or controller, you reduce your window of risk if the storage providing the source disk becomes unavailable.

With Multiple Target FlashCopy, you can now use a combination of these methods. For example, you can use the NOCOPY rate for an hourly snapshot of a VDisk with a daily FlashCopy using a high copy rate.

8.1.8 Using FlashCopy for data migration

SVC FlashCopy can help you with data migration, especially if you want to migrate from a controller (and your own testing reveals that the SVC can communicate with the device). Another reason to use SVC FlashCopy is to keep a copy of your data behind on the old controller in order to help with a back-out plan in the event that you want to stop the migration and revert back to the original configuration.

In this example, you can use the following steps to help migrate to a new storage environment with minimum downtime, which enables you to leave a copy of the data in the old environment if you need to back up to the old configuration.

To use FlashCopy to help with migration:

1. Your hosts are using the storage from either an unsupported controller or a supported controller that you plan on retiring.

2. Install the new storage into your SAN fabric and define your arrays and logical unit numbers (LUNs). Do not mask the LUNs to any host; you will mask them to the SVC later.

3. Install the SVC into your SAN fabric and create the required SAN zones for the SVC nodes and SVC to see the new storage.

4. Mask the LUNs from your new storage controller to the SVC and use svctask detectmdisk on the SVC to discover the new LUNs as MDisks.

5. Place the MDisks into the appropriate MDG.

6. Zone the hosts to the SVC (while maintaining their current zone to their storage) so that you can discover and define the hosts to the SVC.

7. At an appropriate time, install the IBM SDD onto the hosts that will soon use the SVC for storage. If you have performed testing to ensure that the host can use both SDD and the original driver, you can perform this step anytime before the next step.


8. Quiesce or shut down the hosts so that they no longer use the old storage.

9. Change the masking on the LUNs on the old storage controller so that the SVC now is the only user of the LUNs. You can change this masking one LUN at a time so that you can discover them (in the next step) one at a time and not mix any LUNs up.

10.Use svctask detectmdisk to discover the LUNs as MDisks. We recommend that you also use svctask chmdisk to rename the LUNs to something more meaningful.

11.Define a VDisk from each LUN and note its exact size (to the number of bytes) by using the svcinfo lsvdisk command.

12.Define a FlashCopy mapping and start the FlashCopy mapping for each VDisk by using the steps in 8.1.2, “Steps to making a FlashCopy VDisk with application data integrity” on page 153.

13.Assign the target VDisks to the hosts and then restart your hosts. Your host sees the original data with the exception that the storage is now an IBM SVC LUN.

With these steps, you have made a copy of the existing storage, and the SVC has not been configured to write to the original storage. Thus, if you encounter any problems with these steps, you can reverse everything that you have done, assign the old storage back to the host, and continue without the SVC.

By using FlashCopy in this example, any incoming writes go to the new storage subsystem and any read requests that have not been copied to the new subsystem automatically come from the old subsystem (the FlashCopy source).

You can alter the FlashCopy copy rate, as appropriate, to ensure that all the data is copied to the new controller.

After the FlashCopy completes, you can delete the FlashCopy mappings and the source VDisks. After all the LUNs have been migrated across to the new storage controller, you can remove the old storage controller from the SVC node zones and then, optionally, remove the old storage controller from the SAN fabric.

You can also use this process if you want to migrate to a new storage controller and not keep the SVC after the migration. At step 2 on page 160, make sure that you create LUNs that are the same size as the original LUNs. Then, at step 11, use image mode VDisks. When the FlashCopy mappings complete, you can shut down the hosts and map the storage directly to them, remove the SVC, and continue on the new storage controller.

8.1.9 Summary of FlashCopy rules

To summarize the FlashCopy rules:

� FlashCopy services can only be provided inside an SVC cluster. If you want to FlashCopy to remote storage, the remote storage needs to be defined locally to the SVC cluster.

� To maintain data integrity, ensure that all application I/Os and host I/Os are flushed from any application and operating system buffers.

� You might need to stop your application in order for it to be “restarted” with a copy of the VDisk that you make. Check with your application vendor if you have any doubts.

� Be careful if you want to map the target flash-copied VDisk to the same host that already has the source VDisk mapped to it. Check that your operating system supports this configuration.


� The target VDisk must be the same size as the source VDisk; however, the target VDisk can be a different type (image, striped, or sequential mode) or have different cache settings (cache-enabled or cache-disabled).

� If you stop a FlashCopy mapping or a Consistency Group before it has completed, you will lose access to the target VDisks. If the target VDisks are mapped to hosts, they will have I/O errors.

� A VDisk cannot be a source in one FlashCopy mapping and a target in another FlashCopy mapping.

� A VDisk can be the source for up to 16 targets.

� A FlashCopy target cannot be used in a Metro Mirror or Global Mirror relationship.

8.2 Metro Mirror and Global Mirror

In the following topics, we discuss Metro Mirror and Global Mirror guidelines and best practices.

8.2.1 Using both Metro Mirror and Global Mirror between two clusters

A Remote Copy (RC) Mirror relationship is a relationship between two individual VDisks of the same size. The management of the RC Mirror relationships is always performed in the cluster where the source VDisk exists.

However, you must consider the performance implications of this configuration, because write data from all mirroring relationships will be transported over the same inter-cluster links.

Metro Mirror and Global Mirror respond differently to a heavily loaded, poorly performing link.

Metro Mirror will usually maintain the relationships in a consistent synchronized state, meaning that primary host applications will start to see poor performance (as a result of the synchronous mirroring being used).

Global Mirror, however, offers a higher level of write performance to primary host applications. With a well-performing link, writes are completed asynchronously. If link performance becomes unacceptable, the link tolerance feature automatically stops Global Mirror relationships to ensure that the performance for application hosts remains within reasonable limits.

Therefore, with active Metro Mirror and Global Mirror relationships between the same two clusters, Global Mirror writes might suffer degraded performance if Metro Mirror relationships consume most of the inter-cluster link’s capability. If this degradation reaches a level where hosts writing to Global Mirror experience extended response times, the Global Mirror relationships can be stopped when the link tolerance threshold is exceeded. If this situation happens, refer to 8.2.9, “Diagnosing and fixing 1920 errors” on page 170.

8.2.2 Performing three-way copy service functions

If you have a requirement to perform three-way (or more) replication using copy service functions (synchronous or asynchronous mirroring), you can address this requirement by using a combination of SVC copy services with image mode cache-disabled VDisks and storage controller copy services. Both relationships are active, as shown in Figure 8-3 on page 163.


Figure 8-3 Using three-way copy services

In Figure 8-3, the Primary Site uses SVC copy services (Global Mirror or Metro Mirror) to the secondary site. Thus, in the event of a disaster at the primary site, the storage administrator enables access to the target VDisk (from the secondary site), and the business application continues processing.

While the business continues processing at the secondary site, the storage controller copy services replicate to the third site.

8.2.3 Using native controller Advanced Copy Services functions

Native copy services are not supported on all storage controllers. There is a summary of the known limitations at the following Web site:

http://www-1.ibm.com/support/docview.wss?&uid=ssg1S1002852

The storage controller is unaware of the SVCWhen you use the copy services function in a storage controller, remember that the storage controller has no knowledge that the SVC exists and that the storage controller uses those disks on behalf of the real hosts. Therefore, when allocating source volumes and target volumes in a point-in-time copy relationship or a remote mirror relationship, make sure you choose them in the right order. If you accidently use a source logical unit number (LUN) with SVC data on it as a target LUN, you can accidentally destroy that data.

If that LUN was a Managed Disk (MDisk) in an MDisk group (MDG) with striped or sequential VDisks on it, the accident might cascade up and bring the MDG offline. This situation, in turn, makes all the VDisks that belong to that group offline.

Important: The SVC only supports copy services between two clusters.


http://www-1.ibm.com/support/docview.wss?&uid=ssg1S1002852

When defining LUNs in point-in-time copy or a remote mirror relationship, double-check that the SVC does not have visibility to the LUN (mask it so that no SVC node can see it), or if the SVC must see the LUN, ensure that it is an unmanaged MDisk.

The storage controller might, as part of its Advanced Copy Services function, take a LUN offline or suspend reads or writes. The SVC does not understand why this happens; therefore, the SVC might log errors when these events occur.

If you mask target LUNs to the SVC and rename your MDisks as you discover them and if the Advanced Copy Services function prohibits access to the LUN as part of its processing, the MDisk might be discarded and rediscovered with an SVC-assigned MDisk name.

Cache-disabled image mode VDisksWhen the SVC uses a LUN from a storage controller that is a source or target of Advanced Copy Services functions, you can only use that LUN as a cache-disabled image mode VDisk.

If you use the LUN for any other type of SVC VDisk, you risk data loss. Not only of the data on that LUN, but you can potentially bring down all VDisks in the MDG to which you assigned that LUN (MDisk).

If you leave caching enabled on a VDisk, the underlying controller does not get any write I/Os as the host writes them; the SVC caches them and destages them at a later time, which can have additional ramifications if a target host is dependent on the write I/Os from the source host as they are written.

When to use storage controller Advanced Copy Services functionsThe SVC provides you with greater flexibility than only using native copy service functions, namely:

� Standard storage device driver. Regardless of the storage controller behind the SVC, you can use the IBM Subsystem Device Driver (SDD) to access the storage. As your environment changes and your storage controllers change, using SDD negates the need to update device driver software as those changes occur.

� The SVC can provide copy service functions between any supported controller to any other supported controller, even if the controllers are from different vendors. This capability enables you to use a lower class or cost of storage as a target for point-in-time copies or remote mirror copies.

� The SVC enables you to move data around without host application interruption, which can be useful, especially when the storage infrastructure is retired when new technology becomes available.

However, certain storage controllers can provide additional copy service features and functions compared to the capability of the current version of SVC. If you have a requirement to use those features, you can use those additional copy service features and leverage the features that the SVC provides by using cache-disabled image mode VDisks.

8.2.4 Configuration requirements for long distance links

IBM has tested a number of Fibre Channel extender and SAN router technologies for use with the SVC.

The list of supported SAN routers and Fibre Channel extenders is available at this Web site:




If you use one of these extenders or routers, you need to test the link to ensure that the following requirements are met before you place SVC traffic onto the link:

� For SVC 4.1.0.x, the round-trip latency between sites must not exceed 68 ms (34 ms one- way) for Fibre Channel (FC) extenders or 20 ms (10 ms one-way) for SAN routers.

� For SVC 4.1.1.x and later, the round-trip latency between sites must not exceed 80 ms (40 ms one-way).

The latency of long distance links is dependent on the technology that is used. Typically, for each 100 km (62.1 miles) of distance, it is assumed that 1 ms is added to the latency, which for Global Mirror means that the remote cluster can be up to 4 000 km (2485 miles) away.

� When testing your link for latency, it is important that you take into consideration both current and future expected workloads, including any times when the workload might be unusually high. You must evaluate the peak workload by considering the average write workload over a period of one minute or less plus the required synchronization copy bandwidth.

� SVC uses part of the bandwidth for its internal SVC inter-cluster heartbeat. The amount of traffic depends on how many nodes are in each of the two clusters. Table 8-1 shows the amount of traffic, in megabits per second, generated by different sizes of clusters.

These numbers represent the total traffic between the two clusters when no I/O is taking place to mirrored VDisks. Half of the data is sent by one cluster, and half of the data is sent by the other cluster. The traffic will be divided evenly over all available inter-cluster links; therefore, if you have two redundant links, half of this traffic will be sent over each link during fault-free operation.

Table 8-1 SVC inter-cluster heartbeat traffic (megabits per second)

� If the link between the sites is configured with redundancy so that it can tolerate single failures, the link must be sized so that the bandwidth and latency statements continue to be accurate even during single failure conditions.

8.2.5 Saving bandwidth creating Metro Mirror and Global Mirror relationships

If you have a situation where you have a large source VDisk (or a large number of source VDisks) that you want to replicate to a remote site and your planning shows that the SVC mirror initial sync time will take too long (or will be too costly if you pay for the traffic that you use), here is a method of setting up the sync using another medium (that might be less expensive).

Another reason that you might want to use these steps is if you want to increase the size of the VDisks currently in a Metro Mirror relationship or a Global Mirror relationship. To increase the size of these VDisks, you must delete the current mirror relationships and redefine the mirror relationships after you have resized the VDisks.

Local/remote cluster

Two nodes Four nodes Six nodes Eight nodes

Two nodes 2.6 4.0 5.4 6.7

Four nodes 4.0 5.5 7.1 8.6

Six nodes 5.4 7.1 8.8 10.5

Eight nodes 6.7 8.6 10.5 12.4


In this example, we use tape media as the source for the initial sync for the Metro Mirror relationship or the Global Mirror relationship target before using SVC to maintain the Metro Mirror or Global Mirror. This example does not require downtime for the hosts using the source VDisks.

Here are the steps:

1. The hosts are up and running and using their VDisks normally. There is no Metro Mirror relationship or Global Mirror relationship defined yet.

You have identified all the VDisks that will become the source VDisks in a Metro Mirror relationship or a Global Mirror relationship.

2. You have already established the SVC cluster relationship with the target SVC.

3. Define a Metro Mirror relationship or a Global Mirror relationship for each source VDisk. When defining the relationship, ensure that you use the -sync option, which stops the SVC from performing an initial sync.

4. Stop each mirror relationship by using the -access option, which enables write access to the target VDisks. We will need this write access later.

5. Make a copy of the source VDisk to the alternate media by using the dd command to copy the contents of the VDisk to tape. Another option might be using your backup tool (for example, IBM Tivoli® Storage Manager) to make an image backup of the VDisk.

6. Ship your media to the remote site and apply the contents to the targets of the Metro/Global Mirror relationship; you can mount the Metro Mirror and Global Mirror target VDisks to a UNIX server and use the dd command to copy the contents of the tape to the target VDisk. If you used your backup tool to make an image of the VDisk, follow the instructions for your tool to restore the image to the target VDisk. Do not forget to remove the mount, if this is a temporary host.

7. Unmount the target VDisks from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the VDisk while the mirror relationship is running.

8. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target VDisk is not usable at all. As soon as it reaches Consistent Copying, your remote VDisk is ready for use in a disaster.

Note: If you fail to use the -sync option, all of these steps are redundant, because the SVC performs a full initial sync anyway.

Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have some of the changes and is likely to have missed some of the changes as well.

When the relationship is restarted, the SVC will apply all of the changes that occurred since the relationship was stopped in step 1 in 8.2.10, “Using Metro Mirror or Global Mirror with FlashCopy” on page 172. After all the changes are applied, you will have a consistent target image.

Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker SVC is running and maintaining the Metro Mirror and Global Mirror.


8.2.6 Global Mirror guidelines

When using SVC Global Mirror, all components in the SAN (switches, remote links, and storage controllers) must be capable of sustaining the workload generated by application hosts, as well as the Global Mirror background copy workload. If this is not true, Global Mirror might automatically stop your relationships to protect your application hosts from increased response times.

The Global Mirror partnership’s background copy rate must be set to a value appropriate to the link and secondary back-end storage.

Global Mirror is not supported for cache-disabled VDisks participating in a Global Mirror relationship.

We recommend that you use a SAN performance monitoring tool, such as IBM TotalStorage Productivity Center (TPC), which allows you to continuously monitor the SAN components for error conditions and performance problems.

TPC can alert you as soon as there is a performance problem or if a Global (or Metro Mirror) link has been automatically suspended by the SVC. A remote copy relationship that remains stopped without intervention can severely impact your recovery point objective. Additionally, restarting a link that has been suspended for a long period of time can add additional burden to your links while the synchronization catches up.

The gmlinktolerance parameterThe gmlinktolerance parameter of the remote copy partnership must be set to an appropriate value. The default value of 300 seconds (5 minutes) is appropriate for most clients.

If you plan to perform SAN maintenance that might impact SVC Global Mirror relationships, you must either:

� Pick a maintenance window where application I/O workload is reduced for the duration of the maintenance

� Disable the gmlinktolerance feature or increase the gmlinktolerance value (meaning that application hosts might see extended response times from Global Mirror VDisks)

� Stop the Global Mirror relationships

VDisk preferred nodeGlobal Mirror VDisks must have their preferred nodes evenly distributed between the nodes of the clusters.

The preferred node property of a VDisk helps to balance the I/O load between nodes in that I/O Group. This property is also used by Global Mirror to route I/O between clusters.

The SVC node that receives a write for a VDisk is normally that VDisk’s preferred node. For VDisks in a Global Mirror relationship, that node is also responsible for sending that write to the preferred node of the target VDisk. The primary preferred node is also responsible for sending any writes relating to background copy; again, these writes are sent to the preferred node of the target VDisk.

Note: The preferred node for a VDisk cannot be changed non-disruptively or easily after the VDisk is created.


Each node of the remote cluster has a fixed pool of Global Mirror system resources for each node of the primary cluster. That is, each remote node has a separate queue for I/O from each of the primary nodes. This queue is a fixed size and is the same size for every node.

If preferred nodes for the VDisks of the remote cluster are set so that every combination of primary node and secondary node is used, Global Mirror performance will be maximized.

Figure 8-4 shows an example of Global Mirror resources that are not optimized. VDisks from the Local Cluster are replicated to the Remote Cluster, where all VDisks with a preferred node of Node 1 are replicated to the Remote Cluster, where the target VDisks also have a preferred node of Node 1.

With this configuration, the Remote Cluster Node 1 resources reserved for Local Cluster Node 2 are not used. Nor are the resources for Local Cluster Node 1 used for Remote Cluster Node 2.

Figure 8-4 Global Mirror resources not optimized

If the configuration was changed to the configuration shown in Figure 8-5, all Global Mirror resources for each node are used, and SVC Global Mirror operates with better performance than that of the configuration shown in Figure 8-4.

Figure 8-5 Global Mirror resources optimized


Back-end storage controller requirementsThe capabilities of the storage controllers in a remote SVC cluster must be provisioned to allow for:

� The peak application workload to the Global Mirror or Metro Mirror VDisks

� The defined level of background copy

� Any other I/O being performed at the remote site

The performance of applications at the primary cluster can be limited by the performance of the back-end storage controllers at the remote cluster.

To maximize the number of I/Os that applications can perform to Global Mirror and Metro Mirror VDisks:

� Global Mirror and Metro Mirror VDisks at the remote cluster must be in dedicated MDisk Groups. The MDisk Groups must not contain non-mirror VDisks.

� Storage controllers must be configured to support the mirror workload that is required of them, which might be achieved by:

– Dedicating storage controllers to only Global Mirror and Metro Mirror VDisks

– Configuring the controller to guarantee sufficient quality of service for the disks used by Global Mirror and Metro Mirror

– Ensuring that physical disks are not shared between Global Mirror or Metro Mirror VDisks and other I/O

– Verifying that MDisks within a mirror MDisk group must be similar in their characteristics (for example, Redundant Array of Independent Disks (RAID) level, physical disk count, and disk speed)

8.2.7 Migrating a Metro Mirror relationship to Global Mirror

It is possible to change a Metro Mirror relationship to a Global Mirror relationship or a Global Mirror relationship to a Metro Mirror relationship. This procedure, however, requires an outage to the host and is only successful if you can guarantee that no I/Os are generated to either the source or target VDisks through these steps:

1. Your host is currently running with VDisks that are in a Metro Mirror or Global Mirror relationship. This relationship is in the state Consistent-Synchronized.

2. Stop the application and the host.

3. Optionally, unmap the VDisks from the host to guarantee that no I/O can be performed on these VDisks. If there are currently outstanding write I/Os in the cache, you might need to wait at least two minutes before you can unmap the VDisks.

4. Stop the Metro Mirror or Global Mirror relationship, and ensure that the relationship stops with Consistent Stopped.

5. Delete the current Metro Mirror or Global Mirror relationship.

6. Create the new Metro Mirror or Global Mirror relationship. Ensure that you create it as synchronized to stop the SVC from resynchronizing the VDisks. Use the -sync flag with the svctask mkrcrelationship command.

7. Start the new Metro Mirror or Global Mirror relationship.

8. Remap the source VDisks to the host if you unmapped them in step 3.

9. Start the host and the application.


8.2.8 Recovering from suspended Metro Mirror or Global Mirror relationships

It is important to understand that when a Metro Mirror or Global Mirror relationship is started for the first time, or started after it has been stopped or suspended for any reason, that while the synchronization is “catching up,” the target disk is not in a consistent state until the synchronization completes.

If you attempt to use the target VDisk at any time that a synchronization has started and before it gets to the synchronized state (by stopping the mirror relationship and making the target writable), the VDisk will contain only parts of the source VDisk and must not be used.

This inconsistency is particularly important if you have a Global/Metro Mirror relationship running (that is synchronized) and the link fails (thus, the mirror relationship suspends). When you restart the mirror relationship, the target disk will not be usable until the mirror catches up and becomes synchronized again.

Depending on the number of changes that needs to be applied to the target and on your bandwidth, this situation will leave you exposed without a usable target VDisk at all until the synchronization completes.

To avoid this exposure, we recommend that you make a FlashCopy of the target VDisks before you restart the mirror relationship. At least this way, you will have a usable target VDisk even if it does contain old data.

8.2.9 Diagnosing and fixing 1920 errors

The SVC generates a 1920 error message whenever a Metro Mirror or Global Mirror relationship has stopped due to poor performance. A 1920 error does not occur during normal operation as long as you use a supported configuration and your SAN fabric links have been sized to suit your workload.

This 1920 error can be temporary a temporary error, for example, as a result of maintenance, or a permanent error due to hardware failure or unexpectedly higher host I/O workload.

If several 1920 errors have occurred, you must diagnose the cause of the earliest error first.

In order to diagnose the cause of the first error, it is extremely important that TPC, or your chosen SAN performance analysis tool, is correctly configured and monitoring statistics when the problem occurs. If you use TPC, set TPC to collect available statistics using the lowest collection interval period, which is currently five minutes.

These situations are the most likely reasons for a 1920 error:

� Maintenance caused a change, such as switch or storage controller changes, for example, updating firmware or adding additional capacity

Extremely important: If the relationship is not stopped in the consistent state, or if any host I/O takes place between stopping the old Metro Mirror or Global Mirror relationship and starting the new Metro Mirror or Global Mirror relationship, those changes will never be mirrored to the target VDisks. As a result, the data on the source and target VDisks is not exactly the same, and the SVC will be unaware of the inconsistency.


� The remote link is overloaded. Using TPC, you can check the following metrics to see if the remote link was a cause:

– Look at the total Global Mirror auxiliary VDisk write throughput before the Global Mirror relationships were stopped.

If this write throughput is approximately equal to your link bandwidth, it is extremely likely that your link is overloaded, which might be due to application host I/O or a combination of host I/O and background (synchronization) copy activity.

– Look at the total Global Mirror source VDisk write throughput before the Global Mirror relationships were stopped.

This write throughput represents only the I/O performed by the application hosts. If this number approaches the link bandwidth, you might need to either upgrade the link’s bandwidth, reduce the I/O that the application is attempting to perform, or choose to mirror fewer VDisks using Global Mirror.

If, however, the auxiliary disks show much more write I/O than the source VDisks, this situation suggests a high level of background copy. Try decreasing the Global Mirror partnership’s background copy rate parameter to bring the total application I/O bandwidth and background copy rate within the link’s capabilities.

– Look at the total Global Mirror source VDisk write throughput after the Global Mirror relationships were stopped.

If write throughput increases greatly (by 30% or more) when the relationships were stopped, this situation indicates that the application host was attempting to perform more I/O than the link can sustain. While the Global Mirror relationships are active, the overloaded link causes higher response times to the application host, which decreases the throughput that it can achieve. After the relationships have stopped, the application host sees lower response times, and you can see the true I/O workload. In this case, the link bandwidth must be increased, the application host I/O rate must be decreased, or fewer VDisks must be mirrored using Global Mirror.

� The storage controllers at the remote cluster are overloaded. Any of the MDisks on a storage controller that are providing poor service to the SVC cluster can cause a 1920 error if this poor service prevents application I/O from proceeding at the rate required by the application host.

If you have followed the specified back-end storage controller requirements, it is most likely that the error has been caused by a decrease in controller performance due to maintenance actions or a hardware failure of the controller.

Use TPC to obtain the back-end write response time for each MDisk at the remote cluster. Response time for any individual MDisk, which exhibits a sudden increase of 50 ms or more or that is higher than 100 ms, indicates a problem:

– Check the storage controller for error conditions, such as media errors, a failed physical disk, or associated activity, such as RAID array rebuilding.

If there is an error, fix the problem and restart the Global Mirror relationships.

If there is no error, consider whether the secondary controller is capable of processing the required level of application host I/O. It might be possible to improve the performance of the controller by:

• Adding more physical disks to a RAID array

• Changing the RAID level of the array

• Changing the controller’s cache settings (and checking that the cache batteries are healthy, if applicable)

• Changing other controller-specific configuration parameters


� The storage controllers at the primary site are overloaded. Analyze the performance of the primary back-end storage using the same steps you use for the remote back-end storage.

The main effect of bad performance is to limit the amount of I/O that can be performed by application hosts. Therefore, back-end storage at the primary site must be monitored regardless of Global Mirror.

However, if bad performance continues for a prolonged period, it is possible that a 1920 error will occur and the Global Mirror relationships will stop.

� One of the SVC clusters is overloaded. Use TPC to obtain the port to local node send response time and port to local node send queue time.

If the total of these statistics for either cluster is higher than 1 millisecond, the SVC might be experiencing an extremely high I/O load.

Also, check the SVC node CPU utilization; if this figure is in excess of 50%, this situation might also contribute to the problem.

In either case, contact your IBM service support representative (IBM SSR) for further assistance.

� FlashCopy mappings are in the prepared state. If the Global Mirror target VDisks are the sources of a FlashCopy mapping, and that mapping is in the prepared state for an extended time, performance to those VDisks can be impacted, because the cache is disabled. Starting the flash copy mapping will re-enable the cache, improving the VDisks’ performance for Global Mirror I/O.

8.2.10 Using Metro Mirror or Global Mirror with FlashCopy

SVC allows you to use a VDisk in a Metro Mirror or Global Mirror relationship as a source VDisk for a FlashCopy mapping. You cannot use a VDisk as a FlashCopy mapping target that is already in a Metro Mirror or Global Mirror relationship.

When you prepare a FlashCopy mapping, the SVC puts the source VDisks into a temporary cache-disabled state. This temporary state adds additional latency to the Metro Mirror relationship, because I/Os that are normally committed to SVC memory now need to be committed to the storage controller.

One method of avoiding this latency is to temporarily stop the Metro Mirror or Global Mirror relationship before preparing the FlashCopy mapping. When the Metro Mirror or Global Mirror relationship is stopped, the SVC records all changes that occur to the source VDisks and applies those changes to the target when the remote copy mirror is restarted. The steps to temporarily stop the Metro Mirror or Global Mirror relationship before preparing the FlashCopy mapping are:

1. Stop each mirror relationship by using the -access option, which enables write access to the target VDisks. We will need this access later.

2. Make a copy of the source VDisk to the alternate media by using the dd command to copy the contents of the VDisk to tape. Another option might be using your backup tool (for example, IBM Tivoli Storage Manager) to make an image backup of the VDisk.


3. Ship your media to the remote site and apply the contents to the targets of the Metro/Global mirror relationship; you can mount the Metro Mirror and Global Mirror target VDisks to a UNIX server and use the dd command to copy the contents of the tape to the target VDisk. If you used your backup tool to make an image of the VDisk, follow the instructions for your tool to restore the image to the target VDisk. Do not forget to remove the mount if this is a temporary host.

4. Unmount the target VDisks from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the VDisk while the mirror relationship is running.

5. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target VDisk is not usable at all. As soon as it reaches Consistent Copying, your remote VDisk is ready for use in a disaster.

8.2.11 Using TPC to monitor Global Mirror performance

It is important to use a SAN performance monitoring tool to ensure that all SAN components perform correctly. While a SAN performance monitoring tool is useful in any SAN environment, it is particularly important when using an asynchronous mirroring solution, such as SVC Global Mirror. Performance statistics must be gathered at the highest possible frequency, which is currently five minutes for TPC.

Note that if your VDisk or MDisk configuration is changed, you must restart your TPC performance report to ensure that performance is correctly monitored for the new configuration.

If using TPC, monitor:

� Global Mirror Secondary Write Lag

You monitor Global Mirror Secondary Write Lag to identify mirror delays (tpcpool metric 942).

� Port to Remote Node Send Response

Time needs to be less than 80 ms (the maximum latency supported by SVC Global Mirror). A number in excess of 80 ms suggests that the long-distance link has excessive latency, which needs to be rectified. One possibility to investigate is that the link is operating at maximum bandwidth (tpcpool metrics 931 and 934).

� Sum of Port to Local Node Send Response Time and Port to Local Node Send Queue

Time must be less than 1 ms for the primary cluster. A number in excess of 1 ms might indicate that an I/O Group is reaching its I/O throughput limit, which can limit performance.

Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have part of the changes and is likely to have missed part of the changes as well.

When the relationship is restarted, the SVC will apply all changes that have occurred since the relationship was stopped in step 1. After all the changes are applied, you will have a consistent target image.

Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker the SVC is running and maintaining the Metro Mirror and Global Mirror.


� CPU Utilization Percentage

CPU Utilization must be below 50%.

� Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at the remote cluster

Time needs to be less than 100 ms. A longer response time can indicate that the storage controller is overloaded. If the response time for a specific storage controller is outside of its specified operating range, investigate for the same reason.

� Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at the primary cluster

Time must also be less than 100 ms. If response time is greater than 100 ms, application hosts might see extended response times if the SVC’s cache becomes full.

� Write Data Rate for Global Mirror MDisk groups at the remote cluster

This data rate indicates the amount of data that is being written by Global Mirror. If this number approaches either the inter-cluster link bandwidth or the storage controller throughput limit, be aware that further increases can cause overloading of the system and monitor this number appropriately.

8.2.12 Summary of Metro Mirror and Global Mirror rules

To summarize the Metro Mirror and Global Mirror rules:

� FlashCopy targets cannot be in a Metro Mirror or Global Mirror relationship, only FlashCopy sources can be in a Metro Mirror or Global Mirror relationship.

� Metro Mirror or Global Mirror source or target VDisks cannot be moved to different I/O Groups.

� Metro Mirror or Global Mirror VDisks cannot be resized.

� Intra-cluster Metro Mirror or Global Mirror can only mirror between VDisks in the same I/O Group.

� The target VDisks must be the same size as the source VDisks; however, the target VDisk can be a different type (image, striped, or sequential mode) or have different cache settings (cache-enabled or cache-disabled).


Chapter 9. Hosts

This chapter describes best practices for monitoring host systems attached to the SAN Volume Controller (SVC).

A host system is an Open Systems computer that is connected to the switch through a Fibre Channel (FC) interface.

The most important part of tuning, troubleshooting, and performance considerations for a host attached to an SVC will be in the host. There are three major areas of concern:

� Using multipathing and bandwidth (physical capability of SAN and back-end storage)

� Understanding how your host performs I/O and the types of I/O

� Utilizing measurement and test tools to determine host performance and for tuning

This topic supplements the IBM System Storage SAN Volume Controller Host Attachment User’s Guide Version 4.3.0, SC26-7905-02, at:

http://www-1.ibm.com/support/docview.wss?rs=591&context=STCCCXR&context=STCCCYH&dc=DA400&q1=english&q2=-Japanese&uid=ssg1S7002159&loc=en_US&cs=utf-8&lang=en

9






9.1 Configuration recommendations

There are basic configuration recommendations when using the SVC to manage storage that is connected to any host. The considerations include how many paths through the fabric are allocated to the host, how many host ports to use, how to spread the hosts across I/O Groups, logical unit number (LUN) mapping, and the correct size of virtual disks (VDisks) to use.

9.1.1 The number of paths

From general experience, we have determined that it is best to limit the total number of paths from any host to the SVC. We recommend that you limit the total number of paths that the multipathing software on each host is managing to four paths, even though the maximum supported is eight paths. Following these rules solves many issues with high port fanouts, fabric state changes, and host memory management, and improves performance.

Refer to the following Web site for the latest maximum configuration requirements:

http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156

The major reason to limit the number of paths available to a host from the SVC is for error recovery, failover, and failback purposes. The overall time for handling errors by a host is significantly reduced. Additionally, resources within the host are greatly reduced each time that you remove a path from the multipathing management. Two path configurations have just one path to each node, which is a supported configuration but not recommended for most configurations. However, refer to the host attachment guide for specific host and OS requirements:


We have measured the effect of multipathing on performance as shown in the following tables. As the charts show, the differences in performance are generally minimal, but the differences can reduce performance by almost 10% for specific workloads. These numbers were produced with an AIX host running IBM Subsystem Device Driver (SDD) against the SVC. The host was tuned specifically for performance by adjusting queue depths and buffers.

We tested a range of reads and writes, random and sequential, cache hits and misses, at 512 byte, 4 KB, and 64 KB transfer sizes.

Table 9-1 on page 177 shows the effects of multipathing.



http://www-1.ibm.com/support/docview.wss?rs=591&context=STCWGAV&context=STC7HAC&context=STCWGBP&dc=DA400&q1=english&uid=ssg1S7001712&loc=en_US&cs=utf-8&lang=en




Table 9-1 Effect of multipathing on write performance

9.1.2 Host ports

The general recommendation for utilizing host ports connected to the SVC is to limit the number of physical ports to two ports on two different physical adapters. Each of these ports will be zoned to one target port in each SVC node, thus limiting the number of total paths to four, preferably on totally separate redundant SAN fabrics.

If four host ports are preferred for maximum redundant paths, the requirement is to zone each host adapter to one SVC target port on each node (for a maximum of eight paths). The benefits of path redundancy are outweighed by the host memory resource utilization required for more paths.

Use one host object to represent a cluster of hosts and use multiple worldwide port names (WWPNs) to represent the ports from all the hosts that will share the same set of VDisks.

9.1.3 Port masking

You can use a port mask to control the node target ports that a host can access. The port mask applies to logins from the host port that are associated with the host object. You can use this capability to simplify the switch zoning by limiting the SVC ports within the SVC configuration, rather than utilizing direct one-to-one zoning within the switch. This capability can simplify zone management.

The port mask is a four-bit field that applies to all nodes in the cluster for the particular host. For example, a port mask of 0001 allows a host to log in to a single port on every SVC node in the cluster, if the switch zone also includes both host and SVC node ports.

R/W test Four paths Eight paths Difference

Write Hit 512 b Sequential IOPS

81 877 74 909 -8.6%

Write Miss 512 b Random IOPS

60 510.4 57 567.1 -5.0%

70/30 R/W Miss 4K Random IOPS

130 445.3 124 547.9 -5.6%

70/30 R/W Miss 64K Random MBps

1 810.8138 1 834.2696 1.3%

50/50 R/W Miss 4K Random IOPS

97 822.6 98 427.8 0.6%

50/50 R/W Miss 64K Random MBps

1 674.5727 1 678.1815 0.2%

Best practice: Though it is supported in theory, we strongly recommend that you keep Fibre Channel tape and Fibre Channel disks on separate host bus adapters (HBAs). These devices have two extremely different data patterns when operating in their optimum mode, and the switching between them can cause undesired overhead and performance slowdown for the applications.

Chapter 9. Hosts 177

9.1.4 Host to I/O Group mapping

An I/O Grouping consists of two SVC nodes that share management of VDisks within a cluster. The recommendation is to utilize a single I/O Group (iogrp) for all VDisks allocated to a particular host. This recommendation has many benefits. One major benefit is the minimization of port fanouts within the SAN fabric. Another benefit is to maximize the potential host attachments to the SVC, because maximums are based on I/O Groups. A third benefit is within the host itself, having fewer target ports to manage.

The number of host ports and host objects allowed per I/O Group depends upon the switch fabric type. Refer to the maximum configurations document for these maximums:


Occasionally, an extremely powerful host can benefit from spreading its VDisks across I/O Groups for load balancing. Our recommendation is to start with a single I/O Group and use the performance monitoring tools, such as TotalStorage Productivity Center (TPC), to determine if the host is I/O Group-limited. If additional I/O Groups are needed for the bandwidth, it is possible to use more host ports to allocate to the other I/O Group. For example, start with two HBAs zoned to one I/O Group. To add bandwidth, add two more HBAs and zone to the other I/O Group. The host object in the SVC will contain both sets of HBAs. The load can be balanced by selecting which host volumes are allocated to each VDisk. Because VDisks are allocated to only a single I/O Group, the load will then be spread across both I/O Groups based on the VDisk allocation spread.

9.1.5 VDisk size as opposed to quantity

In general, host resources, such as memory and processing time, are used up by each storage LUN that is mapped to the host. For each extra path, additional memory can be used, and a portion of additional processing time is also required. The user can control this effect by using fewer larger LUNs rather than lots of small LUNs; however, it might require tuning of queue depths and I/O buffers to support this efficiently. If a host does not have tunable parameters, such as Windows, the host does not benefit from large VDisk sizes. AIX greatly benefits from larger VDisks with a smaller number of VDisks and paths presented to it.

9.1.6 Host VDisk mapping

When you create a VDisk-to-host mapping, the host ports that are associated with the host object can see the LUN that represents the VDisk on up to eight Fibre Channel ports (the four ports on each node in a I/O Group). Nodes always present the logical unit (LU) that represents a specific VDisk with the same LUN on all ports in an I/O Group.

This LUN mapping is called the Small Computer System Interface ID (scsi id), and the SVC software will automatically assign the next available ID if none is specified. There is also a unique identifier on each VDisk called the LUN serial number.

The best practice recommendation is to allocate SAN boot OS VDisk as the lowest SCSI ID (zero for most hosts) and then allocate the various data disks. While not required, if you share a VDisk among multiple hosts, control the SCSI ID so the IDs are identical across the hosts. This consistency will ensure ease of management at the host level.

If you are using image mode to migrate a host into the SVC, allocate the VDisks in the same order that they were originally assigned on the host from the back-end storage.


An invocation example:

svcinfo lshostvdiskmap -delim

The resulting output:

id:name:SCSI_id:vdisk_id:vdisk_name:wwpn:vdisk_UID 2:host2:0:10:vdisk10:0000000000000ACA:6005076801958001500000000000000A 2:host2:1:11:vdisk11:0000000000000ACA:6005076801958001500000000000000B 2:host2:2:12:vdisk12:0000000000000ACA:6005076801958001500000000000000C 2:host2:3:13:vdisk13:0000000000000ACA:6005076801958001500000000000000D 2:host2:4:14:vdisk14:0000000000000ACA:6005076801958001500000000000000E

For example, VDisk 10, in this example, has a unique device identifier (UID) of 6005076801958001500000000000000A, while the SCSI_ id that host2 used for access is 0.

svcinfo lsvdiskhostmap -delim : EEXCLS_HBin01 id:name:SCSI_id:host_id:host_name:wwpn:vdisk_UID 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938CFDF:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938D01F:600507680191011D4800000000000466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D65B:600507680191011D4800000000000466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D3D3:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D615:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D612:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CFBD:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CE29:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EE1D8:600507680191011D4800000000000466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EDFFE:600507680191011D4800000000000466

If using IBM multipathing software (IBM Subsystem Device Driver (SDD) or SDDDSM), the command datapath query device shows the vdisk_UID (unique identifier) and so enables easier management of VDisks. The SDDPCM equivalent command is pcmpath query device.

Host-VDisk mapping from more than one I/O GroupThe SCSI ID field in the host-VDiskmap might not be unique for a VDisk for a host, because it does not completely define the uniqueness of the LUN. The target port is also used as part of the identification. If there are two I/O Groups of VDisks assigned to a host port, one set will start with SCSI ID 0 and then increment (given the default), and the SCSI ID for the second I/O Group will also start at zero and then increment by default. Refer to Example 9-1 on page 180 for a sample of this type of host map. VDisk s-0-6-4 and VDisk s-1-8-2 both have a SCSI ID of ONE, yet they have different LUN serial numbers.


Example 9-1 Host-VDisk mapping for one host from two I/O Groups

IBM_2145:ITSOCL1:admin>svcinfo lshostvdiskmap senegalid name SCSI_id vdisk_id vdisk_name wwpn vdisk_UID0 senegal 1 60 s-0-6-4 210000E08B89CCC2 60050768018101BF28000000000000A80 senegal 2 58 s-0-6-5 210000E08B89CCC2 60050768018101BF28000000000000A90 senegal 3 57 s-0-5-1 210000E08B89CCC2 60050768018101BF28000000000000AA0 senegal 4 56 s-0-5-2 210000E08B89CCC2 60050768018101BF28000000000000AB0 senegal 5 61 s-0-6-3 210000E08B89CCC2 60050768018101BF28000000000000A70 senegal 6 36 big-0-1 210000E08B89CCC2 60050768018101BF28000000000000B90 senegal 7 34 big-0-2 210000E08B89CCC2 60050768018101BF28000000000000BA0 senegal 1 40 s-1-8-2 210000E08B89CCC2 60050768018101BF28000000000000B50 senegal 2 50 s-1-4-3 210000E08B89CCC2 60050768018101BF28000000000000B10 senegal 3 49 s-1-4-4 210000E08B89CCC2 60050768018101BF28000000000000B20 senegal 4 42 s-1-4-5 210000E08B89CCC2 60050768018101BF28000000000000B30 senegal 5 41 s-1-8-1 210000E08B89CCC2 60050768018101BF28000000000000B4

Example 9-2 shows the datapath query device output of this Windows host. Note that the order of the two I/O Groups’ VDisks is reversed from the host-vdisk map. VDisk s-1-8-2 is first, followed by the rest of the LUNs from the second I/O Group, then VDisk s-0-6-4, and the rest of the LUNs from the first I/O Group. Most likely, Windows discovered the second set of LUNS first. However, the relative order within an I/O Group is maintained.

Example 9-2 Using datapath query device for the host VDisk map

C:\Program Files\IBM\Subsystem Device Driver>datapath query device

Total Devices : 12

DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000B5============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1342 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1444 0

DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000B1============================================================================Path# Adapter/Hard Disk State Mode Select Errors


0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1405 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1387 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0




DEV#: 5 DEVICE NAME: Disk6 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000A8============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 1400 0 1 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 1390 0 3 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 0 0

DEV#: 6 DEVICE NAME: Disk7 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000A9============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 1379 0 1 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk7 Part0 OPEN NORMAL 1412 0 3 Scsi Port3 Bus0/Disk7 Part0 OPEN NORMAL 0 0

DEV#: 7 DEVICE NAME: Disk8 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000AA============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 0 0


1 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 1417 0 2 Scsi Port3 Bus0/Disk8 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk8 Part0 OPEN NORMAL 1381 0

DEV#: 8 DEVICE NAME: Disk9 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000AB============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 1388 0 2 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 1413 0

DEV#: 9 DEVICE NAME: Disk10 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000A7=============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 1293 0 1 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk10 Part0 OPEN NORMAL 1477 0 3 Scsi Port3 Bus0/Disk10 Part0 OPEN NORMAL 0 0

DEV#: 10 DEVICE NAME: Disk11 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000B9=============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 59981 0 2 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 60179 0

DEV#: 11 DEVICE NAME: Disk12 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000BA=============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 28324 0 1 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 27111 0 3 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 0 0

Sometimes, a host might discover everything correctly at initial configuration, but it does not keep up with the dynamic changes in the configuration. The scsi id is therefore extremely important. For more discussion about this topic, refer to 9.2.4, “Dynamic reconfiguration” on page 185.

9.1.7 Server adapter layout

If your host system has multiple internal I/O busses, place the two adapters used for SVC cluster access on two different I/O busses to maximize availability and performance.


9.1.8 Availability as opposed to error isolation

It is important to balance availability through the multiple paths through a SAN to the two SVC nodes as opposed to error isolation. Normally, people add more paths to a SAN to increase availability, which leads to the conclusion that you want all four ports in each node zoned to each port in the host. However, our experience has shown that it is better to limit the number of paths so that the software error recovery software within a switch or a host is able to manage the loss of paths quickly and efficiently. Therefore, it is beneficial to keep the span out from the host port through the SAN to an SVC port to one-to-one as much as possible. Limit each host port to a different set of SVC ports on each node, which keeps the errors within a host isolated to a single adapter if the errors are coming from a single SVC port or from one fabric, making isolation to a failing port or switch easier.

9.2 Host pathing

Each host mapping associates a VDisk with a host object and allows all HBA ports in thehost object to access the VDisk. You can map a VDisk to multiple host objects. When a mapping is created, multiple paths might exist across the SAN fabric from the hosts to the SVC nodes that are presenting the VDisk. Most operating systems present each path to a VDisk as a separate storage device. The SVC, therefore, requires that multipathing software is running on the host. The multipathing software manages the many paths that are available to the VDisk and presents a single storage device to the operating system.

9.2.1 Preferred path algorithm

I/O traffic for a particular VDisk is, at any one time, managed exclusively by the nodes in a single I/O Group. The distributed cache in the SAN Volume Controller is two-way. When a VDisk is created, a preferred node is chosen. This task is controllable at the time of creation. The owner node for a VDisk is the preferred node when both nodes are available.

When I/O is performed to a VDisk, the node that processes the I/O duplicates the data onto the partner node that is in the I/O Group. A write from the SVC node to the back-end managed disk (MDisk) is only destaged via the owner node (normally, the preferred node). Therefore, when a new write or read comes in on the non-owner node, it has to send some extra messages to the owner-node to check if it has the data in cache, or if it is in the middle of destaging that data. Therefore, performance will be enhanced by accessing the VDisk through the preferred node.

IBM multipathing software (SDD, SDDPCM, or SDDDSM) will check the preferred path setting during initial configuration for each VDisk and manage the path usage:

� Non-preferred paths: Failover only� Preferred path: Chosen multipath algorithm (default: load balance)

9.2.2 Path selection

There are many algorithms used by multipathing software to select the paths used for an individual I/O for each VDisk. For enhanced performance with most host types, the recommendation is to load balance the I/O between only preferred node paths under normal conditions. The load across the host adapters and the SAN paths will be balanced by alternating the preferred node choice for each VDisk. Care must be taken when allocating VDisks with the SVC Console GUI to ensure adequate dispersion of the preferred node


among the VDisks. If the preferred node is offline, all I/O will go through the non-preferred node in write-through mode.

Certain multipathing software does not utilize the preferred node information, so it might balance the I/O load for a host differently. Veritas DMP is one example.

Table 9-2 shows the effect with 16 devices and read misses of the preferred node contrasted with the non-preferred node on performance and shows the effect on throughput. The effect is significant.

Table 9-2 The 16 device random 4 Kb read miss response time (4.2 nodes, usecs)

Table 9-3 shows the change in throughput for the case of 16 devices and random 4 Kb read miss throughput using the preferred node as opposed to a non-preferred node shown in Table 9-2.

Table 9-3 The 16 device random 4 Kb read miss throughput (IOPS)

In Table 9-4, we show the effect of using the non-preferred paths compared to the preferred paths on read performance.

Table 9-4 Random (1 TB) 4 Kb read response time (4.1 nodes, usecs)

Table 9-5 shows the effect of using non-preferred nodes on write performance.

Table 9-5 Random (1 TB) 4 Kb write response time (4.2 nodes, usecs)

IBM SDD software, SDDDSM software, and SDDPCM software recognize the preferred nodes and utilize the preferred paths.

9.2.3 Path management

The SVC design is based on multiple path access from the host to both SVC nodes. Multipathing software is expected to retry down multiple paths upon detection of an error.

We recommend that you actively check the multipathing software display of paths available and currently in usage periodically and just before any SAN maintenance or software upgrades. IBM multipathing software (SDD, SDDPCM, and SDDDSM) makes this monitoring easy through the command datapath query device or pcmpath query device.

Preferred node (owner) Non-preferred node Delta

18 227 21 256 3 029


105 274.3 90 292.3 14 982

Preferred Node (Owner) Non-preferred Node Delta

5 074 5 147 73


5 346 5 433 87


Fast node resetThere was a major improvement in SVC 4.2 in software error recovery. Fast node reset restarts a node following a software failure before the host fails I/O to applications. This node reset time improved from several minutes for “standard” node reset in previous SVC versions to about thirty seconds for SVC 4.2.

Pre-SVC 4.2.0 node reset behaviorWhen an SVC node is reset, it will disappear from the fabric. So from a host perspective, a few seconds of non-response from the SVC node will be followed by receipt of a registered state change notification (RSCN) from the switch. Any query to the switch name server will find that the SVC ports for the node are no longer present. The SVC ports/node will be gone from the name server for around 60 seconds.

SVC 4.2.0 node reset behaviorWhen an SVC node is reset, the node ports will not disappear from the fabric. Instead, the node will keep the ports alive. So from a host perspective, SVC will simply stop responding to any SCSI traffic. Any query to the switch name server will find that the SVC ports for the node are still present, but any FC login attempts (for example, PLOGI) will be ignored. This state will persist for around 30-45 seconds.

This improvement is a major enhancement for host path management of potential double failures, such as a software failure of one node while the other node in the I/O Group is being serviced, and software failures during a code upgrade. This new feature will also enhance path management when host paths are misconfigured and include only a single SVC node.

9.2.4 Dynamic reconfiguration

Many users want to dynamically reconfigure the storage connected to their hosts. The SVC gives you this capability by virtualizing the storage behind the SVC so that a host will see only the SVC VDisks presented to it. The host can then add or remove storage dynamically and reallocate using VDisk-MDisk changes.

After you decide to virtualize your storage behind an SVC, an image mode migration is used to move the existing back-end storage behind the SVC. This process is simple, seamless, and requires the host to be gracefully shut down. Then the SAN must be rezoned for SVC to be the host, the back-end storage LUNs must be moved to the SVC as a host, and the SAN rezoned for the SVC as a back-end device for the host. The host will be brought back up with the appropriate multipathing software, and the LUNs are now managed as SVC image mode VDisks. These VDisks can then be migrated to new storage or moved to striped storage anytime in the future with no host impact whatsoever.

There are times, however, when users want to change the SVC VDisk presentation to the host. The process to change the SVC VDisk presentation to the host dynamically is error-prone and not recommended. However, it is possible to change the SVC VDisk presentation to the host by remembering several key issues.

Hosts do not dynamically reprobe storage unless prompted by an external change or by the users manually causing rediscovery. Most operating systems do not notice a change in a disk allocation automatically. There is saved information about the device database information, such as the Windows registry or the AIX Object Data Manager (ODM) database, that is utilized.


Add new VDisks or pathsNormally, adding new storage to a host and running the discovery methods (such as cfgmgr) are safe, because there is no old, leftover information that is required to be removed. Simply scan for new disks or run cfgmgr several times if necessary to see the new disks.

Removing VDisks and then later allocating new VDisks to the hostThe problem surfaces when a user removes a vdiskhostmap on the SVC during the process of removing a VDisk. After a VDisk is unmapped from the host, the device becomes unavailable and the SVC reports that there is no such disk on this port. Usage of datapath query device after the removal will show a closed, offline, invalid, or dead state as shown here:

Windows host:

DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018201BEE000000000000041============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSE OFFLINE 0 0 1 Scsi Port3 Bus0/Disk1 Part0 CLOSE OFFLINE 263 0

AIX host:

DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 DEAD OFFLINE 0 0 1 fscsi0/hdisk1655 DEAD OFFLINE 2 0 2 fscsi1/hdisk1658 INVALID NORMAL 0 0 3 fscsi1/hdisk1659 INVALID NORMAL 1 0

The next time that a new VDisk is allocated and mapped to that host, the SCSI ID will be reused if it is allowed to set to the default value, and the host can possibly confuse the new device with the old device definition that is still left over in the device database or system memory. It is possible to get two devices that use identical device definitions in the device database, such as in this example.

Note that both vpath189 and vpath190 have the same hdisk definitions while they actually contain different device serial numbers. The path fscsi0/hdisk1654 exists in both vpaths.

DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 CLOSE NORMAL 0 0 1 fscsi0/hdisk1655 CLOSE NORMAL 2 0 2 fscsi1/hdisk1658 CLOSE NORMAL 0 0 3 fscsi1/hdisk1659 CLOSE NORMAL 1 0DEV#: 190 DEVICE NAME: vpath190 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007F4============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 OPEN NORMAL 0 0 1 fscsi0/hdisk1655 OPEN NORMAL 6336260 0 2 fscsi1/hdisk1658 OPEN NORMAL 0 0 3 fscsi1/hdisk1659 OPEN NORMAL 6326954 0


The multipathing software (SDD) recognizes that there is a new device, because at configuration time, it issues an inquiry command and reads the mode pages. However, if the user did not remove the stale configuration data, the Object Data Manager (ODM) for the old hdisks and vpaths still remains and confuses the host, because the SCSI ID as opposed to the device serial number mapping has changed. You can avoid this situation if you remove the hdisk and vpath information from the device configuration database (rmdev -dl vpath189, rmdev -dl hdisk1654, and so forth) prior to mapping new devices to the host and running discovery.

Removing the stale configuration and rebooting the host is the recommended procedure for reconfiguring the VDisks mapped to a host.

Another process that might cause host confusion is expanding a VDisk. The SVC will tell a host through the scsi check condition “mode parameters changed,” but not all hosts are able to automatically discover the change and might confuse LUNs or continue to use the old size.

Review the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, SC23-6628, for more details and supported hosts:


9.2.5 VDisk migration between I/O Groups

Migrating VDisks between I/O Groups is another potential issue if the old definitions of the VDisks are not removed from the configuration. Migrating VDisks between I/O Groups is not a dynamic configuration change, because each node has its own worldwide node name (WWNN); therefore, the host will see the new nodes as a different SCSI target. This process causes major configuration changes. If the stale configuration data is still known by the host, the host might continue to attempt I/O to the old I/O node targets during multipathing selection.

Example 9-3 shows the Windows SDD host display prior to I/O Group migration.

Example 9-3 Windows SDD host display prior to I/O Group migration

C:\Program Files\IBM\Subsystem Device Driver>datapath query deviceDEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000A0============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1884768 0

DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF280000000000009F============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0

If you just quiesce the host I/O and then migrate the VDisks to the new I/O Group, you will get closed offline paths for the old I/O Group and open normal paths to the new I/O Group. However, these devices do not work correctly, and there is no way to remove the stale paths





without rebooting. Note the change in the pathing in Example 9-4 for device 0 SERIAL:S60050768018101BF28000000000000A0.

Example 9-4 Windows VDISK moved to new I/O Group dynamically showing the closed offline paths

C:\Program Files\IBM\Subsystem Device Driver>datapath query device

Total Devices : 12

DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF28000000000000A0============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 1 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 3 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 1884768 0 4 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 5 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 45 0 6 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 7 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 54 0

DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF280000000000009F============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0

To change the I/O Group, you must first flush the cache within the nodes in the current I/O Group to ensure that all data is written to disk. The SVC command line interface (CLI) guide recommends that you suspend I/O operations at the host level.

The recommended way to quiesce the I/O is to take the volume groups offline, remove the saved configuration (AIX ODM) entries, such as hdisks and vpaths for those that are planned for removal, and then gracefully shut down the hosts. Migrate the VDisk to the new I/O Group and power up the host, which will discover the new I/O Group. If the stale configuration data was not removed prior to the shutdown, remove it from the stored host device databases (such as ODM if it is an AIX host) at this point. For Windows hosts, the stale registry information is normally ignored after reboot. Doing VDisk migrations in this way will prevent the problem of stale configuration issues.

9.3 I/O queues

Host operating system and host bus adapter software must have a way to fairly prioritize I/O to the storage. The host bus might run significantly faster than the I/O bus or external storage; therefore, there must be a way to queue I/O to the devices. Each operating system and host adapter have unique methods to control the I/O queue. It can be host adapter-based or memory and thread resources-based, or based on how many commands are outstanding for a particular device. You have several configuration parameters available to control the I/O queue for your configuration. There are host adapter parameters and also queue depth


parameters for the various storage devices (VDisks on the SVC). There are also algorithms within multipathing software, such as qdepth_enable.

9.3.1 Queue depths

Queue depth is used to control the number of concurrent operations occurring on different storage resources. Refer to “Limiting Queue Depths in Large SANs,” in the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, SC23-6628-02, for more details:


Queue depth control must be considered for the overall SVC I/O Group to maintain performance within the SVC. It must also be controlled on an individual host adapter basis, LUN basis to avoid taxing the host memory, or physical adapter resources basis. Refer to the host attachment scripts and host attachment guides for initial recommendations for queue depth choices, because they are specific to each host OS and HBA.

You can obtain the IBM System Storage SAN Volume Controller V4.3.0 - Host Attachment Guide, SC26-7905-03, at:


AIX host attachment scripts are available here:

http://www-1.ibm.com/support/dlsearch.wss?rs=540&q=host+attachment&tc=ST52G7&dc=D410

Queue depth control within the host is accomplished through limits placed by the adapter resources for handling I/Os and by setting a queue depth maximum per LUN. Multipathing software also controls queue depth using different algorithms. SDD recently made an algorithm change in this area to limit queue depth individually by LUN as opposed to an overall system queue depth limitation.

The host I/O will be converted to MDisk I/O as needed. The SVC submits I/O to the back-end (MDisk) storage as any host normally does. The host allows user control of the queue depth that is maintained on a disk. SVC controls queue depth for MDisk I/O without any user intervention. After SVC has submitted I/Os and has “Q” I/Os per second (IOPS) outstanding for a single MDisk (that is, it is waiting for Q I/Os to complete), it will not submit any more I/O until some I/O completes. That is, any new I/O requests for that MDisk will be queued inside SVC.

The graph in Figure 9-1 on page 190 indicates the effect on host VDisk queue depth for a simple configuration of 32 VDisks and one host.








Figure 9-1 IOPS compared to queue depth for 32 Vdisks tests on a single host

Figure 9-2 shows another example of queue depth sensitivity for 32 VDisks on a single host.

Figure 9-2 MBps compared to queue depth for 32 VDisk tests on a single host

9.4 Multipathing software

The SVC requires the use of multipathing software on hosts that are connected. The latest recommended levels for each host operating system and multipath software package are documented in the SVC Web site:

http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Multi_Host

Note that the prior levels of host software packages that were recommended are also tested for SVC 4.3.0 and allow for flexibility in maintaining the host software levels with respect to the SVC software version. In other words, it is possible to upgrade the SVC before upgrading the host software levels or after upgrading the software levels, depending on your maintenance schedule.


http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Multi_Host

9.5 Host clustering and reserves

To prevent hosts from sharing storage inadvertently, it is prudent to establish a storage reservation mechanism. The mechanisms for restricting access to SVC VDisks utilize the Small Computer Systems Interface-3 (SCSI-3) persistent reserve commands or the SCSI-2 legacy reserve and release commands.

There are several methods that the host software uses for implementing host clusters. They require sharing the VDisks on the SVC between hosts. In order to share storage between hosts, control must be maintained over accessing the VDisks. Certain clustering software uses software locking methods. Other methods of control can be chosen by the clustering software or by the device drivers to utilize the SCSI architecture reserve/release mechanisms. The multipathing software can change the type of reserve used from a legacy reserve to persistent reserve, or remove the reserve.

Persistent reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and command options that provide SCSI initiators with the ability to establish, preempt, query, and reset a reservation policy with a specified target device. The functionality provided by the persistent reserve commands is a superset of the legacy reserve/release commands. The persistent reserve commands are incompatible with the legacy reserve/release mechanism, and target devices can only support reservations from either the legacy mechanism or the new mechanism. Attempting to mix persistent reserve commands with legacy reserve/release commands will result in the target device returning a reservation conflict error.

Legacy reserve and release mechanisms (SCSI-2) reserved the entire LUN (VDisk) for exclusive use down a single path, which prevents access from any other host or even access from the same host utilizing a different host adapter.

The persistent reserve design establishes a method and interface through a reserve policy attribute for SCSI disks, which specifies the type of reservation (if any) that the OS device driver will establish before accessing data on the disk.

Four possible values are supported for the reserve policy:

� No_reserve: No reservations are used on the disk.

� Single_path: Legacy reserve/release commands are used on the disk.

� PR_exclusive: Persistent reservation is used to establish exclusive host access to the disk.

� PR_shared: Persistent reservation is used to establish shared host access to the disk.

When a device is opened (for example, when the AIX varyonvg command opens the underlying hdisks), the device driver will check the ODM for a reserve_policy and a PR_key_value and open the device appropriately. For persistent reserve, it is necessary that each host attached to the shared disk use a unique registration key value.

Clearing reservesIt is possible to accidently leave a reserve on the SVC VDisk or even the SVC MDisk during migration into the SVC or when reusing disks for another purpose. There are several tools available from the hosts to clear these reserves. The easiest tools to use are the commands lquerypr (AIX SDD host) and pcmquerypr (AIX SDDPCM host). There is also a Windows SDD/SDDDSM tool, which is menu driven.


The Windows Persistent Reserve Tool is called PRTool.exe and is installed automatically when SDD or SDDDSM is installed:

C:\Program Files\IBM\Subsystem Device Driver>PRTool.exe

It is possible to clear SVC VDisk reserves by removing all the host-VDisk mappings when SVC code is at 4.1.0 or higher.

Example 9-5 shows how to determine if there is a reserve on a device using the AIX SDD lquerypr command on a reserved hdisk.

Example 9-5 The lquerypr command

[root@ktazp5033]/reserve-checker-> lquerypr -vVh /dev/hdisk5 connection type: fscsi0 open dev: /dev/hdisk5 Attempt to read reservation key... Attempt to read registration keys... Read Keys parameter Generation : 935 Additional Length: 32 Key0 : 7702785F Key1 : 7702785F Key2 : 770378DF Key3 : 770378DF Reserve Key provided by current host = 7702785F Reserve Key on the device: 770378DF

This example shows that the device is reserved by a different host. The advantage of using the vV parameter is that the full persistent reserve keys on the device are shown, as well as the errors if the command fails. An example of a failing pcmquerypr command to clear the reserve shows the error:

# pcmquerypr -ph /dev/hdisk232 -V connection type: fscsi0 open dev: /dev/hdisk232 couldn't open /dev/hdisk232, errno=16

Use the AIX include file errno.h to find out what the 16 indicates. This error indicates a busy condition, which can indicate a legacy reserve or a persistent reserve from another host (or this host from a different adapter). However, there are certain AIX technology levels (TLs) that have a diagnostic open issue, which prevents the pcmquerypr command from opening the device to display the status or to clear a reserve.

The following hint and tip give more information about AIX TL levels that break the pcmquerypr command:

http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1003122&loc=en_US&cs=utf-8&lang=en

SVC MDisk reservesSometimes, a host image mode migration will appear to succeed, but when the VDisk is actually opened for read or write I/O, problems occur. The problems can result from not removing the reserve on the MDisk before using image mode migration into the SVC. There is no way to clear a leftover reserve on an SVC MDisk from the SVC. The reserve will have to be cleared by mapping the MDisk back to the owning host and clearing it through host commands or through back-end storage commands as advised by IBM technical support.



9.5.1 AIX

The following topics describe items specific to AIX.

HBA parameters for performance tuningThe following example settings can be used to start off your configuration in the specific workload environment. These settings are suggestions, and they are not guaranteed to be the answer to all configurations. Always try to set up a test of your data with your configuration to see if there is further tuning that can help. Again, knowledge of your specific data I/O pattern is extremely helpful.

AIX operating system settingsThe following section outlines the settings that can affect performance on an AIX host. We look at these settings in relation to how they impact the two workload types.

Transaction-based settingsThe following host attachment script will set the default values of attributes for the SVC hdisks: devices.fcp.disk.IBM.rte or devices.fcp.disk.IBM.mpio.rte.

You can modify these values, but they are an extremely good place to start. There are additionally HBA parameters that are useful to set for higher performance or large numbers of hdisk configurations.

All attribute values that are changeable can be changed using the chdev command for AIX.

AIX settings, which can directly affect transaction performance, are the queue_depth hdisk attribute and num_cmd_elem in the HBA attributes.

The queue_depth hdisk attributeFor the logical drive known as the hdisk in AIX, the setting is the attribute queue_depth:

# chdev -l hdiskX -a queue_depth=Y -P

In this example, “X” is the hdisk number, and “Y” is the value to which you are setting X for queue_depth.

For a high transaction workload of small random transfers, try queue_depth of 25 or more, but for large sequential workloads, performance is better with shallow queue depths, such as 4.

The num_cmd_elem attribute For the HBA settings, the attribute num_cmd_elem for the fcs device represents the number of commands that can be queued to the adapter:

chdev -l fcsX -a num_cmd_elem=1024 -P

The default value is 200, and the maximum value is:

� LP9000 adapters: 2048� LP10000 adapters: 2048� LP11000 adapters: 2048� LP7000 adapters: 1024

Best practice: For a high volume of transactions on AIX or a large numbers of hdisks on the fcs adapter, we recommend that you increase num_cmd_elem to 1 024 for the fcs devices being used.


AIX settings, which can directly affect throughput performance with large I/O block size, are the lg_term_dma and max_xfer_size parameters for the fcs device.

The lg_term_dma attributeThis AIX Fibre Channel adapter attribute controls the direct memory access (DMA) memory resource that an adapter driver can use. The default value of lg_term_dma is 0x200000, and the maximum value is 0x8000000. A recommended change is to increase the value of lg_term_dma to 0x400000. If you still experience poor I/O performance after changing the value to 0x400000, you can increase the value of this attribute again. If you have a dual-port Fibre Channel adapter, the maximum value of the lg_term_dma attribute is divided between the two adapter ports. Therefore, never increase lg_term_dma to the maximum value for a dual-port Fibre Channel adapter, because this value will cause the configuration of the second adapter port to fail.

The max_xfer_size attribute This AIX Fibre Channel adapter attribute controls the maximum transfer size of the Fibre Channel adapter. Its default value is 100 000, and the maximum value is 1 000 000. You can increase this attribute to improve performance. You can change this attribute only with AIX 5.2.0 or higher.

Note that setting the max_xfer_size affects the size of a memory area used for data transfer by the adapter. With the default value of max_xfer_size=0x100000, the area is 16 MB in size, and for other allowable values of max_xfer_size, the memory area is 128 MB in size.

Throughput-based settingsIn the throughput-based environment, you might want to decrease the queue depth setting to a smaller value than the default from the host attach. In a mixed application environment, you do not want to lower the num_cmd_elem setting, because other logical drives might need this higher value to perform. In a purely high throughput workload, this value will have no effect.

We recommend that you test your host with the default settings first and then make these possible tuning changes to the host parameters to verify if these suggested changes actually enhance performance for your specific host configuration and workload.

Configuring for fast fail and dynamic tracking For host systems that run an AIX 5.2 or higher operating system, you can achieve the best results by using the fast fail and dynamic tracking attributes. Before configuring your host system to use these attributes, ensure that the host is running the AIX operating system Version 5.2 or higher. Perform the following steps to configure your host system to use the fast fail and dynamic tracking attributes:

1. Issue the following command to set the Fibre Channel SCSI I/O Controller Protocol Device event error recovery policy to fast_fail for each Fibre Channel adapter:

chdev -l fscsi0 -a fc_err_recov=fast_fail

The previous example command was for adapter fscsi0.

2. Issue the following command to enable dynamic tracking for each Fibre Channel device:

chdev -l fscsi0 -a dyntrk=yes

The previous example command was for adapter fscsi0.

Best practice: The recommended start values for high throughput sequential I/O environments are lg_term_dma = 0x400000 or 0x800000 (depending on the adapter type) and max_xfr_size = 0x200000.


MultipathingWhen the AIX operating system was first developed, multipathing was not embedded within the device drivers. Therefore, each path to an SVC VDisk was represented by an AIX hdisk. The SVC host attachment script devices.fcp.disk.ibm.rte sets up the predefined attributes within the AIX database for SVC disks, and these attributes have changed with each iteration of host attachment and AIX technology levels. Both SDD and Veritas DMP utilize the hdisks for multipathing control. The host attachment is also used for other IBM storage devices. The Host Attachment allows AIX device driver configuration methods to properly identify and configure SVC (2145), DS6000 (1750), and DS8000 (2107) LUNs:

http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+attachment&uid=ssg1S4000106&loc=en_US&cs=utf-8&lang=en

SDDIBM Subsystem Device Driver (SDD) multipathing software has been designed and updated consistently over the last decade and is an extremely mature multipathing technology. The SDD software also supports many other IBM storage types directly connected to AIX, such as the 2107. SDD algorithms for handling multipathing have also evolved. There are throttling mechanisms within SDD that controlled overall I/O bandwidth in SDD Releases 1.6.1.0 and lower. This throttling mechanism has evolved to be single vpath specific and is called qdepth_enable in later releases.

SDD utilizes persistent reserve functions, placing a persistent reserve on the device in place of the legacy reserve when the volume group is varyon. However, if HACMP is installed, HACMP controls the persistent reserve usage depending on the type of varyon used. Also, the enhanced concurrent volume groups (VGs) have no reserves: varyonvg -c for enhanced concurrent and varyonvg for regular VGs that utilize the persistent reserve.

Datapath commands are an extremely powerful method for managing the SVC storage and pathing. The output shows the LUN serial number of the SVC VDisk and which vpath and hdisk represent that SVC LUN. Datapath commands can also change the multipath selection algorithm. The default is load balance, but the multipath selection algorithm is programmable. The recommended best practice when using SDD is also load balance using four paths. The datapath query device output will show a somewhat balanced number of selects on each preferred path to the SVC:

DEV#: 12 DEVICE NAME: vpath12 TYPE: 2145 POLICY: Optimized SERIAL: 60050768018B810A88000000000000E0====================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk55 OPEN NORMAL 1390209 0 1 fscsi0/hdisk65 OPEN NORMAL 0 0 2 fscsi0/hdisk75 OPEN NORMAL 1391852 0 3 fscsi0/hdisk85 OPEN NORMAL 0 0

We recommend that you verify that the selects during normal operation are occurring on the preferred paths (use datapath query device -l). Also, verify that you have the correct connectivity.




SDDPCMAs Fibre Channel technologies matured, AIX was enhanced by adding native multipathing support called Multipath I/O (MPIO). This structure allows a manufacturer of storage to create software plug-ins for their specific storage. The IBM SVC version of this plug-in is called SDDPCM, which requires a host attachment script called devices.fcp.disk.ibm.mpio.rte:


SDDPCM and AIX MPIO have been continually improved since their release. We recommend that you are at the latest release levels of this software.

The preferred path indicator for SDDPCM will not display until after the device has been opened for the first time, which differs from SDD, which displays the preferred path immediately after being configured.

SDDPCM features four types of reserve policies:

� No_reserve policy� Exclusive host access single path policy� Persistent reserve exclusive host policy� Persistent reserve shared host access policy

The usage of the persistent reserve now depends on the hdisk attribute: reserve_policy. Change this policy to match your storage security requirements.

There are three path selection algorithms:

� Failover� Round-robin� Load balancing

The latest SDDPCM code of 2.1.3.0 and later has improvements in failed path reclamation by a health checker, a failback error recovery algorithm, Fibre Channel dynamic device tracking, and support for SAN boot device on MPIO-supported storage devices.

9.5.2 SDD compared to SDDPCM

There are several reasons for choosing SDDPCM over SDD. SAN boot is much improved with native mpio-sddpcm software. Multiple Virtual I/O Servers (VIOSs) are supported. Certain applications, such as Oracle® ASM, will not work with SDD.

Another thing that might be worthwhile noting is that with SDD, all paths can go to dead, which will improve HACMP and Logical Volume Manager (LVM) mirroring failovers. With SDDPCM, one path will always remain open even if the LUN is dead. This design causes longer failovers.

With SDDPCM utilizing HACMP, enhanced concurrent volume groups require the no reserve policy for both concurrent and non-concurrent resource groups. Therefore, HACMP uses a software locking mechanism instead of implementing persistent reserves. HACMP used with SDD does utilize persistent reserves based on what type of varyonvg was executed.

SDDPCM pathingSDDPCM pcmpath commands are the best way to understand configuration information about the SVC storage allocation. The following example shows how much can be determined from this command, pcmpath query device, about the connections to the SVC from this host.





DEV#: 0 DEVICE NAME: hdisk0 TYPE: 2145 ALGORITHM: Load Balance SERIAL: 6005076801808101400000000000037B ======================================================================Path# Adapter/Path Name State Mode Select Errors 0 fscsi0/path0 OPEN NORMAL 155009 0 1 fscsi1/path1 OPEN NORMAL 155156 0

In this example, both paths are being used for the SVC connections. These counts are not the normal select counts for a properly mapped SVC, and two paths are not an adequate number of paths. Use the -l option on pcmpath query device to check whether these paths are both preferred paths. If they are both preferred paths, one SVC node must be missing from the host view.

Using the -l option shows an asterisk on both paths, indicating a single node is visible to the host (and is the non-preferred node for this VDisk):

0* fscsi0/path0 OPEN NORMAL 9795 0 1* fscsi1/path1 OPEN NORMAL 9558 0

This information indicates a problem that needs to be corrected. If zoning in the switch is correct, perhaps this host was rebooted while one SVC node was missing from the fabric.

VeritasVeritas DMP multipathing is also supported for the SVC. Veritas DMP multipathing requires certain AIX APARS and the Veritas Array Support Library. It also requires a certain version of the host attachment script devices.fcp.disk.ibm.rte to recognize the 2 145 devices as hdisks rather than MPIO hdisks. In addition to the normal ODM databases that contain hdisk attributes, there are several Veritas filesets that contain configuration data:

� /dev/vx/dmp � /dev/vx/rdmp � /etc/vxX.info

Storage reconfiguration of VDisks presented to an AIX host will require cleanup of the AIX hdisks and these Veritas filesets.

9.5.3 Virtual I/O server

Virtual SCSI is based on a client/server relationship. The Virtual I/O Server (VIOS) owns the physical resources and acts as the server, or target, device. Physical adapters with attached disks (VDisks on the SVC, in our case) on the Virtual I/O Server partition can be shared by one or more partitions. These partitions contain a virtual SCSI client adapter that sees these virtual devices as standard SCSI compliant devices and LUNs.

There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI hdisks and logical volume (LV) VSCSI hdisks.

PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOSs for that reason, you must use PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are VDisks from the virtual I/O client (VIOC) point of view. An LV VCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in LVM volume groups (VGs) on the VIOS and cannot span PVs in that VG, nor be striped LVs. Due to these restrictions, we recommend using PV VSCSI hdisks.

Multipath support for SVC attachment to Virtual I/O Server is provided by either SDD or MPIO with SDDPCM. Where Virtual I/O Server SAN Boot or dual Virtual I/O Server configurations


are required, only MPIO with SDDPCM is supported. We recommend using MPIO with SDDPCM due to this restriction with the latest SVC-supported levels as shown by:

http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Virtual_IO_Server

Details of the Virtual I/O Server-supported environments are at:

http://www14.software.ibm.com/webapp/set2/sas/f/vios/home.html

There are many questions answered on the following Web site for usage of the VIOS:

http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/faq.html

One common question is how to migrate data into a VIO environment or how to reconfigure storage on a VIOS. This question is addressed in the previous link.

Many clients want to know if SCSI LUNs can be moved between the physical and virtual environment “as is.” That is, given a physical SCSI device (LUN) with user data on it that resides in a SAN environment, can this device be allocated to a VIOS and then provisioned to a client partition and used by the client “as is”?

The answer is no, this function is not supported at this time. The device cannot be used “as is.” Virtual SCSI devices are new devices when created, and the data must be put on them after creation, which typically requires a type of backup of the data in the physical SAN environment with a restoration of the data onto the VDisk.

Why do we have this limitationThe VIOS uses several methods to uniquely identify a disk for use as a virtual SCSI disk; they are:

� Unique device identifier (UDID) � IEEE volume identifier � Physical volume identifier (PVID)

Each of these methods can result in different data formats on the disk. The preferred disk identification method for VDisks is the use of UDIDs.

MPIO uses the UDID methodMost non-MPIO disk storage multipathing software products use the PVID method instead of the UDID method. Because of the different data format associated with the PVID method, clients with non-MPIO environments need to be aware that certain future actions performed in the VIOS logical partition (LPAR) can require data migration, that is, a type of backup and restoration of the attached disks. These actions can include, but are not limited to:

� Conversion from a non-MPIO environment to MPIO

� Conversion from the PVID to the UDID method of disk identification

� Removal and rediscovery of the Disk Storage ODM entries

� Updating non-MPIO multipathing software under certain circumstances

� Possible future enhancements to VIO

Due in part to the differences in disk format that we just described, VIO is currently supported for new disk installations only.

AIX, VIO, and SDD development are working on changes to make this migration easier in the future. One enhancement is to use the UDID or IEEE method of disk identification. If you use the UDID method, it might be possible to contact IBM technical support to get a method of migrating that might not require restoration.


http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Virtual_IO_Server

http://www14.software.ibm.com/webapp/set2/sas/f/vios/home.html




A quick and simple method to determine if a backup and restoration is necessary is to run the command lquerypv -h /dev/hdisk## 80 10 to read the PVID off the disk. If the output is different on both the VIOS and VIOC, you must use backup and restore.

How to back up the VIO configurationTo back up the VIO configuration:

1. Save off the volume group information from the VIOC (PVIDs and VG names).

2. Save off the disk mapping, PVID, and LUN ID information from ALL VIOSs. This step includes mapping the VIOS hdisk (typically, a hdisk) to the VIOC hdisk and you must save at least the PVIDs information.

3. Save off the physical LUN to host LUN ID information on the storage subsystem for when we reconfigure the hdisk (typically).

After all the pertinent mapping data has been collected and saved, it is possible to back up and reconfigure your storage and then restore using the AIX commands:

� Back up the VG data on the VIOC.

� For rootvg, the supported method is a mksysb and an install, or savevg and restvg for non-rootvg.

9.5.4 Windows

There are two options of multipathing drivers released for Windows 2003 Server hosts. Windows 2003 Server device driver development has concentrated on the storport.sys driver. This driver has significant interoperability differences from the older scsiport driver set. Additionally, Windows has released a native multipathing I/O option with a storage specific plug-in. SDDDSM was designed to support these newer methods of interfacing with Windows 2003 Server. In order to release new enhancements more quickly, the newer hardware architectures (64-bit EMT and so forth) are only tested on the SDDDSM code stream; therefore, only SDDDSM packages are available.

The older version of the SDD multipathing driver works with the scsiport drivers. This version is required for Windows Server 2000 servers, because storport.sys is not available. The SDD software is also available for Windows 2003 Server servers when the scsiport hba drivers are used.

Clustering and reservesWindows SDD or SDDDSM utilizes the persistent reserve functions to implement Windows Clustering. A stand-alone Windows host will not utilize reserves.

Review this Microsoft article about clustering to understand how a cluster works:

http://support.microsoft.com/kb/309186/

When SDD or SDDDSM is installed, the reserve and release functions described in this article are translated into proper persistent reserve and release equivalents to allow load balancing and multipathing from each host.

SDD compared to SDDDSMThe major requirement for choosing SDD over SDDDSM is to ensure the matching host bus adapter driver type is also loaded on the system. Choose the storport driver for SDDDSM and the scsiport versions for SDD. From an error isolation perspective, the tracing available and collected by sddgetdata is easier to follow with the SDD software and is a more mature



http://support.microsoft.com/kb/309186



release. Future enhancements will concentrate on SDDDSM within the windows MPIO framework.

Tunable parametersWith Windows operating systems, the queue depth settings are the responsibility of the host adapters and configured through the BIOS setting. Configuring the queue depth settings varies from vendor to vendor. Refer to your manufacturer’s instructions about how to configure your specific cards and the IBM System Storage SAN Volume Controller Host Attachment User’s Guide Version 4.3.0, SC26-7905-03, at:


Queue depth is also controlled by the Windows application program. The application program has control of how many I/O commands it will allow to be outstanding before waiting for completion.

For IBM FAStT FC2-133 (and QLogic-based HBAs), the queue depth is known as the execution throttle, which can be set with either the QLogic SANSurfer tool or in the BIOS of the QLogic-based HBA by pressing Ctrl+Q during the startup process.

Changing back-end storage LUN mappings dynamicallyUnmapping a LUN from a Windows SDD or SDDDSM server and then mapping a different LUN using the same SCSI ID can cause data corruption and loss of access. The procedure for reconfiguration is documented at the following Web site:


Recommendations for Disk Alignment using Windows with SVC VDisksThe recommended settings for the best performance with SVC when you use Microsoft Windows operating systems and applications with a significant amount of I/O can be found at the following Web site:

http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&q1=microsoft&uid=ssg1S1003291&loc=en_US&cs=utf-8&lang=en

9.5.5 Linux

IBM has decided to transition SVC multipathing support from IBM SDD to Linux® native DM-MPIO multipathing. Refer to the V4.3.0 - Recommended Software Levels for SAN Volume Controller for which versions of each Linux kernel require SDD or DM-MPIO support:


If your kernel is not listed for support, contact your IBM marketing representative to request a Request for Price Quotation (RPQ) for your specific configuration.

Linux Clustering is not supported, and Linux OS does not use the legacy reserve function. Therefore, there are no persistent reserves used in Linux. Contact IBM marketing for RPQ support if you need Linux Clustering in your specific environment.






http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&q1=microsoft&uid=ssg1S1003291&loc=en_US&cs=utf-8&lang=en


http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003090#_Supported_Host_operating_system_Lev

http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003090#_Supported_Host_operating_system_Lev

SDD compared to DM-MPIOFor reference on the multipathing choices for Linux operating systems, SDD development has provided the white paper, Considerations and Comparisons between IBM SDD for Linux and DM-MPIO, which is available at:

http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&q1=linux&uid=ssg1S7001664&loc=en_US&cs=utf-8&lang=en

Tunable parametersLinux performance is influenced by HBA parameter settings and queue depth. Queue depth for Linux servers can be determined by using the formula specified in the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, SC23-6628-02, at:


Refer to the settings for each specific HBA type and general Linux OS tunable parameters in the IBM System Storage SAN Volume Controller V4.3.0 - Host Attachment Guide, SC26-7905-03, at:


In addition to the I/O and OS parameters, Linux also has tunable file system parameters.

You can use the command tune2fs to increase file system performance based on your specific configuration. The journal mode and size can be changed. Also, the directories can be indexed. Refer to the following open source document for details:

http://swik.net/how-to-increase-ext3-and-reiserfs-filesystems-performance

9.5.6 Solaris

There are several options for multipathing support on Solaris™ hosts. You can choose between IBM SDD, Symantec/VERITAS Volume Manager, or you can use Solaris MPxIO depending on the OS levels in the latest SVC software level matrix.

SAN startup support and clustering support are available for Symantec/VERITAS Volume Manager, and SAN boot support is also available for MPxIO.

Solaris MPxIOReleases of SVC code prior to 4.3.0 did not support load balancing of the MPxIO software.

Configure your SVC host object with the type attribute set to tpgs if you want to run MPxIO on your Sun™ SPARC host. For example:

svctask mkhost -name new_name_arg -hbawwpn wwpn_list -type tpgs

In this command, -type specifies the type of host. Valid entries are hpux, tpgs, or generic. The tpgs option enables an extra target port unit. The default is generic.

Symantec/VERITAS Volume ManagerWhen managing IBM SVC storage in Symantec’s volume manager products, you must install an array support library (ASL) on the host so that the volume manager is aware of the storage subsystem properties (active/active, active/passive). If the appropriate ASL is not installed, the volume manager has not claimed the LUNs. Usage of the ASL is required to enable the special failover/failback multipathing that SVC requires for error recovery.







http://swik.net/how-to-increase-ext3-and-reiserfs-filesystems-performance


Use the following commands to determine the basic configuration of a Symantec/Veritas server:

pkginfo –l (lists all installed packages)showrev -p |grep vxvm (to obtain version of volume manager)vxddladm listsupport (to see what ASLs are configured)vxdisk list vxdmpadm listctrl all (shows all attached subsystems, and provides a type where possible)vxdmpadm getsubpaths ctlr=cX (lists paths by controller)vxdmpadm getsubpaths dmpnodename=cxtxdxs2’ (lists paths by lun)

The following commands will determine if the SVC is properly connected and show at a glance which ASL library is used (native DMP ASL or SDD ASL).

Here is an example of what you see when Symantec volume manager is correctly seeing our SVC, using the SDD passthrough mode ASL:

# vxdmpadm list enclosure allENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS============================================================OTHER_DISKS OTHER_DISKS OTHER_DISKS CONNECTEDVPATH_SANVC0 VPATH_SANVC 0200628002faXX00 CONNECTED

Here is an example of what we see when SVC is configured using native DMP ASL:

# vxdmpadm listenclosure allENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS============================================================OTHER_DISKS OTHER_DSKSI OTHER_DISKS CONNECTEDSAN_VC0 SAN_VC 0200628002faXX00 CONNECTED

ASL specifics for SVCFor SVC, ASLs have been developed using both DMP multipathing or SDD passthrough multipathing.

For SDD passthrough:

http://support.veritas.com/docs/281321

# pkginfo -l VRTSsanvcPKG=VRTSsanvcBASEDIR=/etc/vxNAME=Array Support Library for IBM SAN.VC with SDD.PRODNAME=VERITAS ASL for IBM SAN.VC with SDD.

For native DMP:

http://support.veritas.com/docs/276913pkginfo -l VRTSsanvcPKGINST: VRTSsanvc NAME: Array Support Librarry for IBM SAN.VC in NATIVE DMP mode

To check the installed Symantec/VERITAS version:

showrev -p |grep vxvm

To check what IBM ASLs are configured into the volume manager:

vxddladm listsupport |grep -i ibm




Following the installation of a new ASL using pkgadd, you need to either reboot or issue vxdctl enable. To list the ASLs that are active, run vxddladm listsupport.

How to troubleshoot configuration issuesHere is an example of the appropriate ASL not being installed or the system not enabling the ASL. The key is the enclosure type OTHER_DISKS:

vxdmpadm listctlr allCTLR-NAME ENCLR-TYPE STATE ENCLR-NAME=====================================================c0 OTHER_DISKS ENABLED OTHER_DISKSc2 OTHER_DISKS ENABLED OTHER_DISKSc3 OTHER_DISKS ENABLED OTHER_DISKS

vxdmpadm listenclosure allENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS============================================================OTHER_DISKS OTHER_DISKS OTHER_DISKS CONNECTEDDisk Disk DISKS DISCONNECTED

9.5.7 VMware

Review the V4.3.0 - Recommended Software Levels for SAN Volume Controller Web site for the various ESX levels that are supported:

http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_VMWare

To get continued support for VMware levels, for example, you use level 3.01, you must upgrade it to a minimum VMware level of 3.02. For more details, contact your IBM marketing representative and ask about the submission of an RPQ for support. The necessary patches and procedures to apply them will be supplied after the specific configuration has been reviewed and approved.

Multipathing solutions supportedMultipathing is supported at ESX level 2.5.x and higher; therefore, installing multipathing software is not required. Single pathing is only supported in ESX level 2.1.

VMware® multipathing does not support dynamic pathing. Preferred paths will be ignored in the SVC. The VMware multipathing software performs static load balancing for I/O, based upon a host setting, which defines the preferred path for a given volume.

Multipathing configuration maximumsThe maximum supported configuration for the VMware multipathing software is:

� A total of 256 SCSI devices� Four paths to each VDisk

For more information about VMware and SVC, VMware storage and zoning recommendations, HBA settings and attaching VDisks to VMware, refer to IBM System Storage SAN Volume Controller V4.3, SG24-6423, at:

http://www.redbooks.ibm.com/redpieces/abstracts/sg246423.html

Note: Each path to a VDisk equates to a single SCSI device.




9.6 Mirroring considerations

As you plan how to fully utilize the various options to back up your data through mirroring functions, consider how to keep a consistent set of data for your application. A consistent set of data implies a level of control by the application or host scripts to start and stop mirroring with both host-based mirroring and back-end storage mirroring features. It also implies a group of disks that must be kept consistent.

Host applications have a certain granularity to their storage writes. The data has a consistent view to the host application only at certain times. This level of granularity is at the file system level as opposed to the SCSI read/write level. The SVC guarantees consistency at the SCSI read/write level when its features of mirroring are in use. However, a host file system write might require multiple SCSI writes. Therefore, without a method of controlling when the mirroring stops, the resulting mirror can be missing a portion of a write and look corrupted. Normally, a database application has methods to recover the mirrored data and to back up to a consistent view, which is applicable in the case of a disaster that breaks the mirror. However, we recommend that you have a normal procedure of stopping at a consistent view for each mirror in order to be able to easily start up the backup copy for non-disaster scenarios.

9.6.1 Host-based mirroring

Host-based mirroring is a fully redundant method of mirroring using two mirrored copies of the data. Mirroring is done by the host software. If you use this method of mirroring, we recommend that each copy is placed on a separate SVC cluster.

9.7 Monitoring

A consistent set of monitoring tools is available when IBM SDD, SDDDSM, and SDDPCM are used for the multipathing software on the various OS environments. Examples earlier in this chapter showed how the datapath query device and datapath query adapter commands can be used for path monitoring.

Path performance can also be monitored via datapath commands:

datapath query devstats (or pcmpath query devstats)

The datapath query devstats command shows performance information for a single device, all devices, or a range of devices. Example 9-6 shows the output of datapath query devstats for two devices.

Example 9-6 The datapath query devstats command output

C:\Program Files\IBM\Subsystem Device Driver>datapath query devstats

Total Devices : 2

Device #: 0============= Total Read Total Write Active Read Active Write MaximumI/O: 1755189 1749581 0 0 3SECTOR: 14168026 153842715 0 0 256


Transfer Size: <= 512 <= 4k <= 16K <= 64K > 64K 271 2337858 104 1166537 0

Device #: 1============= Total Read Total Write Active Read Active Write MaximumI/O: 20353800 9883944 0 1 4SECTOR: 162956588 451987840 0 128 256

Transfer Size: <= 512 <= 4k <= 16K <= 64K > 64K 296 27128331 215 3108902 0

Also, an adapter level statistics command is available: datapath query adaptstats (also mapped to pcmpath query adaptstats). Refer to Example 9-7 for a two adapter example.

Example 9-7 The datapath query adaptstats output

C:\Program Files\IBM\Subsystem Device Driver>datapath query adaptstats

Adapter #: 0============= Total Read Total Write Active Read Active Write MaximumI/O: 11060574 5936795 0 0 2SECTOR: 88611927 317987806 0 0 256

Adapter #: 1============= Total Read Total Write Active Read Active Write MaximumI/O: 11048415 5930291 0 1 2SECTOR: 88512687 317726325 0 128 256

It is possible to clear these counters so that you can script the usage to cover a precise amount of time. The commands also allow you to choose devices to return as a range, single device, or all devices. The command to clear the counts is datapath clear device count.

9.7.1 Automated path monitoring

There are many situations in which a host can lose one or more paths to storage. If the problem is just isolated to that one host, it might go unnoticed until a SAN issue occurs that causes the remaining paths to go offline, such as a switch failure, or even a routine code upgrade, which can cause a loss-of-access event, which seriously affects your business. To prevent this loss-of-access event from happening, many clients have found it useful to implement automated path monitoring using SDD commands and common system utilities. For instance, a simple command string in a UNIX system can count the number of paths:

datapath query device | grep dead | lc

This command can be combined with a scheduler, such as cron, and a notification system, such as an e-mail, to notify SAN administrators and system administrators if the number of paths to the system changes.


9.7.2 Load measurement and stress tools

Generally, load measurement tools are specific to each host operating system tool support. For example, the AIX OS has the tool iostat. Windows OS has perfmon.msc /s.

There are industry standard performance benchmarking tools available. These tools are available by joining the Storage Performance Council. The information about how to join is available here:

http://www.storageperformance.org/home

These tools are available to both create stress and measure the stress that was created with a standardized tool and are highly recommended for generating stress for your test environments to compare against the industry measurements.

Another recommended stress tool available is iometer for Windows and Linux hosts:

http://www.iometer.org

AIX System p has Wikis on performance tools and has made a set available for their users:

http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/Performance+Monitoring+Toolshttp://www-941.ibm.com/collaboration/wiki/display/WikiPtype/nstress

Xdd is a tool for measuring and analyzing disk performance characteristics on single systems or clusters of systems. It was designed by Thomas M. Ruwart from I/O Performance, Inc. to provide consistent and reproducible performance of a sustained transfer rate of an I/O subsystem. It is a command line-based tool that grew out of the UNIX community and has been ported to run in Windows environments as well.

Xdd is a free software program distributed under a GNU General Public License. Xdd is available for download at:

http://www.ioperformance.com/products.htm

The Xdd distribution comes with all the source code necessary to install Xdd and the companion programs for the timeserver and the gettime utility programs.

DS4000 Best Practices and Performance Tuning Guide, SG24-6363-02, has detailed descriptions of how to use these measurement and test tools:

http://www.redbooks.ibm.com/abstracts/sg246363.html?Open


http://www.storageperformance.org/home

http://www.iometer.org

http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/Performance+Monitoring+Tools

http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/nstress

http://www.redbooks.ibm.com/abstracts/sg246363.html?Open

http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/Performance+Monitoring+Tools

http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/nstress

http://www.ioperformance.com/products.htm

Chapter 10. Applications

In this chapter, we provide information about laying out storage for the best performance for general applications, IBM AIX Virtual I/O (VIO) servers, and IBM DB2® databases specifically. While most of the specific information is directed to hosts running the IBM AIX operating system, the information is also relevant to other host types.

10


10.1 Application workloads

In general, there are two types of data workload (data processing):

� Transaction-based � Throughput-based

These workloads are different by nature and must be planned for in quite different ways. Knowing and understanding how your host servers and applications handle their workload is an important part of being successful with your storage configuration efforts and the resulting performance.

A workload that is characterized by a high number of transactions per second and a high number of I/Os Per Second (IOPS) is called a transaction-based workload.

A workload that is characterized by a large amount of data transferred, normally with large I/O sizes, is called a throughput-based workload.

These two workload types are conflicting in nature and consequently will require different configuration settings across all components comprising the storage infrastructure. Generally, I/O (and therefore application) performance will be best when the I/O activity is evenly spread across the entire I/O subsystem.

But first, let us describe each type of workload in greater detail and explain what you can expect to encounter in each case.

10.1.1 Transaction-based workloads

High performance transaction-based environments cannot be created with a low-cost model of a storage server. Indeed, transaction process rates are heavily dependent on the number of back-end physical drives that are available for the storage subsystem controllers to use for parallel processing of host I/Os, which frequently results in having to decide how many physical drives you need.

Generally, transaction intense applications also use a small random data block pattern to transfer data. With this type of data pattern, having more back-end drives enables more host I/Os to be processed simultaneously, because read cache is far less effective than write cache, and the misses need to be retrieved from the physical disks.

In many cases, slow transaction performance problems can be traced directly to “hot” files that cause a bottleneck on a critical component (such as a single physical disk). This situation can occur even when the overall storage subsystem sees a fairly light workload. When bottlenecks occur, they can present an extremely difficult and frustrating task to resolve. Because workload content can continually change throughout the course of the day, these bottlenecks can be extremely mysterious in nature and appear and disappear or move over time from one location to another location.

Generally, I/O (and therefore application) performance will be best when the I/O activity is evenly spread across the entire I/O subsystem.

10.1.2 Throughput-based workloads

Throughput-based workloads are seen with applications or processes that require massive amounts of data sent and generally use large sequential blocks to reduce disk latency.


Generally, a smaller number of physical drives are needed to reach adequate I/O performance than with transaction-based workloads. For instance, 20 - 28 physical drives are normally enough to reach maximum I/O throughput rates with the IBM System Storage DS4000 series of storage subsystems. In a throughput-based environment, read operations make use of the storage subsystem cache to stage greater chunks of data at a time to improve the overall performance. Throughput rates are heavily dependent on the storage subsystem’s internal bandwidth. Newer storage subsystems with broader bandwidths are able to reach higher numbers and bring higher rates to bear.

10.1.3 Storage subsystem considerations

It is of great importance that the selected storage subsystem model is able to support the required I/O workload. Besides availability concerns, adequate performance must be ensured to meet the requirements of the applications, which include evaluation of the disk drive modules (DDMs) used and if the internal architecture of the storage subsystem is sufficient. With today’s mechanically based DDMs, it is important that the DDM characteristics match the needs. In general, a high rotation speed of the DDM platters is needed for transaction-based throughputs, where the DDM head continuously moves across the platters to read and write random I/Os. For throughput-based workloads, a lower rotation speed might be sufficient, because of the sequential I/O nature. As for the subsystem architecture, newer generations of storage subsystems have larger internal caches, higher bandwidth busses, and more powerful storage controllers.

10.1.4 Host considerations

When discussing performance, we need to consider far more than just the performance of the I/O workload itself. Many settings within the host frequently affect the overall performance of the system and its applications. All areas must be checked to ensure that we are not focusing on a result rather than the cause. However, in this book we are focusing on the I/O subsystem part of the performance puzzle; so we will discuss items that affect its operation.

Several of the settings and parameters that we discussed in Chapter 9, “Hosts” on page 175 must match both for the host operating system (OS) and for the host bus adapters (HBAs) being used as well. Many operating systems have built-in definitions that can be changed to enable the HBAs to be set to the new values.

10.2 Application considerationsWhen gathering data for planning from the application side, it is important to first consider the workload type for the application.

If multiple applications or workload types will share the system, you need to know the type of workloads of each application, and if the applications have both types or are mixed (transaction-based and throughput-based), which workload will be the most critical. Many environments have a mix of transaction-based and throughput-based workloads; generally, the transaction performance is considered the most critical.

However, in some environments, for example, a Tivoli Storage Manager backup environment, the streaming high throughput workload of the backup itself is the critical part of the operation. The backup database, although a transaction-centered workload, is a less critical workload.

Chapter 10. Applications 209

10.2.1 Transaction environments

Applications that use high transaction workloads are known as Online Transaction Processing (OLTP) systems. Examples of these systems are database servers and mail servers.

If you have a database, you tune the server type parameters, as well as the database’s logical drives, to meet the needs of the database application. If the host server has a secondary role of performing nightly backups for the business, you need another set of logical drives, which are tuned for high throughput for the best backup performance you can get within the limitations of the mixed storage subsystem’s parameters.

So, what are the traits of a transaction-based application? In the following sections, we explain these traits in more detail.

As mentioned earlier, you can expect to see a high number of transactions and a fairly small I/O size. Different databases use different I/O sizes for their logs (refer to the following examples), and these logs vary from vendor to vendor. In all cases, the logs are generally high write-oriented workloads. For table spaces, most databases use between a 4 KB and a 16 KB I/O size. In certain applications, larger chunks (for example, 64 KB) will be moved to host application cache memory for processing. Understanding how your application is going to handle its I/O is critical to laying out the data properly on the storage server.

In many cases, the table space is generally a large file made up of small blocks of data records. The records are normally accessed using small I/Os of a random nature, which can result in about a 50% cache miss ratio. For this reason and to not waste space with unused data, plan for the SAN Volume Controller (SVC) to read and write data into cache in small chunks (use striped VDisks with smaller extent sizes).

Another point to consider is whether the typical I/O is read or write. In most OLTP environments, there is generally a mix of about 70% reads and 30% writes. However, the transaction logs of a database application have a much higher write ratio and, therefore, perform better in a different managed disk (MDisk) group (MDG). Also, you need to place the logs on a separate virtual disk (VDisk), which for best performance must be located on a different MDG that is defined to better support the heavy write need. Mail servers also frequently have a higher write ratio than read ratio.

10.2.2 Throughput environments

With throughput workloads, you have fewer transactions, but much larger I/Os. I/O sizes of 128 K or greater are normal, and these I/Os are generally of a sequential nature. Applications that typify this type of workload are imaging, video servers, seismic processing, high performance computing (HPC), and backup servers.

With large size I/O, it is better to use large cache blocks to be able to write larger chunks into cache with each operation. Generally, you want the sequential I/Os to take as few back-end I/Os as possible and to get maximum throughput from them. So, carefully decide how the logical drive will be defined and how the VDisks are dispersed on the back-end storage MDisks.

Many environments have a mix of transaction-oriented workloads and throughput-oriented workloads. Unless you have measured your workloads, assume that the host workload is

Best practice: Database table spaces, journals, and logs must never be collocated on the same MDisk or MDG in order to avoid placing them on the same back-end storage logical unit number (LUN) or Redundant Array of Independent Disks (RAID) array.


mixed and use SVC striped VDisks over several MDisks in an MDG in order to have the best performance and eliminate trouble spots or “hot spots.”

10.3 Data layout overview

In this section, we document data layout from an AIX point of view. Our objective is to help ensure that AIX and storage administrators, specifically those responsible for allocating storage, know enough to lay out the storage data, consider the virtualization layers, and avoid the performance problems and hot spots that come with poor data layout. The goal is to balance I/Os evenly across the physical disks in the back-end storage subsystems.

We will specifically show you how to lay out storage for DB2 applications as a good example of how an application might balance its I/Os within the application.

There are also various implications for the host data layout based on whether you utilize SVC image mode or SVC striped mode VDisks.

10.3.1 Layers of volume abstraction

Back-end storage is laid out into RAID arrays by RAID type, the number of disks in the array, and the LUN allocation to the SVC or host. The RAID array is a certain number of disk drive modules (DDMs) (usually containing from two to 32 disks and most often, around 10 disks) in a RAID configuration (RAID 0, RAID 1, RAID 5, or RAID 10, typically); although, certain vendors call their entire disk subsystem an “array.”

Use of an SVC adds another layer of virtualization to understand, because there are VDisks, which are LUNs served from the SVC to a host, and MDisks, which are LUNs served from back-end storage to the SVC.

The SVC VDisks are presented to the host as LUNs. These LUNs are then mapped as physical volumes on the host, which might build logical volumes out of the physical volumes.

Figure 10-1 on page 212 shows the layers of storage virtualization.


Figure 10-1 Layers of storage virtualization

10.3.2 Storage administrator and AIX LVM administrator roles

Storage administrators control the configuration of the back-end storage subsystems and their RAID arrays (RAID type and number of disks in the array, although there are restrictions on the number of disks in the array and other restrictions depending upon the disk subsystem). They normally also decide the layout of the back-end storage LUNs (MDisks), SVC MDGs, SVC VDisks, and which VDisks are assigned to which hosts.

The AIX administrators control the AIX Logical Volume Manager (LVM) and in which volume group (VG) the SVC VDisks (LUNs) are placed. They also create logical volumes (LVs) and file systems within the VGs. These administrators have no control where multiple files or directories reside in an LV unless there is only one file or directory in the LV.

There is also an application administrator for those applications, such as DB2, which balance their I/Os by striping directly across the LVs.

Together, the storage administrator, LVM administrator, and application administrator control on which physical disks the LVs reside.

10.3.3 General data layout recommendations

Our primary recommendation for laying out data on SVC back-end storage for general applications is to use striped VDisks across MDGs consisting of similar-type MDisks with as few MDisks as possible per RAID array. This general purpose rule is applicable to most SVC


back-end storage configurations and removes a significant data layout burden for the storage administrators.

Consider where the “failure boundaries” are in the back-end storage and take this into consideration when locating application data. A failure boundary is defined as what will be affected if we lose a RAID array (an SVC MDisk). All the VDisks and servers striped on that MDisk will be affected together with all other VDisks in that MDG. Consider also that spreading out the I/Os evenly across the back-end storage has a performance benefit and a management benefit. We recommend that an entire set of back-end storage is managed together considering the failure boundary. If a company has several lines of business (LOBs), it might decide to manage the storage along each LOB so that each LOB has a unique set of back-end storage. So, for each set of back-end storage (a group of MDGs or perhaps better, just one MDG), we create only striped VDisks across all the back-end storage arrays, which is is beneficial, because the failure boundary is limited to a LOB, and performance and storage management is handled as a unit for the LOB independently.

What we do not recommend is to create striped VDisks that are striped across different sets of back-end storage, because using different sets of back-end storage makes the failure boundaries difficult to determine, unbalances the I/O, and might limit the performance of those striped VDisks to the slowest back-end device.

For SVC configurations where SVC image mode VDisks must be used, we recommend that the back-end storage configuration for the database consists of one LUN (and therefore one image mode VDisk) per array, or an equal number of LUNs per array, so that the Database Administrator (DBA) can guarantee that the I/O workload is distributed evenly across the underlying physical disks of the arrays. Refer to Figure 10-2 on page 214.

Use striped mode VDisks for applications that do not already stripe their data across physical disks. Striped VDisks are the all-purpose VDisks for most applications. Use striped mode VDisks if you need to manage a diversity of growing applications and balance the I/O performance based on probability.

If you understand your application storage requirements, you might take an approach that explicitly balances the I/O rather than a probabilistic approach to balancing the I/O. However, explicitly balancing the I/O requires either testing or a good understanding of the application and the storage mapping and striping to know which approach works better.

Examples of applications that stripe their data across the underlying disks are DB2, GPFS™, and Oracle ASM. These types of applications might require additional data layout considerations as described in 10.4, “When the application does its own balancing of I/Os” on page 216.


Figure 10-2 General data layout recommendations for AIX storage

SVC striped mode VDisksWe recommend striped mode VDisks for applications that do not already stripe their data across disks.

Creating VDisks that are striped across all RAID arrays in an MDG ensures that AIX LVM setup does not matter. Creating VDisks that are striped across all RAID arrays in an MDG is an excellent approach for most general applications and eliminates data layout considerations for the physical disks.

Use striped VDisks with the following considerations:

� Use extent sizes of 64 MB to maximize sequential throughput when it is important. Refer to Table 10-1 for a table of extent size compared to capacity.

� Use striped VDisks when the number of VDisks does not matter.

� Use striped VDisks when the number of VGs does not affect performance.

� Use striped VDisks when sequential I/O rates are greater than the sequential rate for a single RAID array on the back-end storage. Extremely high sequential I/O rates might require a different layout strategy.

� Use striped VDisks when you prefer the use of extremely large LUNs on the host.

Refer to 10.6, “VDisk size” on page 220 for details about how to utilize large VDisks.

Table 10-1 Extent size as opposed to maximum storage capacity

General data layout recommendation for AIX:� Evenly balance I/Os across all physical disks (one method is by striping the VDisks)

� To maximize sequential throughput, use a maximum range of physical disks (AIX command mklv -e x) for each LV.

� MDisk and VDisk sizes:

– Create one MDisk per RAID array.

– Create VDisks based on the space needed, which overcomes disk subsystems that do not allow dynamic LUN detection.

� When you need more space on the server, dynamically extend the VDisk on the SVC and then use the AIX command chvg -g to see the increased size in the system.

Extent size Maximum storage capacity of SVC cluster

16 MB 64 TB

32 MB 128 TB

64 MB 256 TB

128 MB 512 TB

256 MB 1 PB

512 MB 2 PB

1 GB 4 PB

2 GB 8 PB


10.3.4 Database strip size considerations (throughput workload)

It is also worthwhile thinking about the relative strip sizes (a strip is the amount of data written to one volume or “container” before going to the next volume or container). Database strip sizes are typically small. Let us assume they are 32 KB. The SVC strip size (called extent) is user selectable and in the range of 16 MB to 2 GB. The back-end RAID arrays have strip sizes in the neighborhood of 64 - 512 KB. Then, there is the number of threads performing I/O operations (assume they are sequential, because if they are random, it does not matter). The number of sequential I/O threads is extremely important and is often overlooked, but it is a key part of the design to get performance from applications that perform their own striping. Comparing striping schemes for a single sequential I/O thread might be appropriate for certain applications, such as backups, extract, transform, and load (ETL) applications, and several scientific/engineering applications, but typically, it is not appropriate for DB2 or Tivoli Storage Manager.

If we have one thread per volume or “container” performing sequential I/O, using SVC image mode VDisks ensures that the I/O is done sequentially with full strip writes (assuming RAID 5). With SVC striped VDisks, we might run into situations where two threads are doing I/O to the same back-end RAID array or run into convoy effects that temporarily reduce performance (convoy effects result in longer periods of lower throughput).

Tivoli Storage Manager uses a similar scheme as DB2 to spread out its I/O, but it also depends on ensuring that the number of client backup sessions is equal to the number of Tivoli Storage Manager storage volumes or containers. Tivoli Storage Manager performance issues can be improved by using LVM to spread out the I/Os (called PP striping), because it is difficult to control the number of client backup sessions. For this situation, a good approach is to use SVC striped VDisks rather than SVC image mode VDisks. The perfect situation for Tivoli Storage Manager is n client backup sessions going to n containers (each container on a separate RAID array).

To summarize, if you are well aware of the application’s I/O characteristics and the storage mapping (from the application all the way to the physical disks), you might want to consider explicit balancing of the I/Os using SVC image mode VDisks to maximize the application’s striping performance. Normally, using SVC striped VDisks makes sense, balances the I/O well for most situations, and is significantly easier to manage.

10.3.5 LVM volume groups and logical volumes

Without an SVC managing the back-end storage, the administrator must ensure that the host operating system aligns its device data partitions or slices with those of the logical drive. Misalignment can result in numerous boundary crossings that are responsible for unnecessary multiple drive I/Os. Certain operating systems do this automatically, and you just need to know the alignment boundary that they use. Other operating systems, however, might require manual intervention to set their start point to a value that aligns them.

With an SVC managing the storage for the host as striped VDisks, aligning the partitions is easier, because the extents of the VDisk are spread across the MDisks in the MDG. The storage administrator must ensure an adequate distribution.

Understanding how your host-based volume manager (if used) defines and makes use of the logical drives when they are presented is also an important part of the data layout. Volume managers are generally set up to place logical drives into usage groups for their use. The volume manager then creates volumes by carving up the logical drives into partitions (sometimes referred to as slices) and then building a volume from them by either striping or concatenating them to form the desired volume size.


How the partitions are selected for use and laid out can vary from system to system. In all cases, you need to ensure that spreading the partitions is done in a manner to achieve maximum I/Os available to the logical drives in the group. Generally, large volumes are built across a number of different logical drives to bring more resources to bear. You must be careful when selecting logical drives when you do this in order to not use logical drives that will compete for resources and degrade performance.

10.4 When the application does its own balancing of I/Os

In this section, we discuss how to lay out data when the SVC is implemented with applications that can balance their I/Os themselves.

10.4.1 DB2 I/O characteristics and data structures

DB2 tables are put into DB2 tablespaces. DB2 tablespaces are made up of containers that are identified storage locations, such as a raw device (logical volume) or a file system. DB2 spreads data and I/Os evenly across all containers in a tablespace by placing one DB2 extent of data in each container in a round-robin fashion. Each container will have the same I/O activity. Thus, you do not use LVM to spread out I/Os across physical disks. Rather, you create a tablespace with one container on each array, which causes DB2 to explicitly balance I/Os, because data is being accessed equally off each array.

As we will see, a single DB2 container resides on a single logical volume; thus, each container of a tablespace (the logical volume, or a file or directory on it) must reside on a single LUN on an array. This storage design achieves the goal of balanced I/Os spread evenly across physical disks. There are also db2logs that do not share the round-robin extent design. The db2logs reside on one LV, which is generally spread across all disks evenly.

Note that this storage design differs from the recommended storage design for other applications in general. For example, assuming that we are using a disk subsystem directly, the general best practice for highest performance is to create RAID arrays of the same type and size (or nearly the same size), then to take one LUN from each array and from the LVM create a VG, and then create LVs that are spread across every LUN in the VG. In other words, this technique is a spread everything (all LVs) across everything (all physical disks) approach (which is quite similar to what the SVC can do). It is better to not use this approach for DB2, because this approach uses probability to balance I/Os across physical disks, while DB2 explicitly assures that I/Os are balanced.

DB2 also evenly balances I/Os across DB2 database partitions, which can exist on different AIX logical partitions (LPARs). The same I/O principles are applied to each partition separately.

DB2 also has multiple options for containers, including:

� Storage Managed Space (SMS) file system directories � Database Managed Space (DMS) file system files � DMS raw � Automatic Storage for DB2 8.2.2

DMS and SMS are DB2 acronyms for Database Managed Space and Storage Managed Space. Think of DMS containers as pre-allocated storage and SMS containers as dynamic storage.


Note that if we use SMS file system directories, it is important to have one file system (and underlying LV) per container. That is, do not have two SMS file system directory containers in the same file system. Also, for DMS file system files, it is important to have just one file per file system (and underlying LV) per container. In other words, we have only one container per LV. The reason for these restrictions is that we do not have control of where each container resides in the LV; thus, we cannot assure that the LVs are balanced across physical disks.

The simplest way to think of DB2 data layout is to assume that we are using many disks and that we create one container per disk. In general, each container has the same sustained IOPS bandwidth and resides on a set of physically independent physical disks, because each container will be accessed equally by DB2 agents.

DB2 also has multiple types of tablespaces and storage uses. For example, tablespaces can be created separately for table data, indexes, and DB2 temporary work areas. The principles of storage design for even I/O balancing among tablespace containers applies to each of these tablespace types. Furthermore, containers for different tablespace types can be shared on the same array, thus, allowing all database objects to have equal opportunity at using all I/O performance of the underlying storage subsystem and disks. Also note that different options can be used for each container type, for example, DMS file containers might be used for data tablespaces, and SMS file system directories might be used for DB2 temporary tablespace containers.

DB2 connects physical storage to DB2 tables and database structures through the use of DB2 tablespaces. Collaboration between a DB2 DBA and the AIX Administrator (or storage administrator) to create the DB2 tablespace definitions can ensure that the guidance provided for the database storage design is implemented for optimal I/O performance of the storage subsystem by the DB2 database.

Use of Automatic Storage bypasses LVM entirely, and here, DB2 uses disks for containers. So in this case, each disk must have similar IOPS characteristics. We will not describe this option here.

10.4.2 DB2 data layout example

Assume that we have one database partition, a regular tablespace for data, and a temporary tablespace for DB2 temporary work. Further assume that we are using DMS file containers for the regular tablespace and SMS file directories for the DB2 temporary tablespace. This situation provides us with two options for LUN and LVM configuration:

� Create one LUN per array for SMS containers and one LUN per array for DMS containers.

� Create one LUN per array. Then, on each LUN, create one LV (and associated file system) for SMS containers and one LV (and associated file system) for DMS containers.

In either case, the number of VGs is irrelevant from a data layout point of view, but one VG is usually easier to administer and has an advantage for the db2log LV. For the file system logs, JFS2 in-line logs balance the I/Os across the physical disks as well. The second approach is more flexible for growth, at least on disk subsystems that do not allow dynamic LUN expansion, because as the database grows, we can increase the LVs as needed. There also does not need to be any initial planning for the size difference between DB2 tables and DB2 temporary space, which is why DB2 practitioners will frequently recommend creating only one LUN on an array. This storage design provides simplicity while maintaining the highest levels of I/O performance.

For the db2log LV, we have similar options and we can create one LUN per array and then create the LV across all the LUNs.


A second approach to growth is to add another array, the LUNs, and the LVs and allow DB2 to rebalance the data across the containers. This approach also increases the IOPS number available to DB2.

A third approach to growth is to add one or two disks to each RAID array (for disk subsystems that support dynamic RAID array expansion). This approach increases IOPS bandwidth.

For DB2 data warehouses, or extremely high bandwidth DB2 databases on the SVC, utilizing sequential mode VDisks and DB2 managed striping might be preferred.

But for other general applications, we generally recommend using striped VDisks to balance the I/Os. This recommendation also has the advantage of eliminating LVM data layout as an issue. We also recommend using SDDPCM instead of IBM Subsystem Device Driver (SDD). Growth can be handled for general applications by dynamically increasing the size of the VDisk and then using chvg -g for LVM to see the increased size. For DB2, growth can be handled by adding another container (a sequential or image mode VDisk) and allowing DB2 to restripe the data across the VDisks.

10.4.3 SVC striped VDisk recommendation

While we have recommended that applications that can handle their own striping are set up not to use the striping provided by SVC, it usually does little harm to do both kinds of striping.

One danger of multiple striping upon striping is the “beat” effect, similar to the harmonics of music. One striping method reverses (undoes) the benefits of the other striping method. However, the beat effect is easy to avoid by ensuring a wide difference in stripe granularities (sizes of the strips, extents, and so on).

You can design a careful test of an application configuration to ensure that application striping is optimal when using SVC image mode disks, therefore, supplying maximum performance. However, in a production environment, the usual scenario is a mix of different databases, built at different times for different purposes, that is housed in a large and growing number of tablespaces. Under these conditions, it is extremely difficult to ensure that application striping continues to work well in terms of distributing the total load across the whole set of physical disks.

Therefore, we recommend SVC striping even when the application does its own striping, unless you have carefully planned and tested the application and the entire environment. This approach adds a great deal more robustness to the situation. It now becomes easy to accommodate completely new databases and tablespaces with no special planning and without disrupting the balance of work. Also, the extra level of striping ensures that the load will be balanced even if the application striping fails. Perhaps most important, this recommendation lifts a significant burden from the database administrator, because good performance can be achieved with much less care and planning.

10.5 Data layout with the AIX virtual I/O (VIO) server

The purpose of this section is to describe strategies to get the best I/O performance by evenly balancing I/Os across physical disks when using the VIO Server.


10.5.1 Overview

In setting up storage at a VIO server (VIOS), a broad range of possibilities exists for creating VDisks and serving them up to VIO clients (VIOCs). The obvious consideration is to create sufficient storage for each VIOC. Less obvious, but equally important, is getting the best use of the storage. Performance and availability are of paramount importance. There are typically internal Small Computer System Interface (SCSI) disks (typically used for the VIOS operating system) and SAN disks. Availability for disk is usually handled by RAID on the SAN or by SCSI RAID adapters on the VIOS. We will assume here that any internal SCSI disks are used for the VIOS operating system and possibly for the VIOC’s operating systems. Furthermore, we will assume that the applications are configured so that the limited I/O will occur to the internal SCSI disks on the VIOS and to the VIOC’s rootvgs. If you expect your rootvg will have a significant IOPS rate, you can configure it in the same fashion as we recommend for other application VGs later.

VIOS restrictions There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI hdisks and logical volume (LV) VSCSI hdisks.

PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOS for that reason, you must use PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are VDisks from the VIOC point of view.

An LV VSCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in LVM VGs on the VIOS and cannot span PVs in that VG, nor be striped LVs.

VIOS queue depthFrom a performance point of view, the queue_depth of VSCSI hdisks is limited to 3 at the VIOC, which limits the IOPS bandwidth to approximately 300 IOPS (assuming an average I/O service time of 10 ms). Thus, you need to configure a sufficient number of VSCSI hdisks to get the IOPS bandwidth needed. The queue depth limit changed in Version 1.3 of the VIOS (August 2006) to 256; although, you still need to worry about the IOPS bandwidth of the back-end disks. When possible, set the queue depth of the VIOC hdisks to match that of the VIOS hdisk to which it maps.

10.5.2 Data layout strategies

You can use the SVC or AIX LVM (with appropriate configuration of vscsi disks at the VIOS) to balance the I/Os across the back-end physical disks. When using an SVC, here is how to balance the I/Os evenly across all arrays on the back-end storage subsystems:

� You create just a few LUNs per array on the back-end disk in each MDG (the normal practice is to have RAID arrays of the same type and size, or nearly the same size, and same performance characteristics in an MDG).

� You create striped VDisks on the SVC that are striped across all back-end LUNs.

� When you do this, the LVM setup does not matter, and you can use PV vscsi hdisks and redundant VIOSs or LV vscsi hdisks (if you are not worried about VIOS failure).


10.6 VDisk size

Larger VDisks might need more disk buffers and larger queue_depths depending on the I/O rates; however, there is a large benefit of less AIX memory and fewer path management resources used. It is worthwhile to tune the queue_depths and adapter resources for this purpose. It is preferable to use fewer large LUNs, because it is easy to increase the queue_depth, which does require application downtime, and disk buffers, because handling more AIX LUNs requires a considerable amount of OS resources.

10.7 Failure boundaries

As mentioned in 10.3.3, “General data layout recommendations” on page 212, it is important to consider failure boundaries in the back-end storage configuration. If all of the LUNs are spread across all physical disks (either by LVM or SVC VDisk striping), and you experience a single RAID array failure, you might lose all your data. So, there are situations in which you probably want to limit the spread for certain applications or groups of applications. You might have a group of applications where if one application fails, none of the applications can perform any productive work.

When implementing the SVC, limiting the spread can be accounted for through the MDG layout. Refer to Chapter 5, “MDisks” on page 83 for more information about failure boundaries in the back-end storage configuration.


Chapter 11. Monitoring

The SAN Volume Controller (SVC) provides a range of data about how it performs and also about the performance of other components of the SAN. When you properly monitor performance, having the SVC in the SAN makes it easier to recognize and fix faults and performance problems.

In this chapter, we first describe how to collect SAN topology and performance information using TotalStorage Productivity Center (TPC). We then show several examples of misconfiguration and failures, and how they can be identified in the TPC Topology Viewer and performance reports. Finally, we describe how to monitor the SVC error log effectively by using the e-mail notification function.

The examples in this chapter were taken from TotalStorage Productivity Center (TPC) V3.3.2.79, which was released in June 2008 to support SVC 4.3. You must always use the latest version of TPC that is supported by your SVC code; TPC is often updated to support new SVC features. If you have an earlier version of TPC installed, you might still be able to reproduce the reports described here, but certain data might not be available.

11


11.1 Configuring TPC to analyze the SVC

TPC manages all storage controllers using their Common Information Model (CIM) object manager (CIMOM) interface. CIMOM interfaces enable a Storage Management Initiative Specification (SMI-S) management application, such as TPC, to communicate to devices using a standards-based protocol. The CIMOM interface will translate an SMI-S command into a proprietary command that the device understands and then convert the proprietary response back into the SMI-S-based response.

The SVC’s CIMOM interface is supplied with the SVC Master Console and is automatically installed as part of the SVC Master Console installation. The Master Console can manage multiple SVC clusters, and TPC is aware of all of the clusters that it manages. TPC does not directly connect to the Config Node of the SVC cluster to manage the SVC cluster.

If you see that TPC is having difficulty communicating with or monitoring the SVC, check the health and status of the SVC Master Console.

To configure TPC to manage the SVC:

1. Start the TPC GUI application. Navigate to Administrative Services → Data Sources → CIMOM Agents → Add CIMOM. Enter the information in the Add CIMOM panel that appears. Refer to Figure 11-1 for an example.

Figure 11-1 Configuring TPC to manage the SVC

2. When you click Save, TPC will validate the information that you have provided by testing the connection to the CIMOM. If there is an error, an alert will pop up, and you must correct the error before you can save the configuration again.

Note: For TPC to manage the SVC, you must have TCP/IP connectivity between the TPC Server and the SVC Master Console. TPC will not communicate with the SVC nodes, so it is acceptable that the SVC nodes are not on the same network to which TPC has access.


3. After the connection has been successfully configured, TPC must run a CIMOM Discovery (under Administrative Services → Discovery → CIMOM) before you can set up performance monitoring or before the SVC cluster will appear in the Topology Viewer.

11.2 Using TPC to verify the fabric topology

After TPC has probed the SAN environment, it takes the information from all the SAN components (switches, storage controllers, and hosts) and automatically builds a graphical display of the SAN environment. This graphical display is available through the Topology Viewer option on the TPC navigation tree.

The information on the Topology Viewer panel is current as of the successful resolution of the last problem. By default, TPC will probe the environment daily; however, you can execute an unplanned or immediate probe at any time.

Normally, the probe takes less than five minutes to complete. If you are analyzing the environment for problem determination, we recommend that you execute an unplanned probe to ensure that you have the latest up-to-date information on the SAN environment. Make sure that the probe completes successfully.

11.2.1 SVC node port connectivity

It is important that each SVC node port is connected to switches in your SAN fabric. If any SVC node port is not connected, each node in the cluster will display an error on the LCD display (probably, error 1060). TPC will also show the health of the cluster as a warning in the Topology Viewer.

It is equally important to ensure that:

� You have at least one port from each node in each fabric.

� You have an equal number of ports in each fabric from each node; that is, do not have three ports in fabric one and only one port in fabric two for an SVC node.

Figure 11-2 on page 224 shows using TPC (under IBM TotalStorage Productivity Center → Topology → Storage) to verify that we have an even number of ports in each fabric. The example configuration shows that:

� Our SVC is connected to two fabrics (we have named our fabrics FABRIC-2GBS and FABRIC-4GBS).

� We have four SVC nodes in this cluster. TPC has organized our switch ports so that each column represents a node, which you can see, because worldwide port name (WWPN) has similar numbers.

� We have an even number of ports in each switch. Figure 11-2 on page 224 shows the links to each switch at the same time. It might be easier to validate this setup by clicking on one switch at a time (refer to Figure 11-5 on page 227).

Note: The SVC Config Node (that owns the IP address for the cluster) has a 10 session Secure Shell (SSH) limit. TPC will use one of these sessions while interacting with the SVC. You can read more information about the session limit in 3.2.1, “SSH connection limitations” on page 42.

Chapter 11. Monitoring 223

Figure 11-2 Checking the SVC ports to ensure they are connected to the SAN fabric

TPC can also show us where our host and storage are in our fabric and which switches the I/Os will go through when I/Os are generated from the host to the SVC or from the SVC to the storage controller.

For redundancy, all storage controllers must be connected to at least two fabrics, and those same fabrics must be the fabrics to which the SVC is connected.

Figure 11-3 on page 225 shows our DS4500 is also connected to fabrics FABRIC-2GBS and FABRIC-4GBS as we planned.

Information: When we cabled our SVC, we intended to connect ports 1 and 3 to one switch (IBM_2109_F32) and ports 2 and 4 to the other switch (swd77). We thought that we were really careful about labeling our cables and configuring our ports.

TPC showed us that we did not configure the ports this way, and additionally, we made two mistakes. Figure 11-2 shows that we:

� Correctly configured all four nodes with port 1 to switch IBM_2109_F32

� Correctly configured all four nodes with port 2 to switch swd77

� Incorrectly configured two nodes with port 3 to switch swd77

� Incorrectly configured two nodes with port 4 to switch IBM_2109_F32

Information: Our DS4500 was shared with other users, so we were only able to use two ports of the available four ports. The other two ports were used by a different SAN infrastructure.


Figure 11-3 Checking that your storage is in each fabric

11.2.2 Ensuring that all SVC ports are online

Information in the Topology Viewer can also confirm the health and status of the SVC and the switch ports. When you look at the Topology Viewer, TPC will show a Fibre port with a box next to the WWPN. If this box has a black line in it, the port is connected to another device. Table 11-1 shows an example of the ports with their connected status.

Table 11-1 TPC port connection status

Figure 11-2 on page 224 shows an example where all the TPC ports are connected and the switch ports are healthy.

Figure 11-4 on page 226 shows an example where the SVC ports are not healthy. In this example, the two ports that have a black line drawn between the switch and the SVC node port are in fact down.

Because TPC knew where these two ports were connected on a previous probe (and, thus, they were previously shown with a green line), the probe discovered that these ports were no longer connected, which resulted in the green line becoming a black line.

TPC port view Status

This is a port that is connected.

This is a port that is not connected.


If these ports had never been connected to the switch, no lines will show for them, and we will only see six of the eight ports connected to the switch.

Figure 11-4 Showing SVC ports that are not connected

11.2.3 Verifying SVC port zones

When TPC probes the SAN environment to obtain information on SAN connectivity, it also collects information on the SAN zoning that is currently active. The SAN zoning information is also available on the Topology Viewer through the Zone tab.

By opening the Zone tab and clicking both the switch and the zone configuration for the SVC, we can confirm that all of the SVC node ports are correct in the Zone configuration.

Figure 11-5 on page 227 shows that we have defined an SVC node zone called SVC_CL1_NODE in our FABRIC-2GBS, and we have correctly included all of the SVC node ports.


Figure 11-5 Checking that our zoning is correct

Our SVC will also be used in a Metro Mirror and Global Mirror relationship with another SVC cluster. In order for this configuration to be a supported configuration, we must make sure that every SVC in this cluster is zoned so that it can see every port in the remote cluster.

In each fabric, we made a zone set called SVC_MM_NODE with all the node ports for all of the SVC nodes. We can check each SVC to make sure that all of its ports are in fact in this zone set. Figure 11-6 on page 228 shows that we have correctly configured all ports for the SVC cluster ITSO_CL1.

The gray box shows ports in the zone.

Click on the switch to see which ports are connected to it.


Figure 11-6 Verifying Metro Mirror and Global Mirror zones

11.2.4 Verifying paths to storage

TPC 3.3 introduced a new feature called the Data Path View. You can use this view to see the path between two objects, and the Data Path View shows the objects and the switch fabric in one view.

Using the Data Path View, we can see that mdisk1 in SVC ITSOCL1 is available through all the SVC ports and trace that connectivity to its logical unit number (LUN) ST-7S10-5. Figure 11-7 on page 229 shows this view.

What is not shown in Figure 11-7 on page 229 is that you can hover over the MDisk, LUN, and switch ports with your mouse and get both health and performance information about these components. This capability enables you to verify the status of each component to see how well it is performing.

Shift click on each zone, to see all the ports.


Figure 11-7 Verifying the health between two objects in the SVC


11.2.5 Verifying host paths to the SVC

By using the computer display in TPC, you can see all the fabric and storage information for the computer that you select.

Figure 11-8 shows the host KANAGA, which has two host bus adapters (HBAs). This host has also been configured to access part of the SVC storage (the SVC storage is only partially shown in this panel).

Our Topology View confirms that KANAGA is physically connected to both of our fabrics.

By using the Zone tab, we can see that only one zone configuration applies to KANAGA for the FABRIC-2GBS zone and that no zone configuration is active for the FABRIC-4GBS zone. Therefore, KANAGA does not have redundant paths, and thus, if switch IBM_2109_F32 went offline, KANAGA will lose access to its SAN storage.

By clicking the zone configuration, we can see which port is included in a zone configuration and thus which switch has the zone configuration. The port that has no zone configuration will not be surrounded by a gray box.

Figure 11-8 Kanaga has two HBAs but is only zoned into one fabric

Using the Fabric Manager component of TPC, we can quickly fix this situation. The fixed results are shown in Figure 11-9 on page 231.


Figure 11-9 Kanaga with the zoning fixed

You can also use the Data Path Viewer in TPC to check to confirm path connectivity between a disk that an operating system sees and the VDisk that the SVC provides.

Figure 11-10 on page 232 shows two diagrams for the path information relating to host KANAGA:

� The top (left) diagram shows the path information before we fixed our zoning configuration. It confirms that KANAGA only has one path to the SVC VDisk vdisk4. Figure 11-8 on page 230 confirmed that KANAGA has two HBAs and that they are connected to our SAN fabrics. From this panel, we can deduce that our problem is likely to be a zoning configuration problem.

� The lower (right) diagram is the result that shows the zoning fixed.

Figure 11-10 on page 232 does not show us that you can hover over each component to also get health and performance information, which might also be useful when you perform problem determination and analysis.


Figure 11-10 Viewing host paths to the SVC


11.3 Analyzing performance data using TPC

TPC can collect performance information for all of the components that make up your SAN. With the performance information about the switches and storage, it is possible to view the end-to-end performance for a specific host in our SAN environment.

There are three methods of using the performance data that TPC collects:

� Using the TPC GUI to manage fabric and disk performance

By default, the TPC GUI is installed on the TPC server. You can also optionally install the TPC GUI on any supported Windows or UNIX workstation by running the setup on disk1 of the TPC media and choosing a custom installation.

By using the TPC GUI, you can monitor the performance of the:

– Switches by navigating to Fabric Manager → Reporting → Switch Performance

– Storage controllers by navigating to Disk Manager → Reporting → Storage Subsystem Performance

Both options are in the TPC navigation tree on the left side of the GUI.

The reports under these menu options provide the most detailed information about the performance of the devices.

� Using the TPC GUI with the Data Path Viewer

With TPC 3.3, there is a new Data Path Viewer display, which enables you to see the end-to-end performance between:

– A host and its disks (VDisks if they are SVC or LUNs if there is a storage controller)

– The SVC and the storage controllers that provide LUNs

– A storage controller and all the hosts to which it provides storage (including the SVC)

With the Data Path Viewer, all the information and the connectivity between a source (Initiator Entity) and a target (Target Entity) are shown in one display.

Turning on the Topology Viewer Health, Performance, and Alert overlays, you can hover over each component to get a full understanding of how they are performing and their health.

To use the Data Path Viewer, navigate to the Topology Viewer (under IBM TotalStorage Productivity Center → Topology), right-click a computer or storage controller and select Open DataPath View.

� Using the TPC command line interface (CLI) TPCTOOL

The TPCTOOL command line interface enables you to script extracting data out of TPC so that you can do more advanced performance analysis, which is particularly useful if you want to include multiple performance metrics about one or more devices on one report.

For example, if you have an application that spans multiple hosts with multiple disks coming from multiple controllers, you can use TPCTOOL to collect all the performance information from each component and group all of it together onto one report.

Using TPCTOOL assumes you have an advanced understanding of TPC and requires scripting to take full advantage of it. We recommend that you use at least TPC V3.1.2 if you plan on using the CLI interface.


11.3.1 Setting up TPC to collect performance information

TPC performance collection is either turned on or turned off. You do not need to specify the performance information that you want to collect. TPC will collect all performance counters that the SVC (or storage controller) provides and insert them into the TPC database. After the counters are there, you can report on the results using any of the three methods described in the previous section.

To enable the performance collection, navigate to Disk Manager → Monitoring and right-click Storage Performance Monitors.

We recommend that you create a separate performance monitor for each CIMOM from which you want to collect performance data. Each CIMOM provides different sampling intervals, and if you combine all of your different storage controllers into one performance collection, the sample interval might not be as granular as you want.

Additionally, by having separate performance monitor collections, you can start and stop individual monitors as required.

11.3.2 Viewing TPC-collected information

TPC collects and reports on many statistics as recorded by the SVC nodes. With these statistics, you can get general cluster performance information or more detailed specific VDisk or MDisk performance information.

An explanation of the metrics and how they are calculated is available in Appendix A of the TotalStorage Productivity Center User Guide located at this Web site:

http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.itpc.doc/tpcugd31389.htm

The TPC GUI interface provides you with an easy intuitive method of querying the TPC database to obtain information about many of the counters that it stores. One limitation of the TPC GUI is that you can only report on “like” counters at one time. For example, you cannot display response times and data rates on the same graph.

You also cannot include information from related devices on the same report. For example, you cannot combine port utilization from a switch with the host data rate as seen on the SVC. This information can only be provided in separate reports with the TPC GUI.

If you use the TPC command line interface, you will be able to collect all of the individual metrics on which you want to report and massage that data into one report.

When starting to analyze the performance of the SVC environment to identify a performance problem, we recommend that you identify all of the components between the two systems and verify the performance of the smaller components.

Note: Make sure that your TPC server, SVC Master Console, and SVC cluster are set with the correct times for their time zones.

If your SVC is configured for Coordinated Universal Time (UTC), ensure that it is in fact on UTC time and not local time. TPC will adjust the time on the performance data that it receives before inserting the data in the TPC database.

If the time does not match the time zone, it is difficult to compare performance among objects, for example, the switch performance or the storage controller performance.



Thus, traffic between a host, the SVC nodes, and a storage controller goes through these paths:

1. The host generates the I/O and transmits it on the fabric.

2. The I/O is received on the SVC node ports.

3. If the I/O is a write I/O:

a. The SVC node writes the I/O to the SVC node cache.

b. The SVC node sends a copy to its partner node to write to the partner node’s cache.

c. If the I/O is part of a Metro Mirror and Global Mirror, a copy needs to go to the target VDisk of the relationship.

d. If the I/O is part of a FlashCopy and the FlashCopy block has not been copied to the target VDisk, this action needs to be scheduled.

4. If the I/O is a read I/O:

a. The SVC needs to check the cache to see if the Read I/O is already there.

b. If the I/O is not in the cache, the SVC needs to read the data from the physical LUNs.

5. At some point, write I/Os will be sent to the storage controller.

6. The SVC might also do some read ahead I/Os to load the cache in case the next read I/O from the host is the next block.

TPC can help you report on most of these steps so that it is easier to identify where a bottleneck might exist.

11.3.3 Cluster, I/O Group, and node reports

The TPC cluster performance information is useful to get an overall idea of how the cluster is performing and to get an understanding of the workload passing through your cluster.

The I/O Group and node reports enable you to drill down into the health of the cluster and obtain a more granular understanding of the performance.

The available reports fit into the following categories.

SVC node resource performanceThese reports enable you to understand the workload on the cluster resources, particularly the load on CPU and cache memory. There is also a report that shows the traffic between nodes.

Figure 11-11 on page 236 shows an example of several of the available I/O Group resource performance metrics. In this example, we generated excessive I/O to our storage controller (of which the SVC was unaware) together with an excess load on two hosts that each had 11 VDisks from our SVC cluster. The result of this exercise was to show where a storage controller is under stress and how this stress is reflected in the TPC results.


Figure 11-11 Multiple I/O Group resource performance metrics

An important metric in this report is the CPU utilization (in dark blue). The CPU utilization reports give you an indication of how busy the cluster CPUs are. A continually high CPU Utilization rate indicates a busy cluster. If the CPU utilization remains constantly high, it might be time to increase the cluster by adding more resources.

You can add cluster resources by adding another I/O Group to the cluster (two nodes) up to the maximum of four I/O Groups per cluster.

After there are four I/O Groups in a cluster and high CPU utilization is still indicated in the reports, it is time to build a new cluster and consider either migrating part of the storage to the new cluster or servicing new storage requests from it.

We recommend that you plan additional resources for the cluster if your CPU utilization indicates workload continually above 70%.

The cache memory resource reports provide an understanding of the utilization of the SVC cache. These reports provide you with an indication of whether the cache is able to service and buffer the current workload.

In Figure 11-11, you will notice that there is an increase in the Write-cache Delay Percentage and Write-cache Flush Through Percentage and a drop in the Write-cache Hits Percentage, Read Cache Hits, and Read-ahead percentage of cache hits. This change is noted about halfway through the graph.

This change in these performance metrics together with an increase in back-end response time shows that the storage controller is heavily burdened with I/O, and at this time interval, the SVC cache is probably full of outstanding write I/Os. (We expected this result with our test run.) Host I/O activity will now be impacted with the backlog of data in the SVC cache and with any other SVC workload that is happening on the same MDisks (FlashCopy and Global/Metro Mirror).


If cache utilization is a problem, you can add additional cache to the cluster by adding an I/O Group and moving VDisks to the new I/O Group.

SVC fabric performanceThe SVC fabric performance reports help you understand the SVC’s impact on the fabric and give you an indication of the traffic between:

� The SVC and the hosts that receive storage� The SVC and the back-end storage� Nodes in the SVC cluster

These reports can help you understand if the fabric might be a performance bottleneck and if upgrading the fabric can lead to performance improvement. Figure 11-12 is one version of a port send and receive data rate report.

Figure 11-12 Port receive and send data rate for each I/O Group

Figure 11-12 and Figure 11-13 on page 238 show two versions of port rate reports. Figure 11-12 shows the overall SVC node port rates for send and receive traffic. With a 2 Gb per second fabric, these rates are well below the throughput capability of this fabric, and thus, the fabric is not a bottleneck here. Figure 11-13 on page 238 shows the port traffic broken down into host, node, and disk traffic. During our busy time as reported in Figure 11-11 on page 236, we can see that host port traffic drops while disk port traffic continues. This information indicates that the SVC is communicating with the storage controller, possibly flushing outstanding I/O write data in the cache and performing other non-host functions, such as FlashCopy and Metro Mirror and Global Mirror copy synchronization.


Figure 11-13 Total port to disk, host, and local node report

Figure 11-14 on page 239 shows an example TPC report looking at port rates between the SVC nodes, hosts, and disk storage controllers. This report shows low queue and response times, indicating that the nodes do not have a problem communicating with each other.

If this report showed unusually high queue times and high response times, our write activity (because each node communicates to each other node over the fabric) is affected.

Unusually high numbers in this report indicate:

� SVC node or port problem (unlikely)� Fabric switch congestion (more likely)� Faulty fabric ports or cables (most likely)


Figure 11-14 Port to local node send and receive response and queue times

SVC storage performanceThe remaining TPC reports give you a high level understanding of the SVC’s interaction with hosts and back-end storage. Most reports provide both an I/O rate report (measured in IOPS) and a data rate report (measured in MBps).

The particularly interesting areas of these reports include the back-end read and write rates and the back-end read and write response times, which are shown in Figure 11-15.

Figure 11-15 Back-end read and write response times


In Figure 11-15 on page 239, we see an unusual spike in back-end response time for both read and write operations, and this spike is consistent for both of our I/O Groups. This report confirms that we are receiving poor response from our storage controller and explains our lower than expected host performance.

Our cache resource reports (in Figure 11-11 on page 236) also show an unusual pattern in cache usage during the same time interval. Thus, we can attribute the cache performance to be a result of the poor back-end response time that the SVC is receiving from the storage controller. The cause of this poor response time must be investigated using all available information from the SVC and the back-end storage controller. Possible causes, which might be visible in the storage controller management tool, include:

� Physical drive failure can lead to an array rebuild, which drives internal read/write workload in the controller while the rebuild is in progress. If this array rebuild is causing poor latency, it might be desirable to adjust the array rebuild priority to lessen the load. However, this priority must be balanced with the increased risk of a second drive failure during the rebuild, which can cause data loss in a Redundant Array of Independent Disks 5 (RAID 5) array.

� Cache battery failure can lead to cache being disabled by the controller, which can usually be resolved simply by replacing the failed battery.

Summary of the available cluster reports in TPC 3.3These are the types of data available in reports on the cluster, I/O Groups, and nodes:

� Overall Data Rates and I/O Rates

� Backend I/O Rates and Data Rates

� Response Time and Backend Response Time

� Transfer Size and Backend Transfer Size

� Disk to Cache Transfer Rate

� Queue Time

� Overall Cache Hit Rates and Write Cache Delay

� Readahead and Dirty Write cache

� Write cache overflow, flush-through, and write-through

� Port Data Rates and I/O Rates

� CPU Utilization

� Data Rates, I/O Rates, Response Time, and Queue Time for:

– Port to Host – Port to Disk– Port to Local Node – Port to Remote Node

� Global Mirror Rates

� Peak Read and Write Rates

11.3.4 Managed Disk Group, Managed Disk, and Volume reports

The Managed Disk Group, Managed Disk, and Volume reports enable you to report on the performance of storage both from the back end and from the front end. Note that “Volumes” in TPC correspond to VDisks when monitoring an SVC.


By including a VDisk on a report, together with the LUNs from the storage controllers (which in turn are the MDisks over which the VDisks can be striped), you can see the performance that a host is receiving (through the VDisks) together with the impact on the storage controller (through the LUNs).

Figure 11-16 shows a VDisk named IOTEST and the associated LUNs from our DS4000 storage controller. We can see which of the LUNs are being used while IOTEST is being used.

Figure 11-16 Viewing VDisk and LUN performance

11.3.5 Using TPC to alert on performance constraints

Along with reporting on SVC performance, TPC can also generate alerts when performance has not met or has exceeded a defined threshold.

Like most TPC tasks, the alerting can report to:

� Simple Network Management Protocol (SNMP), which can enable you to send a trap to an upstream systems management application. The SNMP trap can then be used with other events occurring within the environment to help determine the root cause of an SNMP trap generated by the SVC.

For example, if the SVC reported to TPC that a Fibre Channel port went offline, it might result from a switch failure. This “port failed” trap, together with the “switch offline” trap, can be analyzed by a systems management tool, which discovers that this is a switch problem and not an SVC problem, and calls the switch technician.

� TEC Event. You can select to send a Tivoli Enterprise Console® (TEC) event.

� Login Notification. You can select to send the alert to a TotalStorage Productivity Center user. The user receives the alert upon logging in to TotalStorage Productivity Center. In the Login ID field, type the user ID.

� UNIX or Windows NT® Server system event logger.


� Script. The script option enables you to run a defined set of commands that might help address this event. For example, simply open a trouble ticket in your helpdesk ticket system.

� Notification by e-mail. TPC will send an e-mail to each person listed.

Useful performance alertsWhile you can use performance alerts to monitor any value reported by TPC, certain alerts will be more useful when identifying serious problems. These alerts include:

� Node CPU utilization threshold

The CPU utilization report alerts you when your SVC nodes become too busy. CPU utilization depends on the amount of host I/O, as well as the extent to which advanced copy services are being used. If this statistic increases beyond 70%, you might want to think about increasing the size of your cluster or adding a new cluster.

� Overall port response time threshold

The port response time alert can let you know when the SAN fabric is becoming a bottleneck. If the response times are consistently poor, perform additional analysis of your SAN fabric.

� Overall back-end response time threshold

An increase in back-end response time might indicate that you are overloading your back-end storage. The exact value at which to set the alert depends on what kind of storage controller you are using, the RAID configuration, and the typical I/O workload. A high-end controller, such as a DS8000, might be expected to have a lower typical latency than a DS4500. RAID 1 typically is faster than RAID 5. To evaluate the normal working range of your back-end storage, use TPC to collect data for a period of typical workload.

After you have established the normal working range of your controller, create a performance alert for back-end response time. You might want to set more than one alert level. For example, response time of more than 100 ms nearly always indicates that back-end storage is being overloaded, so 100 ms might be a suitable high importance alert level. You might set a low importance alert for a lower value, such as 20% over the typical response time.

11.3.6 Monitoring MDisk performance for mirrored VDisks

The new VDisk Mirroring function in SVC 4.3 allows you to mirror VDisks between different MDisk groups to improve availability. However, it is important to note that write performance of the VDisk will depend on the worst performing MDisk group. Reads are always performed from the primary VDisk Copy, if it is available. Writes remain in the SVC cache until both MDisks have completed the I/O. Therefore, if one group performs significantly worse, this problem will reduce the write performance of the VDisk as a whole.

You can use TPC to ensure that the performance of the groups is comparable:

� Report on back-end disk performance by selecting Disk Manager → Storage Subsystem Performance → By managed disk

� Include Write Data Rate

� Choose Selection and check only the MDisks that are members of the groups being used for mirrored VDisks.


The graph from this report will show whether one MDisk group performs significantly worse than the other MDisk group. If there is a gap between the two MDisk groups, consider taking steps to avoid adverse performance impact, which might include:

� Migrating other, non-mirrored MDisks from the poorly performing MDisk group to allow more bandwidth for the mirrored VDisk’s I/O

� Migrating one of the mirrored VDisk’s copies to another MDisk group with spare performance capacity

� Accepting the current performance if the slower of the two MDisk groups is still reasonable

11.4 Monitoring the SVC error log with e-mail notifications

In a SAN environment, it is important to ensure that events, such as hardware failures, are recognized promptly and that corrective action is taken. Redundancy in SAN design allows hosts to continue performing I/O even when these failures occur; however, there are two reasons to fix problems promptly:

� While operating in a degraded state, the performance of key components might be lower. For example, if a storage controller port fails, the remaining ports might not be able to cope with the I/O bandwidth from hosts or SVC.

� The longer a SAN runs with a failed component, the higher the likelihood that a second component will fail, risking a loss of access.

The SVC error log provides information about errors within the SVC, as well as problems with attached devices, such as hosts, switches, and back-end storage. By making good use of this information, having a SVC in a SAN can make it easier to diagnose problems and restore the SAN to a healthy state.

There are three ways to access the SVC error log:

� You can view the error log directly using the SVC Console GUI, which allows searching the error log for particular problems or viewing the whole log to gain an overview of what has happened. However, the administrator must consciously decide to check the error log.

� Simple Network Management Protocol (SNMP) allows continuous monitoring of events as they occur. When an error is logged, the SVC sends an SNMP trap through Ethernet to a monitoring service running on a server. Different responses can be set up for different error classes (or severities), for example, a warning about error recovery activity on back-end storage might simply be logged, while an MDisk Group going offline might trigger an e-mail to the administrator.

� SVC 4.2.0.3 and higher are capable of sending e-mails directly to a standard Simple Mail Transfer Protocol (SMTP) mail server, which means that a separate SNMP server is no longer required. The existing e-mail infrastructure at a site can be used instead, which is often preferable.

Best practice: You must configure SNMP or e-mail notification and test the configuration when the cluster is created, which will make it easier to detect and resolve SAN problems as the SVC environment grows.


11.4.1 Verifying a correct SVC e-mail configuration

After the e-mail settings have been configured on the SVC, it is important to make sure that e-mail can be successfully sent. The svctask testemail command allows you to test sending the e-mail. If the command completes without error, and the test e-mail arrives safely in the administrator’s incoming e-mail, you can be confident that error notifications will be received. If not, you must investigate where the problem lies.

The testemail output in Example 11-1 shows an example of a failed e-mail test. In this case, the test failed because the specified IP address did not exist. The part of the lscluster output that is not related to e-mail has been removed for clarity.

Example 11-1 Sending a test e-mail

IBM_2145:itsosvccl1:admin>svcinfo lscluster itsosvccl1email_server 9.43.86.82email_server_port 25email_reply [email protected]

IBM_2145:itsosvccl1:admin>svcinfo lsemailuserid name address err_type user_type inventory0 admin_email [email protected] all local off

IBM_2145:itsosvccl1:admin>svctask testemail admin_emailCMMVC6280E Sendmail error EX_TEMPFAIL. The sendmail command could not create a connection to a remote system.

Possible causes include:

� Ethernet connectivity issues between the SVC cluster and the mail server. For example, the SVC might be behind a firewall protecting the data center network, or even on a separate network segment that has no access to the mail server. As with the Master Console or System Storage Productivity Center (SSPC), the mail server must be accessible by the SVC. SMTP uses TCP port 25 (unless you have configured an alternative port); if there is a firewall, enable this port outbound from the SVC.

� Mail server relay blocking. Many administrators implement filtering rules to prevent spam, which is particularly likely if you are sending e-mail to a user who is on a different mail server or outside of the mail server’s own network. On certain platforms, the default configuration prevents mail forwarding to any other machine. You must check the mail server log to see whether it is rejecting mail from the SVC. If it is, the mail server administrator must adjust the configuration to allow the forwarding of these e-mails.

� An invalid “FROM” address. Certain mail servers will reject e-mail if no valid “FROM” address is included. SVC takes this FROM address from the email_reply field of lscluster. Therefore, make sure that a valid reply-to address is specified when setting up e-mail. You can change the reply-to address by using the command svctask chemail -reply address

If you cannot find the cause of the e-mail failure, contact your IBM service support representative (IBM SSR).


Chapter 12. Maintenance

As with any piece of enterprise storage equipment, the IBM SAN Volume Controller (SVC) is not a completely “hands-off” device. It requires configuration changes to meet growing needs, updates to software for enhanced performance, features, and reliability, and the tracking of all the data that you used to configure your SVC.

12


12.1 Configuration and change tracking

The IBM SAN Volume Controller provides great flexibility to your storage configuration that you do not otherwise have. However, with the flexibility comes an added layer of configuration that is not present in a “normal” SAN. However, your total administrative burden often decreases, because extremely few changes are necessary on your disk arrays when the SVC manages them.

There are many tools and techniques that you can use to prevent your SVC installation from spiralling out of control. What is most important is what information you track, not how you track it. For smaller installations, everything can be tracked on simple spreadsheets. In environments with several clusters, hundreds of hosts, and a whole team of administrators, more automated solutions, such as TotalStorage Productivity Center or custom databases, might be required.

We do not discuss how to track your changes, because there are far too many tools and methods available to describe here. Rather, we discuss what sort of information is extremely useful to track. You need to decide what is the best method.

In theory, your documentation must be sufficient for any engineer, who is skilled with the products that you own, to take a copy of all of your configuration information and use it to create a functionally equivalent copy of the environment from nothing. If your documentation does not allow you to achieve this goal, you are not tracking enough information.

It is a best practice to create this documentation as you install your solution. Putting this information together after deployment is likely to be a tedious, boring, and error-prone task.

In the following sections, we provide what we think is the minimum documentation needed for an SVC solution. Do not view it as an exhaustive list; you might have additional business requirements that require other data to be tracked.

12.1.1 SAN

Tracking how your SAN is configured is extremely important.

SAN diagramThe most basic piece of SAN documentation is the SAN diagram. If you ever call IBM Support asking for help with your SAN, you can be sure that the SAN diagram is likely to be one of the first things that you are asked to produce.

Maintaining a proper SAN diagram is not as difficult as it sounds. It is not necessary for the diagram to show every last host and the location of every last port; this information is more properly collected (and easier to read) in other places. To understand how difficult an overly detailed diagram is to read, refer to Figure 12-1 on page 247.

Note: Do not store all change tracking and SAN, SVC, and storage inventory information on the SAN itself.


Figure 12-1 An overly detailed SAN diagram

Instead, a SAN diagram must only include every switch, every storage device, all inter-switch links (ISLs), along with how many there are, and a representation of which switches have hosts connected to them. An example is shown in Figure 12-2 on page 248. In larger SANs with many storage devices, the diagram can still be too large to print without a large-format printer, but it can still be viewed on a panel using the zoom feature. We suggest a tool, such as Microsoft Visio®, to create your diagrams. Do not worry about finding fancy stencils or official shapes, because your diagram does not need to show exactly into which port everything is plugged. You can use your port inventory for that. Your diagram can be appropriately simple. You will notice that our sample diagram just uses simple geometric shapes and “standard” stencils to represent a SAN.

Note: These SAN diagrams are just sample diagrams. They do not necessarily depict a SAN that you actually want to deploy.

Chapter 12. Maintenance 247

Figure 12-2 A more useful diagram of a SAN

Port inventoryAlong with the SAN diagram, an inventory of “what is supposed to be plugged in where” is also quite important. Again, you can create this inventory manually or generate it with automated tools. Before using automated tools, remember that it is important that your inventory contains not just what is currently plugged into the SAN, but also what is supposed to be attached to the SAN. If a server has lost its SAN connection, merely looking at the current status of the SAN will not tell you where it was supposed to be attached.

This inventory must exist in a format that can be exported and sent to someone else and retained in an archive for long-term tracking.

The list, spreadsheet, database, or automated tool needs to contain the following information for each port in the SAN:

� The name of the attached device and whether it is a storage device, host, or another switch

� The port on the device to which the switch port is attached, for example, Host Slot 6 for a host connection or Switch Port 126 for an ISL

� The speed of the port

� If the port is not an ISL, list the attached worldwide port name (WWPN)

� For host ports or SVC ports, the destination aliases to which the host is zoned

Automated tools, obviously, can do a decent job of keeping this inventory up-to-date, but even with a fairly large SAN, a simple database, combined with standard operating procedures, can be equally effective. For smaller SANs, spreadsheets are a time-honored and simple method of record keeping.


ZoningWhile you need snapshots of your zoning configuration, you do not really need a separate spreadsheet or database just to keep track of your zones. If you lose your zoning configuration, you can rebuild the SVC parts from your zoning snapshot, and the host zones can be rebuilt from your port inventory.

12.1.2 SVC

For the SVC, there are several important components that you need to document.

Managed disks (MDisks) and Managed Disk Groups (MDGs)Records for each MDG need to contain the following information:

� Name

� The total capacity of the MDG

� Approximate remaining capacity

� Type (image or managed)

� For each MDisk in the group:

– The physical location of each logical unit number (LUN) (that is, rank, loop pair, or controller blade)

– Redundant Array of Independent Disks (RAID) level

– Capacity

– Number of disks

– Disk types (for example, 15k or 4 Gb)

Virtual disks (VDisks)The VDisk list needs to contain the following information for every VDisk in the SAN:

� Name� Owning host� Capacity� MDG� Type of I/O (sequential, random, or mixed)� Striped or sequential� Type (image or managed)

12.1.3 Storage

Actually, for the LUNs themselves, you do not need to track anything outside of what is already in your configuration documentation for the MDisks, unless the disk array is also used for direct-attached hosts.

12.1.4 General inventory

Generally separate from your spreadsheets or databases that describe the configurations of the components, you also need a general inventory of your equipment. This inventory can include information, such as:

� The physical serial number of the hardware� Support phone numbers


� Support contract numbers� Warranty end dates� Current running code level� Date that the code was last checked for updates

12.1.5 Change tickets and tracking

If you have ever called support (for any vendor) for assistance on a complicated problem, you will be asked if you have changed anything recently. Being able to produce what was changed if anything is the key that leads to a swift resolution to a large number of problems. While you might not have done anything wrong, knowing what was changed can help the support person find the action that eventually caused the problem.

As mentioned at the beginning of this section, in theory, the record of your changes must have sufficient detail that you can take all the change documentation and create a functionally equivalent copy of the environment from the beginning.

The most common way that changes are actually performed in the field is that the changes are made and then any documentation is written afterward. As in the field of computer programming, this method often leads to incomplete or useless documentation; a self-documenting SAN is just as much of a fallacy as self-documenting code. Instead, write the documentation first and make it detailed enough that you have a “self-configuring” environment. A “self-configuring” environment means that if your documentation is detailed enough, the actual act of sitting down at the configuration consoles to execute changes becomes an almost trivial process that does not involve any actual decision-making. This method is actually not as difficult as it sounds when you combine it with the checklists that we explain and demonstrate in 12.2, “Standard operating procedures” on page 251.

12.1.6 Configuration archiving

There must be at least occasional historical snapshots of your SAN and SVC configuration, so that if there are issues, these devices can be rolled back to their previous configuration. Historical snapshots can also be useful in measuring the performance impact of changes. In any case, because modern storage is relatively inexpensive, just a couple of GBs can hold a couple of years of complete configuration snapshots, even if you pull them before and after every single SAN change.

These snapshots can include:

� The supportShow output from Brocade switches

� The show tech-support details from Cisco switches

� Data collections from Enterprise Fabric Connectivity Manager (EFCM)-equipped McDATA switches

� SVC configuration (Config) dumps

� DS4x00 subsystem profiles

� DS8x00 LUN inventory commands:

– lsfbvol– lshostconnect– lsarray– lsrank– lsioports– lsvolgrp


Obviously, you do not need to pull DS4x00 profiles if the only thing you are modifying is SAN zoning.

12.2 Standard operating procedures

The phrase “standard operating procedure” (SOP) often brings to mind thick binders filled with useless, mind-numbing processes that nobody reads or uses in their daily job. It does not have to be this way, even for a relatively complicated environment.

For all of the common changes that you make to your environment, there must be procedures written that ensure that changes are made in a consistent fashion and also ensure that the changes are documented properly. If the same task is done in different ways, it can make things confusing quite quickly, especially if you have multiple staff responsible for storage administration. These procedures might be created for tasks, such as adding a new host to the SAN/SVC, allocating new storage, performing disk migrations, configuring new Copy Services relationships, and so forth.

One way to implement useful procedures is to integrate them with checklists that can then serve as change tracking records. We describe one example of a combination checklist and SOP document for adding a new server to the SAN and allocating storage to it on an SVC next.

In Example 12-1, our procedures have all of the variables set off in __Double Underscores__. The example guidance to what decisions to make is in italics.

Example 12-1 Host addition standard operating procedure, checklist, and change record

Abstract: Request__ABC456__ : Add new server __XYZ123__ to the SAN and allocate __200GB__ from SVC Cluster __1__Date of Implementation: __08/01/2008__Implementing Storage Administrator: Katja Gebuhr(x1234)Server Administrator: Jon Tate (x5678)Impact: None. This is a non-disruptive change.Risk: Low.Time estimate: __30 minutes__Backout Plan: Reverse changes

Implementation Checklist:1. ___ Verify (via phone or e-mail) that the server administrator has installed all code levels listed on the intranet site http://w3.itsoelectronics.com/storage_server_code.html

2. ___ Verify that the cabling change request, __CAB927__ has been completed.

3. ___ For each HBA in the server, update the switch configuration spreadsheet with the new server using the information below.

To decide on which SVC cluster to use: All new servers must be allocated to SVC cluster 2, unless otherwise indicated by the Storage Architect.

Note: Do not actually use this procedure exactly as described. It is almost certainly missing information vital to the proper operation of your environment. Use it instead as a general guide as to what a SOP can look like.


To decide which I/O Group to Use: These must roughly be evenly distributed. Note: If this is a high-bandwidth host, the Storage Architect might give a specific I/O Group assignment, which should be noted in the abstract.To select which Node Ports to Use: If the last digit of the first WWPN is odd (in hexadecimal, B, D, and F are also odd), use ports 1 and 3; if even, 2 and 4.HBA A:Switch: __McD_1__Port: __47__WWPN: __00:11:22:33:44:55:66:77__Port Name:__XYZ123_A__ Host Slot/Port: __5__Targets: __SVC 1, IOGroup 2, Node Ports 1__

HBA B:Switch: __McD_2__Port: __47__WWPN: __00:11:22:33:44:55:66:88__Port Name:__XYZ123_B__ Host Slot/Port: __6__Targets: __SVC 1, IOGroup 2, Node Ports 4__

4. ___ Log in to EFCM and modify the Nicknames for the new ports (using the information above).

5. ___ Collect Data Collections from both switches and attach them to this ticket with the filenames of ticket_number>_<switch name_old.zip

6. ___ Add new zones to the zoning configuration using the standard naming convention and the information above.

7. ___ Collect Data Collections from both switches again and attach them with the filenames of <ticket_number>_<switch name>_new.zip

8. Log on to the SVC Console for Cluster __2__ and:___ Obtain a config dump and attach it to this ticket under the filename <ticket_number>_<cluster_name>_old.zip___ Add the new host definition to the SVC using the information above and setting the host type to __Generic__ Do not type in the WWPN. If it does not appear in the drop-down list, cancel the operation and retry. If it still does not appear, check zoning and perform other troubleshooting as necessary.___ Create new VDisk(s) with the following parameters:

To decide on the MDiskGroup: For current requests (as of 8/1/08) use ESS4_Group_5, assuming that it has sufficient free space. If it does not have sufficient free space, inform the storage architect prior to submitting this change ticket and request an update to these procedures.

Use Striped (instead of Sequential) VDisks for all requests, unless otherwise noted in the abstract.Name: __XYZ123_1__Size: __200GB__IO Group: __2__MDisk Group: __ESS 4_Group_5__Mode: __Striped__


9. ___ Map the new VDisk to the Host

10.___ Obtain a config dump and attach it to this ticket under <ticket_number>_<cluster_name>_new.zip

11.___ Update the SVC Configuration spreadsheet using the above information, and the following supplemental data:Request: __ABC456__Project: __Foo__

12.Also update the entry for the remaining free space in the MDiskGroup with the information pulled from the SVC console.

13.___ Call the Server Administrator in the ticket header and request storage discovery. Ask them to obtain a pathcount to the new disk(s). If it is not 4, perform necessary troubleshooting as to why there are an incorrect number of paths.

14.___ Request that the storage admin confirm R/W connectivity to the paths.

15.Make notes on anything unusual in the implementation here: ____

Note that the example checklist does not contain pages upon pages of screen captures or “click Option A, select Option 7....” Instead, it assumes that the user of the checklist understands the basic operational steps for the environment.

After the change is over, the entire checklist, along with the configuration snapshots, needs to be stored in a safe place, not the SVC or any other SAN-attached location.

You must use detailed checklists even for non-routine changes, such as migration projects, to help the implementation go smoothly and provide an easy-to-read record of what was done. Writing a one-use checklist might seem horribly inefficient, but if you have to review the process for a complex project a few weeks after implementation, you might discover that your memory of exactly what was done is not as good as you thought. Also, complex, one-off projects are actually more likely to have steps skipped, because they are not routine.

12.3 Code upgrades

Code upgrades in a networked environment, such as a SAN, are complex enough. Because the SVC introduces an additional layer of code, upgrades can become a bit tricky.

12.3.1 Upgrade code levels

The SVC requires an additional layer of testing on top of the normal product testing performed by the rest of IBM storage product development. For this reason, SVC testing of newly available SAN code often runs several months behind other IBM products, which makes determining the correct code level quite easy; simply refer to the “Recommended Software Levels” and “Supported Hardware List” on the SVC support Web site under “Plan/Upgrade.”


Do not run software levels that are higher than what is recommended on those lists if possible. We do recognize that there can be situations where you need a particular code fix that is only available in a level of code later than what appears on the support matrix. If that is the case, contact your IBM marketing representative and ask for a Request for Price Quotation (RPQ); however, this particular type of modification usually does not cost you anything. These requests are relayed to IBM SVC Development and Test and are routinely granted. The purpose behind this process is to ensure that SVC Test has not run into an interoperability issue in the level of code that you want to run.

12.3.2 Upgrade frequency

Most clients perform major code upgrades every 12 - 18 months, which usually include upgrades across the entire infrastructure, so that all of the code levels are “in sync.”

It is common to wait three months or so after major version upgrades to gauge the stability of the code level. Other clients use an “n-1” policy, which means that a code upgrade does not get deployed until its replacement is released. For instance, they do not deploy 4.2 until either 4.3 or 5.0 ships.

12.3.3 Upgrade sequence

Unless you have another compelling reason (such as a fix that the SVC readme file says you must install first), upgrade the Master Console and the SVC first. Backward compatibility usually works much better than forward compatibility. Do so even if the code levels on everything else were not tested on the latest SVC release.

The exception to this rule is if you discovered that a part of your SAN is accidentally running ancient code, such as a server running a three year old copy of IBM Subsystem Device Driver (SDD).

The following list shows a desirable order:

� SVC Master Console GUI

� SVC cluster code

� SAN switches

� Host systems (host bus adapter (HBA), operating system (OS) and service packs, and multipathing driver)

� Storage controller

12.3.4 Preparing for upgrades

Before performing any SAN switch or SVC upgrade, make sure that your environment has no outstanding problems. Prior to the upgrade, you need to:

� Check all hosts for the proper number of paths. If a host was for some reason not communicating with one of the nodes in an I/O Group, it will experience an outage during an SVC upgrade, because nodes individually reset to complete the upgrade. There are several techniques that you can use to make this process less tedious; refer to 9.7.1, “Automated path monitoring” on page 205.

� Check the SVC error log for unfixed errors. Remedy all outstanding errors. (Certain clients have been known to just automatically click “this error has been fixed” just to clear out the log, which is an extremely bad idea; you must make sure that you understand the error before stating it has been fixed).


� Check your switch logs for issues. Pay special attention to your SVC and storage ports. Things to look for are signal errors, such as Link Resets and cyclic redundancy check (CRC) errors, unexplained logouts, or ports in an error state. Also, make sure that your fabric is stable with no ISLs going up and down often.

� Examine the readme files or release notes for the code that you are preparing to upgrade. There can be important notes about required pre-upgrade dependencies, unfixed issues, necessary APARs, and so on. This requirement applies to all SAN-attached devices, such as your HBAs and switches, not just the SVC.

You must also expect a write performance hit during an SVC upgrade. Because node resets are part of the upgrade, the write cache will be disabled on the I/O Group currently being upgraded.

12.3.5 SVC upgrade

Before applying the SVC code upgrade, review the following Web page to ensure the compatibility between the SVC code and the SVC Console GUI. The SAN Volume Controller and SVC Console GUI Compatibility Web site is:


Furthermore, certain concurrent upgrade paths are only available through an intermediate level. Refer to the SAN Volume Controller Concurrent Compatibility and Code Cross-Reference Web page for more information:


It is wise to schedule a low I/O activity time for the SVC code upgrade. Before making any other changes in your SAN environment, allow the SVC code upgrade to finish. Allow at least one hour to perform the code upgrade for a single SVC I/O Group and 30 minutes for each additional I/O Group. In a worst case scenario, an upgrade can take up to two hours, which implies that the SVC code upgrade will also upgrade the BIOS, SP, and the SVC service card.

New features are not available until all nodes in the cluster are at the same level. Features that are dependent on a remote cluster Metro Mirror or Global Mirror might not be available until the remote cluster is at the same level, too.

Upgrade the SVC cluster in a Metro or Global Mirror cluster relationshipWhen upgrading the SVC cluster software where the cluster participates in an intercluster relationship, make sure to only upgrade one cluster at a time. Do not attempt to upgrade both SVC clusters concurrently. This action is not policed by the software upgrade process. Allow the software upgrade to complete on one cluster before you start the upgrade on the other cluster.

If both clusters are upgraded concurrently, it might lead to a loss of synchronization. In stress situations, it might lead to a loss of availability.

Important: If the Concurrent Code Upgrade (CCU) appears to stop for a long time (up to an hour), this delay can occur because it is upgrading a low level BIOS. Never power off during a CCU upgrade unless you have been instructed to do so by IBM service personnel. If the upgrade does encounter a problem and fails, it will back out the upgrade itself.




12.3.6 Host code upgrades

Making sure that hosts run correctly and with the current HBA drivers, multipath drivers, and HBA firmware is a chronic problem for a lot of storage administrators. In most IT environments, server administration is separate from storage administration, which makes enforcement of proper code levels extremely difficult.

One thing often not realized by server administrators is that proper SAN code levels are just as important to the proper operation of the server as the latest security patches or OS updates. There is no reason not to install updates to storage-related code on the same schedule as the rest of the OS.

The ideal solution to this problem is software inventory tools that are accessible to both administration staffs. These tools can be “homegrown” or are available from many vendors, including IBM.

If automatic inventory tools are not available, an alternative approach is to have an intranet site, which is maintained by the storage staff, that details the code levels that server administrators need to be running. This effort will likely be more successful if it is integrated into a larger intranet site detailing required code levels and patches for everything else.

12.3.7 Storage controller upgrades

If you have to take a controller completely offline for disruptive maintenance, the SVC code Version 4.3.0 allows you to use the VDisk mirroring feature to prepare for this event. You can then have the copy of the primary VDisk in a different MDisk group, where different LUNs from a different storage controller are used. This feature allows you to take the controller offline, fix a problem, or perform an upgrade. When the maintenance is done, you can bring the controller back online and sync the data that has changed since it was offline.

12.4 SAN hardware changes

Part of SAN/SVC maintenance sometimes involves upgrading or replacing equipment, which might require extensive preparation before performing the change.

12.4.1 Cross-referencing the SDD adapter number with the WWPN

It is extremely common in SAN maintenance operations to gracefully take affected adapters or paths offline before performing actions that will take them down in an abrupt manner. This method allows the multipathing software to complete any outstanding commands using that path before it disappears. If you choose to gracefully take affected adapters or paths offline first, it is extremely important that you verify which adapter you will be working on before running any commands to take the adapter offline.

One common misconception is that the adapter IDs in SDD have anything to do with the slot number, FCS/FSCSI number, or any other ID they might be assigned somewhere else. Instead, you need to run several commands to properly associate the WWPN of the adapter, which can be obtained from your SAN records, to the switch on which you are performing maintenance.

For example, let us suppose that we need to perform SAN maintenance with an AIX system on the adapter with a WWPN ending in F5:B0.


The steps are:

1. Run datapath query WWPN, which will return output similar to:

[root@abc]> datapath query wwpn Adapter Name PortWWN fscsi0 10000000C925F5B0 fscsi1 10000000C9266FD1

As you can see, the adapter that we want is fscsi0.

2. Next, cross-reference fscsi0 with the output of datapath query adapter:

Active Adapters :4 Adpt# Name State Mode Select Errors Paths Active 0 scsi3 NORMAL ACTIVE 129062051 0 64 0 1 scsi2 NORMAL ACTIVE 88765386 303 64 0 2 fscsi2 NORMAL ACTIVE 407075697 5427 1024 0 3 fscsi0 NORMAL ACTIVE 341204788 63835 256 0

From here, we can see that fscsi0 has the adapter ID of 3 in SDD. We will use this ID when taking the adapter offline prior to maintenance. Note how the SDD ID was 3 even though the adapter had been assigned the device name fscsi0 by the OS.

12.4.2 Changes that result in the modification of the destination FCID

There are many changes to your SAN that will result in the modification of the destination Fibre Channel ID (FCID), which is also known as the N_Port ID. The following operating systems have suggested procedures that you must perform before the change takes place. If you do not perform these steps, you might have difficulty bringing the paths back online.

The changes that trigger this issue will be noted in this chapter. Note that changes in the FCID of the host itself will not trigger this issue.

AIXIn AIX without the SDDPCM, if you do not properly manage a destination FCID change, running cfgmgr will create brand-new hdisk devices, all of your old paths will go into a defined state, and you will have difficulty removing them from your Object Data Manager (ODM) database.

There are two ways of preventing this issue in AIX.

Dynamic TrackingThis is an AIX feature present in AIX 5.2 Technology Level (TL) 1 and higher. It causes AIX to bind hdisks to the WWPN instead of the destination FCID. However, this feature is not enabled by default, has extensive prerequisite requirements, and is disruptive to enable. For these reasons, we do not recommend that you rely on this feature to aid in scheduled changes. The alternate procedure is not particularly difficult, but if you are still interested in Dynamic Tracking, refer to the IBM System Storage Multipath Subsystem Device Driver User’s Guide, SC30-4096, for full details.

If you choose to use Dynamic Tracking, we strongly recommend that AIX is at the latest available TL. If Dynamic Tracking is enabled, no special procedures are necessary to change the FCID.


Manual device swaps with SDDUse these steps to perform manual device swaps with SDD:

1. Using the procedure in 12.4.1, “Cross-referencing the SDD adapter number with the WWPN” on page 256, obtain the SDD adapter ID.

2. Run the command datapath set adapter X offline where X is the SDD adapter ID.

3. Run the command datapath remove adapter X. Again, X is the SDD adapter ID.

4. Run rmdev -Rdl fcsY where Y is the FCS/FSCSI number. If you receive an error message about the devices being in use, you probably took the wrong adapter offline.

5. Perform your maintenance.

6. Run cfgmgr to detect your “new” hdisk devices.

7. Run addpaths to get the “new” hdisks back into your SDD vpaths.

Device swaps with SDDPCMWith or without Dynamic Tracking, the issue of not properly managing a destination FCID change is not a problem if you are using AIX Multipath I/O (MPIO) with the SDDPCM.

Other operating systemsUnfortunately, whether the HBA binds to the FCID is HBA driver-dependent. Consult your HBA vendor for further details. (We were able to provide details for AIX, because there is only one supported adapter driver.) The most common Intel HBAs made by QLogic are not affected by this issue.

12.4.3 Switch replacement with a like switch

If you are replacing a switch with another switch of the same model, your preparation is fairly straightforward:

1. If the current switch is still up and running, take a snapshot of its configuration.

2. Check all affected hosts to make sure that the path on which you will be relying during the replacement is operational.

3. If there are hosts attached to the switch, gracefully take the paths offline. In SDD, the appropriate command is datapath set adapter X offline where X is the adapter number. While technically taking the paths offline is not necessary, it is nevertheless a good idea. Follow the procedure in 12.4.1, “Cross-referencing the SDD adapter number with the WWPN” on page 256 for details.

4. Power off the old switch. Note that the SVC will log all sorts of error messages when you power off the old switch. Perform at least a spot-check of your hosts to make sure that your access to disk still works.

5. Remove the old switch, put in the new switch, and power up the new switch; do not attach any of the Fibre Channel ports yet.

6. If appropriate, match the code level on the new switch with the other switches in your fabric.

7. Give the new switch the same Domain ID as the old switch. You might also want to upload the configuration of the old switch into the new switch as well. In the case of a Cisco switch and AIX hosts using SDD, it is important to upload the configuration of the old switch into the new switch. Uploading the configuration of the old switch into the new switch ensures that the FCID of the destination devices remains constant, which often is important to AIX hosts with SDD.


8. Plug the ISLs into the new switch and make sure that the new switch merges into the fabric successfully.

9. Attach the storage ports, making sure to use the same physical ports as the old switch.

10.Attach the SVC ports and perform appropriate maintenance procedures to bring the disk paths back online.

11.Attach the host ports and bring their paths back online.

12.4.4 Switch replacement or upgrade with a different kind of switch

The only difference from the procedure in the previous section is that you are obviously not going to upload the configuration of the old switch into the new switch. You must still give the new switch the same Domain ID as the old switch. Remember that the FCIDs will almost certainly change when installing this new switch, so be sure to follow the appropriate procedures for your operating system here.

12.4.5 HBA replacement

Replacing a HBA is a fairly trivial operation if done correctly with the appropriate preparation:

1. Ensure that your SAN is currently zoned by WWPN instead of worldwide node name (WWNN). If you are using WWNN, change your zoning first.

2. If you do not have hot-swappable HBAs, power off your system, replace the HBA, power the system back on, and skip to step 5.

3. Using the procedure in 12.4.1, “Cross-referencing the SDD adapter number with the WWPN” on page 256, gracefully take the appropriate path offline.

4. Follow the appropriate steps for your hardware and software platform to replace the HBA and bring it online.

5. Ensure that the new HBA is successfully logging in to the name server on the switch. If it is not, fix this issue before continuing to the next step. (The WWPN for which you are looking is usually on a sticker on the back of the HBA or somewhere on the HBA’s packing box.)

6. In the zoning interface for your switch, replace the WWPN of the old adapter with the WWPN of the new adapter.

7. Swap out the WWPNs in the SVC host definition interface.

8. Perform the device detection procedures appropriate for your OS to bring the paths back up and verify that the paths are up with your multipathing software. (Use the command datapath query adapter in SDD.)

12.5 Naming convention

Without a proper naming convention, your SAN and SVC configuration can quickly become extremely difficult to maintain. The naming convention needs to be planned ahead of time and documented for your administrative staff. It is more important that your names are useful and informative rather than extremely short.

12.5.1 Hosts, zones, and SVC ports

If you examine 1.3.6, “Sample standard SVC zoning configuration” on page 16, you see a sample naming convention that you might want to use in your own environment.


12.5.2 Controllers

It is common to refer to disk controllers by part of their serial number, which helps facilitate troubleshooting by making the cross-referencing of logs easier. If you have a unique name, by all means, use it, but it is helpful to append the serial number to the end.

12.5.3 MDisks

The MDisks must most certainly be changed from the default name of mDisk X. The name must include the serial number of the controller, the array number/name, and the volume number/name. Unfortunately, you are limited to fifteen characters. This design builds a name similar to:

23K45_A7V10 - Serial 23K45, Array 7, Volume 10.

12.5.4 VDisks

The VDisk name must indicate for what host the VDisk is intended, along with any other identifying information that might distinguish this VDisk from other VDisks.

12.5.5 MDGs

MDG names must indicate from which controller the group comes, the RAID level, and the disk size and type. For example, 23K45_R1015k300 is an MDG on 23K45, RAID 10, 15k, 300 GB drives. (As with the other names on the SVC, you are limited to 15 characters).


Chapter 13. Cabling, power, cooling, scripting, support, and classes

In this chapter, we discuss valuable miscellaneous advice regarding the implementation of the IBM SAN Volume Controller (SVC). This chapter includes several of the supporting installations upon which the SVC relies, together with information about scripting the SVC. We also include references for further information.

13


13.1 Cabling

None of the advice in the following section is specific to the SVC. However, because cabling problems can produce SVC issues that will be troublesome and tedious to diagnose, reminders about how to structure cabling might be useful.

13.1.1 General cabling advice

All cabling used in a SAN environment must be high-quality cables certified for the speeds at which you will be using the cable. Because current SVC nodes come with shortwave small form-factor pluggable (SFP) optical transceivers, multi-mode cabling is to be used for connecting the nodes. For most installations, multi-mode cabling translates into multi-mode cables with a core diameter of 50 microns. When using the current SVC node maximum speed of 4 Gbps, the cables that you use must to be certified to meet the 400-M5-SN-I cabling specification. This specification refers to 400 MBps, 50 micron core, multi-mode, shortwave no-Open Fiber Control (OFC) laser, intermediate distance.

Note that we do not recommend recycling old 62.5 micron core cabling, which is likely to cause problems. There are specifications for using 62.5 micron cabling, but you are greatly limited as far as your maximum cable length, and many cables will not meet the stringent standards required by Fibre Channel. Also, because the SVC nodes come with LC connectors, we do not recommend that you use any conversions between LC and SC connectors in the fiber path.

We recommend that you use factory-terminated cables from a reputable vendor. Only use field-terminated cables for permanent cable installations, and only when they are installed by qualified personnel with fiber splicing skills. When using field-terminated cables, ensure that a fiber path quality test is conducted, for instance, with an Optical Time Domain Reflectometer (OTDR). To ensure cable quality, and prepare for future link speeds, we also advise that you get cables that meet the OM3 cable standard.

If you have a large data center, remember that at 4 Gbps, you are limited to a maximum cable length of 150 meters (492 feet and 1.7 inches). You must set up your SAN so that all switches are within 150 meters of the SVC nodes.

13.1.2 Long distance optical links

Certain installations will require long distance direct fiber links to connect devices in the SAN. For such links, single-mode cables are used together with longwave optical transceivers. For long distance links, you must always ensure quality by measurements prior to production usage. Also, it is of paramount importance that you use the correct optical transceivers and that your SAN switches are capable of supporting stable, error-free operations. We recommend that you consult IBM for any planned links longer than a kilometer (0.62 miles), which is especially important when using wavelength division multiplexing (WDM) solutions instead of direct fiber links.

13.1.3 Labeling

All cables must be labeled at both ends with their source and destination locations. Even in the smallest SVC installations, a lack of cable labels quickly becomes an unusable mess when you are trying to trace problems. A small SVC installation consisting of a two-port storage subsystem, 10 hosts, and a single SVC cluster with two nodes will require 30 fiber-optic cables to set up.


To ensure that unambiguous information can be read from the labels, we recommend that you institute a standard labeling scheme to be used in your environment. The labels at both cable ends must be identical. An example labeling scheme consists of three lines per label, with the following content:

Line 1: Cable first end physical location <-> Cable second end physical location

Line 2: Cable first end device name and port number

Line 3: Cable second end device name and port number

For one of the SVC clusters that was used when writing this book, the label for both ends of the cable connecting SVC node 1, port 1 to the SAN switch, port 2 looks like:

NSJ1R2U14 <-> NSJ1R3U16

itsosvccl1_n1 p1

IBM_2005_B5K_1 p2

In line one, “NSJ” refers to the site name, “Rn” is the rack number and “Un” is the rack unit number. Line two has the name of the SVC cluster node 1 together with port 1, and line three has the name of the corresponding SAN switch together with port 2.

If your cabling installation includes patch panels in the cabling path, information about these patch panels must be included in the labeling. We recommend using a cable management system to keep track of cabling information and routing. For small installations, you can use a a simple spreadsheet, but for large data center, we recommend that you use one of the customized commercial solutions that are available.

13.1.4 Cable management

With SAN switches increasing in port density, it is now theoretically possible to install more than 1 500 ports into a single rack cabinet (this number is based on the IBM System Storage/Cisco MDS 9513 SAN director).

We do not recommend that you install more than 1 500 ports into a single rack cabinet.

Most SAN installations are far too dynamic for this idea to ever work. If you ever have to swap out a faulty line card/port blade, or even worse, a switch chassis, you will be presented with an inaccessible nightmare of cables. For this reason, we strongly advise you to use proper cable management trays and guides. As a general rule, cable management takes about as much space as your switches take.

13.1.5 Cable routing and support

Most guides to fiber cabling specify a minimum bend radius of around 2.5 cm (approximately 1 inch). Note that is a radius; the minimum bend diameter needs to be twice that length.

Proper bend radius is a lofty goal to which you need to design your cabling plan to meet. However, we have never actually seen a production data center that did not have at least a few cables that did not meet that standard. While a few short cables are not a disaster, proper

Note: We strongly recommend that you use only cable labels that are made for this purpose, because they have a specific adhesive that works well with the cable jacket. Otherwise, labels made for other purposes tend to lose their grip on the cable over time.

Chapter 13. Cabling, power, cooling, scripting, support, and classes 263

bend radius will become even more important as SAN speeds increase. You can expect well over twice the number of physical layer issues at 4 Gbps as you might have seen in a 2 Gbps SAN. And, 8 Gbps will have even more stringent requirements.

There are two major causes of insufficient bend radius:

� Incorrect use of server cable management arms. These hinged arms are extremely popular in racked server designs, including the IBM design. However, you must be careful to ensure that when these arms are slid in and out, the cables in the arm do not become kinked.

� Insufficient cable support. You cannot rely on the strain-relief boots built into the ends of the cable to provide support. Over time, your cables will inevitably sag if you rely on these strain-relief boots. A common scene in many data centers is a “waterfall” of cables hanging down from the SAN switch without any other support than the strain-relief boots. Use loosely looped cable ties or cable straps to support the weight of your cables. And as stated elsewhere, make sure that you install a proper cable management system.

13.1.6 Cable length

Cables must be as close as possible to exactly the required length, with little slack. Therefore, purchase a variety of cable lengths and use the cables that will leave you the least amount of slack.

If you do have slack in your cable, you must neatly spool up the excess into loops that are around 20 cm (7.87 inches) in diameter and bundle them together. Try to avoid putting these bundled loops in a great heap on the floor, or you might never be able to remove any cables until your entire data center is destined for the scrap yard.

13.1.7 Cable installation

Before plugging in any cables, it is an extremely good idea to clean the end of the cables with a disposable, lint-free alcohol swab, which is especially true when reusing cables. Also, gently use canned air to blow any dust out of the optical transceivers.

13.2 Power

Because the SVC nodes can be compared to standard one unit rack servers, they have no particularly exotic power requirements. Nevertheless, it is often a source of field issues.

13.2.1 Bundled uninterruptible power supply units

The most notable power feature of the SVC is the required uninterruptible power supply (UPS) units.

The most important consideration with the UPS units is to make sure that they are not cross-connected, which means that you must ensure that the serial cable and the power cable from a specific UPS unit connect to the same SVC node.

Also, remember that the function of the UPS units is solely to provide battery power to the SVC nodes long enough to copy the write cache from memory onto the internal disk of the nodes. The shutdown process will begin immediately when power is lost, and the shutdown cannot be stopped by bringing back power during the shutdown. The SVC nodes will restart


immediately when power is restored. Therefore, compare the UPS units to the built-in batteries found in most storage subsystem controllers, and do not think of them as substitutes to the normal data center UPS units. If you want continuous availability, you will need to provide other sources of backup power to ensure that the power feed to your SVC cluster is never interrupted.

13.2.2 Power switch

The UPS unit that comes bundled with an SVC node only has a single power inlet, and therefore, it can only be connected to a single power feed. If you prefer to have each node connected to two power feeds, there is a small power switch available from IBM. This unit will accept two incoming power feeds and give a feed out. If one of the incoming feeds goes down, the outbound feed will not be interrupted.

Figure 13-1 Optional SVC power switch

13.2.3 Power feeds

There must be as much separation as possible between the feeds that power each node in an SVC I/O Group. The nodes must be plugged into completely different circuits within the data center; you do not want a single breaker tripping to cause an entire SVC I/O Group to shut down.

13.3 Cooling

The SVC has no extraordinary cooling requirements. From the perspective of a data center space planner, it can be compared to a pack of standard one unit rack servers. The most important considerations are:

� The SVC nodes cools front-to-back. When installing the nodes, make sure that the node front faces toward where the cold air comes in.

� Fill empty spaces in your rack with filler panels to help prevent recirculating hot exhaust air back into the air intakes. The most common filler panels do not even require screws to mount.

� Data centers with rows of racks must be set up with “hot” and “cold” aisles. Air intakes must face the cold aisles, and hot air is then blown into the hot aisles. You do not want the hot air from one rack dumping into the intake of another rack.


� In a raised-floor installation, the vent tiles must only be in the cold aisles. Vent tiles in the hot aisle can cause air recirculation problems.

� If you need to deploy fans on the floor to fix “hot spots,” you need to reevaluate your data center cooling configuration. Fans on the floor is a poor solution that will almost certainly lead to reduced equipment life. Instead, engage IBM, or any one of a number of professional data center contractors, to evaluate your cooling configuration. It might be possible to fix your cooling by reconfiguring existing airflow without having to purchase any additional cooling units.

13.4 SVC scripting

While the SVC Console GUI is an extremely user friendly tool, just as other GUIs, it is not well suited to perform large amounts of specific operations. For complex, often-repeated operations, it is more convenient to script the SVC command line interface (CLI). The SVC CLI can be scripted using any program that can pass text commands to the SVC cluster Secure Shell (SSH) connection. Using PuTTY, the component to use is plink.exe.

Engineers in IBM have developed a scripting toolkit that is designed to help automate SVC operations. It is Perl-based and available at no-charge from:


The scripting toolkit includes a sample script that you can use to redistribute extents across existing MDisks in the group. Refer to 5.6, “Restriping (balancing) extents across an MDG” on page 88 for an example use of the redistribute extents script from the scripting toolkit.

13.4.1 Standard changes

For organizations that are incorporating change management processes, the SVC scripting option is well suited for facilitating the creation of standard SVC changes. Standard changes are pre-tested changes to a production environment. By using scripted execution, you can ensure that the execution sequence for a specific change type remains the same as what has been tested.

13.5 IBM Support Notifications Service

Unless you enjoy browsing the SVC Web site on a regular basis, it is an excellent idea to sign up for the IBM Support Notifications Service. This service will send you e-mails when information on the SVC Support Web site changes, including notices of new code releases, product alerts (flashes), new publications, and so on.

The IBM Support Notifications Service is available for the SVC, along with many other IBM products, at:

http://www.ibm.com/support/subscriptions/us/

Note: The scripting toolkit is made available to users through IBM’s AlphaWorks Web site. As with all software available on AlphaWorks, it is not extensively tested and is provided on an as-is basis. It is not supported in any formal way by IBM Product Support. Use it at your own risk.





You can obtain notifications for the SVC from the “System Storage support notifications” section of this Web site.

You need an IBM ID to subscribe. If you do not have an IBM ID, you can create one (for free) by following a link from the sign-on page.

13.6 SVC Support Web site

The primary support portal for the IBM SVC is at:


This page is the primary source for new and updated information about the IBM SVC. From here, you can obtain a variety of SVC-related information, including installation and configuration guides, problem resolutions, and product feature presentations.

13.7 SVC-related publications and classes

There are several IBM publications and educational options available for the SVC.

13.7.1 IBM Redbooks publications

These are useful publications describing the SVC and important related topics:

� IBM System Storage SAN Volume Controller V4.3, SG24-6423-06. This book is an SVC configuration cookbook covering many aspects of how to implement the SVC successfully. It is in an extremely easy to read format.

� Implementing the SVC in an OEM Environment, SG24-7275. This book describes how to integrate the SVC with several non-IBM storage systems (EMC, HP, NetApp®, and HDS), as well as with the IBM DS4000 series. It also discusses storage migration scenarios.

� IBM TotalStorage Productivity Center V3.1: The Next Generation, SG24-7194. While this book was written for Version 3.1, it can be applied to later TotalStorage Productivity Center (TPC) 3.x versions. It is a cookbook style book about TPC implementation.

� TPC Version 3.3 Update Guide, SG24-7490. This book describes new features in TPC Version 3.3.

� Implementing an IBM/Brocade SAN, SG24-6116. This book discusses many aspects to consider when implementing a SAN that is based on IBM System Storage b-type/Brocade products.

� Implementing an IBM/Cisco SAN, SG24-7545. This book discusses many aspects to consider when implementing a SAN that is based on the Cisco SAN portfolio for IBM System Storage.

� IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation, SG24-7544. This book discusses many aspects regarding design and implementation of routed SANs (MetaSANs) with the IBM System Storage b-type/Brocade portfolio. It also describes SAN distance extension technology.

� IBM System Storage/Cisco Multiprotocol Routing: An Introduction and Implementation, SG24-7543. This book discusses many aspects regarding design and implementation of routed SANs with the Cisco portfolio for IBM System Storage. It also describes SAN distance extension technology.



There are many other IBM Redbooks publications available that describe TPC, SANs, and IBM System Storage Products, as well as many other topics. To browse all of the IBM Redbooks publications about Storage, go to:

http://www.redbooks.ibm.com/portals/Storage

13.7.2 Courses

IBM offers several courses to help you learn how to implement the SVC:

� SAN Volume Controller (SVC) - Planning and Implementation (ID: SN821) or SAN Volume Controller (SVC) Planning and Implementation Workshop (ID: SN830). These courses provide a basic introduction to SVC implementation. The workshop course includes a hands-on lab; otherwise, the course content is identical.

� IBM TotalStorage Productivity Center Implementation and Configuration (ID: SN856). This course is extremely useful if you plan to use TPC to manage your SVC environment.

� TotalStorage Productivity Center for Replication Workshop (ID: SN880). This course describes managing replication with TPC. The replication part of TPC is virtually a separate product from the rest of TPC, and it is not covered in the basic implementation and configuration course.


http://www.redbooks.ibm.com/portals/Storage

Chapter 14. Troubleshooting and diagnostics

The SAN Volume Controller (SVC) has proven to be a robust and reliable virtualization engine that has demonstrated excellent availability in the field. Nevertheless, from time to time, problems occur. In this chapter, we provide an overview about common problems that can occur in your environment. We discuss and explain problems related to the SVC, the Storage Area Network (SAN) environment, storage subsystems, hosts, and multipathing drivers. Furthermore, we explain how to collect the necessary problem determination data and how to overcome these problems.

14


14.1 Common problems

Today’s SANs, storage subsystems, and host systems are complicated, often consisting of hundreds or thousands of disks, multiple redundant subsystem controllers, virtualization engines, and different types of Storage Area Network (SAN) switches. All of these components have to be configured, monitored, and managed properly, and in the case of an error, the administrator will need to know what to look for and where to look.

The SVC is a great tool for isolating problems in the storage infrastructure. With functions found in the SVC, the administrator can more easily locate any problem areas and take the necessary steps to fix the problems. In many cases, the SVC and its service and maintenance features will guide the administrator directly, provide help, and suggest remedial action. Furthermore, the SVC will probe whether the problem still persists.

When experiencing problems with the SVC environment, it is important to ensure that all components comprising the storage infrastructure are interoperable. In an SVC environment, the SVC support matrix is the main source for this information. You can download the SVC Version 4.3 support matrix from:

http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&uid=ssg1S1003277&loc=en_US&cs=utf-8&lang=en

Although the latest SVC code level is supported to run on older HBAs, storage subsystem drivers, and code levels, we recommend that you use the latest tested levels.

14.1.1 Host problems

From the host point of view, you can experience a variety of problems. These problems can start from performance degradation up to inaccessible disks. There are a few things that you can check from the host itself before drilling down to the SAN, SVC, and storage subsystems.

Areas to check on the host:

� Any special software that you are using� Operating system version and maintenance/service pack level� Multipathing type and driver level� Host bus adapter (HBA) model, firmware, and driver level� Fibre Channel SAN connectivity

Based on this list, the host administrator needs to check for and correct any problems.

You can obtain more information about managing hosts on the SVC in Chapter 9, “Hosts” on page 175.

14.1.2 SVC problems

The SVC has good error logging mechanisms. It not only keeps track of its internal problems, but it also tells the user about problems in the SAN or storage subsystem. It also helps to isolate problems with the attached host systems. Every SVC node maintains a database of other devices that are visible in the SAN fabrics. This database is updated as devices appear and disappear.

Fast node resetThe SVC cluster software incorporates a fast node reset function. The intention of a fast node reset is to avoid I/O errors and path changes from the host’s point of view if a software




problem occurs in one of the SVC nodes. The fast node reset function means that SVC software problems can be recovered without the host experiencing an I/O error and without requiring the multipathing driver to fail over to an alternative path. The fast node reset is done automatically by the SVC node. This node will inform the other members of the cluster that it is resetting.

Other than SVC node hardware and software problems, failures in the SAN zoning configuration are a problem. A misconfiguration in the SAN zoning configuration might lead to the SVC cluster not working, because the SVC cluster nodes communicate with each other by using the Fibre Channel SAN fabrics.

You must check the following areas from the SVC perspective:

� The attached hosts

Refer to 14.1.1, “Host problems” on page 270.

� The SAN

Refer to 14.1.3, “SAN problems” on page 272.

� The attached storage subsystem

Refer to 14.1.4, “Storage subsystem problems” on page 272.

There are several SVC command line interface (CLI) commands with which you can check the current status of the SVC and the attached storage subsystems. Before starting the complete data collection or starting the problem isolation on the SAN or subsystem level, we recommend that you use the following commands first and check the status from the SVC perspective.

You can use these helpful CLI commands to check the environment from the SVC perspective:

� svcinfo lscontroller controllerid

Check that multiple worldwide port names (WWPNs) that match the back-end storage subsystem controller ports are available.

Check that the path_counts are evenly distributed across each storage subsystem controller or that they are distributed correctly based on the preferred controller. Use the path_count calculation found in 14.3.4, “Solving back-end storage problems” on page 288. The total of all path_counts must add up to the number of managed disks (MDisks) multiplied by the number of SVC nodes.

� svcinfo lsmdisk

Check that all MDisks are online (not degraded or offline).

� svcinfo lsmdisk mdiskid

Check several of the MDisks from each storage subsystem controller. Are they online? And, do they all have path_count = number of nodes?

� svcinfo lsvdisk

Check that all virtual disks (VDisks) are online (not degraded or offline). If the VDisks are degraded, are there stopped FlashCopy jobs? Restart these stopped FlashCopy jobs or delete the mappings.

� svcinfo lshostvdiskmap

Check that all VDisks are mapped to the correct hosts. If a VDisk is not mapped correctly, create the necessary VDisk to host mapping.

Chapter 14. Troubleshooting and diagnostics 271

� svcinfo lsfabric

Use of the various options, such as -controller, can allow you to check different parts of the SVC configuration to ensure that multiple paths are available from each SVC node port to an attached host or controller. Confirm that all SVC node port WWPNs are connected to the back-end storage consistently.

14.1.3 SAN problems

Introducing the SVC into your SAN environment and using its virtualization functions are not difficult tasks. There are basic rules to follow before you can use the SVC in your environment. These rules are not complicated; however, you can make mistakes that lead to accessibility problems or a reduction in the performance experienced. There are two types of SAN zones needed to run the SVC in your environment: a host zone and a storage zone. In addition, there must be an SVC zone that contains all of the SVC node ports of the SVC cluster; this SVC zone enables intra-cluster communication.

Chapter 1, “SAN fabric” on page 1 provides you with valuable information and important points about setting up the SVC in a SAN fabric environment.

Because the SVC is in the middle of the SAN and connects the host to the storage subsystem, it is important to check and monitor the SAN fabrics.

14.1.4 Storage subsystem problems

Today, we have a wide variety of heterogeneous storage subsystems. All these subsystems have different management tools, different setup strategies, and possible problem areas. All subsystems must be correctly configured and in good working order, without open problems, in order to support a stable environment. You need to check the following areas if you have a problem:

� Storage subsystem configuration: Ensure that a valid configuration is applied to the subsystem.

� Storage controller: Check the health and configurable settings on the controllers.

� Array: Check the state of the hardware, such as a disk drive module (DDM) failure or enclosure problems.

� Storage volumes: Ensure that the Logical Unit Number (LUN) masking is correct.

� Host attachment ports: Check the status and configuration.

� Connectivity: Check the available paths (SAN environment).

� Layout and size of Redundant Array of Independent Disks (RAID) arrays and LUNs: Performance and redundancy are important factors.

In the storage subsystem chapter, we provide you with additional information about managing subsystems. Refer to Chapter 4, “Storage controller” on page 57.

Determining the correct number of paths to a storage subsystemUsing SVC CLI commands, it is possible to find out the total number of paths to an storage subsystem. To determine the proper value of the available paths, you need to use the following formula:

Number of MDisks x Number of SVC nodes per Cluster = Number of pathsmdisk_link_count x Number of SVC nodes per Cluster = Sum of path_count


Example 14-1 shows how to obtain this information using the commands svcinfo lscontroller controllerid and svcinfo lsnode.

Example 14-1 The svcinfo lscontroller 0 command

IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0id 0controller_name controller0WWNN 200400A0B8174431mdisk_link_count 2max_mdisk_link_count 4degraded novendor_id IBMproduct_id_low 1742-900product_id_highproduct_revision 0520ctrl_s/nWWPN 200400A0B8174433path_count 4max_path_count 12WWPN 200500A0B8174433path_count 4max_path_count 8

IBM_2145:itsosvccl1:admin>svcinfo lsnodeid name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware6 Node1 1000739007 50050768010037E5 online 0 io_grp0 no 20400001C3240007 8G45 Node2 1000739004 50050768010037DC online 0 io_grp0 yes 20400001C3240004 8G44 Node3 100068A006 5005076801001D21 online 1 io_grp1 no 2040000188440006 8F48 Node4 100068A008 5005076801021D22 online 1 io_grp1 no 2040000188440008 8F4

Example 14-1 shows that two MDisks are present for the storage subsystem controller with ID 0, and there are four SVC nodes in the SVC cluster, which means that in this example the path_count is:

2 x 4 = 8

If possible, spread the paths across all storage subsystem controller ports, which is the case for Example 14-1 (four for each WWPN).


14.2 Collecting data and isolating the problem

Data collection and problem isolation in an IT environment are sometimes difficult tasks. In the following section, we explain the essential steps to collect debug data to find and isolate problems in an SVC environment. Today, there are many approaches to monitoring the complete client environment. IBM offers the IBM TotalStorage Productivity Center (TPC) storage management software. Together with problem and performance reporting, TPC offers a powerful alerting mechanism and an extremely powerful Topology Viewer, which enables the user to monitor the storage infrastructure.

Refer to Chapter 11, “Monitoring” on page 221 for more information about the TPC Topology Viewer.

14.2.1 Host data collection

Data collection methods vary by operating system. In this section, we show how to collect the data for several major host operating systems.

As a first step, always collect the following information from the host:

� Operating system: Version and level� Host Bus Adapter (HBA): Driver and firmware level� Multipathing driver level

Then, collect the following operating system specific information:

� IBM AIX

Collect the AIX system error log, by collecting a snap -gfiLGc for each AIX host.

� For Microsoft Windows or Linux hosts

Use the IBM Dynamic System Analysis (DSA) tool to collect data for the host systems. Visit the following links for information about the DSA tool:

– http://multitool.pok.ibm.com

– http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?brandind=5000008&lndocid=SERV-DSA

If your server is based on non-IBM hardware, use the Microsoft problem reporting tool, MPSRPT_SETUPPerf.EXE, found at:

http://www.microsoft.com/downloads/details.aspx?familyid=cebf3c7c-7ca5-408f-88b7-f9c79b7306c0&displaylang=en

For Linux hosts, another option is to run the tool sysreport.

� VMware ESX Server

Run the following script on the service console:

/usr/bin/vm-support

This script collects all relevant ESX Server system and configuration information, as well as ESX Server log files.

In most cases, it is also important to collect the multipathing driver used on the host system. Again, based on the host system, the multipathing drivers might be different.

If this is an IBM Subsystem Device Driver (SDD), SDDDSM, or SDDPCM host, use datapath query device or pcmpath query device to check the host multipathing. Ensure that there are


http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?brandind=5000008&lndocid=SERV-DSA



http://www.microsoft.com/downloads/details.aspx?familyid=cebf3c7c-7ca5-408f-88b7-f9c79b7306c0&displaylang=en

http://multitool.pok.ibm.com

http://multitool.pok.ibm.com



paths to both the preferred and non-preferred SVC nodes. For more information, refer to Chapter 9, “Hosts” on page 175.

Check that paths are open for both preferred paths (with select counts in high numbers) and non-preferred paths (the * or nearly zero select counts). In Example 14-2, path 0 and path 2 are the preferred paths with a high select count. Path 1 and path 3 are the non-preferred paths, which show an asterisk (*) and 0 select counts.

Example 14-2 Checking paths

C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l

Total Devices : 1

DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018101BF2800000000000037LUN IDENTIFIER: 60050768018101BF2800000000000037============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752399 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1752371 0 3 * Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0

Multipathing driver data (SDD)IBM Subsystem Device Driver (SDD) has been enhanced to collect SDD trace data periodically and to write the trace data to the system’s local hard drive. You collect the data by running the sddgetdata command. If this command is not found, collect the following four files, where SDD maintains its trace data:

� sdd.log� sdd_bak.log� sddsrv.log� sddsrv_bak.log

These files can be found in one of the following directories:

� AIX: /var/adm/ras� Hewlett-Packard UNIX: /var/adm� Linux: /var/log� Solaris: /var/adm� Windows 2000 Server and Windows NT Server: \WINNT\system32� Windows Server 2003: \Windows\system32

SDDPCMSDDPCM has been enhanced to collect SDDPCM trace data periodically and to write the trace data to the system’s local hard drive. SDDPCM maintains four files for its trace data:

� pcm.log� pcm_bak.log� pcmsrv.log� pcmsrv_bak.log


Starting with SDDPCM 2.1.0.8, the relevant data for debugging problems is collected by running sddpcmgetdata. The sddpcmgetdata script collects information that is used for problem determination and then creates a tar file at the current directory with the current date and time as a part of the file name, for example:

sddpcmdata_hostname_yyyymmdd_hhmmss.tar

When you report an SDDPCM problem, it is essential that you run this script and send this tar file to IBM Support for problem determination. Refer to Example 14-3.

Example 14-3 Use of the sddpcmgetdata script (output shortened for clarity)

>sddpcmgetdata>lssddpcmdata_confucius_20080814_012513.tar

If the sddpcmgetdata command is not found, collect the following files:

� pcm.log

� pcm_bak.log

� pcmsrv.log

� pcmsrv_bak.log

� The output of the pcmpath query adapter command

� The output of the pcmpath query device command

You can find these files in the /var/adm/ras directory.

SDDDSMSDDDSM also provides the sddgetdata script to collect information to use for problem determination. SDDGETDATA.BAT is the batch file that generates the following files:

� The sddgetdata_%host%_%date%_%time%.cab file

� SDD\SDDSrv logs

� Datapath output

� Event logs

� Cluster log

� SDD specific registry entry

� HBA information

Example 14-4 shows an example of this script.

Example 14-4 Use of the sddgetdata script for SDDDSM (output shortened for clarity)

C:\Program Files\IBM\SDDDSM>sddgetdata.batCollecting SDD trace Data

Collecting datapath command outputs

Collecting SDD and SDDSrv logs

Collecting Most current driver trace

Generating a CAB file for all the Logs


sdddata_DIOMEDE_20080814_42211.cab file generated

C:\Program Files\IBM\SDDDSM>dir Volume in drive C has no label. Volume Serial Number is 0445-53F4

Directory of C:\Program Files\IBM\SDDDSM

06/29/2008 04:22 AM 574,130 sdddata_DIOMEDE_20080814_42211.cab

Data collection script for IBM AIXIn Example 14-5, we provide a script that collects all of the necessary data for an AIX host at one time (both operating system and multipathing data). You can execute the script in Example 14-5 by using these steps:

1. vi /tmp/datacollect.sh

2. Cut and paste the script into the /tmp/datacollect.sh file and save the file.

3. chmod 755 /tmp/datacollect.sh

4. /tmp/datacollect.sh

Example 14-5 Data collection script

#!/bin/ksh

export PATH=/bin:/usr/bin:/sbin

echo "y" | snap -r # Clean up old snaps

snap -gGfkLN # Collect new; don't package yet

cd /tmp/ibmsupt/other # Add supporting datacp /var/adm/ras/sdd* .cp /var/adm/ras/pcm* .cp /etc/vpexclude .datapath query device > sddpath_query_device.outdatapath query essmap > sddpath_query_essmap.outpcmpath query device > pcmpath_query_device.outpcmpath query essmap > pcmpath_query_essmap.outsddgetdatasddpcmgetdatasnap -c # Package snap and other data

echo "Please rename /tmp/ibmsupt/snap.pax.Z after the"echo "PMR number and ftp to IBM."

exit 0

14.2.2 SVC data collectionYou can collect data for the SVC either by using the SVC Console GUI or by using the SVC CLI. In the following sections, we describe how to collect SVC data using the SVC CLI, which is the easiest method.


Data collection for SVC code Version 4.xBecause the config node is always the SVC node with which you communicate, it is essential that you copy all the data from the other nodes to the config node. In order to copy the files, first run the command svcinfo lsnode to determine the non-config nodes.

The output of this command is shown in Example 14-6.

Example 14-6 Determine the non-config nodes (output shortened for clarity)

IBM_2145:itsosvccl1:admin>svcinfo lsnodeid name WWNN status IO_group_id config_node 1 node1 50050768010037E5 online 0 no 2 node2 50050768010037DC online 0 yes

The output that is shown in Example 14-6 shows that the node with ID 2 is the config node. So, for all nodes, except the config node, you must run the command svctask cpdumps.

There is no feedback given for this command. Example 14-7 shows the command for the node with ID 1.

Example 14-7 Copy the dump files from the other nodes

IBM_2145:itsosvccl1:admin>svctask cpdumps -prefix /dumps 1

To collect all the files, including the config.backup file, trace file, errorlog file, and more, you need to run the svc_snap dumpall command. This command collects all of the data, including the dump files. To ensure that there is a current backup of the SVC cluster configuration, run a svcconfig backup before issuing the svc_snap dumpall command. Refer to Example 14-8 for an example run.

It is sometimes better to use the svc_snap and ask for the dumps individually, which you do by omitting the dumpall parameter, which captures the data collection apart from the dump files.

Example 14-8 The svc_snap dumpall command

IBM_2145:itsosvccl1:admin>svc_snap dumpallCollecting system information...Copying files, please wait...Copying files, please wait...Dumping error log...Waiting for file copying to complete...Waiting for file copying to complete...Waiting for file copying to complete...Waiting for file copying to complete...Creating snap package...Snap data collected in /dumps/snap.104603.080815.160321.tgz

After the data collection with the svc_snap dumpall command is complete, you can verify that the new snap file appears in your 2145 dumps directory using this command, svcinfo ls2145dumps. Refer to Example 14-9 on page 279.

Note: Dump files are extremely large. Only request them if you really need them.


Example 14-9 The ls2145 dumps command (shortened for clarity)

IBM_2145:itsosvccl1:admin>svcinfo ls2145dumpsid 2145_filename0 dump.104603.080801.1613331 svc.config.cron.bak_node2..23 104603.trc24 snap.104603.080815.160321.tgz

To copy the file from the SVC cluster, use secure copy (SCP). The PuTTY SCP function is described in more detail in IBM System Storage SAN Volume Controller V4.3, SG24-6423.

14.2.3 SAN data collection

In this section, we describe capturing and collecting the switch support data. If there are problems that cannot be fixed by a simple maintenance task, such as exchanging hardware, IBM Support will ask you to collect the SAN data.

We list how to collect the switch support data for Brocade, McDATA, and Cisco SAN switches.

IBM System Storage/Brocade SAN switchesFor most of the current Brocade switches, you need to issue the supportSave command to collect the support data.

Example 14-10 shows the use of the supportSave command (interactive mode) on an IBM System Storage SAN32B-3 (type 2005-B5K) SAN switch running Fabric OS v6.1.0c.

Example 14-10 The supportSave output from IBM SAN32B-3 switch (output shortened for clarity)

IBM_2005_B5K_1:admin> supportSaveThis command will collect RASLOG, TRACE, supportShow, core file, FFDC dataand other support information and then transfer them to a FTP/SCP serveror a USB device. This operation can take several minutes.NOTE: supportSave will transfer existing trace dump file first, thenautomatically generate and transfer latest one. There will be two trace dumpfiles transfered after this command.OK to proceed? (yes, y, no, n): [no] y

Host IP or Host Name: 9.43.86.133User Name: fosPassword: Protocol (ftp or scp): ftpRemote Directory: /

Saving support information for switch:IBM_2005_B5K_1, module:CONSOLE0......_files/IBM_2005_B5K_1-S0-200808132042-CONSOLE0.gz: 5.77 kB 156.68 kB/s Saving support information for switch:IBM_2005_B5K_1, module:RASLOG......files/IBM_2005_B5K_1-S0-200808132042-RASLOG.ss.gz: 38.79 kB 0.99 MB/s

Information: If there is no dump file available on the SVC cluster or for a particular SVC node, you need to contact your next level of IBM Support. The support personnel will guide you through the procedure to take a new dump.


Saving support information for switch:IBM_2005_B5K_1, module:TRACE_OLD......M_2005_B5K_1-S0-200808132042-old-tracedump.dmp.gz: 239.58 kB 3.66 MB/s Saving support information for switch:IBM_2005_B5K_1, module:TRACE_NEW......M_2005_B5K_1-S0-200808132042-new-tracedump.dmp.gz: 1.04 MB 1.81 MB/s Saving support information for switch:IBM_2005_B5K_1, module:ZONE_LOG......les/IBM_2005_B5K_1-S0-200808132042-ZONE_LOG.ss.gz: 51.84 kB 1.65 MB/s Saving support information for switch:IBM_2005_B5K_1, module:RCS_LOG......_files/IBM_2005_B5K_1-S0-200808132044-CONSOLE1.gz: 5.77 kB 175.18 kB/s Saving support information for switch:IBM_2005_B5K_1, module:SSAVELOG......_files/IBM_2005_B5K_1-S0-200808132044-sslog.ss.gz: 1.87 kB 55.14 kB/s SupportSave completedIBM_2005_B5K_1:admin>

IBM System Storage/Cisco SAN switchesEstablish a terminal connection to the switch (Telnet, SSH, or serial) and collect the output from the following commands:

� terminal length 0

� show tech-support detail

� terminal length 24

IBM System Storage/McDATA SAN switchesEnterprise Fabric Connectivity Manager (EFCM) is the preferred way of collecting data for McDATA switches.

For EFCM 8.7 and higher levels (without the group manager license), select the switch for which you want to collect data, right-click it, and launch the switch local Element Manager. Refer to Figure 14-1.

On the Element Manager panel, choose Maintenance → Data collection → Extended, and save the compressed file on the local disk. Name the compressed file to reflect your problem ticket number before uploading the file to IBM Support.

Figure 14-1 Data collection for McDATA using Element Manager

If you have the group manager license for EFCM, you can collect data from multiple switches in one run. Refer to Figure 14-2 on page 281.


Figure 14-2 Selecting Group Manager from EFCM

To collect data when you are in the EFCM Group Manager, select Run Data Collection as the Group Action (Figure 14-3). From this point, a wizard will guide you through the data collection process. Name the generated zipped file to reflect your problem ticket number before uploading the file to IBM Support.

Figure 14-3 Selecting the data collection action in EFCM Group Manager

14.2.4 Storage subsystem data collection

How you collect the data depends on the storage subsystem model. We only show how to collect the support data for IBM System Storage storage subsystems.

IBM System Storage DS4000 seriesWith Storage Manager levels higher than 9.1, there is a feature called Collect All Support Data. To collect the information, open the Storage Manager and select Advanced → Troubleshooting → Collect All Support Data as shown in Figure 14-4 on page 282.


Figure 14-4 DS4000 data collection

IBM System Storage DS8000 and DS6000 seriesBy issuing the following series of commands, you get an overview of the current configuration of an IBM System Storage DS8000 or DS6000:

� lssi

� lsarray -l

� lsrank

� lsvolgrp

� lsfbvol

� lsioport -l

� lshostconnect

The complete data collection is normally performed by the IBM service support representative (IBM SSR) or the IBM Support center. The IBM product engineering (PE) package includes all current configuration data as well as diagnostic data.

14.3 Recovering from problems

In this section, we provide guidance about how to recover from several of the more common problems that you might encounter. We also show example problems and how to fix them. In all cases, it is essential to read and understand the current product limitations to verify the configuration and to determine if you need to upgrade any components or to install the latest fixes or “patches.”

To obtain support for IBM products, visit the major IBM Support Web page on the Internet:

http://www.ibm.com/support/us/en/




From this IBM Support Web page, you can obtain various types of support by following the links that are provided on this page.

To review the SVC Web page for the latest flashes, the concurrent code upgrades, code levels, and matrixes, go to:

http://www.ibm.com/storage/support/2145/

14.3.1 Solving host problems

Apart from hardware-related problems, there can be problems in areas, such as the operating system or the software used on the host. These problems are normally handled by the host administrator or the service provider of the host system.

However, the multipathing driver installed on the host and its features can help to determine possible problems. In Example 14-11, we show two faulty paths reported by the IBM Subsystem Device Driver (SDD) output on the host by using the datapath query device -l command. The faulty paths are the paths in the “close” state. Faulty paths can be caused by both hardware and software problems.

Hardware problems, such as:

� Faulty small form-factor pluggable transceiver (SFP) in the host or the SAN switch� Faulty fiber optic cables � Faulty Host Bus Adapters (HBA)

Software problems, such as:

� A back level multipathing driver� A back level HBA firmware� Failures in the zoning� The wrong host to VDisk mapping

Example 14-11 SDD output on a host with faulty paths


Total Devices : 1

DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018381BF2800000000000027LUN IDENTIFIER: 60050768018381BF2800000000000027============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk4 Part0 CLOSE OFFLINE 218297 0 1 * Scsi Port2 Bus0/Disk4 Part0 CLOSE OFFLINE 0 0 2 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 222394 0 3 * Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0

Based on our field experience, we recommend that you check the hardware first:

� Check if any connection error indicators are lit on the host or SAN switch.

� Check if all of the parts are seated correctly (cables securely plugged in the SFPs, and the SFPs plugged all the way into the switch port sockets).

� Ensure that there are no broken fiber optic cables (if possible, swap the cables to cables that are known to work).




After the hardware check, continue to check the software setup:

� Check that the HBA driver level and firmware level are at the recommended and supported levels.

� Check the multipathing driver level, and make sure that it is at the recommended and supported level.

� Check for link layer errors reported by the host or the SAN switch, which can indicate a cabling or SFP failure.

� Verify your SAN zoning configuration.

� Check the general SAN switch status and health for all switches in the fabric.

In Example 14-12, we discovered that one of the HBAs was experiencing a link failure due to a fiber optic cable that has been bent over too far. After we changed the cable, the missing paths reappeared.

Example 14-12 Output from datapath query device command after fiber optic cable change


Total Devices : 1

DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZEDSERIAL: 60050768018381BF2800000000000027LUN IDENTIFIER: 60050768018381BF2800000000000027============================================================================Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 218457 1 1 * Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 2 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 222394 0 3 * Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 0 0

14.3.2 Solving SVC problems

For any problem in an environment implementing the SVC, we advise you to use the “Run Maintenance Procedure” function, which you can find in the SVC Console GUI as shown in Figure 14-5 on page 285, first before trying to fix the problem anywhere else.

The maintenance procedure checks the error condition, and if it was a temporary failure, it marks this problem as fixed; otherwise, the problem persists. In this case, the SVC will guide you through several verification steps to help you isolate the problem area.

The SVC error log provides you with information, such as all of the events on the SVC, all of the error messages, and SVC warning information. Although you can mark the error as fixed in the error log, we recommend that you always use the “Run Maintenance Procedure” function as shown in Figure 14-5 on page 285.

The SVC error log has a feature called Sense Expert as shown in Figure 14-5 on page 285.


Figure 14-5 Error with Sense Expert available

When you click Sense Expert, the sense data is translated into data that is more clearly explained and more easily understood, as shown in Figure 14-6 on page 286.


Figure 14-6 Sense Expert output

Another common practice is to use the SVC CLI to find problems. The following list of commands provides you with information about the status of your environment:

� svctask detectmdisk (discovers any changes in the back-end storage configuration)

� svcinfo lscluster clustername (checks the SVC cluster status)

� svcinfo lsnode nodeid (checks the SVC nodes and port status)

� svcinfo lscontroller controllerid (checks the back-end storage status)

� svcinfo lsmdisk (provides a status of all the MDisks)

� svcinfo lsmdisk mdiskid (checks the status of a single MDisk)

� svcinfo lsmdiskgrp (provides a status of all the MDisk groups)

� svcinfo lsmdiskgrp mdiskgrpid (checks the status of a single MDisk group)

� svcinfo lsvdisk (checks if VDisks are online)

Important: Although the SVC raises error messages, most problems are not caused by the SVC. Most problems are introduced by the storage subsystems or the SAN.


If the problem is caused by the SVC and you are unable to fix it either with the “Run Maintenance Procedure” function or with the error log, you need to collect the SVC debug data as explained in 14.2.2, “SVC data collection” on page 277.

If the problem is related to anything outside of the SVC, refer to the appropriate section in this chapter to try to find and fix the problem.

Cluster upgrade checksThere are a number of prerequisite checks to perform to confirm readiness prior to performing an SVC cluster code load:

� Check the back-end storage configurations for SCSI ID to LUN ID mappings. Normally, a 1625 error is detected if there is a problem, but it is also worthwhile to manually check these back-end storage configurations for SCSI ID to LUN ID mappings.

Specifically, you need to make sure that the SCSI ID to LUN ID is the same for each SVC node port.

You can use these commands on the Enterprise Storage Server (ESS) to pull the data out to check ESS mapping:

esscli list port -d “ess=<ESS name>” esscli list hostconnection -d “ess=<ESS name>” esscli list volumeaccess -d “ess=<ESS name>”

And, verify that the mapping is identical.

Use the following commands for an IBM System Storage DS8000 series storage subsystem to check the SCSI ID to LUN ID mappings:

lsioport -l lshostconnect -l showvolgrp -lunmap <volume group> lsfbvol -l -vol <SVC volume groups>

LUN mapping problems are unlikely on a DS8000-based storage subsystem because of the way that volume groups are allocated; however, it is still worthwhile to verify the configuration just prior to upgrades.

For the IBM System Storage DS4000 series, we also recommend that you verify that each SVC node port has an identical LUN mapping.

From the DS4000 Storage Manager, you can use the Mappings View to verify the mapping. You can also run the data collection for the DS4000 and use the subsystem profile to check the mapping.

� For storage subsystems from other vendors, use the corresponding steps to verify the correct mapping.

� Check the host multipathing to ensure path redundancy.

� Use the svcinfo lsmdisk and svcinfo lscontroller commands to check the SVC cluster to ensure the path redundancy to any back-end storage controllers.

� Use the “Run Maintenance Procedure” function or “Analyze Error Log” function in the SVC Console GUI to investigate any unfixed or investigated SVC errors.

� Download and execute the SAN Volume Controller Software Upgrade Test Utility:


� Review the latest flashes, hints, and tips prior to the cluster upgrade. There will be a list of directly applicable flashes, hints, and tips on the SVC code download page. Also, review the latest support flashes on the SVC support page.







14.3.3 Solving SAN problems

A variety of situations can cause problems in the SAN and on the SAN switches. Problems can be related to either a hardware fault or to a software problem on the switch. Hardware defects are normally the easiest problems find. Here is a short list of possible hardware failures:

� Switch power, fan, or cooling units� Application-specific integrated circuit (ASIC)� Installed SFP modules� Fiber optic cables

Software failures are more difficult to analyze, and in most cases, you need to collect data, and you need to involve IBM Support. But before taking any other step, we recommend that you check the installed code level for any known problems. We also recommend that you check if there is a new code level available that resolves the problem that you are experiencing.

The most common SAN problems are usually related to zoning, for example, you choose the wrong WWPN for a host zone, such as when two SVC node ports need to be zoned to one HBA, with one port from each SVC node. But, Example 14-13 shows that there are two ports zoned belonging to the same node. The result is that the host and its multipathing driver will not see all of the necessary paths. Incorrect zoning is shown in Example 14-13.

Example 14-13 Wrong WWPN zoning

zone: Senegal_Win2k3_itsosvccl1_iogrp0_Zone 50:05:07:68:01:20:37:dc 50:05:07:68:01:40:37:dc 20:00:00:e0:8b:89:cc:c2

The correct zoning must look like the zoning shown in Example 14-14.

Example 14-14 Correct WWPN zoning

zone: Senegal_Win2k3_itsosvccl1_iogrp0_Zone 50:05:07:68:01:40:37:e5 50:05:07:68:01:40:37:dc 20:00:00:e0:8b:89:cc:c2

The following SVC error codes are related to the SAN environment:

� Error 1060 Fibre Channel ports are not operational.� Error 1220 A remote port is excluded.

If you are unable to fix the problem with these actions, use 14.2.3, “SAN data collection” on page 279, collect the SAN switch debugging data, and then contact IBM Support.

14.3.4 Solving back-end storage problems

SVC is a great tool to find and analyze back-end storage subsystem problems, because the SVC has a monitoring and logging mechanism.

However, the SVC is not as helpful in finding problems from a host perspective, because the SVC is a SCSI target for the host, and the SCSI protocol defines that errors are reported via the host.


Typical problems for storage subsystem controllers include incorrect configuration, which results in a 1625 error code. Other problems related to the storage subsystem are failures pointing to the managed disk I/O (error code 1310), disk media (error code 1320), and error recovery procedure (error code 1370).

However, all messages do not have just one explicit reason for being issued. Therefore, you have to check multiple areas and not just the storage subsystem. Next, we explain how to determine the root cause of the problem and in what order to start checking:

1. Run the maintenance procedures under SVC.

2. Check the attached storage subsystem for misconfigurations or failures.

3. Check the SAN for switch problems or zoning failures.

4. Collect all support data and involve IBM Support.

Now, we look at these steps sequentially:

1. Run the maintenance procedures under SVC.

To run the SVC Maintenance Procedures, open the SVC Console GUI. Select Service and Maintenance → Run Maintenance Procedures. On the Maintenance Procedures panel that appears in the right pane, click Start Analysis (Figure 14-7).

Figure 14-7 Start Analysis from the SVC Console GUI

For more information about how to use the SVC Maintenance Procedures, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06, or the SVC Service Guide, S7002158.

2. Check the attached storage subsystem for misconfigurations or failures:

a. Independent of the type of storage subsystem, the first thing for you to check is whether there are any open problems on the system. Use the service or maintenance features provided with the storage subsystem to fix these problems.

b. Then, check if the LUN masking is correct. When attached to the SVC, you have to make sure that the LUN masking maps to the active zone set on the switch. Create a similar LUN mask for each storage subsystem controller port that is zoned to the SVC. Also, observe the SVC restrictions for back-end storage subsystems, which can be found at:


c. Next, we show an example of a misconfigured storage subsystem, and how this misconfigured storage system will appear from the SVC’s point of view. Furthermore, we explain how to fix the problem.

By running the svcinfo lscontroller ID command, you will get output similar to the output that is shown in Example 14-15 on page 290. As highlighted in the example, the MDisks, and therefore, the LUNs, are not equally allocated. In our example, the LUNs provided by the storage subsystem are only visible by one path, which is storage subsystem WWPN.





Example 14-15 The svcinfo lscontroller command

IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0id 0controller_name controller0WWNN 200400A0B8174431mdisk_link_count 2max_mdisk_link_count 4degraded novendor_id IBMproduct_id_low 1742-900product_id_highproduct_revision 0520ctrl_s/nWWPN 200400A0B8174433path_count 8max_path_count 12

WWPN 200500A0B8174433path_count 0max_path_count 8

This imbalance has two possible causes:

• If the back-end storage subsystem implements a preferred controller design, perhaps the LUNs are all allocated to the same controller. This situation is likely with the IBM System Storage DS4000 series, and you can fix it by redistributing the LUNs evenly across the DS4000 controllers and then rediscovering the LUNs on the SVC. Because we used a DS4500 storage subsystem (type 1742) in the Example 14-15, we need to check for this situation.

• Another possible cause is that the WWPN with zero count is not visible to all the SVC nodes via the SAN zoning or the LUN masking on the storage subsystem. Use the SVC CLI command svcinfo lsfabric 0 to confirm.

If you are unsure which of the attached MDisks has which corresponding LUN ID, use the SVC CLI command svcinfo lsmdisk (refer to Example 14-16). This command also shows to which storage subsystem a specific MDisk belongs (the controller ID).

Example 14-16 Determine the ID for the MDisk

IBM_2145:itsosvccl1:admin>svcinfo lsmdiskid name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID0 mdisk0 online managed 0 MDG-1 600.0GB 0000000000000000 controller0 600a0b800017423300000059469cf845000000000000000000000000000000002 mdisk2 online managed 0 MDG-1 70.9GB 0000000000000002 controller0 600a0b800017443100000096469cf0e800000000000000000000000000000000

The problem turned out to be with the LUN allocation across the DS4500 controllers. After fixing this allocation on the DS4500, an SVC MDisk rediscovery fixed the problem from the SVC’s point of view. Example 14-17 on page 291 shows an equally distributed MDisk.


Example 14-17 Equally distributed MDisk on all available paths

IBM_2145:itsosvccl1:admin>svctask detectmdisk

IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0id 0controller_name controller0WWNN 200400A0B8174431mdisk_link_count 2max_mdisk_link_count 4degraded novendor_id IBMproduct_id_low 1742-900product_id_highproduct_revision 0520ctrl_s/nWWPN 200400A0B8174433path_count 4max_path_count 12WWPN 200500A0B8174433path_count 4max_path_count 8

d. In our example, the problem was solved by changing the LUN allocation. If step 2 did not solve the problem, you need to continue with step 3.

3. Check the SANs for switch problems or zoning failures.

Many situations can cause problems in the SAN. Refer to 14.2.3, “SAN data collection” on page 279 for more information.

4. Collect all support data and involve IBM Support.

Collect the support data for the involved SAN, SVC, or storage systems as described in 14.2, “Collecting data and isolating the problem” on page 274.

Common error recovery steps using the SVC CLIIn this section, we describe how to use the SVC CLI to perform common error recovery steps for back-end SAN problems or storage problems.

The maintenance procedures perform these steps, but it is sometimes quicker to run these commands directly via the CLI. Run these commands anytime that you have:

� Experienced a back-end storage issue (for example, error code 1370 or error code 1630)

� Performed maintenance on the back-end storage subsystems

It is especially important to run these commands when there is a back-end storage configuration or zoning change to ensure that the SVC follows the changes.

The SVC CLI commands for common error recovery are:

� The svctask detectmdisk command (discovers the changes in the back end)

� The svcinfo lscontroller command and the svcinfo lsmdisk command (give you overall status of all of the controllers and MDisks)

� The svcinfo lscontroller controllerid command (checks the controller that was causing the problems and verifies that all the WWPNs are listed as you expect)

� svctask includemdisk mdiskid (for each degraded or offline MDisk)


� The svcinfo lsmdisk command (Are all MDisks online now?)

� The svcinfo lscontroller controllerid command (checks that the path_counts are distributed somewhat evenly across the WWPNs)

Finally, run the maintenance procedures on the SVC to fix every error.

14.4 Livedump

SVC livedump is a procedure that IBM Support might ask your clients to run for problem investigation.

Sometimes, investigations require a livedump from the configuration node in the SVC cluster. A livedump is a lightweight dump from a node, which can be taken without impacting host I/O. The only impact is a slight reduction in system performance (due to reduced memory being available for the I/O cache) until the dump is finished. The instructions for a livedump are:

1. Prepare the node for taking a livedump: svctask preplivedum <node id/name>

This command will reserve the necessary system resources to take a livedump. The operation can take some time, because the node might have to flush data from the cache. System performance might be slightly affected after running this command, because part of the memory, which normally is available to the cache, is not available while the node is prepared for a livedump.

After the command has completed, then the livedump is ready to be triggered, which you can see by looking at the output from svcinfo lslivedump <node id/name>.

The status must be reported as “prepared.”

2. Trigger the livedump: svctask triggerlivedump <node id/name>

This command completes as soon as the data capture is complete, but before the dump file has been written to disk.

3. Query the status and copy the dump off when complete:

svcinfo lslivedump <nodeid/name>

The status shows “dumping” while the file is being written to disk and “inactive” after it is completed. After the status returns to the inactive state, you can find the livedump file in /dumps on the node with a filename of the format:

livedump.<panel_id>.<date>.<time>

You can then copy this file off the node, just as you copy a normal dump, by using the GUI or SCP.

The dump must then be uploaded to IBM Support for analysis.

Note: Only invoke the SVC livedump procedure under the direction of IBM Support.


Chapter 15. SVC 4.3 performance highlights

In this chapter, we discuss the performance improvements that have been made with the 4.3.0 release of the SAN Volume Controller (SVC) code and the advantage of upgrading to the latest 8G4 node hardware. We also discuss how to optimize your system to gain the maximum benefit from the improvements that are not discussed elsewhere in this book. We look in detail at:

� Improvements between SVC 4.2 and SVC 4.3

� Benefits of the latest 8G4 nodes

� Caching and striping capabilities

� Sequential scaling of additional nodes

15


15.1 SVC and continual performance enhancements

Since the introduction of the SVC in May 2003, IBM has continually increased its performance capabilities to meet increasing client demands. The SVC architecture brought together, for the first time, the full range of capabilities needed by storage administrators to regain control of SAN complexity, while also meeting aggressive goals for storage reliability and performance. On 29 October 2004, SVC Release 1.2.1 increased the potential for storage consolidation by doubling the maximum number of supported SVC nodes from four to eight.

There is also a performance white paper available to IBM employees at this Web site:

http://tinyurl.com/2el4ar

Contact your IBM marketing representative for details about getting this white paper.

The release of Version 2 of the SVC code included performance improvements that increased the online transaction processing (OLTP) performance. With the release of SVC 3.1, not only were there continued code improvements but a new release of hardware: the 8F2 node with a doubling of cache and improved processor and internal bus speeds. The 8F4 node included support for 4 Gbps SANs and an increase of performance.

SVC 4.2, and the new 8G4 node brought a dramatic increase in performance as demonstrated by the results in the Storage Performance Council (SPC) Benchmarks: SPC-1 and SPC-2.

The benchmark number 272,505.19 SPC-1 IOPS is the industry-leading OLTP result and the PDF is available here:

http://www.storageperformance.org/results/b00024_IBM-SVC4.2_SPC2_executive-summary.pdf

The throughput benchmark, 7,084.44 SPC-2 MBPS, is the industry-leading throughput benchmark, and the PDF is available here:


The performance improvement over time can be seen in Figure 15-1 on page 295 for OLTP.


http://tinyurl.com/2el4ar




Figure 15-1 SPC-1 Benchmark over time

In Figure 15-2 on page 296, we show the improvement for throughput. Because the SPC-2 benchmark was only introduced in 2006, this graph is of necessity over a shorter time span.

Chapter 15. SVC 4.3 performance highlights 295

Figure 15-2 SPC-2 benchmark over time

15.2 SVC 4.3 code improvements

SVC code upgrades generally include a range of minor performance improvements. The following larger changes have been made since SVC 4.2.0:

� SVC 4.2.1 improved the ability of the cache to adapt to performance differences between back-end storage controllers. If one Managed Disk Group (MDG) has poor performance, for example, because a cache battery has failed, the amount of write cache that it is allowed to use will be limited, which means that VDisks hosted by other MDGs will continue to benefit from SVC’s write caching despite the broken storage controller.

� SVC 4.3.0 tunes inter-node communication over the SAN, which can improve performance for workloads consisting of many small I/Os.

While these changes will improve performance in certain circumstances, upgrading from older node hardware to the latest 8G4 level will have a much greater effect. The following test results demonstrate the kind of improvement that can be expected.

15.3 Performance increase when upgrading to 8G4 nodes

Figure 15-3 on page 297 uses a variety of workloads to examine the performance gains achieved by upgrading the software on an 8F4 node to SVC 4.2. These gains are compared with those gains that result from a complete hardware and software replacement based upon 8G4 node technology.


Figure 15-3 Comparison of a software only upgrade to a full upgrade of an 8F4 node (variety of workloads, I/O rate times 1000)

As you can see in Figure 15-3, significant gains can be achieved with the software-only upgrade. The 70/30 miss workload, consisting of 70 percent read misses and 30 percent write misses, is of special interest. This workload contains a mix of both reads and writes, which we ordinarily expect to see under production conditions.

Figure 15-4 on page 298 presents another view of the effect of moving to the latest level of software and hardware.


Figure 15-4 Two node SVC cluster with random 4 KB throughput

Figure 15-5 presents a more detailed view of performance on this specific workload. Figure 15-5 shows that the SVC 4.2 software-only upgrade boosts the maximum throughput for the 70/30 workload by more than 30%. Thus, a significant portion of the overall throughput gain achieved with full hardware and software replacement comes from the software enhancements.

Figure 15-5 Comparison of a software only upgrade to a full upgrade of an 8F4 node 70/30 miss workload

2 Node - 70/30 4K Random Miss

0

5

10

15

20

25

30

0 20000 40000 60000 80000 100000 120000

Throughput (IO/s)

Res

pons

e Ti

me

(ms)

4.1.0 8F4 4.2.0 8F4 4.2.0 8G4


15.3.1 Performance scaling of I/O Groups

We turn now to a discussion of the SVC’s capability to scale up to extremely high levels of I/O demand. This section focuses on an online transaction processing (OLTP) workload, typical of a database’s I/O demands; the following section then examines SVC scalability for sequential demands. Figure 15-6 shows the SPC-1 type performance delivered by two, four, six, or eight SVC nodes. The OLTP workload is handled by 1 536 15K RPM disks configured as Redundant Array of Independent Disks 10 (RAID 10). The host connectivity was through 32 Fibre Channels.

Figure 15-6 OLTP workload performance with two, four, six, or eight nodes

Figure 15-7 on page 300 presents the database scalability results at a higher level by pulling together the maximum throughputs (observed at a response time of 30 milliseconds or less) for each configuration. The latter figure shows that SVC performance scales in a nearly linear manner depending upon the number of nodes.


Figure 15-7 OLTP workload scalability

As Figure 15-6 on page 299 and Figure 15-7 show, the tested SVC configuration is capable of delivering over 270 000 I/Os per second (IOPS) for the OLTP workload. You are encouraged to compare this result against any other disk storage product currently posted on the SPC Web site at:

http://www.storageperformance.org


http://www.storageperformance.org

Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.

IBM Redbooks publications

For information about ordering these publications, refer to “How to get IBM Redbooks publications” on page 303. Note that several of the documents referenced here might be available in softcopy only:

� IBM System Storage SAN Volume Controller, SG24-6423-06

� Get More Out of Your SAN with IBM Tivoli Storage Manager, SG24-6687

� IBM Tivoli Storage Area Network Manager: A Practical Introduction, SG24-6848

� IBM System Storage: Implementing an IBM SAN, SG24-6116

� DS4000 Best Practices and Performance Tuning Guide, SG24-6363-02

Other resources

These publications are also relevant as further information sources:

� IBM System Storage Open Software Family SAN Volume Controller: Planning Guide, GA22-1052

� IBM System Storage Master Console: Installation and User’s Guide, GC30-4090

� IBM System Storage Open Software Family SAN Volume Controller: Installation Guide, SC26-7541

� IBM System Storage Open Software Family SAN Volume Controller: Service Guide, SC26-7542

� IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide, SC26-7543

� IBM System Storage Open Software Family SAN Volume Controller: Command-Line Interface User's Guide, SC26-7544

� IBM System Storage Open Software Family SAN Volume Controller: CIM Agent Developers Reference, SC26-7545

� IBM TotalStorage Multipath Subsystem Device Driver User’s Guide, SC30-4096

� IBM System Storage Open Software Family SAN Volume Controller: Host Attachment Guide, SC26-7563

� IBM System Storage SAN Volume Controller V4.3, SG24-6423-06

� Implementing the SVC in an OEM Environment, SG24-7275

� IBM TotalStorage Productivity Center V3.1: The Next Generation, SG24-7194

� TPC Version 3.3 Update Guide, SG24-7490

� Implementing an IBM/Brocade SAN, SG24-6116

� Implementing an IBM/Cisco SAN, SG24-7545


� IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation, SG24-7544

� IBM System Storage/Cisco Multiprotocol Routing: An Introduction and Implementation, SG24-7543

� Considerations and Comparisons between IBM SDD for Linux and DM-MPIO, which is available at:


� TotalStorage Productivity Center User Guide, which is located at:


Referenced Web sites

These Web sites are also relevant as further information sources:

� IBM TotalStorage home page:

http://www.storage.ibm.com

� SAN Volume Controller supported platform:

http://www-1.ibm.com/servers/storage/support/software/sanvc/index.html

� Download site for Windows SSH freeware:

http://www.chiark.greenend.org.uk/~sgtatham/putty

� IBM site to download SSH for AIX:

http://oss.software.ibm.com/developerworks/projects/openssh

� Open source site for SSH for Windows and Mac:

http://www.openssh.com/windows.html

� Cygwin Linux-like environment for Windows:

http://www.cygwin.com

� IBM Tivoli Storage Area Network Manager site:

http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorageAreaNetworkManager.html

� Microsoft Knowledge Base Article 131658:

http://support.microsoft.com/support/kb/articles/Q131/6/58.asp

� Microsoft Knowledge Base Article 149927:


� Sysinternals home page:

http://www.sysinternals.com

� Subsystem Device Driver download site:

http://www-1.ibm.com/servers/storage/support/software/sdd/index.html

� IBM TotalStorage Virtualization home page:

http://www-1.ibm.com/servers/storage/software/virtualization/index.html


http://www.storage.ibm.com




http://www.chiark.greenend.org.uk/~sgtatham/putty

http://oss.software.ibm.com/developerworks/projects/openssh

http://www.openssh.com/windows.html

http://www.cygwin.com

http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorageAreaNetworkManager.html



http://www.sysinternals.com

http://www-1.ibm.com/servers/storage/support/software/sdd/index.html

http://www-1.ibm.com/servers/storage/software/virtualization/index.html





How to get IBM Redbooks publications

You can search for, view, or download IBM Redbooks publications, IBM Redpaper publications, Technotes, draft publications and Additional materials, as well as order hardcopy IBM Redbooks publications, at this Web site:

ibm.com/redbooks

Help from IBM

IBM Support and downloads

ibm.com/support

IBM Global Services

ibm.com/services

Related publications 303



http://www.ibm.com/support/

http://www.ibm.com/support/

http://www.ibm.com/services/

http://www.ibm.com/services/


Index

Numerics1862 error 992-way write-back cached 120500 84

Aaccess 2, 24, 58, 86, 109, 125, 163, 177, 222, 258access pattern 130accident 163action commands 49active 42, 58, 111, 162, 201, 226, 289Active Directory domain 39adapters 67, 109, 177, 219, 230, 256address 120Address Resolution Protocol 41adds 77, 210Admin 212admin password 53administraive rights 48administration 88, 251administrative access 49administrative rights 48administrator 24, 75, 163, 251, 270administrators 205, 211, 246, 294advanced copy 24, 164aggregate 58, 108AIX 63, 176, 207, 256AIX host 186, 193, 277AIX LVM admin roles 212alert 9, 167, 222alerts 3, 241, 266algorithms 129Alias 17alias 16aliases 14, 248alignment 215amount of I/O 30, 104, 130, 169analysis 76, 170, 231, 292antivirus software 38AOS 50application

availability 86, 102, 219performance 86, 102, 127, 162, 208, 233

Application Specific Integrated Circuit 9application testing 158applications 21, 24, 103, 130, 162, 177, 207architecture 58, 116, 191, 294architectures 109, 199area 189, 217, 274areas 175, 209, 270ARP 41ARP entry 41array 2, 24, 57, 66, 85–86, 102, 104, 138, 160, 169, 201, 210, 249, 262

© Copyright IBM Corp. 2008. All rights reserved.

array overdriving 80arrays 2, 24, 66, 86, 102, 127, 160, 211, 246ASIC 9Assist 50asynchronous 80, 134, 162asynchronously 162attached 3, 58, 84, 123, 175, 248, 270attention 10, 255attributes 83audit 49audit log 49audit log file 49Audit logging 49audit logging facility 49auto 124Auto-Expand 121Automated configuration backup 54automatically discover 187automation 42, 133auxiliary 171availability 10, 66, 86, 102, 182, 265

Bbackend storage controller 148back-end storage controllers 169background copy 167background copy rate 167backplane 9backup 3, 53, 159, 198, 209, 265, 278backup files 53backup node 15backup sessions 215balance 15, 59, 96, 102, 123, 167, 183, 211balance the workload 129balanced 15, 59, 114, 145, 178, 216balancing 19, 96, 123, 178, 213, 215band 138Bandwidth 175bandwidth 2, 25, 68, 112, 130, 160, 178, 209, 252bandwidth requirements 21baseline 78, 141Basic 4, 41basic 2, 25, 138, 176, 246, 268, 272beat effect 218best practices xiii, 1, 86, 102, 122, 162, 175between 3, 31, 58, 86, 103, 123, 177, 210, 222, 265, 293BIOS 35, 200blade 14BladeCenter 22blades 14block 67, 123, 152, 208, 235block size 67, 144, 210blocking 2blocks 129

305

BM System Storage SAN Volume Controller Host Attach-ment User’s Guide Version 4.2.0 175, 200boot 178boot device 196bottlenecks 144, 208boundary crossing 215bridge 6Brocade 27, 279buffer 152, 236buffers 124, 161, 176, 220bus 24, 188, 230, 294

Ccache 2, 57, 85, 104, 126, 176, 208–210, 235, 255, 264, 292, 294cache disabled 133, 155cache enabled 133cache mode 135cache-disabled VDisk 133–134Cache-disabled VDisks 133cache-enabled 162cache-enabled VDisk 133caching 24, 41, 66, 85, 130, 133, 164caching mechanism 133cap 104capacity 8, 24, 85, 123, 214, 249, 290cards 58, 200certified 20, 262changes 3, 29, 78, 85, 144, 164, 176, 219, 245, 266, 270channel 194chdev 193choice 30–31, 67, 86, 130, 183CIMOM 42, 222Cisco 2, 27, 250, 263, 280classes 103, 267CLI 61, 88, 123, 153, 188, 233, 271

commands 69, 88, 286client 197, 215cluster 2, 23, 38, 55, 58, 84, 102, 123, 177, 222, 251, 271, 298

creation 52, 123IP address 52, 223

cluster connection problems 50cluster ID 49cluster IP address 41cluster partnership 54cluster state information 41clustering 191clustering software 191clusters 20, 24, 191, 222, 246code update 34combination 145, 162, 251command 42, 59, 88, 123, 153, 179, 214, 222, 258, 266, 275command prompt 51commit 155Common Information Model Object Manager 42compatibility 34, 39, 254complexity 11, 294conception 12

concurrent 34, 42, 144, 189, 287config node 41configuration 1, 25, 41, 57, 84, 102, 162, 176, 208, 222, 245, 266, 271, 299configuration backup 54configuration backup file 54configuration changes 187configuration data 187, 282configuration file 53configuration node 52, 292configuration parameters 171, 188configure 86, 194, 219, 222, 245congested 9congestion 2, 238

control 3connected 2, 58, 175, 223, 247, 264, 272connection 42, 74, 192, 222, 248connections 8, 42, 58, 196connectivity 195, 222, 253, 270, 299consistency 204consistent 139, 170, 204, 240, 251consolidation 102, 294container 215containers 215, 217control 24, 71, 96, 133, 177, 212, 246, 294controller port 84copy 24, 103, 124, 204, 235, 246, 264, 278copy rate 155copy services 24, 31, 124core 262core switch 4, 8core switches 10core/edge ASIC 9core-edge 5correctly configured 170, 224corrupted 204corruption 20, 73cost 20, 86, 102, 164, 254counters 205, 234create a FlashCopy 155credentials 52critical 66, 93, 208cross-bar architecture 9current 25, 65, 164, 188, 223, 248, 271CWDM 20

Ddata 3, 24, 59, 85, 162, 177, 208, 245, 262, 269

consistency 156data formats 198data integrity 126, 153data layout 115, 124, 211Data layout strategies 219data migration 160, 198data mining 158data path 77data pattern 208data rate 104, 144, 174, 234data structures 216data traffic 9


database 3, 79, 130, 156, 185, 209, 234, 248, 270, 299log 210

Database Administrator 213date 223, 248, 276DB2 container 216DB2 I/O characteristics 216db2logs 216DBA 213debug 75, 274dedicate bandwidth 21dedicated ISLs 9default 58, 123, 167, 179, 223, 257default values 67defined 18, 148, 152, 210, 226, 257degraded 141, 162, 271delay 139, 156delete

a VDisk 125deleted 155demand 103, 299dependency 114design 1, 24, 79, 103, 138, 184, 215, 263destage 66, 85, 138device 2, 66, 109, 138, 164, 179, 213, 225, 245, 274device driver 164, 191diagnose 15, 170, 262diagnostic 192, 282different vendors 164director 10directors 10directory I/O 120disabled 133, 255disaster 29, 163, 204, 263discovery 59, 96, 186, 253disk 2, 24, 64, 83, 102, 123, 152, 185, 208, 231, 246, 264, 280, 300

latency 208disk access profile 130disk groups 29Disk Magic 144disruptive 3, 34, 123, 251distance 20, 164, 262

limitations 20distance extension 21distances 20DMP 184documentation 1, 246domain 72Domain ID 20, 258domain ID 20Domain IDs 20domains 102download 206, 287downtime 156driver 34, 58, 164, 191, 258, 270drops 108, 237DS4000 58, 88, 105, 206, 241, 281DS4000 Storage

Server 209DS4100 84

DS4500 224DS4800 17, 67, 84DS6000 58, 88, 105, 195, 282DS8000 18, 58, 84, 104, 195, 282dual fabrics 14dual-redundant switch controllers 9DWDM 20

Eedge 2edge switch 3edge switches 4–5, 10efficiency 129egress 9element 25eliminates 76e-mail 21, 205, 251EMC 57EMC Symmetrix 59enable 11, 24, 53, 127, 156, 194, 209, 233, 257enforce 8Enterprise 58, 241, 250, 280error 20, 42, 57, 88, 176, 222, 246, 270Error Code 65error handling 65error log 64, 254, 274error logging 63errors 20, 176, 254, 270ESS 58, 88, 105Ethernet 2, 52evenly balancing I/Os 218event 3, 58, 102, 130, 163, 194, 241events 164, 241, 284exchange 156execution throttle 200expand 30expansion 3, 217extenders 164extension 20extent 29, 68, 83, 123, 210, 215

size 123, 215extent size 123, 214extent sizes 123, 214extents 57, 123, 215

FFabric 22, 32, 230, 250, 280fabric 1, 25, 144, 160, 176, 223, 270

isolation 183login 185

fabric outage 3Fabric Watch 10fabrics 5, 177, 223failover 58, 130, 176, 271failure boundaries 103, 213failure boundary 213FAStT 14, 200

storage 14FAStT200 84

Index 307

fastwrite cache 120fault isolation 11fault tolerant 86FC 2, 67, 185fcs 19, 193, 256fcs device 194features 24, 164, 196, 245, 264, 270Fibre Channel 2, 58, 164, 175, 257, 262, 270

ports 21, 58routers 164traffic 3

Fibre Channel (FC) 177Fibre Channel ports 58, 178, 258, 288file level access control 49file system 152, 201, 216file system directories 217file system level 204filesets 197firmware 170, 256flag 126, 169FlashCopy 27, 63, 85, 114, 124, 235, 271

applications 65, 114mapping 76prepare 155rules 161source 64, 124Start 124target 134, 235

FlashCopy mapping 153FlashCopy mappings 125flexibility 25, 130, 164, 190, 246flow 3, 145flush the cache 188force flag 126format 49, 75, 198, 247, 267, 292frames 2free extents 129front panel 42full bandwidth 10fully allocated copy 127fully allocated VDisk 127function 61, 117, 163, 200, 264functions 24, 63, 152, 195, 237, 272

GGB 250Gb 67, 237General Public License (GNU) 206Global 227Global Mirror 228Global Mirror relationship 162gmlinktolerance 167GNU 206governing throttle 130grain 85granularity 123, 204graph 79, 144, 236, 295graphs 189group 8, 69, 102, 123, 178, 210, 233, 249, 262, 280groups 9, 29, 70, 83, 119, 179, 212, 236, 287, 299

growth 78, 217GUI 12, 34, 59, 88, 123, 183, 222, 266, 277GUI session 46

HHACMP 42, 195hardware 2, 25, 38, 58, 87, 103, 170, 199, 249, 271, 294HBA 21, 24, 183, 193, 200, 230, 251, 270HBAs 12, 142, 177–178, 200, 209, 230, 255health 196, 222, 284healthy 171, 225heartbeat 165help 8, 42, 104, 119, 160, 193, 211, 235, 246, 266, 270heterogeneous 24, 272high-bandwidth 10high-bandwidth hosts 4hops 3host 2, 24, 58, 84, 112, 123, 162, 175, 207, 246, 270

configuration 15, 125, 161, 211, 272creating 17definitions 125, 186, 209HBAs 15information 35, 184, 231, 252, 275systems 30, 175, 209, 270zone 14, 123, 177, 272

host bus adapter 199host level 178host mapping 138, 178, 271host type 58, 252host zones 17, 249

II/O governing 130I/O governing rate 133I/O group 8, 27, 123, 183, 235, 252, 265I/O Groups 129I/O groups 16, 123, 174, 187, 240I/O performance 194, 217I/O rate setting 132I/O response time 141I/O workload 213IBM Subsystem Device Driver 58, 88, 125–126, 164, 195IBM TotalStorage Productivity Center 22, 167, 223, 267, 274, 301identification 93, 179identify 57, 88, 103, 195, 234identity 49IDs and passwords 52IEEE 198image 27, 86, 123, 152, 178, 211, 249Image mode 32, 127, 162image mode 30, 124, 185image mode VDisk 124, 218Image Mode VDisks 164image mode virtual disk 134image type VDisk 127implement 3, 29, 31, 88, 199, 251, 268implementing xiii, 1, 104, 191import 124


import failed 99improvements 27, 31, 114, 145, 196, 293Improves 24in-band 138information 1, 59, 129, 185, 207, 222, 246, 266, 270infrastructure 103, 133, 164, 224, 254ingress 9initial configuration 182initiating 76initiators 84, 191install 5, 51, 160, 199, 233, 254installation 1, 87, 222, 246, 262insufficient bandwidth 3integrity 126, 152Inter Switch Link 2interface 24, 153, 175, 222, 259Internet Protocol 21interoperability 21, 254interval 170, 234inter-VSAN routing 11introduction 77, 268, 294iogrp 126, 178IOPS 177, 208, 294IP 20–21, 222IP communication 41IP connectivity considerations 41IP traffic 21IPv4 41IPv6 41IPv6 communication 39ISL 2, 248ISL capacity 10ISL links 5ISL oversubscription 3ISL trunks 10ISLs 3, 247isolated 72, 183isolation 2, 59, 88, 104, 183, 271IVR 11

Jjournal 201, 210

Kkernel 200key 185, 215, 250key based SSH communications 46key pairs 48keys 46, 48, 192

Llast extent 123latency 9, 138, 155, 208LBA 64level 12, 25, 58, 86, 139, 162, 178, 218, 239, 270, 297

storage 75, 204, 253, 271levels 65, 86, 104, 141, 190, 217, 251lg_term_dma 194

library 201license 29, 280light 104, 208, 283limitation 42, 189, 234limitations 1, 29, 42, 163, 210, 282limiting factor 138limits 24, 139, 162, 189, 219lines of business 213link 2, 29, 42, 162, 198

bandwidth 21, 165latency 165

link quality 10link reset 10links 164, 223, 262Linux 200list 12, 24, 50, 61, 85, 164, 202, 246, 270list dump 54livedump 292load balance 130, 183load balances traffic 7Load balancing 196load balancing 123, 199loading 68, 114LOBs 213location 75, 85, 142, 208, 246locking 191log 49, 64, 164, 236, 254, 274logged 42, 72Logical Block Address 64logical drive 58, 95, 193, 210, 215logical unit number 163logical units 29logical volumes 215login 42, 177logins 177logs 156, 210, 255, 276long distance 165loops 67, 264lower-performance 122LPAR 198LU 178LUN 30, 57, 83, 104, 134, 163, 176, 210, 228, 249

access 164, 191LUN mapping 93, 178LUN masking 20, 72, 272LUN Number 59, 93LUN per 105, 213LUNs 58, 84, 101, 164, 178, 211, 213, 233, 249LVM 125, 196, 212LVM volume groups 215

MMAC 41MAC address 41maintenance 34, 42, 167, 184, 256, 270maintenance procedures 42, 259, 289maintenance window 167manage 24, 58, 119, 176, 213, 222, 268managed disk 219, 289managed disk group 127, 219

Index 309

Managed Mode 67, 127management xiii, 6, 41, 102, 162, 176, 211, 222, 263, 272

capability 177port 177, 241software 179

management communication 46managing 24, 31, 176, 215, 246, 268, 270map 61, 137, 161, 179map a VDisk 183mapping 57, 93, 109, 125, 153, 176, 213, 271mappings 125, 192, 271maps 219, 289mask 9, 164, 177, 289masking 30, 72, 161, 177, 272master 35, 153master console 41, 154Master Console server 39max_xfer_size 194maximum IOs 216MB 21, 67, 123, 194Mb 21, 29McDATA 27, 280MDGs 83, 123, 213MDisk 53, 57, 83, 103, 123, 163, 183, 210, 228, 249

adding 87, 140removing 192

MDisk group 124, 163, 210media 171, 233, 289Media Access Control 41media error 64medium errors 63member 16members 67, 271memory 152, 176, 210, 235, 253, 264, 292message 20, 42, 170, 258messages 183, 258, 284metadata 120metadata corruption 99MetaSANs 10metric 78, 139, 173, 236Metro 27, 124, 227Metro Mirror 162, 236Metro Mirror relationship 155microcode 65Microsoft Windows Active Directory 38Microsoft Windows Server professionals 49migrate 22, 124, 160, 178migrate data 127, 198migrate VDisks 125migration 3, 30, 63, 126, 160, 185, 253, 267migration scenarios 8mirrored 138, 165, 204mirrored VDisk 122mirroring 20, 125, 162, 196misalignment 215mkrcrelationship 169Mode 67, 180, 252, 275mode 27, 52, 86, 101, 162, 177, 211, 262, 290

settings 162

monitor 22, 141, 167, 272monitored 78, 141, 172, 204, 270monitoring 77, 167, 175, 288monitors 145, 234mount 126, 156, 265MPIO 196, 258multi-cluster installations 5multipath drivers 88, 256multipath software 190multipathing 34, 58, 176, 256, 269Multipathing software 184multipathing software 183, 259multiple paths 130, 183, 272multiple striping 218multiple vendors 21multiplexing 20

Nname server 185, 259names 16, 49, 122, 199, 259nameserver 185naming 13, 60, 87, 122, 252naming convention 13naming conventions 53nest aliases 16new disks 186new MDisk 95No Contact 50no synchronization 122NOCOPY 155node 3, 27, 52, 72, 84, 104, 123, 176, 223, 255, 264, 270, 294

adding 29failure 130, 185port 14, 130, 172, 177, 223, 272

node port 14nodes 3, 24, 52, 71, 84, 123, 177, 223, 254, 264, 271, 293noise 138non 11, 24, 73, 124, 183, 213, 237, 251, 267, 275non-disruptive 127non-preferred path 129num_cmd_elem 193–194

Ooffline 52, 65, 87, 126, 163, 184, 230, 256, 271OLTP 210Online 210online 87, 115, 225, 257, 271online transaction processing (OLTP) 210operating system (OS) 208operating systems 183, 215, 258, 274optimize 115, 293Oracle 196, 213organizations 21OS 52, 176, 220, 256outage 133overlap 14overloading 148, 174, 242


over-subscribed 10oversubscribed 4over-subscription 9oversubscription 3overview 40, 83, 211, 269

Ppacket filters 41parameters 49, 57, 84, 131, 171, 178, 209, 252partition 197, 216partitions 66, 144, 197, 215partnership 54, 167password 52password reset feature 53passwords 52path 3, 58, 104, 176, 220, 228, 256, 270

selection 195paths 7, 34, 58, 130, 176, 228, 253, 272peak 3, 165per cluster 28, 123, 236performance xiii, 3, 24, 57, 86, 102, 119, 162, 175, 207, 223, 245, 270, 293

degradation 59, 104, 162performance advantage 86, 108performance characteristics 103, 124, 206, 219performance improvement 127, 237, 294performance monitoring 173, 178performance requirements 31Performance Upgrade kit 39permanent 170permit 3persistent 88, 191PFE xivphysical 20, 24, 57, 85, 147, 152, 175, 235, 249, 264physical volume 197, 219ping 52PiT 134Plain Old Documentation 92planning 15, 86, 101, 139, 165, 209plink 51plink.exe 51PLOGI 185point-in-time 163point-in-time copy 164policies 196policy 53, 104, 191, 254pool 24, 68, 168port 2, 24, 57, 84, 142, 172, 176, 223, 246, 262, 272

types 59port bandwidth 9Port Channels 11port errors 10port event 10Port Fencing 10port layout 10port zoning 12port/traffic isolation 11port-density 9ports 2, 27, 58, 84, 142, 176, 223, 248, 263, 271power 188, 258, 264, 288

preferred 20, 30, 50, 58, 123, 167, 177, 214, 271preferred node 15, 123, 167, 183preferred owner node 129preferred path 58, 129, 183preferred paths 130, 183, 275prepare a FlashCopy 172prepared state 172Pre-zoning tips 13primary 29, 85, 102, 134, 162, 212priority 42private key 46problems 2, 34, 59, 87, 138, 161, 192, 208, 250, 262, 269profile xiv, 66, 96, 130, 287properties 138, 201protect 167protecting 67provisioning 87, 105pSeries 19, 73, 206public key 46PuTTY 39, 52PuTTY generated SSH 46PuTTY SSH 38PuTTYgen 46, 48PVID 198PVIDs 199

Qqueue depth 83, 188, 194, 200, 219quickly 2, 76, 138, 155, 183, 230, 251, 262quiesce 125, 156, 187

RRAID 67, 86, 127, 169, 210, 249, 299RAID array 139, 171, 211, 213RAID arrays 138, 212RAID types 211ranges 139RDAC 58, 88Read cache 208read miss performance 130real capacity 63reboot 125, 188rebooted 197receive 97, 237, 258recovery 29, 52, 95, 128, 156, 176, 289recovery point 167Redbooks Web site 303

Contact us xviredundancy 2, 9–10, 58, 116, 165, 177, 224, 272redundant 24, 45, 72, 165, 177, 219, 230, 270redundant paths 177redundant SAN 72registry 185, 276relationship 19, 58, 124, 197, 227reliability 15, 87, 245, 294remote cluster 35, 165, 227, 255remote copy 134remote mirroring 20remotely 50

Index 311

remount 138removed 20, 31, 125, 186rename 161, 277repairsevdisk 99replicate 163replication 162, 268reporting 77, 139, 241, 274reports 142, 186, 221reset 42, 185, 254, 270resource consumption 42resources 24, 75, 96, 102, 133, 168, 176, 216, 235, 292restart 161, 264restarting 167restarts 185restore 55, 166, 173, 198restricted rights 48restricting access 191rights 48risk 76, 87, 102, 164, 266role 48, 210role-based security 48roles 47, 212root 141, 192, 241, 257round 96, 165, 216round-robin 97route 167router 164routers 165routes 11routing 58, 263RPQ 3, 200, 254RSCN 185rules 77, 149, 161, 176, 272

SSAN xiii, 1, 23–24, 39, 58, 122, 160, 175, 219, 245, 262, 269–270, 294

availability 183fabric 1, 160, 183, 224

SAN bridge 6SAN configuration 1SAN fabric 1, 160, 178, 223, 272SAN switch models 9SAN Volume Controller 1, 3, 12, 15, 24, 127, 175

multipathing 200SAN zoning 130, 226, 251save capacity 121scalability 2, 23, 299scalable 1, 24scale 25, 116, 299scaling 31, 117, 293scan 186SCP 49scripts 133, 189SCSI 64, 130, 185, 287

commands 191, 287SCSI disk 198SCSI-3 191SDD 14, 58, 88, 125–126, 164, 176, 195, 218, 254, 274SDD for Linux 201, 302

SDDDSM 179, 274SE VDisks 120secondary 29, 134, 163, 210secondary site 29, 163secure 52Secure Copy Protocol 49Secure Shell 49security 12, 47, 196, 256segment 66separate zone 18sequential 85, 101, 123, 176, 208, 249, 299serial number 60, 178, 249serial numbers 179Server 58, 196–197, 218, 222, 251, 275server 3, 24, 66, 142, 156, 185, 233, 248, 264Servers 199servers 3, 25, 196, 207, 251, 265service 35, 41, 78, 86, 219, 236, 270settings 52, 171, 193, 208, 272setup 41, 193, 214, 262, 272SEV 159SFPs 21shapers 41share 19, 72, 86, 104, 147, 177, 216shared 21, 169, 191, 217, 224sharing 11, 146, 191, 209shortcuts 13shutdown 125, 161, 185, 264Simple Network Management Protocol 41single host 15single initiator zones 15single storage device 183single-member aliases 16site 29, 31, 64, 134, 163, 203, 234, 251, 300, 302slice 215slot number 19, 256slots 67–68slotted design 9SMS 217snapshot 160, 249SNMP 41, 241Software 1, 3, 12, 15, 187, 253, 255, 270software 2, 164, 176, 245, 266, 271Solaris 201, 275solution 1, 38, 86, 139, 173, 208, 246, 266solutions 119, 246source 11, 64, 124, 201, 233sources 265space 80, 86, 123, 210, 252Space Efficient 121space efficient copy 127Space Efficient VDisk 159Space Efficient VDisk Performance 120space-efficient VDisk 63spare 3, 30, 67, 86speed 10, 25, 139, 169, 248speeds 20, 80, 139, 262, 294split 5, 23, 70, 146splitting 121SPS 71


SSH 42, 49, 223SSH communication 48SSH connection 46SSH connection limitations 46SSH connectivity 38SSH keypairs 46SSH Secure Copy 54SSH session 48SSPC 39, 89, 244SSPC server 39standards 22, 222, 262start 20, 25, 79, 148, 162, 178, 215, 222, 270state 42, 127, 162, 176, 255, 292

synchronized 170statistics 85, 170, 205, 234statistics collection 170status 50, 63, 192, 222, 248, 271storage 1, 24, 57, 83, 101, 123, 162, 175, 207, 245, 262, 270, 294storage controller 14, 24, 57, 84, 103, 134, 163, 224storage controllers 14, 24, 59, 86, 104, 147, 163, 222Storage Manager 68, 166, 172, 281storage performance 78, 138, 239Storage Pool Striping 71storage subsystems 49storage traffic 3streaming 114, 130, 209strip 215Strip Size Considerations 215strip sizes 215stripe 102, 213striped 96, 123, 185, 210striped mode 155, 211striped mode VDisks 214striped VDisk recommendation 218Striping 101striping 24, 66, 108, 212, 215, 293striping policy 121Subsystem Device Driver 14, 58, 88, 125–126, 164, 180, 195, 257, 275support 25, 58, 88, 210, 246, 263, 294SVC xiii, 1, 23, 58, 84, 123, 152, 175, 210, 245, 262, 269, 293SVC CLI 52SVC Cluster 52SVC cluster 3, 23, 60, 84, 103, 182, 222, 279SVC configuration 53, 177, 250, 267, 300SVC Console 52SVC Console server 52SVC Console software 38SVC error log 99SVC installations 4, 104, 262SVC master console 52, 157SVC node 14, 29, 164, 177, 223, 271SVC nodes 7, 24, 31, 72, 138, 178, 227, 288, 294SVC Service mode 52SVC software 178, 270SVC zoning 16svcinfo 88, 125, 179, 271svcinfo lsmigrate 88

svctask 59, 88, 123, 160, 201, 278svctask dumpinternallog 50svctask finderr 50switch 53, 142, 170, 175, 223, 247

fabric 3, 255failure 3, 205interoperability 21

switch fabric 2, 178, 228switch ports 8, 225switch splitting 8switches 2, 49, 167, 247, 262, 270Symmetrix 57synchronization 165Synchronized 169synchronized 122, 162system 30, 78, 115, 123, 152, 175, 209, 231, 256, 274, 293system performance 124, 201, 292System Storage Productivity Center 39

Ttablespace 210, 216tablespaces 217–218tape 3, 166, 177target 59, 84, 124, 177, 233, 288target ports 72, 177targets 187tasks 241test 2, 51, 87, 108, 165, 175, 218, 236tested 25, 87, 164, 176, 218, 254, 266, 300This 1, 23, 58, 83, 101, 123, 163, 175, 207, 221, 246, 261, 270, 297thread 188, 215threads 215threshold 3, 162, 241thresholds 138throttle 130, 200throttle setting 131throttles 130throttling 131throughput 28, 66, 86, 104, 138, 156, 184, 194, 208, 210, 237, 294throughput based 208–209tier 87, 104time 2, 27, 52, 58, 96, 103, 122, 176, 208, 223, 248, 264, 269, 294tips 13Tivoli 166, 172, 241Tivoli Storage Manager (TSM) 209tools 175, 246, 272Topology 142, 223, 274topology 2, 223, 274topology issues 7topology problems 7total load 218TPC CIMOM 52TPC for Replication 39traditional 11traffic 3, 7, 165, 183, 235

congestion 3

Index 313

Fibre Channel 21Traffic Isolation 8traffic threshold 9transaction 66, 156, 193, 208transaction based 208–209Transaction log 210transceivers 21transfer 58, 83, 130, 176, 208transit 3trends 78trigger 15, 250troubleshooting 12, 175, 252TSM 215ttle 131tuning 145, 175

UUID 93, 290unique identifier 75, 178UNIX 156, 205Unmanaged MDisk 164unmanaged MDisk 128unmap 124unused space 123upgrade 171, 185, 254, 287, 297upgrades 184, 253, 287upgrading 35, 190, 237, 255–256, 296upstream 2, 241URL 50user data 120user IDs 52users 10, 24, 139, 185, 224, 266using SDD 164, 195, 258utility 88, 206

VVDisk 14, 27, 53, 57, 83, 103, 119, 178, 210, 231, 249, 271

creating 87, 158migrating 127modifying 146showing 142

VDisk migration 64VDisk Mirroring 121VIO clients 219VIO server 198, 219VIOC 197, 219VIOS 196–197, 219virtual address space 120virtual disk 98, 130, 198Virtual LAN 41virtualization 23, 75, 211, 269virtualization policy 121virtualizing 11, 185VLAN 41volume abstraction 211volume group 72, 195VSAN 11VSANs 2, 11

VSCSI 197, 219

WWeb browser 49Windows 2003 32, 199workload 29, 58, 96, 102, 123, 165, 193, 208, 235, 297

throughput based 208transaction based 208

workload type 209workloads 3, 66, 86, 103, 133, 165, 176, 208, 296worldwide node name 11write performance 122writes 66, 85, 104, 138, 162, 176, 210, 235, 297WWNN 11–12, 59, 187, 259WWNs 12WWPN 12, 30, 57, 84, 223, 248WWPN zoning 12WWPNs 12, 73, 177, 259, 271

XXFPs 21

Zzone 7, 160, 177, 226, 272zone name 19zoned 2, 177, 227, 248, 288zones 12, 160, 226, 249, 272zoneset 18, 227, 289Zoning 20, 249zoning 6, 11, 29, 73, 130, 177, 226, 249zoning configuration 11, 231, 249zoning scheme 13zSeries 117


(0.5” spine)0.475”<

->0.873”

250 <->

459 pages

SAN Volume Controller Best Practices and Perform

ance Guidelines

SAN Volume Controller Best Practices

and Performance Guidelines

SAN Volume Controller Best

Practices and Performance

Guidelines

SAN Volume Controller Best Practices and Perform

ance Guidelines



Guidelines



Guidelines

®

SG24-7521-01 ISBN 0738432040

INTERNATIONAL TECHNICALSUPPORTORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE

IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information:ibm.com/redbooks

®

SAN Volume ControllerBest Practices and Performance Guidelines

Read about best practices learned from the field

Learn about SVC performance advantages

Fine-tune your SVC

This IBM Redbooks publication captures many of the best practices based on field experience and details the performance gains that can be achieved by implementing the IBM System Storage SAN Volume Controller (SVC).

This book is intended for extremely experienced storage, SAN, and SVC administrators and technicians.

Readers are expected to have an advanced knowledge of the SVC and SAN environment, and we recommend these books as background reading:

� IBM System Storage SAN Volume Controller, SG24-6423

� Introduction to Storage Area Networks, SG24-5470

� Using the SVC for Business Continuity, SG24-7371

Back cover




Documents

SVC Best Pratices