View
2
Download
0
Category
Preview:
Citation preview
1
National Center for Supercomputing ApplicationsLCI Conference 2007
SAN Persistent Binding andMultipathing in the 2.6 Kernel
Michelle Butler, Technical Program ManagerAndy Loftus, System EngineerStorage Enabling Technologies
NCSAmbutler@ncsa.uiuc.edu or aloftus@ncsa.edu
Slides available at http://dims.ncsa.uiuc.edu/set/san/
2
National Center for Supercomputing ApplicationsLCI Conference 2007
Who?• NCSA
– a unit of the University of Illinoisat Urbana-Champaign
– a federal, state, university, andindustry funded center
• Academic Users– NSF peer review
• Large amount ofapplications/user needs– 3rd party codes, user written…– All running on same
environment• Many research areas
3
National Center for Supercomputing ApplicationsLCI Conference 2007
NCSA’s 1st Dell Cluster• Tungsten: 1750 server
cluster– 3.2 GHz Xeon
• 2,560 processors (computeonly)
• 16.4 TF; 3.8 TB RAM;122TB disk
• Dell OpenManage– Myrinet
• Full bi-section– Lustre over Gig-E
• 13 DataDirect 8500• 104 OSTs, 2 MDS
w/separate disk• 11.1 GB/sec sustained
– Power/Cooling• 593 KW / 193 tons
– Production date: April 2004
– User Environment• Platform Computing LSF• Softenv• Intel Compilers• ChaMPIon Pro, MPICH,
VMI-2
The fir
st
large-s
cale
Dell clu
ster!!!
4
National Center for Supercomputing ApplicationsLCI Conference 2007
NCSA’s 3rd Dell Cluster• T2 – retired into:• Tungsten-3 1955 blade cluster
– 2.6 GHz Woodcrest Dual Core• 1,040 processors/2080 cores• 22 TF; 4.1 TB RAM; 20 TB disk• Warewulf
– Cisco InfiniBand• 3 to 1 over-subscribed• OFED-1.1 w/ HPSM subnet
manager– Lustre over IB
• 4 FasT controllers direct FC• 1.2GB/s sustained• 8 OSTs and 2 MDS w/complete
auto failovers– Power/Cooling
• 148 KW / 42 tons
– Production date: March 2007
– User Environment• Torque/Moab• Softenv• Intel Compilers• VMI-2
5
National Center for Supercomputing ApplicationsLCI Conference 2007
NCSA’s 4th Dell Cluster• Abe: 1955 blade cluster
– 2.33 GHz Cloverton Quad-Core• 1,200 blades/9,600 cores• 89.5 TF; 9.6 TB RAM; 120 TB disk• Perceus management; diskless boot
– Cisco Infiniband• 2 to 1 oversubscribed• OFED-1.1 w/ HPSM subnet
manager– Lustre over IB
• 22 OSTs• 2 9500 DDN controllers direct FC• 10 FasT controllers on SAN fabric• 8.4GB/s sustained• 22 OSTs and 2 MDS w/complete
auto failovers– Power/Cooling
• 500 KW / 140 tons
– Production date: May 2007(anticipated)
– User Environment• Torque/Moab• Sofenv• Intel Compiler• MPI: evaluating Intel MPI,
MPICH, MVAPICH, VMI-2, etc.
The lar
gest
Dell clu
ster!!!
6
National Center for Supercomputing ApplicationsLCI Conference 2007
NCSA Facility - ACB• Advanced Computation Building
– Three rooms, totals:• 16,400 sqft raised floor• 4.5 MW power capacity• 250 kW UPS• 1,500 tons cooling capacity
– Room 200:• 7,000 sqft – no columns• 70” raised floor• 2.3 MW power capacity• 750 tons cooling capacity
7
National Center for Supercomputing ApplicationsLCI Conference 2007
NCSA’s Other Systems• Distributed Memory Clusters
– Mercury (IBM, 1.3/1.5 GHz Itanium2):• 1,846 processors• 10 TF; 4.6 TB RAM; 90 TB disk
• Shared Memory Clusters
– Copper (IBM p690,1.3 GHz Power4): 12 x 32processors
• 2 TF; 64 or 256 GB RAM each; 35 TB disk
– Cobalt (SGI Altix, 1.5 GHz Itanium2): 2 x 512 processors• 6.6 TF; 1 TB or 3 TB RAM; 250 TB disk
8
National Center for Supercomputing ApplicationsLCI Conference 2007
NCSA Storage Systems• Archival: SGI/Unitree (5 PB total capacity)
– 72TB disk cache; 50 tape drives– currently 2.8PB of data in MSS
• >1PB ingested in last 6 months• project ~3.2PB by end of CY2006• licensed to support 5PB resident data
– ~30 data collections hosted
• Infrastructure: 394TB FiberchannelSAN connected– Fiberchannel SAN connected; FC and SATA environments– Lustre, IBRIX, NFS filesystems
• Databases:– 8 processor 12GB memory SGI Altix
• 30TB of SAN storage• Oracle 10G, mysql, Postgres
– Oracle RAC cluster– Single-system Oracle deployments for focused projects
9
National Center for Supercomputing ApplicationsLCI Conference 2007
Visualization Resources• 30M-pixel Tiled Display Wall
– 8192 x 3840 pixels compositedisplay
– 40 NEC VT540 projectors, arrangedin a 5H x 8W matrix
– driven by 40-node Linux cluster• dual-processor 2.4GHz Intel Xeons
with NVIDIA FX 5800 Ultra graphicsaccelerator cards
• Myrinet interconnect• to be upgrade by early CY2007
– funded by State of Illinois
• SGI Prisms– 8 x 8 processor (1.6 GHz Itanium2)– 4 graphics pipes each; 1 GB RAM each– InfiniBand connection to Altix machines
10
National Center for Supercomputing ApplicationsLCI Conference 2007
SAN at NCSA
• 1.3PB spinning disk– 895TB SAN attached
• 1392 Brocade switch ports• 7 SAN fabrics• 2 data centers
11
National Center for Supercomputing ApplicationsLCI Conference 2007
Persistent Binding
• Device naming problems• Udev solution• Examples• Interactive Demo
12
National Center for Supercomputing ApplicationsLCI Conference 2007
Device Naming ProblemBefore After
• Add hardware• SAN zoning• New SAN luns• Modify config
Device node mapping can change with changes to
- hardware
- software
- SAN
Devices assigned random names (based on next available major/minor pair for device type)
CLUSTER
- Multiple hosts that see the same disk will assign the disk to different device nodes
- may be /dev/sda on system1 but /dev/sdc on system2
- Can change with hardware changes; what used to be /dev/sda is not /dev/sdc
Devfs helps only a little:
- Fixes device naming; on a single host, disk will always have the same device node
- But different hosts may have different device names for the same physical disk
13
National Center for Supercomputing ApplicationsLCI Conference 2007
What needs to happen
• Storage target always maps to samelocal device (ie. /dev/…)
• Local device name should be meaningful– /dev/sda conveys no information about the
storage device
14
National Center for Supercomputing ApplicationsLCI Conference 2007
udev - Persistent Device Naming
• “Udev is … a userspace solution for adynamic /dev directory, with persistentdevice naming” *– Userspace: not required to remain in memory– Dynamic: /dev not filled with unused files– Persistent: devices always accessable using the
same device node• Provides for custom device names* Daniel Drake (http://www.reactivated.net/writing_udev_rules.html)
Devfs provides dynamic and persistent naming, but:
- kernel based - entire device db stored in kernel memory, never swapped
- not possible to customize device names
UDEV CUSTOM
- custom names for devices
- custom scripts can be run when specifice devices attached/removed
15
National Center for Supercomputing ApplicationsLCI Conference 2007
Setting up udev device mapper
Overview
1. Uniquely identify each lun2. Assign a meaningful name to each lun
16
National Center for Supercomputing ApplicationsLCI Conference 2007
1. Uniquely identify each lun
/sbin/scsi_id
Sample usage:root# scsi_id -g -u -s /block/sdaSSEAGATE_ST318406LC_____3FE27FZP000073302G5W
root# scsi_id -g -u -s /block/sdb3600a0b8000122c6d00000000453174fc
scsi_id SCSI INQUIRYdevice name
Unique id
/sbin/scsi_id
- INPUT: existing local device name
- OUTPUT: string that uniquely identifies the specific device (guaranteed unique among all scsi devices)
SAMPLE:
- sda: locally installed drive
- sdb: SAN attached disk
17
National Center for Supercomputing ApplicationsLCI Conference 2007
2. Associate a meaningful name
• BUS=scsi– /sys/bus/scsi
• SYSFS– <BUS>/devices/H:B:T:L/<filename>
• PROGRAM & RESULT– Program to invoke and result to look for
• NAME– Device name to create (relative to /dev)
New udev rules file: /etc/udev/rules.d/20-local.rulesBUS="scsi", SYSFS{vendor}="DDN", SYSFS{model}="S2A 8000",PROGRAM="/sbin/scsi_id -g -u -s /block/%k ",RESULT="360001ff020021101092fadc32a450100", NAME="disk/fc/sdd4c1l0"
Custom naming controlled by rulesets stored in /etc/udev/rules.d
A rule is a lists of keys to match against.
When all keys match, the specified action is taken (create a device name or symlink)
18
National Center for Supercomputing ApplicationsLCI Conference 2007
Example: Customizing for multiple paths
ProblemMultiple paths to a
single lun results inmultiple devicenodes.
Need to know whichpath each deviceuses.
19
National Center for Supercomputing ApplicationsLCI Conference 2007
Example: Customizing for multiple paths
Custom script : mpio_scsi_id
Sample udev rule:BUS="scsi", SYSFS{vendor}="DDN", SYSFS{model}="S2A 8000",PROGRAM="/root/bin/mpio_scsi_id %k",RESULT="23000001ff03092f360001ff020021101092fadc32a450100",NAME="disk/fc/sdd4c1l0"
mpio_scsi_id scsi_iddevice name
WWPN + scsi_id
Disk CtlrWWPN
udev
Get disk controller WWPN
(Emulex) /sys/class/fc_transport/target<H>:<B>:<T>/port_name
(QLA) grep + awk to pull value from /proc/scsi/ql2xxx/<host_id>
20
National Center for Supercomputing ApplicationsLCI Conference 2007
Demo: udev persistent device naming
• Single HBA• Single disk unit
– 4 luns– Each lun presented
through both controllers• Host sees 8 logical
luns• Use mpio_scsi_id
to identify the ctlr-lun
21
National Center for Supercomputing ApplicationsLCI Conference 2007
Demo: udev persistent device naming
Original Configuation• udev config file
– /etc/udev/udev.conf
• scsi_id config file– /etc/scsi_id.config
• Scan fc luns– {sysfs}/hostX/scan– /dev/disk/by-id
Custom device names• Custom rules file
– 20-local.rules
• Restart udev– udevstart
• Custom devicenames created– /dev/disk/fc
BEGIN
- tail -f /var/log/messages
1. Enable udev logging
2. Enable scsi_id for all devices (options -g)
3. /proc/partitions
4. Scan fc luns (echo “- - -” > /sys/class/scsi_host/hostX/scan)
5. See udev log lines in messages file ; See fc disks in /dev/disk/by-id
6. Enable 20-local rules file
7. Udevstart
8. See udev log lines in messages file ; See fc disks in /dev/disk/fc
DEFAULT CONFIGURATION
Local rules file already exists. Disable it.
Default behavior for scsi_id is to blacklist everything unknown (-b option). Enable white list everything (-g option) so scsi_id’s will be returned.
Even before custom rules are in place, see default udev rule selection activity in /var/log/messages
After running delete_fc_luns, udev removes /dev/sdX devices files (/var/log/messages)
CUSTOM CONFIGURATION
Udev custom rules are selected (see /var/log/messages)
Major/Minor numbers line up for /dev/disk/fc/* and /proc/partition/*
22
National Center for Supercomputing ApplicationsLCI Conference 2007
Demo: udev persistent device naming
Debugging• Not all sysfs files are available immediately
– HBA target WWPN– Add udevstart to boot scripts
• Udev tools can help– udevinfo– udevtest
Examples• udevinfo -a -p $(udevinfo -q path -n /dev/sdb)• udevtest /block/sdb
Exmaple: multiple paths on Nadir
- If luns are removed (delete_fc_luns)
- Then added (scan_fc_luns)
- No matches are found in 20-local.rules
- Add syslog output to mpio_scsi_id
+ Shows params the script is called with
+ Shows what the script returns
+ target_wwpn is not getting set
- Run udevstart (luns already attached now), matches found in 20-local.rules and device files created
Probably either a driver or udev issue.
Easiest solution is to run scan_luns and udevstart at system boot time (/etc/rc.d/rc.local)
23
National Center for Supercomputing ApplicationsLCI Conference 2007
Custom script: ls_fc_lunsGet HBA list sysfs
Get target list
Get lun list
Get lun info
Get HBA type lspci
sysfs (emulex)/proc (QLA)
sysfs
sysfs
/sys/class/fc_host
/sys/class/scsi_host/hostX/targetX:Y:Z/proc/scsi/qla2xxx/X
/sys/class/scsi_host/hostX/targetX:Y:Z/X:Y:Z:L
/sys/class/scsi_host/hostX/targetX:Y:Z/X:Y:Z:L/*
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd63200000000453175630x10000000c95ebeb4 0x200200a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563
24
National Center for Supercomputing ApplicationsLCI Conference 2007
Custom script: lip_fc_hosts
Get host list ls_fc_luns
echo “1” > /sys/class/fc_host/hostX/lip
25
National Center for Supercomputing ApplicationsLCI Conference 2007
Custom script: scan_fc_luns
Get host list ls_fc_luns
echo “- - -” > /sys/class/scsi_host/hostX/scan
26
National Center for Supercomputing ApplicationsLCI Conference 2007
Custom script: delete_fc_luns
Get lun list ls_fc_luns
echo “1” > /sys/class/scsi_host/hostX/targetX:Y:Z/X:Y:Z:L/delete
27
National Center for Supercomputing ApplicationsLCI Conference 2007
udev - Additional Resources• man udev• http://www.emulex.com/white/hba/wp_linux26udev.pdf
– Excellent white paper• http://www.reactivated.net/udevrules.php
– How to write udev rules
• http://www.us.kernel.org/pub/linux/utils/kernel/hotplug/udev.html– Information and links
• http://dims.ncsa.uiuc.edu/set/san– FC tools : custom tools used in demo
28
National Center for Supercomputing ApplicationsLCI Conference 2007
Linux Multipath I/O
• Overview• History• Setup• Demos
– Active / Passive Controller Pair– Active / Active Controller Pair
29
National Center for Supercomputing ApplicationsLCI Conference 2007
Linux Multipath - HistoryProviders
• Storage Vendor• HBA Vendor• Filesystem• OS
STORAGE VENDOR
- End to end solution (they provide disk, HBA, driver, add’l software, sometimes even FC switch)
- HBA’s (and other parts) come at a markup
- One location for support tickets, but no alternate recourse if they can’t fix the problem
- Proprietary requirements (typically require 2 HBA’s, only works with their systems)
HBA VENDOR
- QLA
> Linux support spotty
+ 2.4 kernel ok, but strict requirements (2 HBA’s, exactly 2 paths per lun, active/active controllers)
+ 2.6 kernel inconsistent behavior
> Solaris support spotty (2 months to get 1 machine working, next month stops working, machine wasuntouched)
> Dropped Windows support prematurely (Windows MPIO layer not complete yet, only an API forvendors)
> Proprietary solution, only works with their HBA’s and configuration software
- Emulex (unix philosophy, do one thing and do it well; MPIO doesn’t belong in the driver)
FILESYSTEM
- 3rd party - Veritos, others??
- Parallel Filesystems - Ibrix, Lustre, GPFS, CXFS (enable MPIO via failover hosts)
OS
- *NEW* Solaris 10 (XPATH, but requires Solaris branded QLA cards)
- *NEW* Linux (device mapper multipath) (RedHat4, Suse, others…)
30
National Center for Supercomputing ApplicationsLCI Conference 2007
Device Mapper Multipath• Identify luns by scsi_id• Create “path groups”
– Round-robin I/O on all pathsin groups
• Monitor paths for failure– When no paths left in current
group, use next group
• Monitor failed paths forrecovery– Upon path recovery, re-
check group priorities– Assign new active group if
necessary
31
National Center for Supercomputing ApplicationsLCI Conference 2007
Linux Device Mapper Multipath
Overview
1. Identify unique luns2. Monitor active paths for failure3. Monitor failed paths for recovery
Multipath handles 3 areas.
All settings are saved in /etc/multipath.conf
32
National Center for Supercomputing ApplicationsLCI Conference 2007
1. Identify unique luns
Storage Device• vendor• product• getuid_callout
device { vendor "DDN" product "S2A 8000" getuid_callout "/sbin/scsi_id -g -u -s /block/%n"}
33
National Center for Supercomputing ApplicationsLCI Conference 2007
1. Identify unique luns
Multipath Device• wwid• alias
multipath { wwid 360001ff020021101092fb1152a450900 alias sdd4l0}
34
National Center for Supercomputing ApplicationsLCI Conference 2007
2. Monitor Healthy Paths for Failure
• Priority group– Collection of paths to
the same physical lun– I/O is split across all
paths in round-robinfashion
• path_grouping_policy– multibus– failover– group_by_prio– group_by_serial– group_by_node
Multipath control creates priority groups.
Paths are grouped based on path_grouping_policy
MULTIBUS - all paths in one priority group (DDN) (no penalty to access luns via alternate controllers)
FAILOVER - one path per priority group (Use only 1 path at a time) (typically only 1 usable path, such asIBM fastt with AVT disabled)
GROUP_BY_PRIO - Paths with same priority in same priority group, 1 group for each unique priority(Priorities assigned by external program)
GROUP_BY_SERIAL - Paths grouped by scsi target serial (controller node WWN)
GROUP_BY_NODE - (I have not tested or researched this, never had a need to)
35
National Center for Supercomputing ApplicationsLCI Conference 2007
2. Monitor Healthy Paths for Failure
• Path Priority– Integer value assigned to a
path– Higher value == higher
priority– Directly controls priority
group selection
• prio_callout– 3rd party pgm to assign
priority values to each path
prio_callout
multipath
Integer value Device name
Path Grouping Policy = group_by_prio
Only matters if using “group_by_prio” grouping policy
DIRECTLY CONTROLS PRIORITY GROUP SELECTION
- Priority group with highest value is active group
PREVIOUS SLIDE - When all paths in a group are failed, next group becomes active. That would be thepriority group with the next highest priority value that has an active path.
PRIO_CALLOUT
- Provided by vendor or (more typically) custom script written by admin for specific setup
- If not using group_by_prio, then set this to /bin/true
36
National Center for Supercomputing ApplicationsLCI Conference 2007
2. Monitor Healthy Paths for Failure
• path_checker– tur– readsector0– directio– (Custom)
• emc_clarion• hp_sw
• no_path_retry– queue– (N > 0)– fail
TUR
- SCSI Test Unit Ready
- Preferred if lun supports it (OK on DDN, IBM fastt)
- Does not cause AVT on IBM fastt
- Does not fill up /var/log/messages on failures
READSECTOR0
- physical lun access via /dev/sdX (IS THIS CORRECT???)
DIRECTIO
- physical lun access via /dev/sgY (IS THIS CORRECT???)
Both readsector0 and directio cause AVT on IBM fastt, resulting in lun thrashing
Both readsector0 and directio log “fail” messages in /var/log/messages (could be useful if you want tomonitor logs for these events)
NO_PATH_RETRY
- # of retries before failing path
- queue: queue I/O forever
- (N > 0): queue I/O for N retries, then fail
- fail: fail immediately
37
National Center for Supercomputing ApplicationsLCI Conference 2007
3. Monitor failed paths for recovery
• Failback– Immediate (same as n=0)– (n > 0)– manual
FAILBACK
- When a path recovers, wait # seconds before enabling the path
- Recovered path is added back into multipath enabled path list
- multipath re-evaluates priority groups, changes active priority group if neededMANUAL RECOVERY
- User runs ‘/sbin/multipath’ to update enabled paths and priority groups
38
National Center for Supercomputing ApplicationsLCI Conference 2007
Putting it all togehtermultipaths { multipath { wwid 3600a0b8000122c6d00000000453174fc alias fastt21l0 } multipath { wwid 3600a0b80000fd6320000000045317563 alias fastt21l1 }}devices { device { vendor "IBM" product "1742-900" getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
path_grouping_policy group_by_prio prio_callout "/usr/local/sbin/path_prio.sh %n"
path_checker tur no_path_retry fail failback immediate }}
39
National Center for Supercomputing ApplicationsLCI Conference 2007
Putting it all together
/usr/local/etc/primary-paths0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc 500x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd6320000000045317563 20x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:2 sdd 3600a0b8000122c6d0000000345317524 500x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:3 sde 3600a0b80000fd6320000000245317593 20x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc 50x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563 510x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:2 sdk 3600a0b8000122c6d0000000345317524 50x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:3 sdl 3600a0b80000fd6320000000245317593 51
path_prio.shmultipath Primary-pathsmatchingline
sdb
path_prio.sh
50
PATH_PRIO.SH
- grep device from primary-paths file
- return value from last column
40
National Center for Supercomputing ApplicationsLCI Conference 2007
Demo: Active/Passive Disk• Host
– One Emulex LP11000• Disk
– IBM DS4500– Luns presented through
both controllers– Luns accessible via 1
controller only at a time– AVT enabled
AVT
- Lun will migrate to alternate controller if requested there
- Tolerance of cable/switch failure
- AVT penalty - lun inaccessible for 5-10 secs while controller ownership changing
SCREENS: /var/log/messages , multi-port-mon , command , script host
1. No luns (ls_fc_luns)
2. /etc/multipath.conf
1. Multipaths (fastt)
2. Devices (fastt)
3. /usr/local/sbin/path_prio.sh
1. Identify controller A, controller B
4. /usr/local/etc/primary-paths
5. Add luns (scan_fc_luns)
1. See multipath bindings & path_prio.sh output in /var/log/messages
6. View current multipath configuration
1. Multipath -v2 -l
7. Failover test
1. Script-host: disable disk port A
2. See multipathd reconfig in /var/log/messages
3. See I/O path change in multi-port-mon
8. Recover test
1. Script-host: enable disk port A
41
National Center for Supercomputing ApplicationsLCI Conference 2007
Demo: Active/Active Disk• Host
– One Emulex LP11000• Disk
– DDN 8500– Luns accessible via
both controllers (nopenalty)
SCREENS: multi-port-mon , /var/log/messages , command , script-host
1. /etc/multipath.conf
1. Devices (DDN) (path_prio = /bin/true ; path_grouping_policy = multibus)
2. Multipath (DDN)
2. Luns present? (ls_fc_luns) Add luns if needed (scan_fc_luns)
1. See multipath bindings in /var/log/messages
3. View multipath configuration
1. Multipath -v2 -l
4. Failover test
1. Expected changes in multi-port-mon
2. Disable switch port for disk ctlr 1
3. See failover in /var/log/messages and multi-port-mon
5. Restore ctlr access
1. Expected changes in multi-port-mon
2. Enable switch port for disk ctlr 1
3. See failback in /var/log/messages and multi-port-mon
42
National Center for Supercomputing ApplicationsLCI Conference 2007
Path Grouping Policy Matrix
failover *multiple pointsof failure
Active/Passivew/o AVT
path_prio(demo2)path_prio
Active/Passivewith AVT
multibus(demo1)multibus
Active/Active
2 HBAs1 HBA
ACTIVE/ACTIVE 2 HBAs
- trivial, same as demo1
- Each HBA sees 1 ctlr
- Can let both HBAs see both ctlrs (4 paths to each lun)
+ Use path_prio if need to control path usage
ACTIVE/PASSIVE (AVT) 2 HBAs
- trivial, similar to demo2
ACTIVE/PASSIVE (no AVT) 1 HBA
- Tolerant of ctlr failure only.
- If anything else fails, luns will not AVT to alternate ctlr, host will lose access
ACTIVE/PASSIVE (no AVT) 2 HBAs
- Non-preferred paths will be failed
- Each HBA must have full access to both controllers
43
National Center for Supercomputing ApplicationsLCI Conference 2007
Linux Multipath Errata• Making changes to multipath.conf
– Stop multipathd service– Clear multipath bindings
•/sbin/multipath -F
– Create new multipath bindings•/sbin/multipath -v2 -l
– Start multipathd service• Cannot multipath root or boot device• user_friendly_names
– Not really, just random names dm-1, dm-2 …
CANNOT MULTIPATH ROOT OR BOOT DEVICE
- per ap-rhcs-dm-multipath-usagetxt.html (see references section)
44
National Center for Supercomputing ApplicationsLCI Conference 2007
Linux Multipath Resources• multipath.conf.annotated• man multipath• http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=H
ome– Multipath tools official home
• http://www.redaht.com/docs/manuals/csgfs/browse/rh-cs-en/ap-rhcs-dm-multipath-usagetxt.html– Description of output (multipath -v2 -l)
• http://kbase.redhat.com/faq/FAQ_85_7170.shtm– Setup device-mapper multipathing in Red Hat Enterprise Linux 4?
• http://dims.ncsa.uiuc.edu/set/san– Multi-port-mon– Set switchport state : (en/dis)able switch port via SNMP
MULTIPATH.CONF.ANNOTATED (RedHat)
- /usr/share/doc/device-mapper-multipath-0.4.5/multipath.conf.annotated
Recommended