162
VMS Clusters: Advanced Concepts CETS2001 Seminar 1090 Sunday, September 9, 2001, 210A Keith Parris

VMS Clusters: Advanced Concepts

  • Upload
    jalia

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

VMS Clusters: Advanced Concepts. CETS2001 Seminar 1090 Sunday, September 9, 2001, 210A Keith Parris. Speaker Contact Info. Keith Parris E-mail: [email protected] or [email protected] Web: http://www.geocities.com/keithparris/ and http://encompasserve.org/~kparris/ - PowerPoint PPT Presentation

Citation preview

Page 1: VMS Clusters: Advanced Concepts

VMS Clusters:Advanced Concepts

CETS2001 Seminar 1090

Sunday, September 9, 2001, 210A

Keith Parris

Page 2: VMS Clusters: Advanced Concepts

Speaker Contact Info

Keith ParrisE-mail: [email protected] or

[email protected]: http://www.geocities.com/keithparris/ and

http://encompasserve.org/~kparris/

Integrity Computing, Inc.2812 Preakness WayColorado Springs, CO 80916-4375(719) 392-6696

Page 3: VMS Clusters: Advanced Concepts

Topics to be covered

• Large Clusters

• Multi-Site Clusters

• Disaster-Tolerant Clusters

• Long-Distance Clusters

• Performance-Critical Clusters

• Recent Cluster Developments

Page 4: VMS Clusters: Advanced Concepts

Large Clusters

• Large by what metric?– Large in node-count– Large in CPU power and/or I/O capacity– Large in geographical span

Page 5: VMS Clusters: Advanced Concepts

Large Node-Count Clusters

• What is a “large” number of nodes for a cluster?

• VMS Cluster Software SPD limit:– 96 VMS nodes total– 16 VMS nodes per CI Star Coupler

Page 6: VMS Clusters: Advanced Concepts

Large Node-Count Clusters

• What does a typical large node-count cluster configuration consist of?– Several core boot and disk servers– Lots & lots of workstation satellite nodes

Page 7: VMS Clusters: Advanced Concepts

Large Node-Count Clusters

• Why build a large node-count cluster?– Shared access to resources by a large number of

users• Particularly workstation users

– Easier system management of a large number of VMS systems

• Managing two clusters is close to twice the work of managing one

• Adding just 1 more node to an existing cluster is almost trivial in comparison

Page 8: VMS Clusters: Advanced Concepts

Large Node-Count Clusters

• Challenges in building a large node-count cluster:– System (re)booting activity– LAN problems– System management– Hot system files

Page 9: VMS Clusters: Advanced Concepts

System (re)booting activity

• Reboot sources:– Power failures– LAN problems– VMS or software upgrades– System tuning

Page 10: VMS Clusters: Advanced Concepts

System (re)booting activity

• Fighting reboot pain:– Power failures:

• UPS protection

– LAN problems:• Redundancy

• Sub-divide LAN to allow “Divide and conquer” troubleshooting technique

• Monitoring

Page 11: VMS Clusters: Advanced Concepts

System (re)booting activity

• Fighting reboot pain:– VMS or software upgrades; patches:

• Try to target safe “landing zones”

• Set up for automated reboots

– System tuning:• Run AUTOGEN with FEEDBACK on regular

schedule, pick up new parameters during periodic automated reboots

Page 12: VMS Clusters: Advanced Concepts

System (re)booting activity

• Factors affecting (re)boot times:– System disk throughput– LAN bandwidth, latency, and quality– Boot and disk server horsepower

Page 13: VMS Clusters: Advanced Concepts

System disk throughput

• Off-load work from system disk– Move re-directable files (SYSUAF, RIGHTSLIST,

queue file, etc.) off system disk

– Put page/swap files on another disk (preferably local to satellite nodes)

– Install applications on non-system disks when possible

– Dump files off system disk to conserve spaceAll this reduces write activity to system disk,

making shadowset or mirrorset performance better

Page 14: VMS Clusters: Advanced Concepts

System disk throughput

• Avoid disk rebuilds at boot time:– Set ACP_REBLDSYSD=0 to prevent boot-time rebuild

of system disk

– While you’re at it, use MOUNT/NOREBUILD on all disks mounted in startup

– But remember to set up a batch job to do $ SET VOLUME/REBUILD

commands during off-hours to free up disk blocks incorrectly left marked allocated when nodes crash (blocks which were in node’s free-extent cache)

Page 15: VMS Clusters: Advanced Concepts

System disk throughput

• Faster hardware:– Caching in the disk controller– 10K rpm or 15K rpm magnetic drives– Solid-state disks– DECram disks (shadowed with non-volatile

disks)

Page 16: VMS Clusters: Advanced Concepts

System disk throughput

• Multiple system disk spindles– Host-based volume shadowing allows up to 3

copies of each system disk– Controller-based mirroring allows up to 6

spindles in each mirrorset– Controller-based striping or RAID-5 allows up

to 14 spindles in a storageset– And you can layer these

Page 17: VMS Clusters: Advanced Concepts

System disk throughput

• Multiple separate system disks for groups of nodes– Use “cloning” technique to replicate system

disks and avoid doing “n” upgrades for “n” system disks

• Consider throttling satellite boot activity to limit demand

Page 18: VMS Clusters: Advanced Concepts

System disk “Cloning” technique

• Create “Master” system disk with roots for all nodes. Use Backup to create Clone system disks.

• Before an upgrade, save any important system-specific info from Clone system disks into the corresponding roots on Master system disk– Basically anything that’s in SYS$SPECIFIC:[*]– Examples: ALPHAVMSSYS.PAR,

MODPARAMS.DAT, AGEN$FEEDBACK.DAT

• Perform upgrade on Master disk• Use Backup to copy Master to Clone disks again.

Page 19: VMS Clusters: Advanced Concepts

LAN bandwidth, latency, and quality

– Divide LAN into multiple segments– Connect systems with switches or bridges

instead of contention-based hubs– Use full-duplex links when possible

Page 20: VMS Clusters: Advanced Concepts

LAN bandwidth, latency, and quality

– Use faster LAN technology at concentration points like backbones and at servers:

• e.g. if using Fast Ethernet for satellites, consider using Gigabit Ethernet for server LAN adapters

– Provide redundant LANs for servers, backbone

Page 21: VMS Clusters: Advanced Concepts

LAN bandwidth, latency, and quality

– Try to avoid saturation of any portion of LAN hardware

– Bridge implementations must not drop small packets under heavy loads

• SCS Hello packets are small packets

• If two in a row get lost, a node without redundant LANs will see a Virtual Circuit closure; if failure lasts too long, node will do a CLUEXIT bugcheck

Page 22: VMS Clusters: Advanced Concepts

LAN bandwidth, latency, and quality

– Riding through temporary LAN problems while you troubleshoot:

• Raise RECNXINTERVAL parameter

– Default is 20 seconds

– It’s a dynamic parameter

Page 23: VMS Clusters: Advanced Concepts

LAN bandwidth, latency, and quality

• Where redundant LAN hardware is in place, use the LAVC$FAILURE_ANALYSIS tool from SYS$EXAMPLES:

– It monitors and reports, via OPCOM messages, LAN component failures and repairs

– Described in Appendix D of the OpenVMS Cluster Systems Manual

– Workshop 1257: Network Monitoring for LAVCs• Tuesday 1:00 pm, Room 208A• Thursday 8:00 am, Room 208A

Page 24: VMS Clusters: Advanced Concepts

LAN bandwidth, latency, and quality

– VOTES:• Most configurations with satellite nodes give votes

to disk/boot servers and set VOTES=0 on all satellite nodes

• If the sole LAN adapter on a disk/boot server fails, and it has a vote, ALL satellites will CLUEXIT!

• Advice: give at least as many votes to node(s) on the LAN as any single server has, or configure redundant LAN adapters

Page 25: VMS Clusters: Advanced Concepts

LAN redundancy and Votes

0 0 0

1 1

Page 26: VMS Clusters: Advanced Concepts

LAN redundancy and Votes

0 0 0

1 1

Page 27: VMS Clusters: Advanced Concepts

LAN redundancy and Votes

0 0 0

1 1

Subset A

Subset B

Which subset of nodes does VMS select as the optimal subcluster?

Page 28: VMS Clusters: Advanced Concepts

LAN redundancy and Votes

0 0 0

1 1

One possible solution: redundant LAN adapters on servers

Page 29: VMS Clusters: Advanced Concepts

LAN redundancy and Votes

1 1 1

2 2

Another possible solution: Enough votes on LAN to outweigh any single server node

Page 30: VMS Clusters: Advanced Concepts

Boot and disk server horsepower

• MSCP-serving is done in interrupt state on Primary CPU– Interrupts from LAN Adapters come in on CPU

0 (Primary CPU)

• Multiprocessor system may have no more MSCP-serving capacity than a uniprocessor– Fast_Path on CI may help

Page 31: VMS Clusters: Advanced Concepts

Large Node-Count Cluster System Management

• Console management software is very helpful for reboots & troubleshooting

• If that’s not available, consider using console firmware’s MOP Trigger Boot function to trigger boots in small waves after a total shutdown

• Alternatively, satellites can be shut down with auto-reboot and then MOP boot service can be disabled, either on an entire boot server, or for individual satellite nodes, to control rebooting

Page 32: VMS Clusters: Advanced Concepts

Hot system files

• Standard multiple-spindle techniques also apply here:– Disk striping (RAID-0)– Volume Shadowing (host-based RAID-1)– Mirroring (controller-based RAID-1)– RAID-5 array (host- or controller-based)

• Consider solid-state disk for hot system files, such as SYSUAF, queue file, etc.

Page 33: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Why build a large cluster, in terms of CPU and/or I/O capacity?– Handle high demand for same application(s)– Pool resources to handle several applications

with lower overall costs and system management workload than separate clusters

Page 34: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Risks:– “All eggs in one basket”– Hard to schedule downtime

• Too many applications, with potentially different availability requirements

– System tuning and performance• Which application do you optimize for?

• Applications may have performance interactions

Page 35: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Plan, configure, and monitor to avoid bottlenecks (saturation of any resource) in all areas:– CPU– Memory– I/O– Locking

Page 36: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Generally easiest to scale CPU by first adding CPUs within SMP boxes– within limits of VMS or applications’ SMP

scalability

• Next step is adding more systems– But more systems implies less local locking– Local locking code path length and latency are

much lower than remote (order-of-magnitude)

Page 37: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Memory scaling is typically easy with 64-bit Alphas: Buy more memory– May require adding nodes eventually

Page 38: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• I/O scalability is generally achieved by using:– Multiple I/O adapters per system– More disk controllers, faster controllers, more

controller cache– More disks; faster disks (solid-state)– Disk striping, mirroring, shadowing

Page 39: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Challenges in I/O scalability:– CPU 0 interrupt-state saturation– Interconnect load balancing

Page 40: VMS Clusters: Advanced Concepts

CPU 0 interrupt-state saturation

• VMS receives interrupts on CPU 0 (Primary CPU)• If interrupt workload exceeds capacity of primary

CPU, odd symptoms can result– CLUEXIT bugchecks, performance anomalies

• VMS has no internal feedback mechanism to divert excess interrupt load– e.g. node may take on more trees to lock-master than it

can later handle

• Use MONITOR MODES/CPU=0/ALL to track CPU 0 interrupt state usage and peaks

Page 41: VMS Clusters: Advanced Concepts

CPU 0 interrupt-state saturation

• FAST_PATH capability can move some of interrupt activity to non-primary CPUs

• Lock mastership workload can be heavy contributor to CPU 0 interrupt state– May have to control or limit this workload

Page 42: VMS Clusters: Advanced Concepts

Interconnect load balancing

• SCS picks path in fixed priority order:1. Galaxy Shared Memory Cluster Interconnect (SMCI)2. Memory Channel3. CI4. DSSI5. LANs, based on:

a) Maximum packet size, andb) Lowest latency

• PORT_CLASS_SETUP tool available from CSC to allow you to change order of priority if needed

• e.g. to prefer Gigabit Ethernet over DSSI

Page 43: VMS Clusters: Advanced Concepts

Interconnect load balancing

• CI Port Load Sharing code didn’t get ported from VAX to Alpha– MOVE_REMOTENODE_CONNECTIONS

tool available from CSC to allow you to statically balance VMS$VAXcluster SYSAP connections across multiple CI adapters

Page 44: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Locking performance scaling is generally done by:– Improving CPU speed (to avoid CPU 0

interrupt-state saturation)– Improving cluster interconnect performance

(lower latency, higher bandwidth, and minimizing host CPU overhead)

– Spreading locking workload across multiple systems

Page 45: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Locking performance scaling– Check SHOW CLUSTER/CONTINUOUS with

ADD CONNECTIONS, ADD REM_PROC and ADD CR_WAITS to check for SCS credit waits. If counts are present and increasing over time, increase the SCS credits at the remote end as follows:

Page 46: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Locking performance scaling– For credit waits on VMS$VAXcluster SYSAP

connections:• Increase CLUSTER_CREDITS parameter

• Default is 10; maximum is 127

Page 47: VMS Clusters: Advanced Concepts

High-Horsepower Clusters

• Locking performance scaling– For credit waits on VMS$DISK_CL_DRVR /

MSCP$DISK connections:• For VMS server node, increase MSCP_CREDITS

parameter. Default is 8; maximum is 128.

• For HSJ/HSD controller, lower MAXIMUM_HOSTS from default of 16 to actual number of VMS systems on the CI/DSSI interconnect

Page 48: VMS Clusters: Advanced Concepts

Multi-Site Clusters

• Consist of multiple “Lobes” with one or more systems, in different locations

• Systems in each “Lobe” are all part of the same VMS Cluster and can share resources

• Sites typically connected by bridges (or bridge-routers; pure routers don’t pass SCS traffic)

Page 49: VMS Clusters: Advanced Concepts

Multi-Site Clusters

• Sites linked by:– DS-3/T3 (E3 in Europe) or ATM Telco circuits

– Microwave link: DS-3/T3 or Ethernet

– “Dark fiber” where available:• FDDI: 40 km with single-mode fiber; 2 km multi-mode fiber

• Ethernet over fiber (10 mb, Fast, Gigabit)

• Fiber links between Memory Channel switches ; up to 3 km

• Dense Wave Division Multiplexing (DWDM), then ATM

Page 50: VMS Clusters: Advanced Concepts

Multi-Site Clusters

• Inter-site link minimum standards are in OpenVMS Cluster Software SPD:– 10 megabits minimum data rate– “Minimize” packet latency– Low SCS packet retransmit rate:

• Less than 0.1% retransmitted. Implies:

– Low packet-loss rate for bridges

– Low bit-error rate for links

Page 51: VMS Clusters: Advanced Concepts

Disaster-Tolerant Clusters

• Goal: Survive loss of up to one entire datacenter

• Foundation:– Two datacenters a “safe” distance apart– VMS Cluster Software for coordination– Bridge with WAN link for cluster interconnect– Volume shadowing for two identical copies of

data, one at each site

Page 52: VMS Clusters: Advanced Concepts

Disaster-Tolerant Clusters

• Foundation:– Management and monitoring tools

• Remote system console access

• Failure detection and alerting, for things like:

– Network (especially inter-site link) monitoring

– Shadowset member loss

– Node crash

• Quorum recovery tool (for 2-site, balanced votes)

Page 53: VMS Clusters: Advanced Concepts

Disaster-Tolerant Clusters

• Foundation:– Configuration planning and implementation

assistance, and staff training• Compaq recommends Disaster Tolerant Cluster

Services (DTCS) with Heroix-based toolset

– Formerly Business Recovery Server (BRS) with Polycenter toolset

– and before that, Multi-Datacenter Facility (MDF)

Page 54: VMS Clusters: Advanced Concepts

Disaster-Tolerant Clusters

• Foundation:– Carefully-planned procedures for:

• Normal operations

• Scheduled downtime and outages

• Detailed diagnostic and recovery action plans for various failure scenarios

Page 55: VMS Clusters: Advanced Concepts

Disaster-Tolerant Clusters

• Management and monitoring toolset choices:– Remote system console access

• CA Manage/IT (formerly Polycenter Console Manager); Heroix RoboCentral; TECSys Development’s ConsoleWorks; Ki Networks’ CLIM

– Failure detection and alerting• Heroix RoboMon; CA Watch/IT (formerly

Polycenter Watchdog); Compaq CockpitMgr; BMC Patrol

Page 56: VMS Clusters: Advanced Concepts

Disaster-Tolerant Clusters

• Management and monitoring toolset choices:– Network monitoring

• Unicenter TNG, HP OpenView, Tivoli, ClearViSN, CockpitMgr, etc.

– Quorum recovery tool• DECamds / Availability Manager• DTCS or BRS integrated tools (which talk to

DECamds’ RMDRIVER)

Page 57: VMS Clusters: Advanced Concepts

Application Scheme 1:Hot Primary/Cold Standby

• All applications normally run at the primary site– Second site is idle, except for remote shadowing, until

primary site fails, then it takes over processing

• Performance will be good (all-local locking)• Fail-over time will be poor, and risk high (standby

systems not active and thus not being tested)• Wastes computing capacity at the remote site

Page 58: VMS Clusters: Advanced Concepts

Application Scheme 2:Hot/Hot but Alternate Workloads• All applications normally run at one site or the

other, but not both; opposite site takes over upon a failure

• Performance will be good (all-local locking)• Fail-over time will be poor, and risk moderate

(standby systems in use, but specific applications not active and thus not being tested from that site)

• Second site’s computing capacity is actively used

Page 59: VMS Clusters: Advanced Concepts

Application Scheme 3:Uniform Workload Across Sites

• All applications normally run at both sites simultaneously; surviving site takes all load upon failure

• Performance may be impacted (some remote locking) if inter-site distance is large

• Fail-over time will be excellent, and risk low (standby systems are already in use running the same applications, thus constantly being tested)

• Both sites’ computing capacity is actively used, and load can be evenly balanced across sites

Page 60: VMS Clusters: Advanced Concepts

Quorum configurations inMulti-Site Clusters

• 3 sites, equal votes in 2 sites– Intuitively ideal; easiest to manage & operate– 3rd site serves as tie-breaker– 3rd site might contain only a “quorum system”

Page 61: VMS Clusters: Advanced Concepts

Quorum configurations inMulti-Site Clusters

• 3 sites, equal votes in 2 sites– Rare in practice, presumably due to cost of

inter-site links beyond on-campus distances• Could use links to quorum site as backup for

main inter-site link if links are high-bandwidth and bridged together

• Could use 2 less-expensive, lower-bandwidth links to quorum site (i.e. 10 megabit), but not bridge them, to lower cost

Page 62: VMS Clusters: Advanced Concepts

Quorum configurations in3-Site Clusters

B

BB

BB

B

BN N

N N

NN

T3, E3, ATM, GbE10 megabit, lower-cost link

Page 63: VMS Clusters: Advanced Concepts

Quorum configurations inMulti-Site Clusters

• 2 sites:– Most common & most problematic:

• How do you arrange votes? Balanced? Unbalanced?

• If votes are balanced, how do you recover from loss of quorum which will result when either site or the inter-site link fails?

Page 64: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

• Unbalanced Votes– More votes at one site– Site with more votes can continue without

human intervention in the event of loss of the other site or the inter-site link

– Site with fewer votes hangs on a failure and requires manual action to continue after loss of the other site

Page 65: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

• Unbalanced Votes– Very common in remote-shadowing-only

clusters (not full disaster-tolerant); 0 votes is a common choice for the remote site

Page 66: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

• Unbalanced Votes– Common mistake: 0 votes at remote site means

all remote-site nodes leave cluster (and take all the remote shadowset members with them) if ANY system with non-zero VOTES loses its LAN adapter

• Solution is to give remote node(s) at least as many votes total as any one local system

Page 67: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

1 01 0

Nodes at this site CLUEXITNodes at this site continue

Page 68: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

• Unbalanced Votes– Common mistake: give more votes to Primary

site; leave Standby site unmanned• Cluster can’t run without Primary, or without human

intervention at unmanned Standby site if Primary site fails

Page 69: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

• Balanced Votes– Equal votes at each site– Manual action required to restore quorum and

continue processing in the event of either:• Site failure, or

• Inter-site link failure

Page 70: VMS Clusters: Advanced Concepts

Quorum configurations inTwo-Site Clusters

• Balanced Votes– Note: Using REMOVE_NODE option in

SHUTDOWN.COM (post V6.2) when taking down a node effectively “unbalances” votes

Page 71: VMS Clusters: Advanced Concepts

Quorum Recovery Methods

1. Software interrupt at IPL 12 from console• IPC> Q

2. DECamds Console: • System Fix; Adjust Quorum

3. DTCS or BRS integrated tool, using same RMDRIVER (DECamds client) interface

Page 72: VMS Clusters: Advanced Concepts

Optimal Sub-cluster Selection

• Connection manager compares potential node subsets that could make up surviving portion of the cluster

1) Pick sub-cluster with the most votes2) If votes are tied, pick sub-cluster with the most

nodes3) If nodes are tied, arbitrarily pick a winner

• based on comparing SCSSYSTEMID values of set of nodes with most-recent cluster software revision

Page 73: VMS Clusters: Advanced Concepts

Shadowing Between Sites

• Shadow copies can generate a high data rate on inter-site links

• Excessive shadow-copy time increases Mean Time To Repair (MTTR) after a site failure or outage

• Acceptable shadow full-copy times and link costs will be the major factors in selecting inter-site link(s)

Page 74: VMS Clusters: Advanced Concepts

Shadowing Between Sites

• Because:1) Inter-site latency is typically much greater than intra-

site latency, at least if there is any significant distance between sites, and

2) Direct operations are a minimum of 1-3 ms lower in latency than MSCP-served operations, even when the inter-site distance is small,

It is most efficient to direct Read operations to the local disks, not remote disks

– Write operations have to go to all disks in a shadowset, remote as well as local members

Page 75: VMS Clusters: Advanced Concepts

Shadowing Between Sites

• Directing Shadowing Read operations to local disks, in favor of remote disks:

– Set bit 16 (%x10000) in SYSGEN parameter SHADOW_SYS_DISK

– VMS 7.3 (or recent ECO kits) allow you to tell VMS at which site member disks are located

Page 76: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Host-Based Volume Shadowing full-copy algorithm is non-intuitive:

1. Read from source disk

2. Do Compare operation with target disk

3. If data is different, write to target disk, then go to Step 1.

Page 77: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Shadow_Server process does copy I/Os• Does one 127-block segment at a time,

from the beginning of the disk to the end, with no double-buffering or other speed-up tricks

• Odd algorithm is to ensure correct results on system disks with member booting or dumping while shadow copy is going on

Page 78: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Implications:– Shadow copy completes fastest if data is

identical beforehand• Fortunately, this is the most-common case – re-

adding a shadow member into shadowset again after it was a member before

Page 79: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• If data is very different, empirical tests have shown that it is faster to:

1. Do BACKUP/PHYSICAL from source shadowset to /FOREIGN-mounted target disk

2. Then do shadow copy afterward

than to simply initiate the shadow copy with differing data.

• But be sure to clobber SCB on target disk with an $INITIALIZE (or $MOUNT/OVERRIDE=SHADOW) command before adding new member to shadowset

Page 80: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• For even more speed-up, perform the BACKUP/PHYSICAL on a node on the target side– Because remote (MSCP-served) writes take a

minimum of 2 round trips, whereas remote reads take a minimum of only 1 round trip

Page 81: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Doing shadow copy work from a node at target site, not source site, is also most efficient, for the same reason. It also uses less inter-site bandwidth.– To control which node does shadow copy:

• 1) Set dynamic SYSGEN parameter SHADOW_MAX_COPY to a large positive value on target-site node(s)

• 2) Set SHADOW_MAX_COPY to 0 on all other nodes

• 3) Do $MOUNT to add member to shadowset; wait briefly

• 4) Reset SHADOW_MAX_COPY parameter to original values on all nodes

Page 82: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Determining which node is performing a shadow copy:

– Using SDA: • From each cluster node, do:

1. SDA> SET PROCESS SHADOW_SERVER

2. SDA> SHOW PROCESS/CHANNELS

3. and look for Busy channel to disk of interest

• Or look for node holding a lock in Exclusive mode on a resource of the form $DSAnnnn$_COPIER

Page 83: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Because of the sequential, non-pipelined nature of the shadow-copy algorithm, to speed shadow copies:– Rather than forming controller-based stripesets

and shadowing those across sites, shadow individual disks in parallel, and combine them into RAID-0 arrays with host-based RAID software

Page 84: VMS Clusters: Advanced Concepts

Speeding Shadow Copies

• Dividing a disk up into 4 partitions at the controller level, and shadow-copying all 4 in parallel, takes only 40% of the time to shadow-copy the entire disk as a whole

Page 85: VMS Clusters: Advanced Concepts

Protecting Shadowed Data

• Shadowing keeps a “Generation Number” in the SCB on shadow member disks

• Shadowing “Bumps” the Generation number at the time of various shadowset events, such as mounting, or membership changes

Page 86: VMS Clusters: Advanced Concepts

Protecting Shadowed Data

• Generation number is designed to constantly increase over time, never decrease

• Implementation is based on VMS timestamp value, and during a “Bump” operation it is increased to the current time value (or, if it’s already a future time for some reason, such as time skew among cluster member clocks, then it’s simply incremented). The new value is stored on all shadowset members at the time of the Bump.

Page 87: VMS Clusters: Advanced Concepts

Protecting Shadowed Data

• Generation number in SCB on removed members will thus gradually fall farther and farther behind that of current members

• In comparing two disks, a later generation number should always be on the more up-to-date member, under normal circumstances

Page 88: VMS Clusters: Advanced Concepts

Protecting Shadowed Data

• If shadowing can’t “see” a later disk’s SCB (i.e. because the site or link to the site is down), it may use an older member and then update the Generation number to a current timestamp value

• New /POLICY=REQUIRE_MEMBERS qualifier on MOUNT command prevents a mount unless all of the listed members are present for Shadowing to compare Generation numbers on

Page 89: VMS Clusters: Advanced Concepts

Protecting Shadowed Data

• Shadow-copy “nightmare scenario”:– Shadow copy in “wrong” direction copies old data over

new

• Real-life example:– Inter-site link failure occurs

– Due to unbalanced votes, Site A continues to run• Shadowing increases generation numbers on Site A disks after

removing Site B members from shadowset

Page 90: VMS Clusters: Advanced Concepts

Protecting Shadowed Data

– Site B is brought up briefly by itself for whatever reason

• Shadowing can’t see Site A disks. Shadowsets mount with Site B disks only. Shadowing bumps generation numbers on Site B disks. Generation number is now greater than on Site A disks.

– Link gets fixed. Both sites are taken down and rebooted at once.

– Shadowing thinks Site B disks are more current, and copies them over Site A’s. Result: Data Loss.

Page 91: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• After a site failure or inter-site link failure, rebooting the downed site after repairs can be disruptive to the surviving site

• Many DT Cluster sites prevent systems from automatically rebooting without manual intervention– Easiest way to accomplish this is to set console

boot flags for conversational boot

Page 92: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• If MOUNT commands are in SYSTARTUP_VMS.COM, shadow copies may start as soon as the first node at the downed site reboots

• Recommendation is to not mount shadowsets automatically at startup; manually initiate shadow copies of application data disks at an opportune time

Page 93: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• In bringing a cluster with cross-site shadowsets completely down and back up, you need to preserve both shadowset members to avoid a full copy operation

• Cross-site shadowsets must be dismounted while both members are still accessible

• This implies keeping MSCP-serving VMS systems up at each site until the shadowsets are dismounted– Easy way is to use the CLUSTER_SHUTDOWN

option on SHUTDOWN.COM

Page 94: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• In bringing a cluster with cross-site shadowsets back up, you need to ensure both shadowset members are accessible at mount time, to avoid removing a member and thus needing to do a shadow full-copy afterward

• If MOUNT commands are in SYSTARTUP_VMS.COM, the first node up at the first site up will form 1-member shadow sets and drop the other site’s shadow members

Page 95: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• Recommendation is to not mount cross-site shadowsets automatically in startup; wait until at least a couple of systems are up at each site, then manually initiate cross-site shadowset mounts

• Since MSCP-serving is enabled before a node joins a cluster, booting systems at both sites simultaneously works most of the time

Page 96: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• New Shadowing capabilities help in this area: $ MOUNT DSAnnn: label

without any other qualifiers will mount a shadowset on an additional node using the existing membership, without the chance of any shadow copies being initiated.

• This allows you to start the application at the second site and run from the first site’s disks, and do the shadow copies later

Page 97: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• DCL code can be written to wait for both shadowset members before MOUNTing, using the /POLICY=REQUIRE_MEMBERS and /NOCOPY qualifiers as safeguards against undesired copies

Page 98: VMS Clusters: Advanced Concepts

Avoiding Untimely/Unwanted Shadow Copies

• USERDn SYSGEN parameter may be used to indicate to startup procedures the desired action:– Mount both members (normal case; both sites

OK)– Mount only local member (other site is down)– Mount only remote member (other site

survived; this site re-entering the cluster, but deferring shadow copies until later)

Page 99: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance

• Goal is to continue operating despite loss of an entire datacenter– All the pieces must be in place to allow that:

• User access to both sites

• Network connections to both sites

• Operations staff at both sites

– Business can’t depend on anything that is only at either site

Page 100: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance

• Sites must be carefully selected to avoid common hazards and loss of both datacenters at once

• Make them a “safe” distance apart• This must be a compromise. Factors:

– Risks

– Performance (inter-site latency)

– Interconnect costs

– Ease of travel between sites

Page 101: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance: What is a “Safe Distance”

• Analyze likely hazards of proposed sites:– Fire (building, forest, gas leak, explosive

materials)– Storms (Tornado, Hurricane, Lightning, Hail)– Flooding (excess rainfall, dam breakage, storm

surge, broken water pipe)– Earthquakes, Tsunamis

Page 102: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance: What is a “Safe Distance”

• Analyze likely hazards of proposed sites:– Nearby transportation of hazardous materials

(highway, rail)– Terrorist (or disgruntled customer) with a bomb

or weapon– Civil unrest (riots, vandalism)– Enemy attack in war (nearby military or

industrial targets)

Page 103: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance: What is a “Safe Distance”

• Select separation direction:– Not along same earthquake fault-line– Not along likely storm tracks– Not in same floodplain or downstream of same

dam– Not on the same coastline– Not in line with prevailing winds (that might

carry hazardous materials)

Page 104: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance: What is a “Safe Distance”

• Select separation distance in safe direction:– 1 mile (1-2 km): protect against most building

fires, gas leak, bombs, armed intruder– 10 miles (10-20 km): protect against most

tornadoes, floods, hazardous material spill– 100 miles (100-200 km): protect against most

hurricanes, earthquakes, tsunamis, forest fires

Page 105: VMS Clusters: Advanced Concepts

Planning for Disaster Tolerance

• Plan for operation after a disaster; surviving site will likely have to operate alone for a long period before the other site can be repaired or replaced– Provide redundancy within each site

• Mirroring or RAID-5 to protect against disk failures (or enough spare drives to quickly form shadowsets)

– Provide enough capacity within each site to run the business alone

• and handle normal workload growth rate

Page 106: VMS Clusters: Advanced Concepts

Long-Distance Clusters

• VMS SPD supports distance of up to 150 miles (250 km) between sites– up to 500 miles (800 km) with DTCS or BRS

• Why the limit?– Inter-site latency

• Latency due to speed of light becomes significant at higher distances: – About 1 ms per 50 miles (80 km) round-trip

Page 107: VMS Clusters: Advanced Concepts

Inter-Site Link Choices

• Media possibilities:– DS-3/T3 (called E3 in Europe)– ATM– Dark fiber

• FDDI, Gigabit Ethernet, ATM, or DWDM for several on same fiber

• Fibre Channel inter-site link for storage interconnect can be dark fiber or ATM link

Page 108: VMS Clusters: Advanced Concepts

Inter-Site Link Choices

• Service type choices– Telco-provided service, own microwave link,

or dark fiber?– Dedicated bandwidth, or shared ring?– Multiple vendors? Diverse paths?

Page 109: VMS Clusters: Advanced Concepts

Inter-site Round-Trip Latencies

4 100 120 230 270 285 440 500

4400

0500

100015002000250030003500400045005000

Latency (micro-seconds)

Local node

Galaxy SMCI

Memory Channel

Gigabit Ethernet

FDDI

FDDI via ATM

CI

ATM 6 miles

DS3 250 miles

Page 110: VMS Clusters: Advanced Concepts

Bandwidth of Inter-Site Link(s)

• Link bandwidth:– DS3/T3: 45 Mb/sec (about 4.5 MB/sec)– ATM: 155 Mb/sec (100 with

GIGAswitch/FDDI)– FDDI: 100 Mb/sec, 40 km max. distance– FC: 1 Gb/sec, 100 km max. distance– Memory Channel 2: 1 Gb/sec, 3 km max.

distance

Page 111: VMS Clusters: Advanced Concepts

Bandwidth of Inter-Site Link• Bandwidth affects performance of:

– Volume Shadowing full copy– Volume Shadowing merge operation

• Link is typically only fully utilized during shadow copies– Size link(s) for acceptably small full shadow copy

times– VMS (PEDRIVER) can use multiple links very well as

long as there are multiple nodes per site• Significant improvements in this area in VMS 7.3

Page 112: VMS Clusters: Advanced Concepts

Differentiate between latency and bandwidth

• Can’t get around the speed of light and its latency effects over long distances

• Higher-bandwidth link doesn’t mean lower latency

• Multiple links may help latency a bit under heavy loading due to shorter queues

Page 113: VMS Clusters: Advanced Concepts

Local versus Remote operations

• Remote operations suffer inter-site latency:– Locking

• Lock request with lock master at remote site

• Lock directory lookups with directory node at remote site

Page 114: VMS Clusters: Advanced Concepts

Local versus Remote operations

• Remote operations suffer inter-site latency:– MSCP-served I/Os

• e.g. shadowset writes– Use MONITOR MSCP, SHOW DEVICE/SERVED,

and/or AUTOGEN with FEEDBACK to ensure MSCP_BUFFER is high enough to avoid segmenting transfers

– Check SHOW CLUSTER/CONTINUOUS with CR_WAITS and/or AUTOGEN with FEEDBACK to ensure MSCP_CREDITS is high enough to avoid SCS credit waits

Page 115: VMS Clusters: Advanced Concepts

Local versus Remote operations

• Remote operations suffer inter-site latency:– Remote MSCP-served writes

• Write requires at least 2 round trips (read takes only 1 round trip)

• Use write-back cache in controller to minimize I/O latency for target disk

Page 116: VMS Clusters: Advanced Concepts

Local versus Remote operations

• Optimize local operations:– Read I/Os:

• Be sure to set SHADOW_SYS_DISK bit %X10000 (bit 16) to select local-read optimization

• Specify site for member disks with VMS 7.3 or recent VOLSHAD ECO kits

Page 117: VMS Clusters: Advanced Concepts

Local versus Remote operations

• Optimize local operations:– Read I/Os:

• Avoid using FDDI Storage Server versus local CI or DSSI disks in long-distance clusters, because:

– it adds an additional 2-3 milliseconds latency

– SHDRIVER can’t differentiate local from remote disks, so half of all reads will go to the remote site

Page 118: VMS Clusters: Advanced Concepts

Performance-Critical Clusters

• Fast-Failover Clusters– For real-time or near-real-time applications– State transition impact must be minimized

Page 119: VMS Clusters: Advanced Concepts

Minimizing Impact of State Transitions

• State transition impact is affected by:– Frequency/probability of state transitions– Failure detection time– Length of reconnection interval– Number of outstanding locks– Presence of a quorum disk– Node reboot activity following state transition

Page 120: VMS Clusters: Advanced Concepts

Minimizing Impact of State Transitions

• Frequency/probability of state transitions– UPS power protection– Patches/fixes in place– Minimal number of nodes (3)– Stable, proven, tested hardware configuration– Protected, isolated cluster interconnects

Page 121: VMS Clusters: Advanced Concepts

Minimizing Impact of State Transitions

• Failure detection time– Lower TIMVCFAIL parameter

• Length of reconnection interval– Let Last-Gasp work (bypasses reconnection

interval)• i.e. don’t just Halt or power-off system

– Lower RECNXINTERVAL parameter

Page 122: VMS Clusters: Advanced Concepts

Minimizing Impact of State Transitions

• Number of outstanding locks– Application design; number of open files

• Presence of a quorum disk– Use quorum node as tie-breaker instead of

quorum disk

Page 123: VMS Clusters: Advanced Concepts

Minimizing Impact of State Transitions

• Node reboot activity following state transition– System disk rebuild

• Set ACP_REBLDSYSD to 0

– System disk shadow merges• Use MSCP (e.g. HSJ) controllers with Write-

Logging Shadowing Assist to allow mini-merges (with DOSD)

Page 124: VMS Clusters: Advanced Concepts

Massive Scalability Challenges

• SMP not fully symmetrical in VMS– Interrupts go to primary CPU– No integral feedback mechanism to throttle interrupt

load– Saturation of CPU 0 in interrupt state causes strange

symptoms• e.g. failure to service PEDRIVER timer queue and send out

Hello packets may result in CLUEXIT of node(s)

– Spinlock contention puts practical limit on # of CPUs within each box

• This puts upward pressure on cluster node count

Page 125: VMS Clusters: Advanced Concepts

Massive Scalability Challenges

• Lock Mastership workload for a given resource tree is confined to a single node– CPU power of CPU 0 on lock master node for

busiest resource tree may be a bottleneck for performance of the entire cluster

Page 126: VMS Clusters: Advanced Concepts

Scalability Challenges

• Interconnects may limit node counts– CI node limit: 16 nodes

• CI Star Coupler Extender (CISCE), which allowed up to 32 nodes (16 VMS nodes plus 16 storage nodes), is no longer available

• But HSJ80 with dual CI ports can allow greater than 16 VMS nodes with a creative CI configuration

– Example: 24 or 32 nodes

Page 127: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Configure “extra” redundancy– 2N redundancy instead of N+1, and 3N instead

of 2N• i.e. shadowing instead of RAID-5

– Layer redundancy when possible• e.g. shadowsets of mirrorsets

– Recognize all potential single points of failure• e.g. dual-redundant controller pair tightly coupled• e.g. non-mirrored cache memory in controller

Page 128: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Monitor closely– Fix broken hardware quickly because it reduces

redundancy and subsequent hardware failures can cause outages

– Quicker mean-time-to-repair means better mean-time-to-failure

Page 129: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Configure reserve capacity– Avoid saturation and recovery scenarios,

infrequently-used code paths

Page 130: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Avoid using very-new VMS versions, ECO patch kits, firmware releases, and hardware products– Let new VMS releases, ECO kits, and products

“age”– This lets other sites find the bugs first, and

helps avoid installing an ECO kit only to find shortly that it has been put on “Engineering Hold” due to a problem

Page 131: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Allow fail-over mechanisms the freedom to work– Leave options open for fail-over in case they are

needed:• e.g. Rather than leaving preferred paths set permanently down

at the HSx controller level, give VMS the freedom to select the path, in anticipation of future failure conditions

• e.g. Leave Ethernet LAN paths enabled for SCS traffic even though FDDI is present and larger packet size is in use

• e.g. Load MSCP server with MSCP_LOAD=2 even on a node which you normally don’t desire to take MSCP serving load

Page 132: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Give time for hardware to mature– The second or third revisions of a product often

have higher performance and fewer problems than the very first version (and often cost less)

• However, one aspect of “availability” is acceptable performance, so– You may have to trade off hardware/software

maturity in favor of higher performance to handle high workload growth rates

Page 133: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Test new code and hardware (and any changes) first in a test environment separate from the production environment– Ideally, have a full-scale test environment

which exactly duplicates the production environment, to allow testing under full (simulated) loading

• Introduce any changes into production cautiously, one node at a time

Page 134: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Introduce new technology using “generational diversity”– Instead of using all new technology, use a mix

of old (proven) and new (fast, but unproven) technology, backing each other up in redundant configurations

Page 135: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Consider using multiple clusters– Provide in-bound user routing across clusters– Allow user data to be migrated between clusters

on-the-fly, transparent to the user– Failure of any single cluster affects only a

fraction of the users– Partial failure of any cluster can be mitigated by

migrating its users to another cluster

Page 136: VMS Clusters: Advanced Concepts

Achieving Extremely High Availability

• Do cluster configuration and application design or re-design with availability in mind:– On-line backups– Disk defragmentation– Indexed file reorganization

Page 137: VMS Clusters: Advanced Concepts

Making System Management of Large Clusters Efficient

• Most large clusters have multiple system disks

• Create a “cluster-common” disk– Mount it in SYLOGICALS.COM– Put all cluster-common files there, and define

logicals in SYLOGICALS.COM to point to them:

• SYSUAF, RIGHTSLIST• Queue file, LMF database, etc.

Page 138: VMS Clusters: Advanced Concepts

Making System Management of Large Clusters Efficient

– Put startup files on cluster-common disk also; and replace startup files on all system disks with a pointer to the common one:

• e.g. SYS$STARTUP:STARTUP_VMS.COM contains only:

$ @CLUSTER_COMMON:SYSTARTUP_VMS

– To allow for differences between nodes, test for node name in common startup files, e.g. $ NODE = F$GETSYI(“NODENAME”) $ IF NODE .EQS. “GEORGE” THEN ...

Page 139: VMS Clusters: Advanced Concepts

Making System Management of Large Clusters Efficient

• Create a MODPARAMS_COMMON.DAT file on the cluster-common disk which contains system parameter settings common to all nodes– For multi-site or disaster-tolerant clusters, also

create one of these for each site

• Include an AGEN$INCLUDE line in each node-specific MODPARAMS.DAT to include the common parameter settings

Page 140: VMS Clusters: Advanced Concepts

Recent Cluster Developments

• SCSI Device Naming Changes

• Fibre Channel storage

• Data Replication Manager

Page 141: VMS Clusters: Advanced Concepts

Fibre Channel and SCSI

• Fibre Channel or SCSI are Storage-Only Interconnects– Provide access to storage devices and

controllers• Storage can be shared between several nodes

– SCSI Bus or SCSI Hub

– FC Switch

• Each node can access the storage directly

Page 142: VMS Clusters: Advanced Concepts

Fibre Channel and SCSIin Clusters

• Fibre Channel or SCSI are Storage-Only Interconnects– Cannot carry SCS protocol

• e.g. Connection Manager and Lock Manager traffic

– Need SCS-capable Cluster Interconnect also• Memory Channel, CI, DSSI, FDDI, Ethernet, or

Galaxy Shared Memory C.I. (SMCI)

Page 143: VMS Clusters: Advanced Concepts

SCSI and Fibre Channel Notes

• Warning: Features and restrictions are still changing at a rapid rate:– Check the OpenVMS web site for the latest

information, and the status of relevant ECO kits

Page 144: VMS Clusters: Advanced Concepts

SCSI Device Naming

• Earlier, Node Allocation Class or Port Allocation Class were used

• Now, HSZ Allocation Class can also be used

Page 145: VMS Clusters: Advanced Concepts

SCSI and FC Notes

• Fail-over between a direct path and an MSCP-served path is not supported

Page 146: VMS Clusters: Advanced Concepts

Direct vs. MSCP-Served Paths

Node Node

SCSI Hub

Page 147: VMS Clusters: Advanced Concepts

Direct vs. MSCP-Served Paths

Node Node

SCSI Hub

Page 148: VMS Clusters: Advanced Concepts

Direct vs. MSCP-Served Paths

Node

FC Switch

Node

FC Switch

Shadowset

Page 149: VMS Clusters: Advanced Concepts

Direct vs. MSCP-Served Paths

Node

FC Switch

Node

FC Switch

Shadowset

Page 150: VMS Clusters: Advanced Concepts

Fibre Channel Notes

• After FC switch, cable, or disk virtual unit reconfiguration, the documentation recommends you do:SYSMAN> IO SCSI_PATH_VERIFY andSYSMAN> IO AUTOCONFIGURE– This is unlike CI or DSSI, where path changes

are detected and new devices appear automatically

Page 151: VMS Clusters: Advanced Concepts

Fibre Channel Notes

• Volume Shadowing Mini-Merges are not supported for SCSI, or Fibre Channel (yet)– Full Merge will occur after any node crashes

while it has a shadowset mounted

Page 152: VMS Clusters: Advanced Concepts

New Failure Scenario:SCS link OK but FC link broken

Node

FC Switch

Node

FC Switch

Shadowset

Page 153: VMS Clusters: Advanced Concepts

New Failure Scenario:SCS link OK but FC link broken

• With inter-site FC link broken, Volume Shadowing doesn’t know which member to remove– Node on left wants to remove right-hand disk, and vice-

versa

• New DCL commands allow you to force removal of one member manually– Assumes you are logged in already, or SYSUAF, etc.

aren’t on a cross-site shadowed disk– “ICKI” solution

• Solved by direct-to-MSCP-served failover

Page 154: VMS Clusters: Advanced Concepts

Data Replication Manager

Node

FC Switch

Node

FC Switch

Mirrorset

HSG80 HSG80

Page 155: VMS Clusters: Advanced Concepts

Data Replication Manager

• Multi-site mirroring between HSG80 controllers

• Host writes to one HSG80– That HSG80 sends data to the HSG80 at the

opposite site

Page 156: VMS Clusters: Advanced Concepts

Data Replication Manager

Node

FC Switch

Node

FC Switch

Mirrorset

HSG80 HSG80

Controller in charge of mirrorset:

Write

Write

Page 157: VMS Clusters: Advanced Concepts

Data Replication Manager

• Since HSG80 controller coordinates writes to a mirrorset:

• Mirrorset is visible on and can only be accessed through one HSG80 at a time

Page 158: VMS Clusters: Advanced Concepts

Data Replication Manager

• Nodes at opposite site from controller in charge of mirrorset will have to do remote I/Os, both for writes and reads– Storage Engineering plans the ability to access

data simultaneously from both sites in a future “DRM II” implementation

Page 159: VMS Clusters: Advanced Concepts

Data Replication Manager

Node

FC Switch

Node

FC Switch

Mirrorset

HSG80 HSG80

Controller in charge of mirrorset:

Write

Write

Page 160: VMS Clusters: Advanced Concepts

Data Replication Manager

Node

FC Switch

Node

FC Switch

Mirrorset

HSG80 HSG80

Nodes must now switch to access data through this controller

Page 161: VMS Clusters: Advanced Concepts

Data Replication Manager

• Because data can be accessed only through one controller at a time; failover is messy and involves manual or, at best, scripted changes

Page 162: VMS Clusters: Advanced Concepts

Speaker Contact Info

Keith ParrisE-mail: [email protected] or

[email protected]: http://www.geocities.com/keithparris/ and

http://encompasserve.org/~kparris/

Integrity Computing, Inc.2812 Preakness WayColorado Springs, CO 80916-4375(719) 392-6696