14vcs6unixclustercomm Trans

Veritas Cluster Server 6.0 for UNIX: Install and Configure Lesson 13: Cluster Communications

Lesson 1: High Availability ConceptsLesson 2: VCS Building BlocksLesson 3: Preparing a Site for VCS Lesson 4: Installing VCSLesson 5: VCS OperationsLesson 6: VCS Configuration MethodsLesson 7: Preparing Services for VCSLesson 8: Online ConfigurationLesson 9: Offline ConfigurationLesson 10: Configuring NotificationLesson 11: Handling Resource FaultsLesson 12: Intelligent Monitoring FrameworkLesson 13: Cluster CommunicationsLesson 14: Protecting Data Using SCSI 3-Based FencingLesson 15: Coordination Point ServerLesson introduction

Lesson objectives

TopicObjectivesVCS communications reviewDescribe how components communicate in a VCS environment.Cluster membershipDescribe how VCS determines cluster membership.Cluster interconnect configurationDescribe the files that specify the cluster interconnect configuration.Joining the cluster membershipDescribe how systems join the cluster membership.Changing the interconnect configurationChange the cluster interconnect configuration.

After completing this topic, you will be able to describe how components communicate in a VCS environment.VCS communications review

On-node and off-node communicationLLT sends a heartbeat on each interface every second.LLT forwards the heartbeat status of each node to GAB.AgentsEach LLT module tracks the status of heartbeats from each peer on each interface.

Cluster interconnect specifications

After completing this topic, you will be able to describe how VCS determines cluster membership.Cluster membership

# gabconfig -aGAB Port Memberships===============================================Port a gen f7c001 membership 01;;12Port b gen f7c004 membership 01;;12Port h gen f7c002 membership 01;;12GAB status and membership notationNodes 0 and 110s placeholder(0 displayed if node 10is a member)Nodes 21 and 2220s placeholderHAD is communicatingGAB is communicatingCluster with four nodes: 0, 1, 21, 22 Port a gen a4e095 membership 0123456789012345678901Port b gen a4e098 membership 0123456789012345678901Port h gen a4e096 membership 0123456789012345678901Fencing is communicating

LLT link statuss1# lltstat nvv activeLLT node information: Node State Link Status Address * 0 s1 OPEN dev1 UP 00:0C:29:97:FB:5D dev2 UP 00:0C:29:97:FB:67 1 s2 OPEN dev1 UP 00:0C:29:C4:1D:0A dev2 UP 00:0C:29:C4:1D:14

* Shows which system runs the commandlltconfig: Status of LLT protocollltstat n[vv] [active | configured]-nvv: Status of nodes, with very verbose outputactive: Status of only active nodesconfigured: Status of all nodes-

# lltconfigllt is running

After completing this topic, you will be able to describe the files that specify the cluster interconnect configuration.Cluster interconnect configuration

Assigns node numbers to systemsSets the cluster ID number to identify a system with a clusterSpecifies the network devices used for the cluster interconnectModifies default LLT behavior, such as heartbeat frequencyThe llttab fileset-cluster 10set-node s1link nxge0 /dev/nxge:0 - ether - -link nxge4 /dev/nxge:4 - ether - - /etc/llttab

The llttab file on AIX

# cat /etc/llttabset-node S1set-cluster 10link en1 /dev/en:1 - ether - -link en2 /dev/en:2 - ether - -

Device:UnitLink TypeMTUTag NameRange (all)SAP

The llttab file on HP-UX

# cat /etc/llttabset-node S1set-cluster 10link lan1 /dev/lan:1 - ether - -link lan2 /dev/lan:2 - ether - -

Range (all)SAP

The llttab file on Linux

# cat /etc/llttabset-node S1set-cluster 10link eth1 eth1 ether - link eth2 eth-00:05:56:3f:02:2f ether 0xCAFD

DeviceLink TypeMTUTag NameRange (all)SAP

The llthosts fileAssociates a system name with a VCS cluster node IDHas the same entries on all systemsMaps unique node numbers to system namesMatches system names with llttab and main.cfMatches system names with sysname0 s1 1 s2

/etc/llttabset-node s1set-cluster 10link qfe0 /dev/qfe:0 - ether - -link qfe4 /dev/qfe:4 - ether - -

0 s11 s2

How node and cluster numbers are specified0 to 630 to 64k -1

Contains a short-form system name used by VCSRemoves VCS dependency on UNIX unameIs optional, but created by default during cluster configuration using CPICan be specified in llttab so the same file can be copied to multiple nodesThe sysname file

The GAB configuration fileThe /etc/gabtab file:Contains the command to start GAB:/sbin/gabconfig -c -n number_of_systemsSpecifies the number of systems that must be communicating to allow VCS to start.

/sbin/gabconfig c n 4

After completing this topic, you will be able to describe how systems join the cluster membership and services come online.Cluster startup

Seeding during startupI am aliveI am aliveI am alive

How LLT and GAB are started automatically on AIXS92gabsh /etc/gabtab/sbin/gabconfig c n #S70llt/sbin/lltconfig -c/etc/llttab?S99vcshastart/etc/rc.d/rc2.d

/sbin/rc2.d/S680lltHow LLT and GAB are started automatically on HP-UX/sbin/rc2.d/S920gabsh /etc/gabtab/sbin/gabconfig c n #/sbin/lltconfig -c/etc/llttab?/sbin/rc2.d/S990vcshastart

How LLT and GAB are started automatically on Linux/etc/rc3.d/S67gabsh /etc/gabtab/sbin/gabconfig c n #/etc/rc3.d/S66llt/sbin/lltconfig -c/etc/llttab?/etc/rc3.d/S99vcshastart/etc/rc[2345].d

How LLT and GAB are started automatically on Solaris 10/lib/svc/method/gabsh /etc/gabtab/sbin/gabconfig c n #/lib/svc/method/llt/sbin/lltconfig -c/etc/llttab?/lib/svc/method/vcshastart

Probing resources during normal startupABHAD directs agents to probe (monitor) all resources on all systems in the SystemList to determine their status.If agents successfully probe resources, HAD brings service groups online according to AutoStart and AutoStartList attributes.During startup, HAD autodisables service groups.A, B autodisabled for s1, s2Monitor

After completing this topic, you will be able to describe how VCS responds to common failures.System and cluster interconnect failures

VCS response to system failures3 faults; C started on s1 or s2Regular membership: s1, s2No membership: s3

Failover duration on a system failureDetect the system failure21 seconds for heartbeat timeouts.Select a failover targetless than one second.Bring the service group online on another system in the cluster.

Failover duration

Manual seeding SeedsSeededs3 is down for maintenance. s1 and s2 are rebooted.Manually force GAB to seed on s1: gabconfig -x. LLT starts on s1 and s2. GAB starts but cannot seed with s3 down.Start HAD on s1 and s2.GAB on s2 now seeds because it can detect another seeded system (s1).Warning: Manual seeding can cause split-brain condition.

Interconnect failure and potential split brain conditions1 and s2 determine that s3 is faulted. No jeopardy occurs; no groups are autodisabled.If all systems are in all groups' SystemList, VCS tries to bring them online.s3 determines that s1 and s2 are faulted.

Interconnect failures with a low-priority public linkNo change in membershipRegular membership: s1, s2, s3Jeopardy membership: s3Public network is now used for heartbeat and status

After completing this topic, you will be able to change the cluster interconnect configuration.Changing the interconnect configuration

Manual configuration:Merging clustersChanging parameters persistent across reboots, such as the peer inactive timeoutAutomatic changes by CPI:Adding or removing cluster nodesThe lltconfig command:Changes parameters temporarilyRequires running on each node to affect cluster-wide

Example reconfiguration scenariosset-node s1set-cluster 10. . .set-timer peerinact:1200

Manually modifying the interconnect Start VCS hastop all -forcegabconfig -Ulltconfig -Ush /etc/gabtablltconfig -chastart

path/gabconfig c n #set-node s1set-cluster 10 /etc/llttab

0 s11 s2Start fencingvxfenconfig -Uvxfenconfig chaconf dump -makero

set-node s1set-cluster 10link nxge0 /dev/nxge:0 - ether - -link nxge4 /dev/nxge:4 - ether - - #link-lowpri e1000g0 /dev/e1000g:0 - ether - -

Example LLT link specificationDevice:UnitMTUTag name/etc/llttabLink type

Lesson summaryKey points The cluster interconnect is used for cluster membership and status information.The cluster interconnect configuration may never require modification, but can be altered for site-specific requirements.Reference materialsVeritas Cluster Server Installation GuideVeritas Cluster Server Users Guide

End of Presentation

Transcript: BRAD WILLER: Welcome to the VCS 6.0 Install and Config, Cluster Communications. Author's Original Notes:

Transcript: This is Lesson 13 of 15. Author's Original Notes: This is the 13th lesson of the VCS Install and Configure course.

Transcript: In this lesson we're going to first review the lower level communication that happens between the nodes. Then we're going to look at how does cluster membership occur. Then we're going to look at the cluster interconnects, how we configure them. We'll look at what happens when a node joins the cluster. And then finally we'll look at how do we go about changing the interconnects if we need to. Author's Original Notes:

Transcript: First one is reviewing VCS communications. Author's Original Notes:

Transcript: If you remember, what we've been doing up to this point is looking really at just the VCS level, at the HAD level, service groups, resources, etc. Now we're going to look down below that at the communication. So our stack we had on every node was first LLT, then GAB, then also fencing which we just didn't put on here to simplify it. And then finally HAD and the agents would happen. So from an infrastructure perspective LLT starts up on each node first, opens up the NICs, again the NIC is for the private interconnects, cluster interconnects, different names used. Then GAB starts up. GAB tells LLT to start heartbeating. At that point LLT starts sending heartbeats out across all the interconnects, all the cluster interconnects. By default LLT heartbeats every half a second, but it's not until GAB tells LLT to start heartbeating that that actually occurs. Now, GAB is responsible for forming the cluster or seeding the cluster, so now we have to wait until all the appropriate GABs are talking to each other. In this particular example we have a three node cluster. So once three GABs are talking to each other now we have a cluster, now everything else can start up. And again we would also see fencing start up in the same manner. Author's Original Notes:

Transcript: From our cluster interconnects, remember that's the responsibility of LLT, LLT manages those, we actually support up to eight links in a cluster. We want to see at least two, and remember we want no single points of failure, so we want to see at least two. If you need more than that for whatever reason, you can go up to eight. The links themselves are divided really into two categories. The first is what we refer to as a high-pri link. That is the default. A high-pri link, LLT heartbeats every half a second across it. We send status information, we send configuration changes, basically use that for everything that needs to be communicated between the different nodes in the cluster. You also can set up what's called a low-pri link. What a low-pri link is meant for is to go across your public network. We purposely because we know this is typically going to be your public network, we purposely minimize the traffic that goes across that link. So we only heartbeat every second. We don't send any status information, we don't send any cluster changes information. So we just use the high-pri links for that stuff. We use the low-pri link as just another way to communicate. If all the high-pri links go down for some reason, then one of the low-pri links will automatically, because you could have multiple, automatically would get bumped up to be a high-pri link. Now as far as the configuration goes it will still say it's a low-pri link, but we will treat it as a high-pri link, which means it'll now heartbeat every half a second, status information will go across there, config changes will go across there. As I mentioned it's really meant just for your public -- in a public network. Because we know on your public network the vast majority of traffic should be your application traffic. That's what we want to focus on. We know we have the separate cluster interconnects for cluster traffic. It does no good really to put a low-pri link across one of your cluster interconnects because that's dedicated to the cluster, so you're really wasting that link. So again only on your public interface. Author's Original Notes:

Transcript: Let's now look at cluster membership. Author's Original Notes: This is , .As a reminder, the objective(s) for this topic is/are: .

Transcript: First off let's look at our GAB config -a command, as I mentioned before, probably the second most common command. Hastatus -sum is probably the most common command, GAB config -a the second most. What GAB config -a shows you is it shows what ports have membership. And again they're GAB ports, this is not TCP/IP ports. Concept is similar, but it's GAB ports, and so we use letters for the GAB ports. The common ones you'll see are the three that are shown on the slide here. The first one you'll see is GAB port a. That's always for GAB itself. Remember LLT starts up, then GAB starts up. The appropriate number of GABs have to be talking to each other. If that doesn't occur, you don't have port a membership. Next is port b, which is always I/O fencing. So depending on the version that you have if you don't have I/O fencing configured, you either will not see port b at all, or you'll see port b but it'll be in what's referred to as disabled mode. We'll talk specifically about I/O fencing in a future lesson. And then port h, h is for HAD. So on this slide we actually have membership for all three ports. Now there are other ports, but again with standard VCS these are the only ones you're going to see. If you have Storage Foundation, Cluster File System or our SFRAC product, then you'll see a lot of other ports. The gabconfig -a command not only shows you what ports are present, but it actually shows you what nodes have membership for that particular port. Now the output is a little bit difficult to read so you just have to get used to it. But if you notice it says port a, then it says gen, and then we have a random number there. That's used internally by the code, nothing we can do about it, so we really don't focus on it. But what you will notice is if there's a membership change you'll see that number actually changes. So we refer to it internally as the seeding number. But what we're really focused on is you notice the keyword membership, then we're interested to the information to the right of that. What there is, is there's actually a single byte for every single possible cluster ID. So in -- since we're talking about 6.0 here, remember the range of nodes is 0 to 63 because we could have a 64 node cluster. So there's one position for every possible node number out there. If you see the node number there, that means that node has membership. So for example let's just take the really simplest part if we had just a two node cluster. Normally if you have a two node cluster the node numbers start at 0 and work their way up, so we would have 0 and 1. It actually isn't a requirement. The requirement is that node numbers be unique. So again they have to be in the range of 0 to 63. The installer automatically starts with 0 up. So it's very common that if I had just a two node cluster I'd have node 0 and node 1. So if you look at just that part of the output and pretend the rest isn't there, what that's showing us is that for port a membership, both node 0 is there and node 1 is there. So I know the two GABs are talking to each other. Same thing with port b, I know fencing is talking between the two nodes. And same thing with HAD, I know the two HADS are there. Now let's look at the rest of this. What you'll notice is we have 0, 1, then there's a bunch of spaces, and then a semicolon. So again there's one space, one character position for every single node number. So what that's showing me is node 0 is there, node 1 is there. Node 2 is not there, node 3 is not there, node 4 is not there, all the way -- wherever you see a semicolon, that's a 0-if placeholder. So that first semicolon would be where node 10 is, then you'd have node 11, node 12, etc. The next semicolon is node 20, and then you'll notice we actually do have a 1 and a 2. That corresponds to node 21 and node 22. So in this example four nodes are currently in membership, node 0, node 1, node 21, node 22. Now we purposely did an unusual example like that so that you could see the way the output would look. If I really truly had a four node cluster, most likely my node numbers are going to be 0 to 3. So in my port an output where it says membership, I would say 0, 1, 2, 3. And that's the typical type of thing you'll see, but again this is trying to show you how you tell whether a node is present or not. The other thing to be aware of is just this shows the nodes that are present, so you can extrapolate that nodes that are missing. However from this output I can't tell for sure how many nodes make up this cluster. So for example I see node 0, node 1, node 21, node 22. So this could be just a four node cluster with those four nodes, or this could be a larger cluster with a lot of missing nodes. So I can't explicitly know from this output. But usually what happens is you know what your cluster looks like as far as number of nodes, node numbers, that type of thing. And so when you see this output you can extrapolate as to what's there and what's not there. If you look down at the bottom there is an example of a 22 node cluster with node numbers 0 to 21. So notice it says 0123456789, then it goes 0123456789. So the first 0 is node 0, then you have node 1, node 2, etc. The second, the next 0 is actually where node 10 is, then node 11, node 12. Finally you have node 20, node 21. So it's very condensed output. Takes a little while to get used to it, but sooner or later it's easy to read. Author's Original Notes:

Transcript: Let's look at things from an LLT perspective. So when I'm doing troubleshooting, more often than not you'll actually work your way down the stack. So you'll typically start with hastatus, as I mentioned hastatus -sum. Then if HAD's not running you'll go to gabconfig -a. Then that will tell you what's up. And if you don't have GAB membership at all, more often than not that tells you you've got some kind of an LLT link issue. So there's a couple of different commands. Lltconfig is the way that you can tell the status of the LLT protocol. We'll see a little later it's also the way you can make changes. If I just do lltconfig and hit return, then that tells me as you see on the far right that either LLT is running or it's not running. Now what that means is LLT is either configured to do something or not configured to do something. The kernel module is going to get loaded when the kernel gets loaded. But the way we've always done things is the different kernel modules, like LLT and GAB have commands that are the name of the kernel module config. So for LLT it's lltconfig, for GAB it's gabconfig, for fencing it's vxfenconfig, etc. And then there's two main options to it, a dash lower case c and a dash upper case U. The lower case c says configure yourself, i.e., start doing something. So in order to get lltconfig and hit return where it says LLT is running, at some point an lltconfig -c had to have been executed. And normally that gets done by startup scripts when you boot the box up. If an lltconfig -c hadn't been done or an lltconfig dash upper case U, the upper case U stands for unconfigure yourself, i.e., stop doing something. So then if that had been the case, if I typed lltconfig and hit return, it would say LLT is not running. If I did an appropriate OS command to look at kernel modules, I would see that the LLT kernel module is still loaded. So it's a configuration thing, do I do something, do I not do something. When it comes to LLT the more typical command you're going to use is lltstat. What we're showing here is some of the typical things. First off it's very, very common to say lltstat -nvv. The first n does the node output that you see on the screen. It, instead of just putting a node number, it actually matches the node number to the system name, to the node name. So for example you'll notice it says 0 s1 and 1 s2. So node 0 is system s1, node 1 is system s2. The vv is verbose and very verbose, and it basically gives the rest of the output that you see on the screen. In older releases -nvv was really the only options you had. And what happens is by default this command actually spits out output for all possible nodes whether they actually exist or not. So in 5.1 SP1 and 6.0, we bumped up to 64 nodes as the possible number of nodes. So if I just do an lltstat -nvv and hit return, I'm actually going to get output for all 64 nodes even though the other nodes don't exist. That's just the way the command works. So typically what you would do is -nvv, and pipe that to more or page or head or something like that, so that you just get the output you need. Starting in 5.1 SP1 and of course 6.0 we added in two other keywords, and that was the word active and the word configured. Active just gives output of the nodes that are part of the cluster, so that are talking from an LLT perspective. So you'll notice the output, you do an lltstat -nvv active, and then you just get node 0 and node 1's information and that's it. So no longer do we have to pipe it to more. If you say configured, the difference between active and configured is the active is the nodes that actually are talking from an LLT perspective, configured will show all the known nodes. So for example if this was a four node cluster but in my example right now only two nodes are up and running, so node 0 node 1, active is just going to show those two nodes. If I do configured it would actually show all four nodes, and you would see that for nodes 2 and 3 that they were down. So more often than not you're going to do active, but you could also do configured. On the output you'll notice that it gives a node number and then the state. Usually it'll have the status, that's UP. The key thing is you're going to see the link there, and we're going to see that there are tag names associated with that. Typically those tag names are going to be the actual nicknames, like ether 1, ether 2 if it was Linux for example, or nxge0, nxge1 for example if it was Solaris, that type of idea, so whatever NICs you're using. And then the status, which if they're working they would say is UP, and then the address is the MAC address. When LLT is talking it actually discovers the MAC address. Oh, it obviously knows its own MAC address, but then it discovers the other MAC addresses as they're communicating, and so the output always shows the MAC address. Author's Original Notes:

Transcript: Let's now look at the configuration of the interconnects. Author's Original Notes: This is , .As a reminder, the objective(s) for this topic is/are: .

Transcript: So basically we're going to look at the different LLT and GAB files. From an LLT perspective the first key file is /etc/llttab. In this file, this is the file that has all the information that LLT needs to say who it is, as well as what links to use. If we think back to the installer when we were installing the software and we said yes, we need to configure, the questions we got asked were things like well, what's the name of the cluster, what's the cluster ID, what system names are we using, what links are we using, that type of stuff. So all those questions that the installer asked us get put in different parts, but a big chunk of them go into this file. As you see on the screen there, this is a sample llttab file for Solaris, but the four lines you see there is pretty typical of what you'll see. We call those directives. And so the key directives you'll see is first the set-cluster directive, which specifies what the cluster ID is, and again it got that from the installer. The set-node directive which says who is this node, who am I. Now there's one of three forms that will take. It'll be a hostname like you see there. That's the default that the installer does. In much older releases you could actually see a node number, so in the current context that would be something between 0 and 63. And then you also -- a third format is you actually could put a file name there, and inside the file you would have the name of the host. We'll see that example a little bit later. So that what you see on the slide is the default where we say set-node and then the system name. Then you have your link directives. Now again as we know, we want to see at least two, so we don't have a single point of failure. We could see up to eight there, but that's the typical output. And we'll look at the other platforms in just a second here, and as we talk about the other platforms I'll describe what those other fields are. But the one field I do want to point out right here while we're on the Solaris example is right after the link, that nxge:0 and nxge:4 in this particular example, that's referred to as the tag name. That's what you see in your lltstat output. So best practice is if you can, the tag name should also be the name of the actual NIC you're using. It just makes it so much easier for you when you see the lltstat output. Along that same lines, best practice, you try to use the exact same NICs for all the nodes in your cluster. So on every single node in this cluster we would use nxge:0 and nxge:4 as the two LLT links. Again that's best practice, that really helps you for troubleshooting. Sometimes you can't do that. Sometimes you're given systems you have to use and the first link happens to be nxge:0 on this machine, it happens to be eri:0 on this machine. You know something like that where you have no choice, that's just the cards you've been dealt. But if you have any say, if you think back to the first couple of lessons, we talked about make everything the exact same if you can. So same thing here, use the same LLT links on all the nodes. It really helps from a troubleshooting perspective. Author's Original Notes:

Transcript: Alright, let's take a look at some examples of other platforms. So first off let's look at AIX. What you're going to see in all of these guys is that the output is very very similar, so it's really easy to extrapolate one to the other. Alright, set-node, set-cluster directives are exactly the same across all nodes. What changes is a little bit in the link directives. The key thing that's really going to change is the nicknames and the actual devices that get used. So in the Solaris example we had nxge:0, nxge:4. Here we've got en:1, en:2. Now the next field is the actual name of the NIC, and this is the typically format that we use in VCS, so with LLT. It's /dev/the name of it, colon, the interface or unit explicitly. So again exactly what that will look like is going to be OS specific, but you'll see they're all very very similar. Then the next fields, the - ether - - is the default, and probably 99 times out 100, maybe more than that, that's exactly what you'll see. Now what those fields are is the first is the range of nodes this applies to. A dash means all. So what you can actually do which is quite nice is if you had the scenario I was just describing, where unfortunately you actually had different NICs you had to use and you had a very large cluster, normally what would happen by default with the installer is the installer will create an llttab file that's specific to that one machine. So if I was using the NICs, I'll just do pseudo names, a and b on this machine and I was doing c and d on this machine, then this llttab file would only have a and b. This llttab file on the second node would have c and d. Now what LLT knows is the order they show up is the order the links get used. Okay so in my example a comes before b, so a would be the first link, b would be the second link. Over here, c would be the first link, d would be the second link. And so what you'd have to make sure you did is that's how they'd have to be linked up from a crossover cable perspective or a switch perspective, is so a on this guy would have to talk to c on this guy, b on this guy would have to talk to d on this guy. So by default the installer makes the llttab file unique on every node. One of the things you can do though that's quite nice is you can utilize the file properly to where you use ranges. So if you really did have values, you could have a case where your llttab file actually includes information for all nodes, and on a particular node it'll only use the information relative to it. But the beauty of that is if I had a very large cluster, say I had 12 nodes, and I needed to make a change, I could actually do it in such a way where I only had to change one file on one node, and then I could propagate it out to all the other nodes. So in large clusters it makes it really nice in having to make configuration changes. Usually like I said you don't deal with that range column and so you always see a dash, but that's what it's for. The next column is the link type, and typically that's ether, short for Ethernet. You also could do UDP. I mentioned this in our overview lesson earlier in the lessons. And we typically do Ethernet because LLT is a Level 2 protocol, a very efficient Level 2 protocol, we do not need TCP/IP. Sometimes though there are scenarios where you do need it, and for example if you had a stretch cluster, one node here, one node here, and the networking team put routers between the buildings. Since LLT is a Level 2 protocol we do not use a TCP/IP stack. Therefore it doesn't have an IP address that's not routable. So consequently LLT traffic from this node wouldn't be able to get to the LLT on the other node. In a case like that you actually could put LLT across UDP. Now you're taking a Level 2 protocol and basically putting on top of a TCP/IP stack. In that case instead of saying ether here, it would say UDP. And then the other fields would change because now you have to give IP addresses so that this is routable. Again don't do that unless you absolutely have a scenario like that where you need to. Otherwise let LLT go across Ethernet and be a very efficient low level protocol. Alright, with Ethernet in mind, again the next two are - - typically, the next field is basically the header we use. So the SAP is the actual header. By default we use a OxCAFE, C-A-F-E. What that is, is that's -- if you want to think of that's kind of like the TCP/IP header. So when we send packets out, when LLT sends packets out to another node, it basically starts with that header, so the OxCAFE. Then it has cluster ID node number, and that uniquely identifies that system. So typically the OxCAFE is perfectly fine. Sometimes you'd change that, and it's very rare to do that now. But in older releases when the range of cluster IDs was actually very limited, it used to just be 0 to 255, then if you had that many clusters in the same network space you would have conflicts. Alright because the heartbeats would get confused between two clusters that had the same cluster ID. This doesn't really happen in today's environment because of the fact that the set-cluster directive can be anywhere from 0 to 64K basically. So that's a huge range. But one of the solutions if I did have two clusters that say were cluster ID 200 and I couldn't change the cluster ID of either of them, then if they were in the same network space, the way that I would make sure that the heartbeats didn't get confused is I would change the SAP. So one of the clusters I would on every single link, I'd stay with the default, which is the dash, and that would be OxCAFE. The other one I would change it to say OxCAFF, and I'd have to do that on every single link. So now the headers are different, so now the heartbeats wouldn't get confused. And then the last one is the maximum transmission unit which by default is 1,500 bytes. Usually it doesn't do -- there's no benefit to trying to change that. LLT multiplexes, you know breaks the packets apart, puts them back together efficiently and when it's sending information. And most of the -- of the output that goes between the nodes actually fits within one packet. Author's Original Notes:

Transcript: Alright, so that's what the different fields are, that's an example of AIX. Now let's look at an example with HP-UX, and what you'll notice is it's almost identical. The tag name of course differs because the NICs are different. The actual device differs. Everything else is the same. Author's Original Notes:

Transcript: Moving to Linux, now we have some differences. Again for the most part exactly the same. If you look at the first link directive where it says link eth1, etc., you'll notice that's almost identical to the other platforms. So the tag name is eth1, the actual device is eth1, but what you will notice on Linux, you don't say /dev/eth1. You just put that in, and then a - ether - -. Another very common way though on Linux is to actually put the MAC address in there, and this is actually the default that the installer does. So if you look at link eth2, you'll notice eth2 is the tag name, and then we have eth - and the actual MAC address that that's using. Then the rest is the same, - ether - -. If you notice under SAP there's an example of if I did have to change, what it would look like. So in that particular example the eth2 link would use 0xCAFD as its header instead of the default of CAFF. Author's Original Notes:

Transcript: So that's the llttab file. That's the more complex of the LLT files. The other key file is the /etc/llthosts file. Llthosts is exactly analogous to an /etc/hosts file, but instead of translating hostname to IP address, it translates hostname to node number. This file absolutely should be the same on all nodes. So in this example, we got a two node cluster, system s1 is node 0, system s2 is node 1. This is used by LLT to match up with the llttab file, and it's used by HAD to take the system names that are in main.cf and translate them to node numbers. There's another file that can potentially come into play, and that's the last bullet there is a guy called sysname, and we'll talk about that in a little bit. Author's Original Notes:

Transcript: This slide shows how we map between the two. So again that first example shows an llttab file. And there's the default with the set-node hostname, and again that's what the installer does. So when LLT starts up it reads llttab, sees hostname s1, then goes to llthosts to translate that into a node number, and it's as simple as that. And again same deal when HAD starts up when it translates, when it's going through the main.cf file, it comes to llthosts to translate to node numbers as well. Author's Original Notes:

Transcript: The third of the three potential lltconfig files is called sysname. If you look towards the bottom of the slide you'll notice where that file is located. It's not in /etc like llttab and llthosts is. It's actually located in /etc/VRTSvcs/conf and it's called sysname. Now again also notice it's not conf/config. That's where main.cf, types.cf/etc are located. So it's actually one directory up. The purpose of the sysname file is to hold the short version of the hostname. So what I'm getting at is in older versions of VCS, HAD did not like a fully-qualified hostname. So if HAD did a uname command to say who am I and the hostname came back as s1.symantec.com for example, HAD didn't like that. So that was the case where you had to have a sysname file. So instead of relying on the uname command, HAD would just look at sysname and go oh, this is who I am. So that's what its purpose was. So if you're -- if you do a uname command and it actually returns the short version, so just s1 for example, which is pretty common, you don't actually need this file. It's an optional file. The one little gotcha is the installer automatically creates this file. So for better or for worse it will automatically create llttab, llthosts and sysname. So that file would be out there. If you don't have a fully-qualified hostname with your uname command, you could actually delete it. It's completely up to you actually. But realize it's out there because that will come into play if at some point you were to change your system name. You actually would have to make sure you either delete that file or change that file, because otherwise HAD will always use that as who it is. Now if you remember I mentioned that in the llttab file, the set-node directive had one of three choices, hostname which is the default and the common one the installer does, putting a number in there which was the much older way that we used to do it, or you actually could put a file name in there. So what this slide is showing you is an example of actually using the sysname file and putting a fully-qualified hostname in there -- sorry, file name in there. The beauty of this is actually twofold. Number one, it reminds you that that file is out there, if it is. And number two, remember I was talking about if I had a really large cluster it'd be nice if I could have an llttab file that was actually the same on all nodes, had all the information I needed in it? And then if I needed to make a change I only had to make a change on the one node and then I could propagate to all the other nodes. Well, in the simplest version of the file, which is what it's showing here where there's those four lines, the set-node, set-cluster and a couple of link directives, the set-node directive by default is always going to be different because it usually has a hostname in there. If you change that and put a file name instead, now your file is the same on all nodes again, again assuming that your link directives are all the same. So you're using the same NICs on all nodes. Or as I mentioned, you know there is a way that through that first dash in the - ether - - that you could say what nodes are using what links. So again not a requirement you do this. I wouldn't recommend you go change your environment from what it is. But if at some point you are going to make a change or you have a very large cluster and you want to take advantage of something like this to help ease or automate changes for yourself, then this is what you could do. Author's Original Notes:

Transcript: That's the llttab -- lltconfig files. Now let's look at it from a GAB perspective. There's one file from a GAB perspective, and we consider it a config file, but it's kind of a, almost like an executable as well. It isn't an executable. It's a read-only file, read/write for a roots perspective. But it's the /etc/gabtab file and it has one line in it. Typically it says what you see on the slide there, gabconfig -c -n, and the number of nodes that make up the cluster. By default that should always be the number of nodes that make up the cluster, not the current number of nodes there, the total number of nodes that typically are there. So if this is a four node cluster it should look like it shows on the slide, gabconfig -c -n 4. The -c says configure yourself, i.e., start up, start doing something. The -n 4 says there have to be four GABs talking to each other before we form a cluster, before we seed the cluster. Now, the reason it should always equal the number of nodes is that's a split brain protection mechanism. So it ensures that all of the appropriate nodes are there at startup time. If that isn't the case, we'll talk about that a little later. It's not so important if you have I/O fencing. But if you don't have I/O fencing it's absolutely critical that always equal the number of nodes that are in the cluster. So best practice, keep it as the number of nodes. Now there are other options to this. If you do a man and gabconfig you'll see the other options that potentially could be in there, but this is almost always what the gabtab file looks like. Author's Original Notes:

Transcript: Okay, so that's the different config files. Now we're going to look at how do we -- what happens at startup time. Author's Original Notes: This is , .As a reminder, the objective(s) for this topic is/are: .

Transcript: In this slide we're basically going through the startup process. Now the way I want to talk about this slide is let's assuming our cluster is completely down and we're booting the boxes up. Now what I'm going to do for sakes of this discussion is purposely boot just one box at a time. So, first off we boot up s1. Kernel gets loaded, the different kernel modules get loaded. But it's not until run control starts up that these actually start doing something. So run control starts kicking off. LLT gets started up first. LLT reads the various config files like llttab, opens up the appropriate NICs. Doesn't do anything else, just opens the appropriate NICs. Then GAB starts up. GAB reads gabtab and says oh, there should be three of us. GAB tells LLT to start heartbeating. So now LLT is heartbeating out the private interconnects. At this point, that's as far as we get. Now from a run control perspective HAD is actually going to start up, because at some point the RC script is going to get executed. However HAD can't do anything until there's port a membership, i.e., until GAB says it's okay. And in this context also we're going to pretend fencing isn't there, just to keep it simple. Alright, so we've got things starting up but basically nothing is there. If I were to do a gabconfig -a I would actually see nothing. I would just get the header output because I have no port a membership yet, because gabtab says there should be three of us. If I do an hastatus -sum I'm going to get can't connect to HAD. Because even if the daemon is out there it can't do anything yet. I will get output if I look at it from an lltstat perspective. Because I will see that the NICs are open and I actually could see that heartbeats are being sent. Alright, so we've got one system up. Now we're going to come along and we're going to boot up s2, same basic process. LLT is going to start up. Alright GAB is going to start up, tell LLT to start heartbeating. At this point s1 and s2 are now talking to each other. So the LLTs are going to tell GAB now the two GABs are talking to each other. However if I do gabconfig -a I'm still not going to get any output, because gabtab said there has to be three. Same deal, HAD will start up, can't do anything. Finally we boot up the third box. LLT starts up, GAB starts up, GAB tells LLT to start heartbeating. Now all three are talking to each other and now we've got three GABs talking to each other. At that point if we do a gabconfig -a, we'll see port a membership for node 0, 1 and 2, assuming those are the three node numbers. Okay, now that we have GAB membership, again we're ignoring fencing right now, now the HADs can start up. At this point it's a race to see who gets there first. Remember only one of the HADs can do a local build. Odds are it's going to be either s1 or s2, because those HADs have already kicked off, they're waiting for GAB. On s3 though the run control hasn't happened to yet to where HAD got kicked off. And now again it completely depends on who gets there first, but odds are it'll be s1 or s2. One of those guys gets there first. That guy does a local build, reads the main.cf in the memory, and then the other guys pull it across. Now the three HADs are talking to each other. If we were to a gabconfig -a we would see port h membership with the three HADs. Author's Original Notes:

Transcript: Here's how the different -- here's the different run control associated with this. Alright so quickly I'll go through these, first off on AIX. So again the actual run control environment is very platform specific. All of them are C directories except for Solaris 10. Solaris 10 uses SMF and VCS takes part of that -- takes part in that. But basically the run control is when LLT starts up it makes sure that the llttab file is there. If it is, then it goes ahead and does an lltconfig -c, i.e., start yourself up, configure yourself. Same deal with GAB starts up. If gabtab exists it sources the /etc/gabtab file. Because again /etc/gabtab even though inside it as it shows on the bottom of the slide it'll say gabconfig -c -n whatever, that's not an executable file, so we actually have to source it. Author's Original Notes:

Transcript: For HP-UX, same deal, lltconfig -c, source /etc/gabtab file. That gets us LLT, gives us port a membership, and of course HAD can start up. Author's Original Notes:

Transcript: Linux same exact deal, lltconfig -c, source the /etc/gabtab file. Author's Original Notes:

Transcript: And finally Solaris, Solaris 10, the only one you'll notice that's different. And again that's because Solaris 10 went away from traditional run control and went to what it refers to as SMF. And so instead of for example /etc/init.d, it's in /lib/svc/method, but same exact deal. There's an LLT1 that does an lltconfig -c. There's a GAB1 that sources the /etc/gabtab file. Author's Original Notes:

Transcript: Alright, now what I talked about with HAD was at initial startup time, right, is one of them reads in the local main.cf file, does a local build. The other guys do a remote build, i.e., they pull across, load into memory. If it's different than what's out on disk, they write out a new main.cf file. The old main.cf file becomes main.cf.previous. Once that happens and all the appropriate HADs are talking to each other, and it's actually, interestingly enough, it doesn't have to be all HADs. So if this was a four node cluster, typically HAD wouldn't start up until all four HADs are talking to each other. But what it actually is, it has to be all HADs that have port a membership. So we'll get to in a minute how you could have a scenario where you have less than the number of nodes. But let's say we typically have four, but one of the guys had crashed or couldn't come up. So we went ahead and started GAB with just these three nodes. When it comes to HAD starting up, all we really need is those three nodes to be able to proceed. Normally it'd be all four, but in this case we only have three. And again what it boils down to is whatever nodes have port a membership, those are the nodes that have to have port h membership in order for us to be able to start up service groups. Now the way we do that is once we've gone through the local build/remote build thing and HAD now knows what all the service groups are, what systems they can run on, because remember it's not a requirement that a service group run on all systems. So HAD has to process the main.cf file to figure out SystemList. When it goes to start all these up, by default what happens is every single service group gets autodisabled. And you actually can see this when you do an hastatus -sum. There's a autodisabled field, and initially it'll have a Y in there for yes. But when we finish processing what's on this slide, that'll change to a no, and things'll be able to start up. What autodisabled means, autodisabled is kind of like frozen in that you can't online, can't offline, can't switch, can't do anything. So VCS automatically does that. Then what it does is it issues probes, i.e., monitors for every single resource for every single system in the SystemList. All those probes have to come back for the systems that have port a membership before HAD says alright, I'll autoenable this service group. So what it basically boils down to is HAD needs to know what the current startup state is. If we're booting boxes, obviously none of the applications are running, so all the probes are going to come back offline. But if you think about it, we've mentioned that the most common way to bring HAD down is hastop -all -force. That forces HAD down, leaves the applications running. So if you did that, at hastart time when we do all these probes, all the monitors are going to come back as online. So at that point HAD goes oh, it's already running, I'm done. If all the monitors come back as offline, HAD will autoenable the service group. And then if AutoStart is enabled, which is the default, it'll go ahead and autostart the service group. And now remember if you'll notice in step 3 there, not only is it the AutoStart attribute which is either equal to 1 or 0, 1 is the default. But remember we talked about that SystemList was required, AutoStartList wasn't. It's an optional attribute. Most customers consider it required. If you don't set AutoStartList we don't bother to autostart, because there's nothing to tell us where to start it up at. If it is set, we'll go ahead and bring it up on one of those nodes in the AutoStartList if it's an active/passive failover service group. If it's a parallel service group, we'll actually bring it up on all nodes in the AutoStartList. Author's Original Notes:

Transcript: So that's how we initially bring things up. Now let's take a look at what happens if we have any kind of failures. Author's Original Notes: Jade: I made the arrow head a little bigger.This is , .As a reminder, the objective(s) for this topic is/are: .

Transcript: The normal response to a failure, so we have a system crash. In this example service group A was running on s1, service group B was running on s2 and service group C was running on s3. Now prior to the crash all the nodes were talking to each other. So we had GAB membership for all three, which of course meant we had LLT membership and LLT communication. We had possibly fencing if that was in place. We had HAD membership for all three. Hastatus -sum showed all three systems, the whole bit. Now system C crashes. The way that we figure that out is actually by LLT. Remember LLT is heartbeating every half a second by default. So what happens is the heartbeats stop from node s3, so the other nodes time the LLT links out. Once all the links time out that system is considered to have been failed. And we'll get into later lessons, this is one of the key reasons why we have I/O fencing. It will ensure that the system is actually dead in the water and not a split brain scenario, and we'll get into that. But let's assume that it really did crash. Now HAD will get told hey, that system's gone, so then HAD will do failover. And so it'll take service group C, look at SystemList, figure out what node to failover to, and bring it up on one of the other systems in the SystemList. So in this particular example it would bring it up on either s1 or s2, depending on what the SystemList looked like. Author's Original Notes:

Transcript: How long that takes to do, this is the default. First off detecting the failure by default is 21 seconds. That's broken into two pieces. The LLT timeout is the major part of that. By default, that's 16 seconds. So what that means is in the last example systems s1 and s2, LLT on both those guys waited until the links timed out for 16 consecutive seconds. Which if you think about it, that means 32 heartbeats because we heartbeat by default every half a second. So 32 heartbeats, 16 seconds occurred. At that point, the link gets timed out. Now LLT tells GAB. Now what GAB -- what the GABs do is they do -- they actually wait five seconds. That's where the 21 total come into play. That five seconds is what we refer to as GAB stable timeout. So GAB gets told about it and so GAB decides I'm going to wait five seconds, make sure it's really true. Okay, five seconds have done. Now we're going to start failover, which normally would then kick into fencing. Let's assume that all happened. Then HAD's going to get told hey, that system is no longer there, you need to do failover. So now HAD for each service group is going to look at SystemList, figure out a failover target, just like we've talked about before, and then bring that service group up on that system. And again how long that takes completely depends on how long the application takes. So the only configurable part here really is that 21 seconds. In the llttab file there's a directive where you could change how long that takes. And also in the gabtab file there's an option to the gabconfig command that you could change that five seconds stable timeout. Normally we encourage you to stay with the defaults. But again once you really understand VCS if you decide you know, I've got crossover cables between my two nodes, I really don't want to wait 16 seconds for it to figure out that the system crashed, you could do that, but we want you to stay with the defaults to start with. Author's Original Notes: In the case of a system failure, service group failover time is the sum of the duration of each of these tasks.

Transcript: Well let's go back to the scenario where I was manually booting the boxes and they all came up and of course everything started up. Well what if they don't all come up? So we have a site-wide power outage. We had to shut all three nodes down over the weekend. Sunday night comes along, power outage is no longer a case. And so now we boot up the boxes, and we would just boot all three at the same time. Well it's been a year since we've done that, and all of a sudden we start smelling smoke coming from s3. And we look over there and oops, that guy's a little bonfire. Eh, he's not coming back any time soon. In a scenario like that, by default the cluster's not coming up. Because remember gabtab said that there had to be three GABs talking to each other. So s1 and s2 are up, but the two GABs are talking to each other, but they're waiting for a third. They will wait, by default, they will wait indefinitely. So if you have this scenario your override is gabconfig -x. That's step 3 on this slide. So the first thing you really want to do is you want to validate that s3 is truly dead in the water. Because if you don't do that and it turns out that s3 actually was up but you had a split brain scenario and you weren't using I/O fencing, if you use gabconfig -x you could actually shoot yourself in the foot. So validate the node is down. Then you do a gabconfig -x. It does not matter what node you do it on. What gabconfig -x means is go ahead and form a cluster with however many nodes you have. So literally I could have a 64 node cluster and try to boot all 64 of my nodes. And I'm having a really bad day and 62 of them die a horrible death, they don't come up. I've only got two nodes that have actually come up, I could still do a gabconfig -x and now I've got a two node cluster. You actually could do a gabconfig -x on a one node cluster, obviously not much of a cluster but at least you're up and running. So now you've got port a membership. Now fencing would come up, eventually HAD would come up, and then HAD would go through exactly the same thing. Here's the scenario that I was describing earlier. Remember I mentioned that the probes had to come back when we did the initial monitors and autodisabled the service groups? The probes had to come back for all nodes that had port a membership. So in this scenario, remember HAD will know about s1, s2 and s3, so it actually will issue probes for all three systems. But the only ones that have to come back are s1 and s2 because they're the two nodes that have port a membership. Once those come back, now HAD will autoenable the service groups and then start them up if appropriate. When you look at your hastatus -sum in this particular example, you'll actually see a bunch of monitors sitting out there, and it'll be resources waiting to be probed. And it'll be all the resources but for system s3 because that guy doesn't have port a membership. This process that I just described is called manually seeding the cluster. This is your override if a system doesn't come up. This only applies at initial startup time. Once you have a cluster none of this applies. So once you've formed a cluster, nodes can come and go at their leisure. So again if I had a 64 node cluster, I could have nodes crash left and right on me. I could go all the way down to a two node cluster, I still have port a membership. When I reboot one of those boxes, it would come up and immediately join in. So it's only at initial startup when we don't have port a membership where this manual seeding or the seeding in general has to come into play, and that's where the number of GABs have to match. Author's Original Notes:

Transcript: Let's look at a couple of other scenarios. This is the bad one. This is the split brain scenario. So in this example I've got a three node cluster. For some reason I lose all of the links between s1 and 2 and s3. So perhaps maybe this is a stretch cluster. Maybe s1 and s2 are in one building, s3 is in another building, and all communication between the two nodes has gone down. So we've got a two node minicluster on the left hand side, s1 and s2, and we have a one node minicluster on s3. This is bad news. This is the concept of split brain. So s1 and s2 think s3 crashed. So once the links time out it brings up service group C, because that's what was running over on s3. On the flipside, s3 thinks s1 and s2 crashed. So once the links time out, it brings up A and B. Murphy's Law says this will work. Now we've got service group A running on both sides, doesn't know about it. And of course there's just one database for example, and so the two trash the data in the database. This is exactly what we need to avoid. This is exactly why we have I/O fencing, and we'll talk about that in a future lesson. Author's Original Notes:

Transcript: The scenario I just described, this is one of the nice reasons about adding a low-pri link. A low-pri link remember is going across your public network. It adds another level of redundancy at really no cost. If you remember, by default all we do is heartbeat every second on a low-pri link and that's it. So the amount of traffic on the public network is absolutely minimal. In most cases that's all it's ever going to be. If we had a scenario like that last slide where for some reason we lost all the high-pri links, in this particular example we still have a cluster, so I/O fencing doesn't have to kick in. Or if we don't have I/O fencing, we don't have to worry about split brain, because now s1 and s2 and s3 are still talking across the low-pri link. If you'll look at the slide, s3 is down to a single link. We call that scenario Jeopardy. And if I do a gabconfig -a I'll actually see that it says that s3, which in this example is probably node 2, it will say that node 2 is in Jeopardy. All Jeopardy means is you're down to a single LLT link. With I/O fencing, it means nothing more than that. If you don't have I/O fencing, now it means more than that. But again our best practice and where we're going to focus in this class is setting up I/O fencing. That's coming in a later -- in the next lesson. Author's Original Notes:

Transcript: Last thing we really want to talk about here is if we need to change these files, how do we do it. Author's Original Notes: Jade: I made the arrow head a little bigger.This is , .As a reminder, the objective(s) for this topic is/are: .

Transcript: First off here's a couple examples. Might have to change them if for example we're adding a node to a cluster, or as it's showing here maybe we're actually going to merge two clusters. So instead of having two two node clusters, we decide to have one four node cluster. It can very easily be done. Probably we could use this to change different parameters like it shows here. The peerinact is for example how you would narrow down that 16-second window I talked about, you know the 16-second timeout for LLT. In this example it's narrowing it down to just 12 seconds. So you could add other LLT directives. Or a very common usage for this is to add a low-pri link across the public network or perhaps one of your NICs died and you need to swap it out and use a different one. So things like that where you're changing your cluster interconnects is probably the most common reason to do this. So you could manually change it. That's what we're going to show you, it's a really easy process. There's some cases where you can do it through the installer, and the lltconfig command actually allows you to do it as well. The lltconfig command allows you to do it on the fly. When you have other products like SFCFS or SFRAC, that becomes important. In a normal VCS environment though, you'll see the process involved is pretty easy. And from my own experience this was a big oh wow for me. Because other cluster products I had used and was familiar with, if I needed to do something like this where I needed to make a low level change, I had to have a complete outage, whereas with VCS you don't. And that was a huge oh wow that I could change the low level cluster and the application folks never knew, because they were happily doing their thing. Author's Original Notes:

Transcript: Here's the process. This slide looks kind of complicated, but it really isn't. So, what we're doing is we're modifying either the LLT or GAB levels typically. We actually could do this, it would have to be a little bit differently than what's shown on this slide, but we actually could do this for the fencing level as well. You don't have to do this as we know for the HAD level. We would do the hastop -all -force if we were doing the offline config method, but we just would be bouncing HAD. Now, a couple of things, number one keep it simple. Don't make a ton of changes. In other words don't change HAD -- don't change main.cf, GAB and LLT all in one fell swoop. Could you do that? Yes.? Do you want to do it? In my opinion, no, because what happens, the more you change, the bigger your window of possible failure is. So keep it simple. So really you're going to focus down to just changing LLT and/or changing GAB. Changing the two together, that's not that big a deal because GAB is very simple relative to LLT changes. But I wouldn't add any additional changes up the stack with this. Do them independent of that. What the slide is showing you is snippets of the different files. So again let's review. The gabtab file usually is the same on all nodes. Again it doesn't have to be, so you really want to look at it, but usually it is. So what that means is if I'm changing it, I actually could change it on one node and probably just propagate it to all the other nodes, do a modify on one node and then SCP it to all the others. Llthosts absolutely should be the same on all nodes. So I edit it on one if I need to and move it to all the others. Llttab again by default is always going to be different. That set-node directive will be different, if nothing else. We talked earlier how we could change that from set-node and a hostname to set-node and a fully-qualified file name. If that were the case and all the other directives were the same, same deal. We edit it on one machine, propagate it to all the others. So you have to take a look and really understand your own environment as to how you go about modifying these. But your first step is to go ahead, and always a good idea to make a backup, and then go ahead and edit those files. Now, one thing that's very different with LLT and GAB versus VCS, versus HAD. Remember when we talked about the offline config method you actually copied all your *.cf files into a separate directory and did your changes there, because HAD writes out to main.cf. LLT and GAB don't do that. The only time they look at these files is at initial startup time. So once, like I said, once you have port a membership, GAB never touches the gabtab file, LLT never touches the llttab or llthosts files. So what that means is you could edit the files directly. Definitely make a backup though, always a good thing to do. Do that, now you're ready to go. So you've made your changes on all your nodes, now do your haconf -dump -makero, again because you're going to be bouncing HAD because we're going to take the entire stack down. So make sure whatever's in memory is saved to disk. Then you're going to do your hastop -all -force, force HAD down, leave the applications running. No reason to bring any of the apps down. Remember ClusterService always goes down. But as I mentioned to you in an earlier lesson don't ever put any of your stuff in ClusterService, so that's no big deal. When we start so at high level what we're going to do is take the stack down on all nodes, we have our new files, and then we're going to bring it all right back up. Now, the best easiest way to make sure you don't have any timing issues is what you really have to do is you have to think of it, you're not going to just bring one system down. Remember we're dealing with a cluster here. So what you really need to think about is think of it more horizontally, i.e., across nodes rather than vertically on one node. So what I mean by that is we have a HAD stack on all nodes, we have fencing on all nodes, we have GAB on all nodes and we have LLT on all nodes. So the best, easiest way that will make sure you don't have any kind of timing issues is we would bring HAD completely down on all nodes, verify it. Then we would bring GAB on all nodes, verify it -- sorry, fencing, verify it, GAB on all nodes, verify, LLT on all nodes, verify, and then reverse it and bring everything up. So I'm a big proponent of doing something then validating it or verifying it before you move on. So we first start with our HAD level. That we can do in one fell swoop, right, with your hastop -all -force. So then we do gabconfig -a, make sure port h disappears on all nodes. Now we're going to stop fencing. If you remember, I mentioned that all of the kernel modules have a name of the kernel module config command, and there's a lower case c that says configure yourself, i.e., start yourself up. There's an upper case U that says unconfigure yourself, shut yourself down. The kernel modules do not get unloaded. They stay in the kernel. They just stop doing anything. So that's what we're showing on this slide. So to stop fencing on all our nodes -- okay, let's say we had a three node cluster. On all three nodes we would do a vxfenconfig -U. When we talk more about fencing in the coming lessons you'll see there actually are other ways to do this. But I find this is the easiest way to do it, especially since we're not changing anything about fencing. So I do a vxfenconfig -U on all three nodes. Then I validate, do my gabconfig -a, I should no longer see port b membership. Now I'm going to stop GAB, gabconfig -U on all three nodes. Once I do that, I do my gabconfig -a, I should not see port a membership on any of the nodes. Now lastly I have to stop LLT. And by the way how far down the stack you have to go depends on what you're changing. For example if all we were changing was GAB, we actually would not need to shut LLT down, because we just shut GAB down, and then we could come back up. But most likely if you do this you're going to be changing something about LLT. So now we do an lltconfig -U, again on all three nodes in my example, make sure they're shut down, and then I would validate. So I could do an lltconfig command and I should get the LLT is not running. Or I could do an lltstat -nvv and get no output basically, or something to validate it. My stack is now completely down. I've already made my changes, and so now I'm just going to bring the stack completely right back up. So it's really a quick bounce, and you can actually hammer this out very fast. So now I bring LLT up on all nodes, so lltconfig -c on all three of my nodes, validate. Now, when I do my lltstat -nvv at that point, what I'm going to see is the NICs are open but none of the LLTs are talking to each other. Because if you remember what I said earlier, LLT opens the NICs up, doesn't do anything until GAB tells it to. So I'm not actually going to see heartbeats go out, therefore the LLTs are not going to talking to each other until GAB starts up. But what I can verify here with my lltstat command is for example, if I added another NIC. Let's say I had two cluster interconnects and now I added a third, maybe a low-pri. When I do my lltstat I should see all three of those. If I don't, I know I made a mistake in my llttab file. Now I've got my LLTs up. Now I bring my GABs up. Now notice I don't do a gabconfig -c. What I really want to do is I want to source the /etc/gabtab file. That's especially important if you actually modified that file, because that will validate that the file is good. The gabconfig -- or sorry, the lltconfig -c actually validated that the llttab file was good, because otherwise you'd get an error as that came up. Alright, I'd source my gabtab on all three nodes. That's going to start GAB up. Now GABs can tell LLT to start heartbeating. Now I can do a gabconfig -a, and I should see port a membership. Now I start fencing up. Again there's multiple ways to do that. We're not making a change to fencing, so this is probably the easiest way, just vxfenconfig -c. Same deal, once fencing starts up, gabconfig -a, I should see port b membership. Now I can go ahead and do my hastarts. In this context since I did not change anything about HAD, I didn't change main.cf, I can just go ahead and do hastart, hastart, hastart. It doesn't matter which node comes up first, because remember I did my haconf -dump before we brought down, so I knew all the main.cfs were the same. HAD comes up, I can do a gabconfig -a, see port h membership. I should now be able to hastatus -sum and see that everything's coming back up. In this context we don't need to start anything. All the monitors happen, HAD finds the applications are already running. So in this entire process the beauty of this, besides the fact that it's fairly quick, is we never ever had to take the applications down. So the business users never knew we just did this. And this like I mentioned was a huge oh wow for me when I first started working with VCS. Author's Original Notes: The UNIX startup file for vxfen should be used to start fencing.For example, for Linux: /etc/rc3.d/S68vxfen

This ensures that the fencing configuration files are regenerated, ensuring thatany changes that may have been made to fencing are in effect. If no changes are made to fencing (and that is not the focus of this lesson), you could startfencing with vxfenconfig c.Transcript: Here's an example of changes to the llttab file. Now this happens to be a Solaris example. But again as you saw when we looked at the different llttab files, they are almost exactly the same on all platforms. What we're showing here -- what we're showing here in the dotted lines is a couple of things. Number one we're actually pointing out that the pound sign in the first column or first character of a line is actually the comment line. So you actually can put comments in into your llttab file if you want to. Completely up to you. It's not that complex a file. Usually -- usually you don't. The other thing we're showing is an example of a low-pri link. If you notice the link-lowpri is the only thing different between the link statement and the link-lowpri statement. So the -low-pri is how you specify that that's a low-pri link. And again the whole purpose of a low-pri is across your public network. So the tag name, the device, the - ether - - all the same as with the link statements, so very easy to do. And I would strongly encourage you if you have a normal VCS cluster, it's a great way to go in adding a low-pri link across your public network with really no cost to it. Gives you an additional level of redundancy. And we really minimize the traffic, so it should not in any way, shape or form impact your public network for your applications. One last thing I will point out about that. If you do do that, you add a low-pri link across your public network, what I would encourage you to do is just let your networking team know that if they see CAFE packets, because usually the networking team is sniffing, you know doing some kind of sniffing to see what kind of traffic is on the public network. And so if that's true of your site, just let them know that CAFE packets are okay because they're VCS packets. Remember that's the SAP column with the dash, the default is 0xCAFE. Okay so let them know that, because they will see that part of a sniffer. And just tell them yes, that's VCS packets, don't worry about it. Author's Original Notes:

Transcript: Alright, so in this lesson we took a look at the different things involved in cluster membership, how we start up, how we know who's there. And we also looked at the different config files, as well as how you make changes. Author's Original Notes:

Transcript: End of Presentation Author's Original Notes:

Documents

14vcs6unixclustercomm Trans