Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

TableofContentsIntroduction

Disclaimer

Abouttheauthor

IntroductiontoHA

ComponentsofHA

FundamentalConcepts

RestartingVirtualMachines

VirtualSANandVirtualVolumesspecifics

AddingresiliencytoHA

AdmissionControl

VMandApplicationMonitoring

vSphereHAand...

UseCase-StretchedClusters

AdvancedSettings

Summarizing

Changelog

vSphere6.xHADeepdive

2

VMwarevSphere6.xHADeepdiveLikemanyofyouIamconstantlytryingtoexplorenewwaystosharecontentwiththerestoftheworld.OverthecourseofthelastdecadeIhavedonethisinmanydifferentformats,someofthemwereeasytodoandothersnotsomuch.Booksalwaysfellinthatlastcategory,whichisashameasIhavealwaysenjoyedwritingthem.

Iwantedtoexplorethedifferentoptionstherearetocreatecontentandshareitindifferentways,withouttheneedtore-doformattingandwastealotoftimeonthingsIdonotwanttowastetimeon.AfteranafternoonofreadingandresearchingGitBookpoppedup.Itlookedlikeaninterestingplatform/solutionthatwouldallowmetocreatecontentbothonlineandoffline,pushandpullittoandfromarepositoryandbuildbothastaticwebsitefromitaswellaspublishitinavarietyofdifferentformats.

Letitbeclearthatthisisatrial,andthismayormaynotresultinafollowup.IamstartingwiththevSphereHighAvailabilitycontentasthatiswhatIammostfamiliarwithandwillbeeasiesttoupdate.

Aspecialthanksgoesouttoeveryonewhohascontributedinanyshapeorformtothisproject.FirstofallFrankDenneman,thepersonwhomIwrotethefirst3versionsoftheClusteringDeepdivewithandwhodesignedallthegreatdiagramswhichyoufindthroughoutthispublication.Ofcoursealso:DougBaerforeditingthecontentinthepastandmytechnicalconscious:KeithFarkas,CormacHogan,ManojKrishnan,AnneHoller,MustafaUysalandGabrielTarasuk-Levin.

Forofflinereading,feelfreetodownloadthispublicationinanyofthefollowingformats:PDF-ePub-Mobi.

ThesourceofthispublicationisstoredonbothGitbookaswellasGithub.Feelfreetosubmit/contributewherepossibleandneeded.Notethatitisalsopossibletoleavefeedbackonthecontentbysimplyclickingonthe"+"ontherightsideoftheparagraphyouwanttocommenton(hoveroveritwithyourmouse).IwillreadandincorporatefeedbackassoonasIhavetime,henceitisusefultocheckbackregularlyandvalidateyourdownloadedversionagainstthedetailsbelow.

vSphere6.xHADeepdive,bookversion:1.0.4.BookbuiltwithGitBookversion:2.6.7.

Thanksforreading,andenjoy!


3Introduction

https://www.gitbook.com/download/pdf/book/duncanyb/vsphere-ha-deepdive

https://www.gitbook.com/download/epub/book/duncanyb/vsphere-ha-deepdive

https://www.gitbook.com/download/mobi/book/duncanyb/vsphere-ha-deepdive

https://www.gitbook.com/book/duncanyb/vsphere-ha-deepdive/details

https://github.com/duncanyb/vsphere-ha-deepdive

DuncanEppingChiefTechnologistStorageandAvailability-VMware


4Introduction

DisclaimerAlthougheveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorassumenoresponsibilityforerrorsoromissions.Neitherisanyliabilityassumedfordamagesresultingfromtheuseoftheinformationcontainedherein.

TheauthorofthispublicationworksforVMware.Theopinionsexpressedhereistheauthor'spersonalopinion.ContentpublishedwasnotapprovedinadvancebyVMwareanddoesnotnecessarilyreflecttheviewsandopinionofVMware.Thisistheauthor'sbook,notaVMware.

Copyrights/Licensing

Figure1-CreativeCommonsLicense


5Disclaimer

http://creativecommons.org/licenses/by-nc/4.0/

AbouttheAuthorDuncanEppingisaChiefTechnologistworkingintheOfficeofCTOofVMware'sStorageandAvailabilitybusinessunit.Inthatrole,heservesasapartnerandtrustedadvisertoVMware’scustomersprimarilyinEMEA.MainresponsibilitiesareensuringVMware’sfutureinnovationsalignwithessentialcustomerneedsandtranslatingcustomerproblemstoopportunities.DuncanspecializesinSoftwareDefinedStorage,hyper-convergedinfrastructuresandbusinesscontinuity/disasterrecoverysolutions.Hehas1patentgrantedand4patentspendingonthetopicofavailability,storageandresourcemanagement.DuncanisaVMwareCertifiedDesignExpert(VCDX007)andthemainauthorandownerofVMware/VirtualizationblogYellow-Bricks.com.

Hecanbefollowedontwitter@DuncanYB.


6Abouttheauthor

http://yellow-bricks.com

http://www.twitter.com/DuncanYB

IntroductiontovSphereHighAvailabilityAvailabilityhastraditionallybeenoneofthemostimportantaspectswhenprovidingservices.WhenprovidingservicesonasharedplatformlikeVMwarevSphere,theimpactofdowntimeexponentiallygrowsasmanyservicesrunonasinglephysicalmachine.AssuchVMwareengineeredafeaturecalledVMwarevSphereHighAvailability.VMwarevSphereHighAvailability,hereaftersimplyreferredtoasHA,providesasimpleandcosteffectivesolutiontoincreaseavailabilityforanyapplicationrunninginavirtualmachineregardlessofitsoperatingsystem.ItisconfiguredusingacoupleofsimplestepsthroughvCenterServer(vCenter)andassuchprovidesauniformandsimpleinterface.HAenablesyoutocreateaclusteroutofmultipleESXihosts.Thiswillallowyoutoprotectvirtualmachinesandtheirworkloads.Intheeventofafailureofoneofthehostsinthecluster,impactedvirtualmachinesareautomaticallyrestartedonotherESXihostswithinthatsameVMwarevSphereCluster(cluster).

Figure2-HighAvailabilityinaction

Ontopofthat,inthecaseofaGuestOSlevelfailure,HAcanrestartthefailedGuestOS.ThisfeatureiscalledVMMonitoring,butissometimesalsoreferredtoasVM-HA.Thismightsoundfairlycomplexbutagaincanbeimplementedwithasingleclick.


7IntroductiontoHA

Figure3-OSLevelHAjustasingleclickaway

Unlikemanyotherclusteringsolutions,HAisasimplesolutiontoimplementandliterallyenabledwithin5clicks.Ontopofthat,HAiswidelyadoptedandusedinallsituations.However,HAisnota1:1replacementforsolutionslikeMicrosoftClusteringServices/WindowsServerFailoverClustering(WSFC).ThemaindifferencebetweenWSFCandHAbeingthatWSFCwasdesignedtoprotectstatefulcluster-awareapplicationswhileHAwasdesignedtoprotectanyvirtualmachineregardlessofthetypeofworkloadwithin,butalsocanbeextendedtotheapplicationlayerthroughtheuseofVMandApplicationMonitoring.

InthecaseofHA,afail-overincursdowntimeasthevirtualmachineisliterallyrestartedononeoftheremaininghostsinthecluster.WhereasMSCStransitionstheservicetooneoftheremainingnodesintheclusterwhenafailureoccurs.Incontrarytowhatmanybelieve,WSFCdoesnotguaranteethatthereisnodowntimeduringatransition.Ontopofthat,yourapplicationneedstobecluster-awareandstatefulinordertogetthemostoutofthismechanism,whichlimitsthenumberofworkloadsthatcouldreallybenefitfromthistypeofclustering.

OnemightaskwhywouldyouwanttouseHAwhenavirtualmachineisrestartedandserviceistemporarilylost.Theanswerissimple;notallvirtualmachines(orservices)need99.999%uptime.FormanyservicesthetypeofavailabilityHAprovidesismorethansufficient.Ontopofthat,manyapplicationswereneverdesignedtorunontopofanWSFCcluster.ThismeansthatthereisnoguaranteeofavailabilityordataconsistencyifanapplicationisclusteredwithWSFCbutisnotcluster-aware.

Inaddition,WSFCclusteringcanbecomplexandrequiresspecialskillsandtraining.Oneexampleismanagingpatchesandupdates/upgradesinaWSFCenvironment;thiscouldevenleadtomoredowntimeifnotoperatedcorrectlyanddefinitelycomplicatesoperationalprocedures.HAhoweverreducescomplexity,costs(associatedwithdowntimeandMSCS),resourceoverheadandunplanneddowntimeforminimaladditionalcosts.ItisimportanttonotethatHA,contrarytoWSFC,doesnotrequireanychangestotheguestasHAisprovidedonthehypervisorlevel.Also,VMMonitoringdoesnotrequireanyadditionalsoftwareorOSmodificationsexceptforVMwareTools,whichshouldbeinstalledanywayasabestpractice.Incaseevenhigheravailabilityisrequired,VMwarealsoprovidesalevelof


8IntroductiontoHA

applicationawarenessthroughApplicationMonitoring,whichhasbeenleveragedbypartnerslikeSymantectoenableapplicationlevelresiliencyandcouldbeusedbyin-housedevelopmentteamstoincreaseresiliencyfortheirapplication.

HAhasprovenitselfoverandoveragainandiswidelyadoptedwithintheindustry;ifyouarenotusingittoday,hopefullyyouwillbeconvincedafterreadingthissectionofthebook.

vSphere6.0BeforewediveintothemainconstructsofHAanddescribeallthechoicesonehastomakewhenconfiguringHA,wewillfirstbrieflytouchonwhat’snewinvSphere6.0anddescribethebasicrequirementsandstepsneededtoenableHA.ThisbookcoversallthereleasedversionsofwhatisknownwithinVMwareas“FaultDomainManager”(FDM)whichwasintroducedwithvSphere5.0.Wewillcalloutthedifferencesinbehaviorinthedifferentversionswhereapplicable,ourbaselinehoweverisvSphere6.0.

What’sNewin6.0?

ComparedtovSphere5.0thechangesintroducedwithvSphere6.0forHAappeartobeminor.However,someofthenewfunctionalitywillmakethelifeofmanyofyoumucheasier.Althoughthelistisrelativelyshort,fromanengineeringpointofviewmanyofthesethingshavebeenanenormouseffortastheyrequiredchangetothedeepfundamentsoftheHAarchitecture.

SupportforVirtualVolumes–WithVirtualVolumesanewtypeofstorageentityisintroducedinvSphere6.0.ThishasalsoresultedinsomechangesintheHAarchitecturetoaccommodateforthisnewwayofstoringvirtualmachinesSupportforVirtualSAN–ThiswasactuallyintroducedwithvSphere5.5,butasitisnewtomanyofyouandledtochangesinthearchitecturewedecidedtoincludeitinthisupdateVMComponentProtection–ThisallowsHAtorespondtoascenariowheretheconnectiontothevirtualmachine’sdatastoreisimpactedtemporarilyorpermanently

HA“ResponseforDatastorewithAllPathsDown”HA“ResponseforDatastorewithPermanentDeviceLoss”

Increasedhostscale–Clusterlimithasgrownfrom32to64hostsIncreasedVMscale–Clusterlimithasgrownfrom4000VMsto8000VMsperclusterSecureRPC–SecurestheVM/AppmonitoringchannelFullIPv6supportRegistrationof“HADisabled”VMsonhostsafterfailure


9IntroductiontoHA

WhatisrequiredforHAtoWork?EachfeatureorproducthasveryspecificrequirementsandHAisnodifferent.KnowingtherequirementsofHAispartofthebasicswehavetocoverbeforedivingintosomeofthemorecomplexconcepts.ForthosewhoarecompletelynewtoHA,wewillalsoshowyouhowtoconfigureit.

Prerequisites

BeforeenablingHAitishighlyrecommendvalidatingthattheenvironmentmeetsalltheprerequisites.Wehavealsoincludedrecommendationsfromaninfrastructureperspectivethatwillenhanceresiliency.

Requirements:

MinimumoftwoESXihostsMinimumof5GBmemoryperhosttoinstallESXiandenableHAVMwarevCenterServerSharedStorageforvirtualmachinesPingablegatewayorotherreliableaddress

Recommendation:

RedundantManagementNetwork(notarequirement,buthighlyrecommended)8GBofmemoryormoreperhostMultipleshareddatastores

FirewallRequirements

ThefollowingtablecontainstheportsthatareusedbyHAforcommunication.Ifyourenvironmentcontainsfirewallsexternaltothehost,ensuretheseportsareopenedforHAtofunctioncorrectly.HAwillopentherequiredportsontheESXorESXifirewall.

Port Protocol Direction

8182 UDP Inbound

8182 TCP Inbound

8182 UDP Outbound

8182 TCP Outbound

ConfiguringvSphereHighAvailability


10IntroductiontoHA

HAcanbeconfiguredwiththedefaultsettingswithinacoupleofclicks.ThefollowingstepswillshowyouhowtocreateaclusterandenableHA,includingVMMonitoring,usingthevSphereWebClient.Eachofthesettingsandthedesigndecisionsassociatedwiththesestepswillbedescribedinmoredepthinthefollowingchapters.

1. Click“Hosts&Clusters”underInventoriesontheHometab.2. Right-clicktheDatacenterintheInventorytreeandclickNewCluster.3. Givethenewclusteranappropriatename.Werecommendataminimumincludingthe

locationoftheclusterandasequencenumberie.ams-hadrs-001.4. SelectTurnOnvSphereHA.5. Ensure“Enablehostmonitoring”and“Enableadmissioncontrol”isselected.6. Select“Percentageofclusterresources…”underPolicyandspecifyapercentage.7. EnableVMMonitoringStatusbyselecting“VMandApplicationMonitoring”.8. Click“OK”tocompletethecreationofthecluster.

Figure4-ReadytocompletetheNewClusterWizard


11IntroductiontoHA

WhentheHAclusterhasbeencreated,theESXihostscanbeaddedtotheclustersimplybyrightclickingthehostandselecting“MoveTo”,iftheywerealreadyaddedtovCenter,orbyrightclickingtheclusterandselecting“AddHost”.

WhenanESXihostisaddedtothenewly-createdcluster,theHAagentwillbeloadedandconfigured.Oncethishascompleted,HAwillenableprotectionoftheworkloadsrunningonthisESXihost.

Aswehaveclearlydemonstrated,HAisasimpleclusteringsolutionthatwillallowyoutoprotectvirtualmachinesagainsthostfailureandoperatingsystemfailureinliterallyminutes.UnderstandingthearchitectureofHAwillenableyoutoreachthatextra9whenitcomestoavailability.ThefollowingchapterswilldiscussthearchitectureandfundamentalconceptsofHA.Wewillalsodiscussalldecision-makingmomentstoensureyouwillconfigureHAinsuchawaythatitmeetstherequirementsofyouroryourcustomer’senvironment.


12IntroductiontoHA

ComponentsofHighAvailabilityNowthatweknowwhatthepre-requisitesareandhowtoconfigureHAthenextstepswillbedescribingwhichcomponentsformHA.Keepinmindthatthisisstilla“highlevel”overview.Thereismoreunderthecoverthatwewillexplaininfollowingchapters.Thefollowingdiagramdepictsatwo-hostclusterandshowsthekeyHAcomponents.

Figure5-ComponentsofHighAvailability

Asyoucanclearlysee,therearethreemajorcomponentsthatformthefoundationforHAasofvSphere6.0:

FDMHOSTDvCenter

ThefirstandprobablythemostimportantcomponentthatformsHAisFDM(FaultDomainManager).ThisistheHAagent.


13ComponentsofHA

TheFDMAgentisresponsibleformanytaskssuchascommunicatinghostresourceinformation,virtualmachinestatesandHApropertiestootherhostsinthecluster.FDMalsohandlesheartbeatmechanisms,virtualmachineplacement,virtualmachinerestarts,loggingandmuchmore.Wearenotgoingtodiscussallofthisin-depthseparatelyaswefeelthatthiswillcomplicatethingstoomuch.

FDM,inouropinion,isoneofthemostimportantagentsonanESXihost,whenHAisenabled,ofcourse,andweareassumingthisisthecase.TheengineersrecognizedthisimportanceandaddedanextralevelofresiliencytoHA.FDMusesasingle-processagent.However,FDMspawnsawatchdogprocess.Intheunlikelyeventofanagentfailure,thewatchdogfunctionalitywillpickuponthisandrestarttheagenttoensureHAfunctionalityremainswithoutanyoneevernoticingitfailed.Theagentisalsoresilienttonetworkinterruptionsand“allpathsdown”(APD)conditions.Inter-hostcommunicationautomaticallyusesanothercommunicationpath(ifthehostisconfiguredwithredundantmanagementnetworks)inthecaseofanetworkfailure.

HAhasnodependencyonDNSasitworkswithIPaddressesonly.ThisisoneofthemajorimprovementsthatFDMbrought.ThisdoesnotmeanthatESXihostsneedtoberegisteredwiththeirIPaddressesinvCenter;itisstillabestpracticetoregisterESXihostsbyitsfullyqualifieddomainname

(FQDN)invCenter.AlthoughHAdoesnotdependonDNS,rememberthatotherservicesmaydependonit.Ontopofthat,monitoringandtroubleshootingwillbemucheasierwhenhostsarecorrectlyregisteredwithinvCenterandhaveavalidFQDN.

Basicdesignprinciple:AlthoughHAisnotdependentonDNS,itisstillrecommendedtoregisterthehostswiththeirFQDNforeaseofoperations/management.

vSphereHAalsohasastandardizedloggingmechanism,whereasinglelogfilehasbeencreatedforalloperationallogmessages;itiscalledfdm.log.Thislogfileisstoredunder/var/log/asdepictedinFigure5.

Figure6-HAlogfile


14ComponentsofHA

Basicdesignprinciple:Ensuresyslogiscorrectlyconfiguredandlogfilesareoffloadedtoasafelocationtoofferthepossibilityofperformingarootcauseanalysisincasedisasterstrikes.

HOSTDAgentOneofthemostcrucialagentsonahostisHOSTD.Thisagentisresponsibleformanyofthetaskswetakeforgrantedlikepoweringonvirtualmachines.FDMtalksdirectlytoHOSTDandvCenter,soitisnotdependentonVPXA,likeinpreviousreleases.Thisis,ofcourse,toavoidanyunnecessaryoverheadanddependencies,makingHAmorereliablethaneverbeforeandenablingHAtorespondfastertopower-onrequests.ThatultimatelyresultsinhigherVMuptime.

When,forwhateverreason,HOSTDisunavailableornotyetrunningafterarestart,thehostwillnotparticipateinanyFDM-relatedprocesses.FDMreliesonHOSTDforinformationaboutthevirtualmachinesthatareregisteredtothehost,andmanagesthevirtualmachinesusingHOSTDAPIs.Inshort,FDMisdependentonHOSTDandifHOSTDisnotoperational,FDMhaltsallfunctionsandwaitsforHOSTDtobecomeoperational.

vCenterThatbringsustoourfinalcomponent,thevCenterServer.vCenteristhecoreofeveryvSphereClusterandisresponsibleformanytasksthesedays.Forourpurposes,thefollowingarethemostimportantandtheoneswewilldiscussinmoredetail:

DeployingandconfiguringHAAgentsCommunicationofclusterconfigurationchangesProtectionofvirtualmachines

vCenterisresponsibleforpushingouttheFDMagenttotheESXihostswhenapplicable.Thepushoftheseagentsisdoneinparalleltoallowforfasterdeploymentandconfigurationofmultiplehostsinacluster.vCenterisalsoresponsibleforcommunicatingconfigurationchangesintheclustertothehostwhichiselectedasthemaster.Wewilldiscussthisconceptofmasterandslavesinthefollowingchapter.Examplesofconfigurationchangesaremodificationoradditionofanadvancedsettingortheintroductionofanewhostintothecluster.

HAleveragesvCentertoretrieveinformationaboutthestatusofvirtualmachinesand,ofcourse,vCenterisusedtodisplaytheprotectionstatus(Figure6)ofvirtualmachines.(What“virtualmachineprotection”actuallymeanswillbediscussedinchapter3.)Ontopofthat,vCenterisresponsiblefortheprotectionandunprotectionofvirtualmachines.Thisnotonly


15ComponentsofHA

appliestouserinitiatedpower-offsorpower-onsofvirtualmachines,butalsointhecasewhereanESXihostisdisconnectedfromvCenteratwhichpointvCenterwillrequestthemasterHAagenttounprotecttheaffectedvirtualmachines.

Figure7-Virtualmachineprotectionstate

AlthoughHAisconfiguredbyvCenterandexchangesvirtualmachinestateinformationwithHA,vCenterisnotinvolvedwhenHArespondstofailure.ItiscomfortingtoknowthatincaseofahostfailurecontainingthevirtualizedvCenterServer,HAtakescareofthefailureandrestartsthevCenterServeronanotherhost,includingallotherconfiguredvirtualmachinesfromthatfailedhost.

ThereisacornercasescenariowithregardstovCenterfailure:iftheESXihostsaresocalled“statelesshosts”andDistributedvSwitchesareusedforthemanagementnetwork,virtualmachinerestartswillnotbeattempteduntilvCenterisrestarted.Forstatelessenvironments,vCenterandAutoDeployavailabilityiskeyastheESXihostsliterallydependonthem.

IfvCenterisunavailable,itwillnotbepossibletomakechangestotheconfigurationofthecluster.vCenteristhesourceoftruthforthesetofvirtualmachinesthatareprotected,theclusterconfiguration,thevirtualmachine-to-hostcompatibilityinformation,andthehostmembership.So,whileHA,bydesign,willrespondtofailureswithoutvCenter,HAreliesonvCentertobeavailabletoconfigureormonitorthecluster.

WhenavirtualvCenterServer,orthevCenterServerAppliance,hasbeenimplemented,werecommendsettingthecorrectHArestartprioritiesforit.AlthoughvCenterServerisnotrequiredtorestartvirtualmachines,therearemultiplecomponentsthatrelyonvCenterand,assuch,aspeedyrecoveryisdesired.WhenconfiguringyourvCentervirtualmachinewitha


16ComponentsofHA

highpriorityforrestarts,remembertoincludeallservicesonwhichyourvCenterserverdependsforasuccessfulrestart:DNS,MSADandMSSQL(oranyotherdatabaseserveryouareusing).

Basicdesignprinciples:

1. Instatelessenvironments,ensurevCenterandAutoDeployarehighlyavailableasrecoverytimeofyourvirtualmachinesmightbedependentonthem.

2. UnderstandtheimpactofvirtualizingvCenter.EnsureithashighpriorityforrestartsandensurethatserviceswhichvCenterServerdependsonareavailable:DNS,ADanddatabase.


17ComponentsofHA

FundamentalConceptsNowthatyouknowaboutthecomponentsofHA,itistimetostarttalkingaboutsomeofthefundamentalconceptsofHAclusters:

Master/SlaveagentsHeartbeatingIsolatedvsNetworkpartitionedVirtualMachineProtectionComponentProtection

EveryonewhohasimplementedvSphereknowsthatmultiplehostscanbeconfiguredintoacluster.Aclustercanbestbeseenasacollectionofresources.TheseresourcescanbecarvedupwiththeuseofvSphereDistributedResourceScheduler(DRS)intoseparatepoolsofresourcesorusedtoincreaseavailabilitybyenablingHA.

TheHAarchitectureintroducestheconceptofmasterandslaveHAagents.Exceptduringnetworkpartitions,whicharediscussedlater,thereisonlyonemasterHAagentinacluster.Anyagentcanserveasamaster,andallothersareconsidereditsslaves.Amasteragentisinchargeofmonitoringthehealthofvirtualmachinesforwhichitisresponsibleandrestartinganythatfail.Theslavesareresponsibleforforwardinginformationtothemasteragentandrestartinganyvirtualmachinesatthedirectionofthemaster.TheHAagent,regardlessofitsroleasmasterorslave,alsoimplementstheVM/AppmonitoringfeaturewhichallowsittorestartvirtualmachinesinthecaseofanOperatingSystemorrestartservicesinthecaseofanapplicationfailure.

MasterAgentAsstated,oneoftheprimarytasksofthemasteristokeeptrackofthestateofthevirtualmachinesitisresponsibleforandtotakeactionwhenappropriate.Inanormalsituationthereisonlyasinglemasterinacluster.Wewilldiscussthescenariowheremultiplemasterscanexistinasingleclusterinoneofthefollowingsections,butfornowlet’stalkaboutaclusterwithasinglemaster.Amasterwillclaimresponsibilityforavirtualmachinebytaking“ownership”ofthedatastoreonwhichthevirtualmachine’sconfigurationfileisstored.


18FundamentalConcepts

Basicdesignprinciple:Tomaximizethechanceofrestartingvirtualmachinesafterafailurewerecommendmaskingdatastoresonaclusterbasis.Althoughsharingofdatastoresacrossclusterswillwork,itwillincreasecomplexityfromanadministrativeperspective.

Thatisnotall,ofcourse.TheHAmasterisalsoresponsibleforexchangingstateinformationwithvCenter.ThismeansthatitwillnotonlyreceivebutalsosendinformationtovCenterwhenrequired.TheHAmasterisalsothehostthatinitiatestherestartofvirtualmachineswhenahosthasfailed.Youmayimmediatelywanttoaskwhathappenswhenthemasteristheonethatfails,or,moregenerically,whichofthehostscanbecomethemasterandwhenisitelected?

Election

AmasteriselectedbyasetofHAagentswhenevertheagentsarenotinnetworkcontactwithamaster.AmasterelectionthusoccurswhenHAisfirstenabledonaclusterandwhenthehostonwhichthemasterisrunning:

fails,becomesnetworkpartitionedorisolated,isdisconnectedfromvCenterServer,isputintomaintenanceorstandbymode,orwhenHAisreconfiguredonthehost.

TheHAmasterelectiontakesapproximately15secondsandisconductedusingUDP.WhileHAwon’treacttofailuresduringtheelection,onceamasteriselected,failuresdetectedbeforeandduringtheelectionwillbehandled.Theelectionprocessissimplebutrobust.Thehostthatisparticipatingintheelectionwiththegreatestnumberofconnecteddatastoreswillbeelectedmaster.Iftwoormorehostshavethesamenumberofdatastoresconnected,theonewiththehighestManagedObjectIdwillbechosen.Thishoweverisdonelexically;meaningthat99beats100as9islargerthan1.Foreachhost,theHAStateofthehostwillbeshownontheSummarytab.Thisincludestheroleasdepictedinscreenshotbelowwherethehostisamasterhost.

Afteramasteriselected,eachslavethathasmanagementnetworkconnectivitywithitwillsetupasinglesecure,encrypted,TCPconnectiontothemaster.ThissecureconnectionisSSL-based.Onethingtostressherethoughisthatslavesdonotcommunicatewitheachotherafterthemasterhasbeenelectedunlessare-electionofthemasterneedstotakeplace.



Figure8-MasterAgent

Asstatedearlier,whenamasteriselecteditwilltrytoacquireownershipofallofthedatastoresitcandirectlyaccessoraccessbyproxyingrequeststooneoftheslavesconnectedtoitusingthemanagementnetwork.Forregularstoragearchitecturesitdoesthisbylockingafilecalled“protectedlist”thatisstoredonthedatastoresinanexistingcluster.Themasterwillalsoattempttotakeownershipofanydatastoresitdiscoversalongtheway,anditwillperiodicallyretryanyitcouldnottakeownershipofpreviously.

Thenamingformatandlocationofthisfileisasfollows:

/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/protectedlist

Forthosewonderinghow“cluster-specific-directory”isconstructed:

<uuidofvCenterServer>-<numberpartoftheMoIDofthecluster>-<random8charstring>-

<nameofthehostrunningvCenterServer>

Themasterusesthisprotectedlistfiletostoretheinventory.ItkeepstrackofwhichvirtualmachinesareprotectedbyHA.Callingitaninventorymightbeslightlyoverstating:itisalistofprotectedvirtualmachinesanditincludesinformationaroundvirtualmachineCPUreservationandmemoryoverhead.Themasterdistributesthisinventoryacrossalldatastoresinusebythevirtualmachinesinthecluster.Thenextscreenshotshowsanexampleofthisfileononeofthedatastores.



Figure9-Protectedlistfile

Nowthatweknowthemasterlocksafileonthedatastoreandthatthisfilestoresinventorydetails,whathappenswhenthemasterisisolatedorfails?Ifthemasterfails,theanswerissimple:thelockwillexpireandthenewmasterwillrelockthefileifthedatastoreisaccessibletoit.

Inthecaseofisolation,thisscenarioisslightlydifferent,althoughtheresultissimilar.ThemasterwillreleasethelockithasonthefileonthedatastoretoensurethatwhenanewmasteriselecteditcandeterminethesetofvirtualmachinesthatareprotectedbyHAbyreadingthefile.If,byanychance,amastershouldfailrightatthemomentthatitbecameisolated,therestartofthevirtualmachineswillbedelayeduntilanewmasterhasbeenelected.Inascenariolikethis,accuracyandthefactthatvirtualmachinesarerestartedismoreimportantthanashortdelay.

Let’sassumeforasecondthatyourmasterhasjustfailed.Whatwillhappenandhowdotheslavesknowthatthemasterhasfailed?HAusesapoint-to-pointnetworkheartbeatmechanism.Iftheslaveshavereceivednonetworkheartbeatsfromthemaster,theslaveswilltrytoelectanewmaster.Thisnewmasterwillreadtherequiredinformationandwillinitiatetherestartofthevirtualmachineswithinroughly10seconds.

Restartingvirtualmachinesisnottheonlyresponsibilityofthemaster.ItisalsoresponsibleformonitoringthestateoftheslavehostsandreportingthisstatetovCenterServer.Ifaslavefailsorbecomesisolatedfromthemanagementnetwork,themasterwilldeterminewhichvirtualmachinesmustberestarted.Whenvirtualmachinesneedtoberestarted,themasterisalsoresponsiblefordeterminingtheplacementofthosevirtualmachines.Itusesaplacementenginethatwilltrytodistributethevirtualmachinestoberestartedevenlyacrossallavailablehosts.



Alloftheseresponsibilitiesarereallyimportant,butwithoutamechanismtodetectaslavehasfailed,themasterwouldbeuseless.Justliketheslavesreceiveheartbeatsfromthemaster,themasterreceivesheartbeatsfromtheslavessoitknowstheyarealive.

SlavesAslavehassubstantiallyfewerresponsibilitiesthanamaster:aslavemonitorsthestateofthevirtualmachinesitisrunningandinformsthemasteraboutanychangestothisstate.

Theslavealsomonitorsthehealthofthemasterbymonitoringheartbeats.Ifthemasterbecomesunavailable,theslavesinitiateandparticipateintheelectionprocess.Lastbutnotleast,theslavessendheartbeatstothemastersothatthemastercandetectoutages.Likethemastertoslavecommunication,allslavetomastercommunicationispointtopoint.HAdoesnotusemulticast.

Figure10-SlaveAgent

FilesforbothSlaveandMasterBeforeexplainingthedetailsitisimportanttounderstandthatbothVirtualSANandVirtualVolumeshaveintroducedchangestothelocationandtheusageoffiles.Forspecificsonthesetwodifferentstoragearchitectureswereferyoutothoserespectivesectionsinthebook.

Boththemasterandslaveusefilesnotonlytostorestate,butalsoasacommunicationmechanism.We’vealreadyseentheprotectedlistfile(Figure8)usedbythemastertostorethelistofprotectedvirtualmachines.Wewillnowdiscussthefilesthatarecreatedbyboth



themasterandtheslaves.Remotefilesarefilesstoredonashareddatastoreandlocalfilesarefilesthatarestoredinalocationonlydirectlyaccessibletothathost.

RemoteFiles

Thesetofpoweredonvirtualmachinesisstoredinaper-host“poweron”file.Itshouldbenotedthat,becauseamasteralsohostsvirtualmachines,italsocreatesa“poweron”file.

Thenamingschemeforthisfileisasfollows:host-number-poweron

Trackingvirtualmachinepower-onstateisnottheonlythingthe“poweron”fileisusedfor.Thisfileisalsousedbytheslavestoinformthemasterthatitisisolatedfromthemanagementnetwork:thetoplineofthefilewilleithercontaina0ora1.A0(zero)meansnot-isolatedanda1(one)meansisolated.ThemasterwillinformvCenterabouttheisolationofthehost.

LocalFiles

Asmentionedbefore,whenHAisconfiguredonahost,thehostwillstorespecificinformationaboutitsclusterlocally.

Figure11-Locallystoredfiles

Eachhost,includingthemaster,willstoredatalocally.Thedatathatislocallystoredisimportantstateinformation.Namely,theVM-to-hostcompatibilitymatrix,clusterconfiguration,andhostmembershiplist.Thisinformationispersistedlocallyoneachhost.UpdatestothisinformationissenttothemasterbyvCenterandpropagatedbythemastertotheslaves.Althoughweexpectthatmostofyouwillnevertouchthesefiles–andwehighlyrecommendagainstmodifyingthem–wedowanttoexplainhowtheyareused:



clusterconfigThisfileisnothuman-readable.Itcontainstheconfigurationdetailsofthecluster.vmmetadataThisfileisnothuman-readable.ItcontainstheactualcompatibilityinfomatrixforeveryHAprotectedvirtualmachineandlistsallthehostswithwhichitiscompatibleplusavm/hostdictionaryfdm.cfgThisfilecontainstheconfigurationsettingsaroundlogging.Forinstance,thelevelofloggingandsyslogdetailsarestoredinhere.hostlistAlistofhostsparticipatinginthecluster,includinghostname,IPaddresses,MACaddressesandheartbeatdatastores.

HeartbeatingWementioneditacoupleoftimesalreadyinthischapter,anditisanimportantmechanismthatdeservesitsownsection:heartbeating.HeartbeatingisthemechanismusedbyHAtovalidatewhetherahostisalive.HAhastwodifferentheartbeatingmechanisms.Theseheartbeatmechanismsallowsittodeterminewhathashappenedtoahostwhenitisnolongerresponding.Let’sdiscusstraditionalnetworkheartbeatingfirst.

NetworkHeartbeating

NetworkHeartbeatingisusedbyHAtodetermineifanESXihostisalive.Eachslavewillsendaheartbeattoitsmasterandthemastersendsaheartbeattoeachoftheslaves,thisisapoint-to-pointcommunication.Theseheartbeatsaresentbydefaulteverysecond.

Whenaslaveisn’treceivinganyheartbeatsfromthemaster,itwilltrytodeterminewhetheritisIsolated–wewilldiscuss“states”inmoredetaillateroninthischapter.

Basicdesignprinciple:Networkheartbeatingiskeyfordeterminingthestateofahost.Ensurethemanagementnetworkishighlyresilienttoenableproperstatedetermination.

DatastoreHeartbeating

DatastoreheartbeatingaddsanextralevelofresiliencyandpreventsunnecessaryrestartattemptsfromoccurringasitallowsvSphereHAtodeterminewhetherahostisisolatedfromthenetworkoriscompletelyunavailable.Howdoesthiswork?

Datastoreheartbeatingenablesamastertomoredeterminethestateofahostthatisnotreachableviathemanagementnetwork.Thenewdatastoreheartbeatmechanismisusedincasethemasterhaslostnetworkconnectivitywiththeslaves.Thedatastoreheartbeatmechanismisthenusedtovalidatewhetherahosthasfailedorismerelyisolated/network



partitioned.Isolationwillbevalidatedthroughthe“poweron”filewhich,asmentionedearlier,willbeupdatedbythehostwhenitisisolated.Withoutthe“poweron”file,thereisnowayforthemastertovalidateisolation.Letthatbeclear!Basedontheresultsofchecksofbothfiles,themasterwilldeterminetheappropriateactiontotake.Ifthemasterdeterminesthatahosthasfailed(nodatastoreheartbeats),themasterwillrestartthefailedhost’svirtualmachines.IfthemasterdeterminesthattheslaveisIsolatedorPartitioned,itwillonlytakeactionwhenitisappropriatetotakeaction.Withthatmeaningthatthemasterwillonlyinitiaterestartswhenvirtualmachinesaredownorpowereddown/shutdownbyatriggeredisolationresponse,butwewilldiscussthisinmoredetailinChapter4.

Bydefault,HAselects2heartbeatdatastores–itwillselectdatastoresthatareavailableonallhosts,orasmanyaspossible.Althoughitispossibletoconfigureanadvancedsetting(das.heartbeatDsPerHost)toallowformoredatastoresfordatastoreheartbeatingwedonotrecommendconfiguringthisoptionasthedefaultshouldbesufficientformostscenarios,exceptforstretchedclusterenvironmentswhereitisrecommendedtohavetwoineachsitemanuallyselected.

TheselectionprocessgivespreferencetoVMFSoverNFSdatastores,andseekstochoosedatastoresthatarebackedbydifferentLUNsorNFSserverswhenpossible.Ifdesired,youcanalsoselecttheheartbeatdatastoresyourself.We,however,recommendlettingvCenterdealwiththisoperational“burden”asvCenterusesaselectionalgorithmtoselectheartbeatdatastoresthatarepresentedtoallhosts.ThishoweverisnotaguaranteethatvCentercanselectdatastoreswhichareconnectedtoallhosts.ItshouldbenotedthatvCenterisnotsite-aware.Inscenarioswherehostsaregeographicallydisperseditisrecommendtomanuallyselectheartbeatdatastorestoensureeachsitehasonesite-localheartbeatdatastoreatminimum.

Basicdesignprinciple:Inametro-cluster/geographicallydispersedclusterwerecommendsettingtheminimumnumberofheartbeatdatastorestofour.Itisrecommendedtomanuallyselectsitelocaldatastores,twoforeachsite.



Figure12-Selectingtheheartbeatdatastores

Thequestionnowarises:what,exactly,isthisdatastoreheartbeatingandwhichdatastoreisusedforthisheartbeating?Let’sanswerwhichdatastoreisusedfordatastoreheartbeatingfirstaswecansimplyshowthatwithascreenshot,seebelow.vSpheredisplaysextensivedetailsaroundthe“ClusterStatus”ontheCluster’sMonitortab.Thisforinstanceshowsyouwhichdatastoresarebeingusedforheartbeatingandwhichhostsareusingwhichspecificdatastore(s).Inaddition,itdisplayshowmanyvirtualmachinesareprotectedandhowmanyhostsareconnectedtothemaster.

InblockbasedstorageenvironmentsHAleveragesanexistingVMFSfilesystemmechanism.Thedatastoreheartbeatmechanismusesasocalled“heartbeatregion”whichisupdatedaslongasthefileisopen.OnVMFSdatastores,HAwillsimplycheckwhethertheheartbeatregionhasbeenupdated.Inordertoupdateadatastoreheartbeatregion,a



hostneedstohaveatleastoneopenfileonthevolume.HAensuresthereisatleastonefileopenonthisvolumebycreatingafilespecificallyfordatastoreheartbeating.Inotherwords,aper-hostfileiscreatedonthedesignatedheartbeatingdatastores,asshownbelow.Thenamingschemeforthisfileisasfollows:host-number-hb.

OnNFSdatastores,eachhostwillwritetoitsheartbeatfileonceevery5seconds,ensuringthatthemasterwillbeabletocheckhoststate.Themasterwillsimplyvalidatethisbycheckingthatthetime-stampofthefilechanged.

Realizethatinthecaseofaconvergednetworkenvironment,theeffectivenessofdatastoreheartbeatingwillvarydependingonthetypeoffailure.Forinstance,aNICfailurecouldimpactbothnetworkanddatastoreheartbeating.If,forwhateverreason,thedatastoreorNFSsharebecomesunavailableorisremovedfromthecluster,HAwilldetectthisandselectanewdatastoreorNFSsharetousefortheheartbeatingmechanism.

Basicdesignprinciple

Datastoreheartbeatingaddsanewlevelofresiliencybutisnotthebe-allend-all.Inconvergednetworkingenvironments,theuseofdatastoreheartbeatingaddslittlevalueduetothefactthataNICfailuremayresultinboththenetworkandstoragebecomingunavailable.

IsolatedversusPartitionedWe’vealreadybrieflytouchedonitanditistimetohaveacloserlook.Whenitcomestonetworkfailurestherearetwodifferentstatesthatexist.WhataretheseexactlyandwhenisahostPartitionedratherthanIsolated?Beforewewillexplainthiswewanttopointoutthatthereisthestateasreportedbythemasterandthestateasobservedbyanadministratorandthecharacteristicsthesehave.



First,considertheadministrator’sperspective.Twohostsareconsideredpartitionediftheyareoperationalbutcannotreacheachotheroverthemanagementnetwork.Further,ahostisisolatedifitdoesnotobserveanyHAmanagementtrafficonthemanagementnetworkanditcan’tpingtheconfiguredisolationaddresses.Itispossibleformultiplehoststobeisolatedatthesametime.Wecallasetofhoststhatarepartitionedbutcancommunicatewitheachothera“managementnetworkpartition”.Networkpartitionsinvolvingmorethantwopartitionsarepossiblebutnotlikely.

Now,considertheHAperspective.WhenanyHAagentisnotinnetworkcontactwithamaster,theywillelectanewmaster.So,whenanetworkpartitionexists,amasterelectionwilloccursothatahostfailureornetworkisolationwithinthispartitionwillresultinappropriateactionontheimpactedvirtualmachine(s).ThescreenshotbelowshowspossiblewaysinwhichanIsolationoraPartitioncanoccur.

Figure13-IsolatedversusPartitioned



Ifaclusterispartitionedinmultiplesegments,eachpartitionwillelectitsownmaster,meaningthatifyouhave4partitionsyourclusterwillhave4masters.Whenthenetworkpartitioniscorrected,anyofthefourmasterswilltakeovertheroleandberesponsiblefortheclusteragain.Itshouldbenotedthatamastercouldclaimresponsibilityforavirtualmachinethatlivesinadifferentpartition.Ifthisoccursandthevirtualmachinehappenstofail,themasterwillbenotifiedthroughthedatastorecommunicationmechanism.

IntheHAarchitecture,whetherahostispartitionedisdeterminedbythemasterreportingthecondition.So,intheaboveexample,themasteronhostESXi-01willreportESXi-03and04partitionedwhilethemasteronhost04willreport01and02partitioned.Whenapartitionoccurs,vCenterreportstheperspectiveofonemaster.

Amasterreportsahostaspartitionedorisolatedwhenitcan’tcommunicatewiththehostoverthemanagementnetwork,itcanobservethehost’sdatastoreheartbeatsviatheheartbeatdatastores.Themastercannotalonedifferentiatebetweenthesetwostates–ahostisreportedasisolatedonlyifthehostinformsthemasterviathedatastoresthatisisolated.

ThisstillleavesthequestionopenhowthemasterdifferentiatesbetweenaFailed,Partitioned,orIsolatedhost.

Whenthemasterstopsreceivingnetworkheartbeatsfromaslave,itwillcheckforhost“liveness”forthenext15seconds.Beforethehostisdeclaredfailed,themasterwillvalidateifithasactuallyfailedornotbydoingadditionallivenesschecks.First,themasterwillvalidateifthehostisstillheartbeatingtothedatastore.Second,themasterwillpingthemanagementIPaddressofthehost.Ifbotharenegative,thehostwillbedeclaredFailed.Thisdoesn’tnecessarilymeanthehosthasPSOD’ed;itcouldbethenetworkisunavailable,includingthestoragenetwork,whichwouldmakethishostIsolatedfromanadministrator’sperspectivebutFailedfromanHAperspective.Asyoucanimagine,however,thereareavariouscombinationspossible.Thefollowingtabledepictsthesecombinationsincludingthe“state”.

State NetworkHeartbeat StorageHeartbeatHostLive-

nessPing

IsolationCriteriaMet

Running Yes N/A N/A N/A

Isolated No Yes No Yes

Partitioned No Yes No No

Failed No No No N/A

FDMAgentDown N/A N/A Yes N/A



HAwilltriggeranactionbasedonthestateofthehost.WhenthehostismarkedasFailed,arestartofthevirtualmachineswillbeinitiated.WhenthehostismarkedasIsolated,themastermightinitiatetherestarts.

Theonethingtokeepinmindwhenitcomestoisolationresponseisthatavirtualmachinewillonlybeshutdownorpoweredoffwhentheisolatedhostknowsthereisamasterouttherethathastakenownershipforthevirtualmachineorwhentheisolatedhostlosesaccesstothehomedatastoreofthevirtualmachine.

Forexample,ifahostisisolatedandrunstwovirtualmachines,storedonseparatedatastores,thehostwillvalidateifitcanaccesseachofthehomedatastoresofthosevirtualmachines.Ifitcan,thehostwillvalidatewhetheramasterownsthesedatastores.Ifnomasterownsthedatastores,theisolationresponsewillnotbetriggeredandrestartswillnotbeinitiated.Ifthehostdoesnothaveaccesstothedatastore,forinstance,duringan“AllPathsDown”condition,HAwilltriggertheisolationresponsetoensurethe“original”virtualmachineispowereddownandwillbesafelyrestarted.Thistoavoidso-called“split-brain”scenarios.

Toreiterate,asthisisaveryimportantaspectofHAandhowithandlesnetworkisolations,theremaininghostsintheclusterwillonlyberequestedtorestartvirtualmachineswhenthemasterhasdetectedthateitherthehosthasfailedorhasbecomeisolatedandtheisolationresponsewastriggered.

VirtualMachineProtectionVirtualmachineprotectionhappensonseverallayersbutisultimatelytheresponsibilityofvCenter.WehaveexplainedthisbrieflybutwanttoexpandonitabitmoretomakesureeveryoneunderstandsthedependencyonvCenterwhenitcomestoprotectingvirtualmachines.Wedowanttostressthatthisonlyappliestoprotectingvirtualmachines;virtualmachinerestartsinnowayrequirevCentertobeavailableatthetime.

Whenthestateofavirtualmachinechanges,vCenterwilldirectthemastertoenableordisableHAprotectionforthatvirtualmachine.Protection,however,isonlyguaranteedwhenthemasterhascommittedthechangeofstatetodisk.Thereasonforthis,ofcourse,isthatafailureofthemasterwouldresultinthelossofanystatechangesthatexistonlyinmemory.Aspointedoutearlier,thisstateisdistributedacrossthedatastoresandstoredinthe“protectedlist”file.

Whenthepowerstatechangeofavirtualmachinehasbeencommittedtodisk,themasterwillinformvCenterServersothatthechangeinstatusisvisiblebothfortheuserinvCenterandforotherprocesseslikemonitoringtools.



Toclarifytheprocess,wehavecreatedaworkflowdiagramoftheprotectionofavirtualmachinefromthepointitispoweredonthroughvCenter:

Figure14-VirtualMachineprotectionworkflow

Butwhatabout“unprotection?”Whenavirtualmachineispoweredoff,itmustberemovedfromtheprotectedlist.WehavedocumentedthisworkflowinthefollowingdiagramforthesituationwherethepoweroffisinvokedfromvCenter.



Figure15-VirtualMachineUnprotectionworkflow



RestartingVirtualMachinesInthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.

HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:

FailedhostIsolatedhostFailedguestoperatingsystem

Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.

Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.

RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.

Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachinefails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.


33RestartingVirtualMachines

Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.

Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:

AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority

ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.

Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.

RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.

HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.

Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.

T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4



Figure16-HighAvailabilityrestarttimeline



Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.

WhataboutVMswhichare"disabled"forHA?WhatwillhappenwiththoseVMs?BeforevSphere6.0thoseVMswouldbeleftalone,asofvSphere6.0theseVMswillberegisteredonanotherhostafterafailure.Thiswillallowyoutoeasilypower-onthatVMwhenneededwithoutneededtomanuallyre-registerityourself.Note,HAwillnotdoapower-onoftheVM,itwilljustregisteritforyou!

Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:

Cluster:4Host(esxi01,esxi02,esxi03,esxi04)

Master:esxi01

Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,

unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.

Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.

Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall

ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd

poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.

Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissued



poweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.

Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.

Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.

Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.

FailedhostFailureofamasterFailureofaslave

Isolatedhostandresponse

FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.

TheFailureofaSlave

Thefailureofaslavehostisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:

T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.



T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.

Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor

15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging

themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe

15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat

datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.



Figure17-Restarttimelineslavefailure

Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist



whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.

Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.

Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.

TheFailureofaMaster

Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:

T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.

Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.



Figure18-Restarttimelinemasterfailure

Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.

IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.

IsolationResponse



TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:

Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.

Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.

Figure19-Clusterdefaultsettings

Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.

UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.

Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.



Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.

Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.

Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.

Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.

Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.



Likelihoodthathostwillretainaccessto

VMdatastore

LikelihoodVMswillretain

accesstoVM

network

RecommendedIsolationPolicy

Rationale

Likely Likely LeavePoweredOn

Virtualmachineisrunningfine,noreasontopoweritoff

Likely UnlikelyEitherLeavePoweredOnorShutdown.

ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage

Unlikely Likely PowerOff

UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork

Unlikely UnlikelyLeavePoweredOnorPowerOff

LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.

Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.

Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.

Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.

IsolationDetection



Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:

IsolationofaslaveIsolationofamaster

Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.

IsolationofaSlave

HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.

T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse

WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.

Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.



Figure20-Isolationofaslavetimeline

IsolationofaMaster

Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.

T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse

AdditionalChecks

Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.



Figure21-IsolationAddress

SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.

Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.

IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.

RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.



Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.

Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.

NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:

CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.

Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.

Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.

Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:



Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.

ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.

ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.

Figure22-VirtualMachineComponentProtection



AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.

APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.

Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.

Figure23-EnablingVirtualMachineComponentProtection

Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.

UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.

WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed



HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.

YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.

ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.

Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.

vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.

das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"

Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.



WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.

das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"

ADDSCREENSHOTHERE!#RestartingVirtualMachines

Inthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.

HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:

FailedhostIsolatedhostFailedguestoperatingsystem

Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.

Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.

RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.

Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachine



fails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.

Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.

Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:

AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority

ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.

Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.

RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.

HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.

Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.



T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4



Figure24-HighAvailabilityrestarttimeline



Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.

Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:

Cluster:4Host(esxi01,esxi02,esxi03,esxi04)

Master:esxi01

Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,

unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.

Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.

Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall

ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd

poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.

Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissuedpoweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.

Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.



Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.

Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.

FailedhostFailureofamasterFailureofaslave

Isolatedhostandresponse

FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.

TheFailureofaSlave

Thefailureofaslavehostisisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:

T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.

Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor



15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging

themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe

15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat

datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.



Figure25-Restarttimelineslavefailure

Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist



whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.

Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.

Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.

TheFailureofaMaster

Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:

T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.

Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.



Figure26-Restarttimelinemasterfailure

Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.

IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.

IsolationResponse



TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:

Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.

Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.

Figure27-Clusterdefaultsettings

Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.

UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.

Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.



Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.

Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.

Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.

Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.

Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.



Likelihoodthathostwillretainaccessto

VMdatastore

LikelihoodVMswillretain

accesstoVM

network

RecommendedIsolationPolicy

Rationale

Likely Likely LeavePoweredOn

Virtualmachineisrunningfine,noreasontopoweritoff

Likely UnlikelyEitherLeavePoweredOnorShutdown.

ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage

Unlikely Likely PowerOff

UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork

Unlikely UnlikelyLeavePoweredOnorPowerOff

LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.

Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.

Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.

Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.

IsolationDetection



Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:

IsolationofaslaveIsolationofamaster

Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.

IsolationofaSlave

HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.

T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse

WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.

Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.



Figure28-Isolationofaslavetimeline

IsolationofaMaster

Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.

T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse

AdditionalChecks

Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.



Figure29-IsolationAddress

SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.

Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.

IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.

RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.



Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.

Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.

NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:

CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.

Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.

Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.

Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:



Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.

ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.

ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.

Figure30-VirtualMachineComponentProtection



AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.

APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.

Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.

Figure31-EnablingVirtualMachineComponentProtection

Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.

UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.

WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed



HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.

YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.

ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.

Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.

vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.

das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"

Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.



WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.

das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"

ADDSCREENSHOTHERE!



VirtualSANandVirtualVolumesspecificsInthelastcoupleofsectionswehavediscussedtheinsandoutofHA.AllofitbasedonVMFSbasedorNFSbasedstorage.WiththeintroductionofVirtualSANandVirtualVolumesalsocomeschangestosomeofthediscussedconcepts.

HAandVirtualSANVirtualSANisVMware’sapproachtoSoftwareDefinedStorage.WearenotgoingtoexplaintheinsandoutsofVirtualSAN,butdowanttoprovideabasicunderstandingforthosewhohaveneverdoneanythingwithit.VirtualSANleverageshostlocalstorageandcreatesashareddatastoreoutofit.

Figure32-VirtualSANCluster


73VirtualSANandVirtualVolumesspecifics

VirtualSANrequiresaminimumof3hostsandeachofthose3hostswillneedtohave1SSDforcachingand1capacitydevice(canbeSSDorHDD).Onlythecapacitydeviceswillcontributetotheavailablecapacityofthedatastore.Ifyouhave1TBworthofcapacitydevicesperhostthenwiththreehoststhetotalsizeofyourdatastorewillbe3TB.

Havingthatsaid,withVirtualSAN6.1VMwareintroduceda"2-node"option.This2-nodeoptionisactually2regularVSANnodeswithathird"witness"node.

ThebigdifferentiatorbetweenmoststoragesystemsandVirtualSANisthatavailabilityofthevirtualmachine’sisdefinedonapervirtualdiskorpervirtualmachinebasis.Thisiscalled“FailuresToTolerate”andcanbeconfiguredtoanyvaluebetween0(zero)and3.Whenconfiguredto0thenthevirtualmachinewillhaveonly1copyofitsvirtualdiskswhichmeansthatifahostfailswherethevirtualdisksarestoredthevirtualmachineislost.AssuchallvirtualmachinesaredeployedbydefaultwithFailuresToTolerate(FTT)setto1.AvirtualdiskiswhatVSANreferstoasanobject.Anobject,whenFTTisconfiguredas1orhigher,hasmultiplecomponents.InthediagrambelowwedemonstratetheFTT=1scenario,andthevirtualdiskinthiscasehas2"datacomponents"anda"witnesscomponents".Thewitnessisusedasa"quorom"mechnanism.

Figure33-VirtualSANObjectmodel



Asthediagramabovedepicts,avirtualmachinecanberunningonthefirsthostwhileitsstoragecomponentsareontheremaininghostsinthecluster.AsyoucanimaginefromanHApointofviewthischangesthingsasaccesstothenetworkisnotonlycriticalforHAtofunctioncorrectlybutalsoforVirtualSAN.WhenitcomestonetworkingnotethatwhenVirtualSANisconfiguredinaclusterHAwillusethesamenetworkforitscommunications(heartbeatingetc).Ontopofthat,itisgoodtoknowthatVMwarehighlyrecommends10GbEtobeusedforVirtualSAN.

Basicdesignprinciple:10GbEishighlyrecommendforVirtualSAN,asvSphereHAalsoleveragestheVirtualSANnetworkandavailabilityofVMsisdependentonnetworkconnectivityensurethatataminimumtwo10GbEportsareusedandtwophysicalswitchesforresiliency.

ThereasonthatHAusesthesamenetworkasVirtualSANissimple,itistooavoidnetworkpartitionscenarioswhereHAcommunicationsisseparatedfromVirtualSANandthestateoftheclusterisunclear.NotethatyouwillneedtoensurethatthereisapingableisolationaddressontheVirtualSANnetworkandthisisolationaddresswillneedtobeconfiguredassuchthroughtheuseoftheadvancedsetting“das.isolationAddress0”.Wealsorecommendtodisabletheuseofthedefaultisolationaddressthroughtheadvancedsetting“das.useDefaultIsolationAddress”(settofalse).

Whenanisolationdoesoccurtheisolationresponseistriggeredasexplainedinearlierchapters.ForVirtualSANtherecommendationissimple,configuretheisolationresponseto“PowerOff,thenfailover”.Thisisthesafestoption.VirtualSANcanbecomparedtothe“convergednetworkwithIPbasedstorage”exampleweprovided.ItisveryeasytoreachasituationwhereahostisisolatedallvirtualmachinesremainrunningbutarerestartedonanotherhostbecausetheconnectiontotheVirtualSANdatastoreislost.

Basicdesignprinciple:ConfigureyourIsolationAddressandyourIsolationPolicyaccordingly.Werecommendselecting“poweroff”astheIsolationPolicyandareliablepingabledeviceastheisolationaddress.ItisrecommendedtoconfiguretheIsolationPolicyto“poweroff”.

WhataboutthingslikeheartbeatdatastoresandthefolderstructurethatexistsonaVMFSdatastore,hasanyofthatchangedwithVirtualSAN.Yesithas.Firstofall,ina“VirtualSAN”onlyenvironmenttheconceptofHearbeatDatastoresisnotusedatall.Thereasonforthisisstraightforward,asHAandVirtualSANsharethesamenetworkitissafetoassumethatwhentheHAheartbeatislostbecauseofanetworkfailuresoisaccesstotheVirtualSANdatastore.Onlyinanenvironmentwherethereisalsotraditionalstoragetheheartbeatdatastoreswillbeconfigured,leveragingthosetraditionaldatastoresasaheartbeatdatastore.NotethatwedonotfeelthereisareasontointroducetraditionalstoragejusttoprovideHAthisfunctionality,HAandVirtualSANworkperfectlyfinewithoutheartbeatdatastores.



NormallyHAmetadataisstoredintherootofthedatastore,forVirtualSANthisisdifferentasthemetadataisstoredintheVMsnamespaceobject.TheprotectedlistisheldinmemoryandupdatedautomaticallywhenVMsarepoweredonoroff.

Nowyoumaywonder,whathappenswhenthereisanisolation?HowdoesHAknowwheretostarttheVMthatisimpacted?Letstakealookatapartitionscenario.

Figure34-VSANPartitionscenario

Inthisscenariothereanetworkproblemhascausedaclusterpartition.WhereaVMisrestartedisdeterminedbywhichpartitionownsthevirtualmachinefiles.WithinaVSANclusterthisisfairlystraightforward.Therearetwopartitions,oneofwhichisrunningtheVMwithitsVMDKandtheotherpartitionhasaVMDKreplicaandawitness.Guesswhathappens?Right,VSANusesthewitnesstoseewhichpartitionhasquorumandbasedonthatresult,oneofthetwopartitionswillwin.Inthiscase,Partition2hasmorethan50%ofthecomponentsofthisobjectandassuchisthewinner.ThismeansthattheVMwillbe



restartedoneither“esxi-03″or“esxi-04″byvSphereHA.NotethattheVMinPartition1willbepoweredoffonlyifyouhaveconfiguredtheisolationresponsetodoso.Wewouldliketostressthatthisishighlyrecommended!(Isolationresponse–>poweroff)

HAandVirtualVolumesLetusstartwithfirstdescribingwhatVirtualVolumesisandwhatvalueitbringsforanadministrator.VirtualVolumeswasdevelopedtomakeyourlife(vSphereadmin)andthatofthestorageadministratoreasier.ThisisdonebyprovidingaframeworkthatenablesthevSphereadministratortoassignpoliciestovirtualmachinesorvirtualdisks.Inthesepoliciescapabilitiesofthestoragearraycanbedefined.Thesecapabilitiescanbethingslikesnapshotting,deduplication,raid-level,thin/thickprovisioningetc.WhatisofferedtothevSphereadministratorisuptotheStorageadministrator,andofcourseuptowhatthestoragesystemcanoffertobeginwith.Whenavirtualmachineisdeployedandapolicyisassignedthenthestoragesystemwillenablecertainfunctionalityofthearraybasedonwhatwasspecifiedinthepolicy.SonolongeraneedtoassigncapabilitiestoaLUNwhichholdsmanyVMs,butratheraperVMorevenperVMDKlevelcontrol.Sohowdoesthiswork?Wellletstakealookatanarchitecturaldiagramfirst.



Figure35-VirtualVolumesArchitecture

ThediagramshowsacoupleofcomponentswhichareimportantintheVVolarchitecture.Letslistthemout:

ProtocolEndpointsakaPEVirtualDatastoreandaStorageContainerVendorProvider/VASAPoliciesVirtualVolumes

Letstakealookatallofthesethreeintheaboveorder.ProtocolEndpoints,whatarethey?

ProtocolEndpointsareliterallytheaccesspointtoyourstoragesystem.AllIOtovirtualvolumesisproxiedthroughaProtocolEndpointandyoucanhave1ormoreoftheseperstoragesystem,ifyourstoragesystemsupportshavingmultipleofcourse.(Implementationsofdifferentvendorswillvary.)PEsarecompatiblewithdifferentprotocols(FC,FCoE,iSCSI,NFS)andifyouaskmethatwholediscussionwithVirtualVolumeswillcometoanend.You



couldseeaProtocolEndpointasa“mountpoint”oradevice,andyestheywillcounttowardsyourmaximumnumberofdevicesperhost(256).(VirtualVolumesitselfwon’tcounttowardsthat!)

NextupistheStorageContainer.Thisistheplacewhereyoustoreyourvirtualmachines,orbettersaidwhereyourvirtualvolumesendup.TheStorageContainerisastoragesystemlogicalconstructandisrepresentedwithinvSphereasa“virtualdatastore”.Youneed1perstoragesystem,butyoucanhavemanywhendesired.TothisStorageContaineryoucanapplycapabilities.Soifyoulikeyourvirtualvolumestobeabletousearraybasedsnapshotsthenthestorageadministratorwillneedtoassignthatcapabilitytothestoragecontainer.Notethatastorageadministratorcangrowastoragecontainerwithouteveninformingyou.Astoragecontainerisn’tformattedwithVMFSoranythinglikethat,soyoudon’tneedtoincreasethevolumeinordertousethespace.

ButhowdoesvSphereknowwhichcontaineriscapableofdoingwhat?Inordertodiscoverastoragecontaineranditscapabilitiesweneedtobeabletotalktothestoragesystemfirst.ThisisdonethroughthevSphereAPIsforStorageAwareness.YousimplypointvSpheretotheVendorProviderandthevendorproviderwillreporttovSpherewhat’savailable,thisincludesboththestoragecontainersaswellasthecapabilitiestheypossess.NotethatasingleVendorProvidercanbemanagingmultiplestoragesystemswhichinitsturncanhavemultiplestoragecontainerswithmanycapabilities.Thesevendorproviderscanalsocomeindifferentflavours,forsomestoragesystemsitispartoftheirsoftwarebutforothersitwillcomeasavirtualappliancethatsitsontopofvSphere.

NowthatvSphereknowswhichsystemsthereare,whatcontainersareavailablewithwhichcapabilitiesyoucanstartcreatingpolicies.Thesepoliciescanbeacombinationofcapabilitiesandwillultimatelybeassignedtovirtualmachinesorvirtualdiskseven.YoucanimaginethatinsomecasesyouwouldlikeQualityofServiceenabledtoensureperformanceforaVMwhileinothercasesitisn’tasrelevantbutyouneedtohaveasnapshoteveryhour.Allofthisisenabledthroughthesepolicies.NolongerwillyoubemaintainingthatspreadsheetwithallyourLUNsandwhichdataservicewereenabledandwhatnot,noyousimplyassignapolicy.(Yes,apropernamingschemewillbehelpfulwhendefiningpolicies.)WhenrequirementschangeforaVMyoudon’tmovetheVMaround,noyouchangethepolicyandthestoragesystemwilldowhatisrequiredinordertomaketheVM(anditsdisks)compliantagainwiththepolicy.NottheVMreally,buttheVirtualVolumes.

Okay,thosearethebasics,nowwhataboutVirtualVolumesandvSphereHA.WhatchangeswhenyouarerunningVirtualVolumes,whatdoyouneedtokeepinmindwhenrunningVirtualVolumeswhenitcomestoHA?

Firstofall,letmementionthis,insomecasesstoragevendorshavedesignedasolutionwherethe"vendorprovider"isn'tdesignedinanHAfashion(VMwareallowsforActive/Active,Active/Standbyorjust"Active"asinasingleinstance).Makesuretovalidate



whatkindofimplementationyourstoragevendorhas,astheVendorProviderneedstobeavailablewhenpoweringonVMs.Thefollowingquoteexplainswhy:

WhenaVirtualVolumeiscreated,itisnotimmediatelyaccessibleforIO.ToAccessVirtualVolumes,vSphereneedstoissuea“Bind”operationtoaVASAProvider(VP),whichcreatesIOaccesspointforaVirtualVolumeonaProtocolEndpoint(PE)chosenbyaVP.AsinglePEcanbetheIOaccesspointformultipleVirtualVolumes.“Unbind”OperationwillremovethisIOaccesspointforagivenVirtualVolume.

Thatisthe"VirtualVolumes"implementationaspect,butofcoursethingshavealsochangedfromavSphereHApointofview.NolongerdowehaveVMFSorNFSdatastorestostorefilesonoruseforheartbeating.Whatchangesfromthatperspective.FirstofallaVMiscarvedupindifferentVirtualVolumes:

VMConfigurationVirtualMachineDisk'sSwapFileSnapshot(ifthereareany)

Besidesthesedifferenttypesofobjects,whenvSphereHAisenabledtherealsoisavolumeusedbyvSphereHAandthisvolumewillcontainallthemetadatawhichisnormallystoredunder"/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/"onregularVMFS.ForeachFaultDomainaseperatefolderwillbecreatedinthisVVol.

AllVMrelatedHAfileswhichnormallywouldbeundertheVMfolder,likeforinstancethepower-offfile,arenowstoredintheVMConfigurationVVolobject.ConceptuallyspeakingsimilartoregularVMFS,implementationwisehowevercompletelydifferent.

AnotherthingthatchangeswithVVolsisHeartbeatDatastores.

BEINGWORKEDON-EARLYDRAFT



AddingResiliencytoHA(NetworkRedundancy)InthepreviouschapterweextensivelycoveredbothIsolationDetection,whichtriggerstheselectedIsolationResponseandtheimpactofafalsepositive.TheIsolationResponseenablesHAtorestartvirtualmachineswhen“Poweroff”or“Shutdown”hasbeenselectedandthehostbecomesisolatedfromthenetwork.However,thisalsomeansthatitispossiblethat,withoutproperredundancy,theIsolationResponsemaybeunnecessarilytriggered.Thisleadstodowntimeandshouldbeprevented.

Toincreaseresiliencyfornetworking,VMwareimplementedtheconceptofNICteaminginthehypervisorforbothVMkernelandvirtualmachinenetworking.WhendiscussingHA,thisisespeciallyimportantfortheManagementNetwork.

NICteamingistheprocessofgroupingtogetherseveralphysicalNICsintoonesinglelogicalNIC,whichcanbeusedfornetworkfaulttoleranceandloadbalancing.

Usingthismechanism,itispossibletoaddredundancytotheManagementNetworktodecreasethechancesofanisolationevent.Thisis,ofcourse,alsopossibleforother“Portgroups”butthatisnotthetopicofthischapterorbook.AnotheroptionisconfiguringanadditionalManagementNetworkbyenablingthe“managementnetwork”tickboxonanotherVMkernelport.AlittleunderstoodfactisthatiftherearemultipleVMkernelnetworksonthesamesubnet,HAwilluseallofthemformanagementtraffic,evenifonlyoneisspecifiedformanagementtraffic!

Althoughtherearemanyconfigurationspossibleandsupported,werecommendasimplebuthighlyresilientconfiguration.WehaveincludedthevMotion(VMkernel)networkinourexampleascombiningtheManagementNetworkandthevMotionnetworkonasinglevSwitchisthemostcommonlyusedconfigurationandanindustryacceptedbestpractice.

Requirements:

2physicalNICsVLANtrunking

Recommended:

2physicalswitchesIfavailable,enable“linkstatetracking”toensurelinkfailuresarereported

ThevSwitchshouldbeconfiguredasfollows:


81AddingresiliencytoHA

vSwitch0:2PhysicalNICs(vmnic0andvmnic1).2Portgroups(ManagementNetworkandvMotionVMkernel).ManagementNetworkactiveonvmnic0andstandbyonvmnic1.vMotionVMkernelactiveonvmnic1andstandbyonvmnic0.FailbacksettoNo.

EachportgrouphasaVLANIDassignedandrunsdedicatedonitsownphysicalNIC;onlyinthecaseofafailureitisswitchedovertothestandbyNIC.Wehighlyrecommendsettingfailbackto“No”toavoidchancesofanunwantedisolationevent,whichcanoccurwhenaphysicalswitchroutesnotrafficduringbootbuttheportsarereportedas“up”.(NICTeamingTab)

Pros:Only2NICsintotalareneededfortheManagementNetworkandvMotionVMkernel,especiallyusefulinbladeserverenvironments.Easytoconfigure.

Cons:Justasingleactivepathforheartbeats.

Thefollowingdiagramdepictsthisactive/standbyscenario:



Figure36-Active-StandbyManagementNetworkdesign

Toincreaseresiliency,wealsorecommendimplementingthefollowingadvancedsettingsandusingNICportsondifferentPCIbusses–preferablyNICsofadifferentmakeandmodel.Whenusingadifferentmakeandmodel,evenadriverfailurecouldbemitigated.

AdvancedSettings:das.isolationaddressX=<ip-address>

Theisolationaddresssettingisdiscussedinmoredetailinthesectiontitled"FundamentalConcepts".Inshort;itistheIPaddressthattheHAagentpingstoidentifyifthehostiscompletelyisolatedfromthenetworkorjustnotreceivinganyheartbeats.IfmultipleVMkernelnetworksondifferentsubnetsareused,itisrecommendedtosetanisolationaddresspernetworktoensurethateachofthesewillbeabletovalidateisolationofthehost.

Basicdesignprinciple:TakeadvantageofsomeofthebasicfeaturesvSpherehastoofferlikeNICteaming.CombiningdifferentphysicalNICswillincreaseoverallresiliencyofyoursolution.



CornerCaseScenario:Split-BrainAsplitbrainscenarioisascenariowhereasinglevirtualmachineispoweredupmultipletimes,typicallyontwodifferenthosts.Thisispossibleinthescenariowheretheisolationresponseissetto“leavepoweredon”andnetworkbasedstorage,likeNFS/iSCSIandevenVirtualSAN,isused.Thissituationcanoccurduringafullnetworkisolation,whichmayresultinthelockonthevirtualmachine’sVMDKbeinglost,enablingHAtoactuallypowerupthevirtualmachine.Asthevirtualmachinewasnotpoweredoffonitsoriginalhost(isolationresponsesetto“leavepoweredon”),itwillexistinmemoryontheisolatedhostandinmemorywithadisklockonthehostthatwasrequestedtorestartthevirtualmachine.

Keepinmindthatthistrulyisacornercasescenariowhichisveryunlikelytooccurinmostenvironments.Incaseitdoeshappen,HAreliesonthe“lostlockdetection”mechanismtomitigatethisscenario.InshortESXidetectsthatthelockontheVMDKhasbeenlostand,whenthedatastorebecomesaccessibleagainandthelockcannotbereacquired,issuesaquestionwhetherthevirtualmachineshouldbepoweredoff;HAautomaticallyanswersthequestionwithYes.However,youwillonlyseethisquestionifyoudirectlyconnecttotheESXihostduringthefailure.HAwillgenerateaneventforthisauto-answeredquestionthough.

Asstatedabovethequestionwillbeauto-answeredandthevirtualmachinewillbepoweredofftorecoverfromthesplitbrainscenario.Thequestionstillremains:inthecaseofanisolationwithiSCSIorNFS,shouldyoupoweroffvirtualmachinesorleavethempoweredon?

Asjustexplained,HAwillautomaticallypoweroffyouroriginalvirtualmachinewhenitdetectsasplit-brainscenario.Thisprocesshoweverisnotinstantaneousandassuchitisrecommendedtousetheisolationresponseof“PowerOff”or“Leavepoweredon.Wealsorecommendincreasingheartbeatnetworkresiliencytoavoidgettingintothissituation.WewilldiscusstheoptionsyouhaveforenhancingManagementNetworkresiliencyinthenextchapter.

LinkStateTrackingThiswasalreadybrieflymentionedinthelistofrecommendations,butthisfeatureissomethingwewouldliketoemphasize.Wehavenoticedthatpeopleoftenforgetaboutthiseventhoughmanyswitchesofferthiscapability,especiallyinbladeserverenvironments.

Linkstatetrackingwillmirrorthestateofanupstreamlinktoadownstreamlink.Let’sclarifythatwithadiagram.



Figure37-LinkStatetrackingmechanism

Thediagramabovedepictsascenariowhereanuplinkofa“CoreSwitch”hasfailed.WithoutLinkStateTracking,theconnectionfromthe“EdgeSwitch”tovmnic0willbereportedasup.WithLinkStateTrackingenabled,thestateofthelinkonthe“EdgeSwitch”willreflectthestateofthelinkofthe“CoreSwitch”andassuchbemarkedas“down”.Youmightwonderwhythisisimportantbutthinkaboutitforasecond.ManyfeaturesthatvSphereofferrelyonnetworkingandsodoyourvirtualmachines.Inthecasewherethestateisnotreflected,somefunctionalitymightjustfail,forinstancenetworkheartbeatingcouldfailifitneedstoflowthroughthecoreswitch.Wecallthisa‘blackhole’scenario:thehostsendstrafficdownapaththatitbelievesisup,butthetrafficneverreachesitsdestinationduetothefailedupstreamlink.

Basicdesignprinciple:Knowyournetworkenvironment,talktothenetworkadministratorsandensureadvancedfeatureslikeLinkStateTrackingareusedwhenpossibletoincreaseresiliency.



AdmissionControlAdmissionControlismorethanlikelythemostmisunderstoodconceptvSphereholdstodayandbecauseofthisitisoftendisabled.However,AdmissionControlisamustwhenavailabilityneedstobeguaranteedandisn’tthatthereasonforenablingHAinthefirstplace?

WhatisHAAdmissionControlabout?WhydoesHAcontainthisconceptcalledAdmissionControl?The“AvailabilityGuide”a.k.aHAbiblestatesthefollowing:

vCenterServerusesadmissioncontroltoensurethatsufficientresourcesareavailableinaclustertoprovidefailoverprotectionandtoensurethatvirtualmachineresourcereservationsarerespected.

Pleasereadthatquoteagainandespeciallythefirsttwowords.IndeeditisvCenterthatisresponsibleforAdmissionControl,contrarytowhatmanybelieve.AlthoughthismightseemlikeatrivialfactitisimportanttounderstandthatthisimpliesthatAdmissionControlwillnotdisallowHAinitiatedrestarts.HAinitiatedrestartsaredoneonahostlevelandnotthroughvCenter.

Assaid,AdmissionControlguaranteesthatcapacityisavailableforanHAinitiatedfailoverbyreservingresourceswithinacluster.Itcalculatesthecapacityrequiredforafailoverbasedonavailableresources.Inotherwords,ifahostisplacedintomaintenancemodeordisconnected,itistakenoutoftheequation.Thisalsoimpliesthatifahosthasfailedorisnotrespondingbuthasnotbeenremovedfromthecluster,itisstillincludedintheequation.“AvailableResources”indicatesthatthevirtualizationoverheadhasalreadybeensubtractedfromthetotalamount.

Togiveanexample;VMkernelmemoryissubtractedfromthetotalamountofmemorytoobtainthememoryavailablememoryforvirtualmachines.ThereisonegotchawithAdmissionControlthatwewanttobringtoyourattentionbeforedrillingintothedifferentpolicies.WhenAdmissionControlisenabled,HAwillinnowayviolateavailabilityconstraints.Thismeansthatitwillalwaysensuremultiplehostsareupandrunningandthisappliesformanualmaintenancemodeactionsand,forinstance,toVMwareDistributedPowerManagement.So,ifahostisstucktryingtoenterMaintenanceMode,rememberthatitmightbeHAwhichisnotallowingMaintenanceModetoproceedasitwouldviolatetheAdmissionControlPolicy.Inthissituation,userscanmanuallyvMotionvirtualmachinesoffthehostortemporarilydisableadmissioncontroltoallowtheoperationtoproceed.


86AdmissionControl

ButwhatifyouusesomethinglikeDistributedPowerManagement(DPM),wouldthatplaceallhostsinstandbymodetoreducepowerconsumption?No,DPMissmartenoughtotakehostsoutofstandbymodetoensureenoughresourcesareavailabletoprovideforHAinitiatedfailovers.Ifbyanychancetheresourcesarenotavailable,HAwillwaitfortheseresourcestobemadeavailablebyDPMandthenattempttherestartofthevirtualmachines.Inotherwords,theretrycount(5retriesbydefault)isnotwastedinscenarioslikethese.

AdmissionControlPolicyTheAdmissionControlPolicydictatesthemechanismthatHAusestoguaranteeenoughresourcesareavailableforanHAinitiatedfailover.ThissectiongivesageneraloverviewoftheavailableAdmissionControlPolicies.Theimpactofeachpolicyisdescribedinthefollowingsection,includingourrecommendation.HAhasthreemechanismstoguaranteeenoughcapacityisavailabletorespectvirtualmachineresourcereservations.

Figure38-Admissioncontrolpolicy


87AdmissionControl

BelowwehavelistedallthreeoptionscurrentlyavailableastheAdmissionControlPolicy.Eachoptionhasadifferentmechanismtoensureresourcesareavailableforafailoverandeachoptionhasitscaveats.

AdmissionControlMechanismsEachAdmissionControlPolicyhasitsownAdmissionControlmechanism.UnderstandingeachoftheseAdmissionControlmechanismsisimportanttoappreciatetheimpacteachonehasonyourclusterdesign.Forinstance,settingareservationonaspecificvirtualmachinecanhaveanimpactontheachievedconsolidationratio.ThissectionwilltakeyouonajourneythroughthetrenchesofAdmissionControlPoliciesandtheirrespectivemechanismsandalgorithms.

HostFailuresClusterTolerates

TheAdmissionControlPolicythathasbeenaroundthelongestisthe“HostFailuresClusterTolerates”policy.ItisalsohistoricallytheleastunderstoodAdmissionControlPolicyduetoitscomplexadmissioncontrolmechanism.

ThisadmissioncontrolpolicycanbeconfiguredinanN-1fashion.Thismeansthatthenumberofhostfailuresyoucanspecifyina32hostclusteris31.

WithinthevSphereWebClientitispossibletomanuallyspecifytheslotsizeascanbeseeninthebelowscreenshot.ThevSphereWebClientalsoallowsyoutoviewwhichvirtualmachinesspanmultipleslots.Thiscanbeveryusefulinscenarioswheretheslotsizehasbeenexplicitlyspecified,wewillexplainwhyinjustasecond.


88AdmissionControl

Figure39-HostFailures

Theso-called“slots”mechanismisusedwhenthe“Hostfailuresclustertolerates”hasbeenselectedastheAdmissionControlPolicy.Thedetailsofthismechanismhavechangedseveraltimesinthepastanditisoneofthemostrestrictivepolicies;morethanlikely,itisalsotheleastunderstood.

SlotsdictatehowmanyvirtualmachinescanbepoweredonbeforevCenterstartsyelling“OutOfResources!”Normally,aslotrepresentsonevirtualmachine.AdmissionControldoesnotlimitHAinrestartingvirtualmachines,itensuresenoughunfragmentedresourcesareavailabletopoweronallvirtualmachinesintheclusterbypreventing“over-commitment”.Technicallyspeaking“over-commitment”isnotthecorrectterminologyasAdmissionControlensuresvirtualmachinereservationscanbesatisfiedandthatallvirtualmachines’initialmemoryoverheadrequirementsaremet.Althoughwehavealreadytouchedonthis,itdoesn’thurtrepeatingitasitisoneofthosemythsthatkeepscomingback;HAinitiatedfailoversarenotpronetotheAdmissionControlPolicy.AdmissionControlisdonebyvCenter.HAinitiatedrestarts,inanormalscenario,areexecuteddirectlyontheESXihostwithouttheuseofvCenter.Thecorner-caseiswhereHArequestsDRS(DRSisavCentertask!)todefragmentresourcesbutthatisbesidethepoint.EvenifresourcesarelowandvCenterwouldcomplain,itcouldn’tstoptherestartfromhappening.

Let’sdigintothisconceptwehavejustintroduced,slots.

AslotisdefinedasalogicalrepresentationofthememoryandCPUresourcesthatsatisfythereservationrequirementsforanypowered-onvirtualmachineinthecluster.

InotherwordsaslotistheworstcaseCPUandmemoryreservationscenarioinacluster.Thisdirectlyleadstothefirst“gotcha.”


89AdmissionControl

HAusesthehighestCPUreservationofanygivenpowered-onvirtualmachineandthehighestmemoryreservationofanygivenpowered-onvirtualmachineinthecluster.Ifnoreservationofhigherthan32MHzisset,HAwilluseadefaultof32MHzforCPU.Ifnomemoryreservationisset,HAwilluseadefaultof0MB+memoryoverheadformemory.(SeetheVMwarevSphereResourceManagementGuideformoredetailsonmemoryoverheadpervirtualmachineconfiguration.)Thefollowingexamplewillclarifywhat“worst-case”actuallymeans.

Example:Ifvirtualmachine“VM1”has2GHzofCPUreservedand1024MBofmemoryreservedandvirtualmachine“VM2”has1GHzofCPUreservedand2048MBofmemoryreservedtheslotsizeformemorywillbe2048MB(+itsmemoryoverhead)andtheslotsizeforCPUwillbe2GHz.Itisacombinationofthehighestreservationofbothvirtualmachinesthatleadstothetotalslotsize.ReservationsdefinedattheResourcePoollevelhowever,willnotaffectHAslotsizecalculations.

Basicdesignprinciple:Bereallycarefulwithreservations,ifthere’snoneedtohavethemonapervirtualmachinebasis;don’tconfigurethem,especiallywhenusinghostfailuresclustertolerates.Ifreservationsareneeded,resorttoresourcepoolbasedreservations.

Nowthatweknowtheworst-casescenarioisalwaystakenintoaccountwhenitcomestoslotsizecalculations,wewilldescribewhatdictatestheamountofavailableslotsperclusterasthatultimatelydictateshowmanyvirtualmachinescanbepoweredoninyourcluster.

First,wewillneedtoknowtheslotsizeformemoryandCPU,nextwewilldividethetotalavailableCPUresourcesofahostbytheCPUslotsizeandthetotalavailablememoryresourcesofahostbythememoryslotsize.ThisleavesuswithatotalnumberofslotsforbothmemoryandCPUforahost.Themostrestrictivenumber(worst-casescenario)isthenumberofslotsforthishost.Inotherwords,whenyouhave25CPUslotsbutonly5memoryslots,theamountofavailableslotsforthishostwillbe5asHAalwaystakestheworstcasescenariointoaccountto“guarantee”allvirtualmachinescanbepoweredonincaseofafailureorisolation.

ThequestionwereceivealotishowdoIknowwhatmyslotsizeis?ThedetailsaroundslotsizescanbemonitoredontheHAsectionoftheCluster’sMonitortabbycheckingthethe“AdvancedRuntimeInfo”sectionwhenthe“HostFailures”AdmissionControlPolicyisconfigured.


90AdmissionControl

Figure40-HighAvailabilityclustermonitorsection

AdvancedRuntimeInfowillshowthespecificstheslotsizeandmoreusefuldetailssuchasthenumberofslotsavailableasdepictedinFigure30.


91AdmissionControl

Figure41-HighAvailabilityadvancedruntimeinfo

Asyoucanimagine,usingreservationsonapervirtualmachinebasiscanleadtoveryconservativeconsolidationratios.However,thisissomethingthatisconfigurablethroughtheWebClient.Ifyouhavejustonevirtualmachinewithareallyhighreservation,youcansetanexplicitslotsizebygoingto“EditClusterServices”andspecifyingthemundertheAdmissionControlPolicysectionasshowninFigure29.

Ifoneoftheseadvancedsettingsisused,HAwillensurethatthevirtualmachinethatskewedthenumberscanberestartedby“assigning”multipleslotstoit.However,whenyouarelowonresources,thiscouldmeanthatyouarenotabletopoweronthevirtualmachinewiththisreservationbecauseresourcesmaybefragmentedthroughouttheclusterinsteadofavailableonasinglehost.HAwillnotifyDRSthatapower-onattemptwasunsuccessfulandarequestwillbemadetodefragmenttheresourcestoaccommodatetheremainingvirtualmachinesthatneedtobepoweredon.InorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendations.


92AdmissionControl

Thefollowingdiagramdepictsascenariowhereavirtualmachinespansmultipleslots:

Figure42-VirtualmachinespanningmultipleHAslots

Noticethatbecausethememoryslotsizehasbeenmanuallysetto1024MB,oneofthevirtualmachines(groupedwithdottedlines)spansmultipleslotsduetoa4GBmemoryreservation.Asyoumighthavenoticed,noneofthehostshasenoughresourcesavailabletosatisfythereservationofthevirtualmachinethatneedstofailover.Althoughintotalthereareenoughresourcesavailable,theyarefragmentedandHAwillnotbeabletopower-onthisparticularvirtualmachinedirectlybutwillrequestDRStodefragmenttheresourcestoaccommodatethisvirtualmachine’sresourcerequirements.

AdmissionControldoesnottakefragmentationofslotsintoaccountwhenslotsizesaremanuallydefinedwithadvancedsettings.Itwilltakethenumberofslotsthisvirtualmachinewillconsumeintoaccountbysubtractingthemfromthetotalnumberofavailableslots,butitwillnotverifytheamountofavailableslotsperhosttoensurefailover.Asstatedearlier,though,HAwillrequestDRStodefragmenttheresources.Thisisbynomeansaguaranteeofasuccessfulpower-onattempt.


93AdmissionControl

Basicdesignprinciple:Avoidusingadvancedsettingstodecreasetheslotsizeasitcouldleadtomoredowntimeandaddsanextralayerofcomplexity.Ifthereisalargediscrepancyinsizeandreservationswerecommendusingthepercentagebasedadmissioncontrolpolicy.

WithinthevSphereWebClientthereisfunctionalitywhichenablesyoutoidentifyvirtualmachineswhichspanmultipleslots,asshowninFigure29.Wehighlyrecommendmonitoringthissectiononaregularbasistogetabetterunderstandofyourenvironmentandtoidentifythosevirtualmachinesthatmightbeproblematictorestartincaseofahostfailure.

UnbalancedConfigurationsandImpactonSlotCalculation

Itisanindustrybestpracticetocreateclusterswithsimilarhardwareconfigurations.However,manycompaniesstartedoutwithasmallVMwareclusterwhenvirtualizationwasfirstintroduced.Whenthetimehascometoexpand,chancesarefairlylargethesamehardwareconfigurationisnolongeravailable.Thequestioniswillyouaddthenewlyboughthoststothesameclusterorcreateanewcluster?

FromaDRSperspective,largeclustersarepreferredasitincreasestheloadbalancingopportunities.HoweverthereisacaveatforDRSaswell,whichisdescribedintheDRSsectionofthisbook.ForHA,thereisabigcaveat.WhenyouthinkaboutitandunderstandtheinternalworkingsofHA,morespecificallytheslotalgorithm,youprobablyalreadyknowwhatiscomingup.

Let’sfirstdefinetheterm“unbalancedcluster.”

Anunbalancedclusterwould,forinstance,beaclusterwith3hostsofwhichonecontainssubstantiallymorememorythantheotherhostsinthecluster.

Let’strytoclarifythatwithanexample.

Example:Whatwouldhappentothetotalnumberofslotsinaclusterofthefollowingspecifications?

ThreehostclusterTwohostshave16GBofavailablememoryOnehosthas32GBofavailablememory

Thethirdhostisabrandnewhostthathasjustbeenboughtandaspricesofmemorydroppedimmenselythedecisionwasmadetobuy32GBinsteadof16GB.

Theclustercontainsavirtualmachinethathas1vCPUand4GBofmemory.A1024MBmemoryreservationhasbeendefinedonthisvirtualmachine.Asexplainedearlier,areservationwilldictatetheslotsize,whichinthiscaseleadstoamemoryslotsizeof1024


94AdmissionControl

MB+memoryoverhead.Forthesakeofsimplicity,wewillcalculatewith1024MB.Thefollowingdiagramdepictsthisscenario:

Figure43-HighAvailabilitymemoryslotsize

WhenAdmissionControlisenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy,thenumberofslotswillbecalculatedperhostandtheclusterintotal.Thiswillresultin:

Host Numberofslots

ESXi-01 16Slots

ESXi-02 16Slots

ESXi-03 32Slots

AsAdmissionControlisenabled,aworst-casescenarioistakenintoaccount.Whenasinglehostfailurehasbeenspecified,thismeansthatthehostwiththelargestnumberofslotswillbetakenoutoftheequation.Inotherwords,forourcluster,thiswouldresultin:

ESXi-01+ESXi-02=32slotsavailable


95AdmissionControl

Althoughyouhavedoubledtheamountofmemoryinoneofyourhosts,youarestillstuckwithonly32slotsintotal.Asclearlydemonstrated,thereisabsolutelynopointinbuyingadditionalmemoryforasinglehostwhenyourclusterisdesignedwithAdmissionControlenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy.

Inourexample,thememoryslotsizehappenedtobethemostrestrictive;however,thesameprincipleapplieswhenCPUslotsizeismostrestrictive.

Basicdesignprinciple:Whenusingadmissioncontrol,balanceyourclustersandbeconservativewithreservationsasitleadstodecreasedconsolidationratios.

Now,whatwouldhappeninthescenarioabovewhenthenumberofallowedhostfailuresisto2?InthiscaseESXi-03istakenoutoftheequationandoneofanyoftheremaininghostsintheclusterisalsotakenout,resultingin16slots.Thismakessense,doesn’tit?

CanyouavoidlargeHAslotsizesduetoreservationswithoutresortingtoadvancedsettings?That’sthequestionwegetalmostdailyandtheansweristhe“PercentageofClusterResourcesReserved”admissioncontrolmechanism.

PercentageofClusterResourcesReserved

ThePercentageofClusterResourcesReservedadmissioncontrolpolicyisoneofthemostusedadmissioncontrolpolicies.Thesimplereasonforthisisthatitistheleastrestrictiveandmostflexible.Itisalsoveryeasytoconfigureasshowninthescreenshotbelow.

Figure44-SettingadifferentpercentageforCPU/Memory

ThemainadvantageofthepercentagebasedAdmissionControlPolicyisthatitavoidsthecommonlyexperiencedslotsizeissuewherevaluesareskewedduetoalargereservation.Butifitdoesn’tusetheslotalgorithm,whatdoesituse?

Whenyouspecifyapercentage,andlet’sassumefornowthatthepercentageforCPUandmemorywillbeconfiguredequally,thatpercentageofthetotalamountofavailableresourceswillstayreservedforHApurposes.Firstofall,HAwilladdupallavailable


96AdmissionControl

resourcestoseehowmuchithasavailable(virtualizationoverheadwillbesubtracted)intotal.Then,HAwillcalculatehowmuchresourcesarecurrentlyreservedbyaddingupallreservationsformemoryandforCPUforallpoweredonvirtualmachines.

Forthosevirtualmachinesthatdonothaveareservation,adefaultof32MHzwillbeusedforCPUandadefaultof0MB+memoryoverheadwillbeusedforMemory.(Amountofoverheadperconfigurationtypecanbefoundinthe“UnderstandingMemoryOverhead”sectionoftheResourceManagementguide.)

Inotherwords:

((totalamountofavailableresources–totalreservedvirtualmachineresources)/totalamountofavailableresources)<=(percentageHAshouldreserveassparecapacity)

Totalreservedvirtualmachineresourcesincludesthedefaultreservationof32MHzandthememoryoverheadofthevirtualmachine.

Let’suseadiagramtomakeitabitclearer:

Figure45-Percentageofclusterresourcesreserved

Totalclusterresourcesare24GHz(CPU)and96GB(MEM).Thiswouldleadtothefollowingcalculations:

((24GHz-(2GHz+1GHz+32MHz+4GHz))/24GHz)=69%available


97AdmissionControl

((96GB-(1,1GB+114MB+626MB+3,2GB)/96GB=85%available

Asyoucansee,theamountofmemorydiffersfromthediagram.Evenifareservationhasbeenset,theamountofmemoryoverheadisaddedtothereservation.ThisexamplealsodemonstrateshowkeepingCPUandmemorypercentageequalcouldcreateanimbalance.Ideally,ofcourse,thehostsareprovisionedinsuchawaythatthereisnoCPU/memoryimbalance.Experienceovertheyearshasproven,unfortunately,thatmostenvironmentsrunoutofmemoryresourcesfirstandthismightneedtobefactoredinwhencalculatingthecorrectvalueforthepercentage.However,thistrendmightbechangingasmemoryisgettingcheapereveryday.

Inordertoensurevirtualmachinescanalwaysberestarted,AdmissionControlwillconstantlymonitorifthepolicyhasbeenviolatedornot.PleasenotethatthisAdmissionControlprocessispartofvCenterandnotoftheESXihost!Whenoneofthethresholdsisreached,memoryorCPU,AdmissionControlwilldisallowpoweringonanyadditionalvirtualmachinesasthatcouldpotentiallyimpactavailability.ThesethresholdscanbemonitoredontheHAsectionoftheCluster’ssummarytab.

Figure46-HighAvailabilitysummary

Ifyouhaveanunbalancedcluster(hostswithdifferentsizesofCPUormemoryresources),yourpercentageshouldbeequalorpreferablylargerthanthepercentageofresourcesprovidedbythelargesthost.Thiswayyouensurethatallvirtualmachinesresidingonthishostcanberestartedincaseofahostfailure.

Asearlierexplained,thisAdmissionControlPolicydoesnotuseslots.Assuch,resourcesmightbefragmentedthroughoutthecluster.AlthoughDRSisnotifiedtorebalancethecluster,ifneeded,toaccommodatethesevirtualmachinesresourcerequirements,a


98AdmissionControl

guaranteecannotbegiven.Werecommendselectingthehighestrestartpriorityforthisvirtualmachine(ofcourse,dependingontheSLA)toensureitwillbeabletoboot.

Thefollowingexampleanddiagram(Figure37)willmakeitmoreobvious:Youhave3hosts,eachwithroughly80%memoryusage,andyouhaveconfiguredHAtoreserve20%ofresourcesforbothCPUandmemory.Ahostfailsandallvirtualmachineswillneedtofailover.Oneofthosevirtualmachineshasa4GBmemoryreservation.Asyoucanimagine,HAwillnotbeabletoinitiateapower-onattempt,astherearenotenoughmemoryresourcesavailabletoguaranteethereservedcapacity.Insteadaneventwillgetgeneratedindicating"notenoughresourcesforfailover"forthisvirtualmachine.

Figure47-Availableresources

Basicdesignprinciple:AlthoughHAwillutilizeDRStotrytoaccommodatefortheresourcerequirementsofthisvirtualmachineaguaranteecannotbegiven.Dothemath;verifythatanysinglehosthasenoughresourcestopower-onyourlargestvirtualmachine.Alsotakerestartpriorityintoaccountforthis/thesevirtualmachine(s).

FailoverHosts

ThethirdoptiononecouldchooseistoselectoneormultipledesignatedFailoverhosts.Thisiscommonlyreferredtoasahotstandby.


99AdmissionControl

Figure48-SelectfailoverhostsAdmissionControlPolicy

Itis“whatyouseeiswhatyouget”.Whenyoudesignatehostsasfailoverhosts,theywillnotparticipateinDRSandyouwillnotbeabletorunvirtualmachinesonthesehosts!Thesehostsareliterallyreservedforfailoversituations.HAwillattempttousethesehostsfirsttofailoverthevirtualmachines.If,forwhateverreason,thisisunsuccessful,itwillattemptafailoveronanyoftheotherhosts.Forexample,whenthreehostswouldfail,includingthehostsdesignatedasfailoverhosts,HAwillstilltrytorestarttheimpactedvirtualmachinesonthehostthatisleft.Althoughthishostwasnotadesignatedfailoverhost,HAwilluseittolimitdowntime.


100AdmissionControl

Figure49-Selectmultiplefailoverhosts

DecisionMakingTimeAswithanydecisionyoumake,thereisanimpacttoyourenvironment.Thisimpactcouldbepositivebutalso,forinstance,unexpected.ThisespeciallygoesforHAAdmissionControl.SelectingtherightAdmissionControlPolicycanleadtoaquickerReturnOnInvestmentandalowerTotalCostofOwnership.Intheprevioussection,wedescribedallthealgorithmsandmechanismsthatformAdmissionControlandinthissectionwewillfocusmoreonthedesignconsiderationsaroundselectingtheappropriateAdmissionControlPolicyforyouroryourcustomer’senvironment.

ThefirstdecisionthatwillneedtobemadeiswhetherAdmissionControlwillbeenabled.WegenerallyrecommendenablingAdmissionControlasitistheonlywayofguaranteeingyourvirtualmachineswillbeallowedtorestartafterafailure.Itisimportant,though,thatthepolicyiscarefullyselectedandfitsyouroryourcustomer’srequirements.

Basicdesignprinciple

Admissioncontrolguaranteesenoughcapacityisavailableforvirtualmachinefailover.Assuchwerecommendenablingit.


101AdmissionControl

Althoughwealreadyhaveexplainedallthemechanismsthatarebeingusedbyeachofthepoliciesintheprevioussection,wewillgiveahighleveloverviewandlistalltheprosandconsinthissection.Ontopofthat,wewillexpandonwhatwefeelisthemostflexibleAdmissionControlPolicyandhowitshouldbeconfiguredandcalculated.

HostFailuresClusterTolerates

ThisoptionishistoricallyspeakingthemostusedforAdmissionControl.MostenvironmentsaredesignedwithanN+1redundancyandN+2isalsonotuncommon.ThisAdmissionControlPolicyuses“slots”toensureenoughcapacityisreservedforfailover,whichisafairlycomplexmechanism.SlotsarebasedonVM-levelreservationsandifreservationsarenotusedadefaultslotsizeforCPUof32MHzisdefinedandformemorythelargestmemoryoverheadofanygivenvirtualmachineisused.

Pros:

Fullyautomated(Whenahostisaddedtoacluster,HAre-calculateshowmanyslotsareavailable.)Guaranteesfailoverbycalculatingslotsizes.

Cons:

Canbeveryconservativeandinflexiblewhenreservationsareusedasthelargestreservationdictatesslotsizes.Unbalancedclustersleadtowastageofresources.Complexityforadministratorfromcalculationperspective.

PercentageasClusterResourcesReserved

ThepercentagebasedAdmissionControlisbasedonper-reservationcalculationinsteadoftheslotsmechanism.ThepercentagebasedAdmissionControlPolicyislessconservativethan“HostFailures”andmoreflexiblethan“FailoverHosts”.

Pros:

Accurateasitconsidersactualreservationpervirtualmachinetocalculateavailablefailoverresources.Clusterdynamicallyadjustswhenresourcesareadded.

Cons:

Manualcalculationsneededwhenaddingadditionalhostsinaclusterandnumberofhostfailuresneedstoremainunchanged.Unbalancedclusterscanbeaproblemwhenchosenpercentageistoolowand


102AdmissionControl

resourcesarefragmented,whichmeansfailoverofavirtualmachinecan’tbeguaranteedasthereservationofthisvirtualmachinemightnotbeavailableasablockofresourcesonasinglehost.

Pleasenotethat,althoughafailovercannotbeguaranteed,therearefewscenarioswhereavirtualmachinewillnotbeabletorestartduetotheintegrationHAofferswithDRSandthefactthatmostclustershavesparecapacityavailabletoaccountforvirtualmachinedemandvariance.Althoughthisisacorner-casescenario,itneedstobeconsideredinenvironmentswhereabsoluteguaranteesmustbeprovided.

SpecifyFailoverHosts

Withthe“SpecifyFailoverHosts”AdmissionControlPolicy,whenoneormultiplehostsfail,HAwillattempttorestartallvirtualmachinesonthedesignatedfailoverhosts.Thedesignatedfailoverhostsareessentially“hotstandby”hosts.Inotherwords,DRSwillnotmigratevirtualmachinestothesehostswhenresourcesarescarceortheclusterisimbalanced.

Pros:

Whatyouseeiswhatyouget.Nofragmentedresources.

Cons:

Whatyouseeiswhatyouget.Dedicatedfailoverhostsnotutilizedduringnormaloperations.

RecommendationsWehavebeenaskedmanytimesforourrecommendationonAdmissionControlanditisdifficulttoansweraseachpolicyhasitsprosandcons.However,wegenerallyrecommendaPercentagebasedAdmissionControlPolicy.Itisthemostflexiblepolicyasitusestheactualreservationpervirtualmachineinsteadoftakinga“worstcase”scenarioapproachlikethenumberofhostfailuresdoes.However,thenumberofhostfailurespolicyguaranteesthefailoverlevelunderallcircumstances.Percentagebasedislessrestrictive,butofferslowerguaranteesthatinallscenariosHAwillbeabletorestartallvirtualmachines.WiththeaddedlevelofintegrationbetweenHAandDRSwebelieveaPercentagebasedAdmissionControlPolicywillfitmostenvironments.


103AdmissionControl

Basicdesignprinciple:Dothemath,andtakecustomerrequirementsintoaccount.Werecommendusinga“percentage”basedadmissioncontrolpolicy,asitisthemostflexible.

NowthatwehaverecommendedwhichAdmissionControlPolicytouse,thenextstepistoprovideguidancearoundselectingthecorrectpercentage.Wecannottellyouwhattheidealpercentageisasthattotallydependsonthesizeofyourclusterand,ofcourse,onyourresiliencymodel(N+1vs.N+2).Wecan,however,provideguidelinesaroundcalculatinghowmuchofyourresourcesshouldbesetasideandhowtopreventwastingresources.

SelectingtheRightPercentageItisacommonstrategytoselectasinglehostasapercentageofresourcesreservedforfailover.Wegeneårallyrecommendselectingapercentagewhichistheequivalentofasingleormultiplehosts,Let’sexplainwhyandwhattheimpactisofnotusingtheequivalentofasingleormultiplehosts.

Let’sstartwithanexample:aclusterexistsof8ESXihosts,eachcontaining70GBofavailableRAM.Thismightsoundlikeanawkwardmemoryconfigurationbuttosimplifythingswehavealreadysubtracted2GBasvirtualizationoverhead.Althoughvirtualizationoverheadisprobablylessthan2GB,wehaveusedthisnumbertomakecalculationseasier.ThisexamplezoomsinonmemorybutthisconceptalsoappliestoCPU,ofcourse.

ForthisclusterwewilldefinethepercentageofresourcestoreserveforbothMemoryandCPUto20%.Formemory,thisleadstoatotalclustermemorycapacityof448GB:

(70GB+70GB+70GB+70GB+70GB+70GB+70GB+70GB)*(1–20%)

Atotalof112GBofmemoryisreservedasfailovercapacity.

Onceapercentageisspecified,thatpercentageofresourceswillbeunavailableforvirtualmachines,thereforeitmakessensetosetthepercentageasclosetothevaluethatequalstheresourcesasingle(ormultiple)hostrepresents.Wewilldemonstratewhythisisimportantinsubsequentexamples.

Intheexampleabove,20%wasusedtobereservedforresourcesinan8-hostcluster.Thisconfigurationreservesmoreresourcesthanasinglehostcontributestothecluster.HA’smainobjectiveistoprovideautomaticrecoveryforvirtualmachinesafteraphysicalserverfailure.Forthisreason,itisrecommendedtoreserveresourcesequaltoasingleormultiplehosts.Whenusingtheper-hostlevelgranularityinan8-hostcluster(homogeneousconfiguredhosts),theresourcecontributionperhosttotheclusteris12.5%.However,the


104AdmissionControl

percentageusedmustbeaninteger(wholenumber).Itisrecommendedtorounduptothevalueguaranteeingthatthefullcapacityofonehostisprotected,inthisexample(Figure40),theconservativeapproachwouldleadtoapercentageof13%.

Figure50-Settingthecorrectvalue

AggressiveApproach

Wehaveseenmanyenvironmentswherethepercentagewassettoavaluethatwaslessthanthecontributionofasinglehosttothecluster.Althoughthisapproachreducestheamountofresourcesreservedforaccommodatinghostfailuresandresultsinhigherconsolidationratios,italsooffersalowerguaranteethatHAwillbeabletorestartallvirtualmachinesafterafailure.Onemightarguethatthisapproachwillmorethanlikelyworkasmostenvironmentswillnotbefullyutilized;howeveritalsodoeseliminatetheguaranteethatafterafailureallvirtualmachineswillberecovered.Wasn’tthatthereasonforenablingHAinthefirstplace?

AddingHoststoYourCluster

Althoughthepercentageisdynamicandcalculatescapacityatacluster-level,changestoyourselectedpercentagemightberequiredwhenexpandingthecluster.Thereasonbeingthattheamountofreservedresourcesforafail-overmightnotcorrespondwiththecontributionperhostandasaresultleadtoresourcewastage.Forexample,adding4hoststoan8-hostclusterandcontinuingtousethepreviouslyconfiguredadmissioncontrolpolicyvalueof13%willresultinafailovercapacitythatisequivalentto1.5hosts.Figure41depicts


105AdmissionControl

ascenariowherean8-hostclusterisexpandedto12hosts.Eachhostholds82GHzcoresand70GBofmemory.Theclusterwasoriginallyconfiguredwithadmissioncontrolsetto13%,whichequalsto109.2GBand24.96GHz.Iftherequirementistoallowasinglehostfailure7.68Ghzand33.6GBis“wasted”asclearlydemonstratedinthediagrambelow.

Figure51-Avoidwastingresources

HowtoDefineYourPercentage?

AsexplainedearlieritwillfullydependontheN+Xmodelthathasbeenchosen.Basedonthismodel,werecommendselectingapercentagethatequalstheamountofresourcesasinglehostrepresents.So,inthecaseofan8hostclusterandN+2resiliency,thepercentageshouldbesetasfollows:2/8(*100)=25%

Basicdesignprinciple:InordertoavoidwastingresourceswerecommendcarefullyselectingyourN+Xresiliencyarchitecture.Calculatetherequiredpercentagebasedonthisarchitecture.


106AdmissionControl

VMandApplicationMonitoringVMandApplicationMonitoringisanoftenoverlookedbutreallypowerfulfeatureofHA.ThereasonforthisismostlikelythatitisdisabledbydefaultandrelativelynewcomparedtoHA.WehavetriedtogatheralltheinformationwecouldaroundVMandApplicationMonitoring,butitisaprettystraightforwardproductthatactuallydoeswhatyouexpectitwoulddo.

Figure52-VMandApplicationMonitoring

WhyDoYouNeedVM/ApplicationMonitoring?VMandApplicationMonitoringactsonadifferentlevelfromHA.VM/AppMonitoringrespondstoasinglevirtualmachineorapplicationfailureasopposedtoHAwhichrespondstoahostfailure.Anexampleofasinglevirtualmachinefailurewould,forinstance,betheinfamous“bluescreenofdeath”.InthecaseofAppMonitoringthetypeoffailurethattriggersaresponseisdefinedbytheapplicationdeveloperoradministrator.

HowDoesVM/AppMonitoringWork?

VMMonitoringresetsindividualvirtualmachineswhenneeded.VM/AppmonitoringusesaheartbeatsimilartoHA.Ifheartbeats,and,inthiscase,VMwareToolsheartbeats,arenotreceivedforaspecific(andconfigurable)amountoftime,thevirtualmachinewillbe


107VMandApplicationMonitoring

restarted.TheseheartbeatsaremonitoredbytheHAagentandarenotsentoveranetwork,butstaylocaltothehost.

Figure53-VMMonitoringsensitivity

WhenenablingVM/AppMonitoring,thelevelofsensitivity(Figure43)canbeconfigured.Thedefaultsettingshouldfitmostsituations.Lowsensitivitybasicallymeansthatthenumberofallowed“missed”heartbeatsishigherandthechancesofrunningintoafalsepositivearelower.However,ifafailureoccursandthesensitivitylevelissettoLow,theexperienceddowntimewillbehigher.Whenquickactionisrequiredintheeventofafailure,“highsensitivity”canbeselected.Asexpected,thisistheoppositeof“lowsensitivity”.Notethattheadvancedsettingsmentionedinthefollowingtablearedeprecatedandlistedforeducationalpurposes.

Sensitivity Failureinterval Maxfailures Maximresetstimewindow

Low 120Seconds 3 7Days

Medium 60Seconds 3 24Hours

High 30Seconds 3 1hour

ItisimportanttorememberthatVMMonitoringdoesnotinfinitelyrebootvirtualmachinesunlessyouspecifyacustompolicywiththisrequirement.Thisistoavoidaproblemfromrepeating.Bydefault,whenavirtualmachinehasbeenrebootedthreetimeswithinanhour,nofurtherattemptswillbetaken.Unlessthespecifiedtimehaselapsed.Thefollowingadvancedsettingscanbesettochangethisdefaultbehavioror“custom”canbeselectedasshowninFigure43.

AlthoughtheheartbeatproducedbyVMwareToolsisreliable,VMwareaddedafurtherverificationmechanism.Toavoidfalsepositives,VMMonitoringalsomonitorsI/Oactivityofthevirtualmachine.WhenheartbeatsarenotreceivedANDnodiskornetworkactivityhas



occurredoverthelast120seconds,perdefault,thevirtualmachinewillbereset.Changingtheadvancedsetting“das.iostatsInterval”canmodifythis120-secondinterval.

Itisrecommendedtoalignthedas.iostatsIntervalwiththefailureintervalselectedintheVMMonitoringsectionofvSphereHAwithintheWebClientorthevSphereClient.

Basicdesignprinciple:Aligndas.iostatsIntervalwiththefailureinterval.

ScreenshotsOneofthemostusefulfeaturesaspartofVMMonitoringisthefactthatittakesscreenshotsofthevirtualmachine’sconsole.ThescreenshotsaretakenrightbeforeVMMonitoringresetsavirtualmachine.Itisaveryusefulfeaturewhenavirtualmachine“freezes”everyonceinawhilefornoapparentreason.Thisscreenshotcanbeusedtodebugthevirtualmachineoperatingsystemwhenneeded,andisstoredinthevirtualmachine’sworkingdirectoryasloggedintheEventsviewontheMonitortabofthevirtualmachine.

Basicdesignprinciple:VMandApplicationmonitoringcansubstantiallyincreaseavailability.ItispartoftheHAstackandwestronglyrecommendusingit!

VMMonitoringImplementationDetailsVM/AppMonitoringisimplementedaspartoftheHAagentitself.Theagentusesthe“PerformanceManager”tomonitordiskandnetworkI/O;VM/AppMonitoringusesthe“usage”countersforbothdiskandnetworkanditrequeststhesecountersonceenoughheartbeatshavebeenmissedthattheconfiguredpolicyistriggered.

Asstatedbefore,VM/AppMonitoringusesheartbeatsjustlikehost-levelHA.TheheartbeatsaremonitoredbytheHAagent,whichisresponsiblefortherestarts.Ofcourse,thisinformationisalsobeingrolledupintovCenter,butthatisdoneviatheManagementNetwork,notusingthevirtualmachinenetwork.Thisiscrucialtoknowasthismeansthatwhenavirtualmachinenetworkerroroccurs,thevirtualmachineheartbeatwillstillbereceived.Whenanerroroccurs,HAwilltriggerarestartofthevirtualmachinewhenallthreeconditionsaremet:

1. NoVMwareToolsheartbeatreceived2. NonetworkI/Ooverthelast120seconds3. NostorageI/Ooverthelast120seconds

Justlikewithhost-levelHA,theHAagentworksindependentlyofvCenterwhenitcomestovirtualmachinerestarts.



Timing

TheVM/Appmonitoringfeaturemonitorstheheartbeat(s)issuedbyaguestandresetsthevirtualmachineifthereisaheartbeatfailurethatsatisfiestheconfiguredpolicyforthevirtualmachine.HAcanmonitorjusttheheartbeatsissuedbytheVMwaretoolsprocessorcanmonitortheseheartbeatsplusthoseissuedbyanoptionalin-guestagent.

IftheVMmonitoringheartbeatsstopattimeT-0,theminimumtimebeforeHAwilldeclareaheartbeatfailureisintherangeof81secondsto119seconds,whereasforheartbeatsissuedbyanin-guestapplicationagent,HAwilldeclareafailureintherangeof61secondsto89seconds.Onceaheartbeatfailureisdeclaredforapplicationheartbeats,HAwillattempttoresetthevirtualmachine.However,forVMwaretoolsheartbeats,HAwillfirstcheckwhetheranyIOhasbeenissuedbythevirtualmachineforthelast2minutes(bydefault)andonlyiftherehasbeennoIOwillitissueareset.DuetohowHOSTDpublishestheI/Ostatistics,thischeckcoulddelaytheresetbyapproximately20secondsforvirtualmachinesthatwereissuingI/Owithinapproximately1minuteofT-0.

Timingdetails:therangedependsonwhentheheartbeatsstoprelativetotheHOSTDthreadthatmonitorsthem.ForthelowerboundoftheVMwaretoolsheartbeats,theheartbeatsstopasecondbeforetheHOSTDthreadruns,whichmeans,atT+31,theFDMagentonthehostwillbenotifiedofatoolsyellowstate,andthenatT+61oftheredstate,whichHAreactsto.HAthenmonitorstheheartbeatfailureforaminimumof30seconds,leadingtotheminofT+91.The30secondsmonitoringperioddonebyHAcanbeincreasedusingthedas.failureIntervalpolicysetting.Fortheupperbound,theFDMisnotnotifieduntilT+59s(T=0thefailureoccurs,T+29HOSTDnoticesitandstartstheheartbeatfailuretimer,andatT+59HOSTDreportsayellowstate,andatT+89reportsaredstate).

Fortheheartbeatsissuedbyanin-guestagent,noyellowstateissent,sothethereisnoadditional30secondsperiod.

ApplicationMonitoringApplicationMonitoringisapartofVMMonitoring.ApplicationMonitoringisafeaturethatpartnersand/orcustomerscanleveragetoincreaseresiliency,asshowninthescreenshotbelowbutfromanapplicationpointofviewratherthanfromaVMpointofview.ThereisanSDKavailabletothegeneralpublicanditispartoftheguestSDK.



Figure54-VMandApplicationMonitoring

TheGuestSDKiscurrentlyprimarilyusedbyapplicationdevelopersfrompartnerslikeSymantectodevelopsolutionsthatincreaseresilienceonadifferentlevelthanVMMonitoringandHA.InthecaseofSymantec,asimplifiedversionofVeritasClusterServer(VCS)isusedtoenableapplicationavailabilitymonitoring,includingrespondingtoissues.Notethatthisisnotamulti-nodeclusteringsolutionlikeVCSitself,butasinglenodesolution.

SymantecApplicationHA,asitiscalled,istriggeredtogettheapplicationupandrunningagainbyrestartingit.Symantec'sApplicationHAisawareofdependenciesandknowsinwhichorderservicesshouldbestartedorstopped.If,however,thisfailsforacertainnumber(configurableoptionwithinApplicationHA)oftimes,VMwareHAwillberequestedtotakeaction.Thisactionwillbearestartofthevirtualmachine.

AlthoughApplicationMonitoringisrelativelynewandthereareonlyafewpartnerscurrentlyexploringthecapabilities,inouropinion,itdoesaddawholenewlevelofresiliency.Yourin-housedevelopmentteamcouldleveragefunctionalityofferedthroughtheAPI,oryoucoulduseasolutiondevelopedbyoneofVMware’spartners.WehavetestedApplicationHAbySymantecandpersonallyfeelitisthemissinglink.ItenablesyouasSystemAdmintointegrateyourvirtualizationlayerwithyourapplicationlayer.ItensuresyouasaSystemAdminthatserviceswhichareprotectedarerestartedinthecorrectorderanditavoidsthecommonpitfallsassociatedwithrestartsandmaintenance.NotethatVMwarealsointroducedan"ApplicationMonitoring"solutionwhichwasbasedonHyperictechnology,thisproducthoweverhasbeendeprecatedandassuchwillnotbediscussedinthispublication.

ApplicationAwarenessAPI

TheApplicationAwarenessAPIisopenforeveryone.Wefeelthatthisisnottheplacetodoafulldeepdiveonhowtouseit,butwedowanttodiscussitbriefly.

TheApplicationAwarenessAPIallowsforanyonetotalktoit,includingscripts,whichmakesthepossibilitiesendless.Currentlythereare6functionsdefined:

_VMGuestAppMonitor_Enable_()

EnablesMonitoring_VMGuestAppMonitor_MarkActive_()



Callevery30secondstomarkapplicationasactive_VMGuestAppMonitor_Disable_()

DisableMonitoring_VMGuestAppMonitor_IsEnabled_()

ReturnsstatusofMonitoring_VMGuestAppMonitor_GetAppStatus_()

Returnsthecurrentapplicationstatusrecordedfortheapplication_VMGuestAppMonitor_Free(_)

FreestheresultoftheVMGuestAppMonitor_GetAppStatus()call

Thesefunctionscanbeusedbyyourdevelopmentteam,howeverAppMonitoringalsooffersanewexecutable.ThisallowsyoutousethefunctionalityAppMonitoringofferswithouttheneedtocompileafullbinary.Thisnewcommand,vmware-appmonitoring.exe,takesthefollowingarguments,whicharenotcoincidentallysimilartothefunctions:

EnableDisablemarkActiveisEnabledgetAppStatus

Whenrunningthecommandvmware-appmonitor.exe,whichcanbefoundunder"VMware-GuestAppMonitorSDK\bin\win32\"thefollowingoutputispresented:

Usage:vmware-appmonitor.exe{enable|disable|markActive|isEnabled|getAppStatus}

AsshowntherearemultiplewaysofleveragingApplicationMonitoringandtoenhanceresiliencyonanapplicationlevel.



vSphereHAand...NowthatyouknowhowHAworksinsideout,wewanttoexplainthedifferentintegrationpointsbetweenHA,DRSandSDRS.

HAandStorageDRSvSphereHAinformsStorageDRSwhenafailurehasoccurred.ThistopreventtherelocationofanyHAprotectedvirtualmachine,meaning,avirtualmachinethatwaspoweredon,butwhichfailed,andhasnotbeenrestartedyetduetotheirbeinginsufficientcapacityavailable.Further,StorageDRSisnotallowedtoStoragevMotionavirtualmachinethatisownedbyamasterotherthantheonevCenterServeristalkingto.Thisisbecauseinsuchasituation,HAwouldnotbeabletoreprotectthevirtualmachineuntilthemastertowhichvCenterServeristalkingisabletolockthedatastoreagain.

StoragevMotionandHAIfavirtualmachineneedstoberestartedbyHAandthevirtualmachineisintheprocessofbeingStoragevMotionedandthevirtualmachinefails,therestartprocessisnotstarteduntilvCenterinformsthemasterthattheStoragevMotiontaskhascompletedorhasbeenrolledback.Ifthesourcehostfails,however,virtualmachinewillrestartthevirtualmachineaspartofthenormalworkflow.DuringaStoragevMotion,theHAagentonthehostonwhichtheStoragevMotionwasinitiatedmasksthefailurestateofthevirtualmachine.If,forwhateverreason,vCenterisunavailable,themaskingwilltimeoutafter15minutestoensurethatthevirtualmachinewillberestarted.

AlsonotethatwhenaStoragevMotioncompletes,vCenterwillreportthevirtualmachineasunprotecteduntilthemasterreportsitprotectedagainunderthenewpath.

HAandDRSHAintegratesonmultiplelevelswithDRS.ItisahugeimprovementanditissomethingthatwewantedtostressasithaschangedboththebehaviorandthereliabilityofHA.

HAandResourceFragmentation


113vSphereHAand...

Whenafailoverisinitiated,HAwillfirstcheckwhetherthereareresourcesavailableonthedestinationhostsforthefailover.If,forinstance,aparticularvirtualmachinehasaverylargereservationandtheAdmissionControlPolicyisbasedonapercentage,forexample,itcouldhappenthatresourcesarefragmentedacrossmultiplehosts.(Formoredetailsonthisscenario,seeChapter7.)HAwillaskDRStodefragmenttheresourcestoaccommodateforthisvirtualmachine’sresourcerequirements.AlthoughHAwillrequestadefragmentationofresources,aguaranteecannotbegiven.Assuch,evenwiththisadditionalintegration,youshouldstillbecautiouswhenitcomestoresourcefragmentation.

FlattenedShares

WhenshareshavebeensetcustomonavirtualmachineanissuecanarisewhenthatVMneedstoberestarted.WhenHAfailsoveravirtualmachine,itwillpower-onthevirtualmachineintheRootResourcePool.However,thevirtualmachine’sshareswerethoseconfiguredbyauserforit,andnotscaledforitbeingparentedundertheRootResourcePool.Thiscouldcausethevirtualmachinetoreceiveeithertoomanyortoofewresourcesrelativetoitsentitlement.

Ascenariowhereandwhenthiscanoccurwouldbethefollowing:

VM1hasa1000sharesandResourcePoolAhas2000shares.HoweverResourcePoolAhas2virtualmachinesandbothvirtualmachineswillhave50%ofthose“2000”shares.Thefollowingdiagramdepictsthisscenario:


114vSphereHAand...

Figure55-Flattensharesstartingpoint

Whenthehostfails,bothVM2andVM3willenduponthesamelevelasVM1,theRootResourcePool.However,asacustomsharesvalueof10,000wasspecifiedonbothVM2andVM3,theywillcompletelyblowawayVM1intimesofcontention.Thisisdepictedinthefollowingdiagram:


115vSphereHAand...

Figure56-Flattenshareshostfailure

ThissituationwouldpersistuntilthenextinvocationofDRSwouldre-parentthevirtualmachinesVM2andVM3totheiroriginalResourcePool.ToaddressthisissueHAcalculatesaflattenedsharevaluebeforethevirtualmachine’sisfailed-over.ThisflatteningprocessensuresthatthevirtualmachinewillgettheresourcesitwouldhavereceivedifithadfailedovertothecorrectResourcePool.Thisscenarioisdepictedinthefollowingdiagram.NotethatbothVM2andVM3areplacedundertheRootResourcePoolwithasharesvalueof1000.


116vSphereHAand...

Figure57-FlattensharesafterhostfailurebeforeDRSinvocation

Ofcourse,whenDRSisinvoked,bothVM2andVM3willbere-parentedunderResourcePool1andwillagainreceivethenumberofsharestheyhadbeenoriginallyassigned.

DPMandHA

IfDPMisenabledandresourcesarescarceduringanHAfailover,HAwilluseDRStotrytoadjustthecluster(forexample,bybringinghostsoutofstandbymodeormigratingvirtualmachinestodefragmenttheclusterresources)sothatHAcanperformthefailovers.

IfHAstrictAdmissionControlisenabled(default),DPMwillmaintainthenecessarylevelofpowered-oncapacitytomeettheconfiguredHAfailovercapacity.HAplacesaconstrainttopreventDPMfrompoweringdowntoomanyESXihostsifitwouldviolatetheAdmissionControlPolicy.

WhenHAadmissioncontrolisdisabled,HAwillpreventDPMfrompoweringoffallbutonehostinthecluster.Aminimumoftwohostsarekeptupregardlessoftheresourceconsumption.Thereasonthisbehaviorhaschangedisthatitisimpossibletorestartvirtualmachineswhentheonlyhostleftintheclusterhasjustfailed.

Inafailurescenario,ifHAcannotrestartsomevirtualmachines,itasksDRS/DPMtotrytodefragmentresourcesorbringhostsoutofstandbytoallowHAanotheropportunitytorestartthevirtualmachines.AnotherchangeisthatDRS/DPMwillpower-onorkeepon


117vSphereHAand...

hostsneededtoaddressclusterconstraints,evenifthosehostarelightlyutilized.Onceagain,inorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendationsandallowtherestartofvirtualmachinestooccur.


118vSphereHAand...

UseCase:StretchedClusterInthispartwewillbediscussingaspecificinfrastructurearchitectureandhowHA,DRSandStorageDRScanbeleveragedandshouldbedeployedtoincreaseavailability.Beitavailabilityofyourworkloadortheresourcesprovidedtoyourworkload,wewillguideyouthroughsomeofthedesignconsiderationsanddecisionpointsalongtheway.Ofcourse,afullunderstandingofyourenvironmentwillberequiredinordertomakeappropriatedecisionsregardingspecificimplementationdetails.Nevertheless,wehopethatthissectionwillprovideaproperunderstandingofhowcertainfeaturesplaytogetherandhowthesecanbeusedtomeettherequirementsofyourenvironmentandbuildthedesiredarchitecture.

ScenarioThescenariowehavechosenisastretchedclusteralsoreferredtoasaVMwarevSphereMetroStorageClustersolution.Wehavechosenthisspecificscenarioasitallowsustoexplainamultitudeofdesignandarchitecturalconsiderations.Althoughthisscenariohasbeentestedandvalidatedinourlab,everyenvironmentisuniqueandourrecommendationsarebasedonourexperienceandyourmileagemayvary.

AVMwarevSphereMetroStorageCluster(vMSC)configurationisaVMwarevSpherecertifiedsolutionthatcombinessynchronousreplicationwithstoragearraybasedclustering.Thesesolutionsaretypicallydeployedinenvironmentswherethedistancebetweendatacentersislimited,oftenmetropolitanorcampusenvironments.

Theprimarybenefitofastretchedclustermodelistoenablefullyactiveandworkload-balanceddatacenterstobeusedtotheirfullpotential.ManycustomersfindthisarchitectureattractiveduetothecapabilityofmigratingvirtualmachineswithvMotionandStoragevMotionbetweensites.Thisenableson-demandandnon-intrusivecross-sitemobilityofworkloads.Thecapabilityofastretchedclustertoprovidethisactivebalancingofresourcesshouldalwaysbetheprimarydesignandimplementationgoal.

Stretchedclustersolutionsofferthebenefitof:

WorkloadmobilityCross-siteautomatedloadbalancingEnhanceddowntimeavoidanceDisasteravoidance

Technicalrequirementsandconstraints


119UseCase-StretchedClusters

DuetothetechnicalconstraintsofanonlinemigrationofVMs,thefollowingspecificrequirements,whicharelistedintheVMwareCompatibilityGuide,mustbemetpriortoconsiderationofastretchedclusterimplementation:

StorageconnectivityusingFibreChannel,iSCSI,NFS,andFCoEissupported.ThemaximumsupportednetworklatencybetweensitesfortheVMwareESXimanagementnetworksis10msround-triptime(RTT).vMotion,andStoragevMotion,supportsamaximumof150mslatencyasofvSphere6.0,butthisisnotintendedforstretchedclusteringusage.Themaximumsupportedlatencyforsynchronousstoragereplicationlinksis10msRTT.Refertodocumentationfromthestoragevendorbecausethemaximumtoleratedlatencyislowerinmostcases.ThemostcommonlysupportedmaximumRTTis5ms.TheESXivSpherevMotionnetworkhasaredundantnetworklinkminimumof250Mbps.

Thestoragerequirementsareslightlymorecomplex.AvSphereMetroStorageClusterrequireswhatisineffectasinglestoragesubsystemthatspansbothsites.Inthisdesign,agivendatastoremustbeaccessible—thatis,beabletobereadandbewrittento—simultaneouslyfrombothsites.Further,whenproblemsoccur,theESXihostsmustbeabletocontinuetoaccessdatastoresfromeitherarraytransparentlyandwithnoimpacttoongoingstorageoperations.

Thisprecludestraditionalsynchronousreplicationsolutionsbecausetheycreateaprimary–secondaryrelationshipbetweentheactive(primary)LUNwheredataisbeingaccessedandthesecondaryLUNthatisreceivingreplication.ToaccessthesecondaryLUN,replicationisstopped,orreversed,andtheLUNismadevisibletohosts.This“promoted”secondaryLUNhasacompletelydifferentLUNIDandisessentiallyanewlyavailablecopyofaformerprimaryLUN.Thistypeofsolutionworksfortraditionaldisasterrecovery–typeconfigurationsbecauseitisexpectedthatVMsmustbestarteduponthesecondarysite.ThevMSCconfigurationrequiressimultaneous,uninterruptedaccesstoenablelivemigrationofrunningVMsbetweensites.

ThestoragesubsystemforavMSCmustbeabletobereadfromandwritetobothlocationssimultaneously.Alldiskwritesarecommittedsynchronouslyatbothlocationstoensurethatdataisalwaysconsistentregardlessofthelocationfromwhichitisbeingread.Thisstoragearchitecturerequiressignificantbandwidthandverylowlatencybetweenthesitesinthecluster.Increaseddistancesorlatenciescausedelaysinwritingtodiskandadramaticdeclineinperformance.TheyalsoprecludesuccessfulvMotionmigrationbetweenclusternodesthatresideindifferentlocations.

UniformversusNon-Uniform



vMSCsolutionsareclassifiedintotwodistinctcategories.Thesecategoriesarebasedonafundamentaldifferenceinhowhostsaccessstorage.Itisimportanttounderstandthedifferenttypesofstretchedstoragesolutionsbecausethisinfluencesdesignconsiderations.ThefollowingtwomaincategoriesareasdescribedontheVMwareHardwareCompatibilityList:

Uniformhostaccessconfiguration–ESXihostsfrombothsitesareallconnectedtoastoragenodeinthestorageclusteracrossallsites.PathspresentedtoESXihostsarestretchedacrossadistance.Nonuniformhostaccessconfiguration–ESXihostsateachsiteareconnectedonlytostoragenode(s)atthesamesite.PathspresentedtoESXihostsfromstoragenodesarelimitedtothelocalsite.

Thefollowingin-depthdescriptionsofbothcategoriesclearlydefinethemfromarchitecturalandimplementationperspectives.

Withuniformhostaccessconfiguration,hostsindatacenterAanddatacenterBhaveaccesstothestoragesystemsinbothdatacenters.Ineffect,thestorageareanetworkisstretchedbetweenthesites,andallhostscanaccessallLUNs.NetAppMetroClustersoftwareisanexampleofuniformstorage.Inthisconfiguration,read/writeaccesstoaLUNtakesplaceononeofthetwoarrays,andasynchronousmirrorismaintainedinahidden,read-onlystateonthesecondarray.Forexample,ifaLUNcontainingadatastoreisread/writeonthearrayindatacenterA,allESXihostsaccessthatdatastoreviathearrayindatacenterA.ForESXihostsindatacenterA,thisislocalaccess.ESXihostsindatacenterBthatarerunningVMshostedonthisdatastoresendread/writetrafficacrossthenetworkbetweendatacenters.Incaseofanoutageoranoperator-controlledshiftofcontroloftheLUNtodatacenterB,allESXihostscontinuetodetecttheidenticalLUNbeingpresented,butitisnowbeingaccessedviathearrayindatacenterB.

TheidealsituationisoneinwhichVMsaccessadatastorethatiscontrolled(read/write)bythearrayinthesamedatacenter.Thisminimizestrafficbetweendatacenterstoavoidtheperformanceimpactofreads’traversingtheinterconnect.

Thenotionof“siteaffinity”foraVMisdictatedbytheread/writecopyofthedatastore.“Siteaffinity”isalsosometimesreferredtoas“sitebias”or“LUNlocality.”ThismeansthatwhenaVMhassiteaffinitywithdatacenterA,itsread/writecopyofthedatastoreislocatedindatacenterA.Thisisexplainedinmoredetailinthe“vSphereDRS”subsectionofthissection.



Figure58-UniformConfiguration

Withnonuniformhostaccessconfiguration,hostsindatacenterAhaveaccessonlytothearraywithinthelocaldatacenter;thearray,aswellasitspeerarrayintheoppositedatacenter,isresponsibleforprovidingaccesstodatastoresinonedatacentertoESXihostsintheoppositedatacenter.EMCVPLEXisanexampleofastoragesystemthatcanbedeployedasanonuniformstoragecluster,althoughitcanalsobeconfiguredinauniformmanner.VPLEXprovidestheconceptofa“virtualLUN,”whichenablesESXihostsineachdatacentertoreadandwritetothesamedatastoreorLUN.VPLEXtechnologymaintainsthecachestateoneacharraysoESXihostsineitherdatacenterdetecttheLUNaslocal.EMCcallsthissolution“writeanywhere.”EvenwhentwoVMsresideonthesamedatastorebutarelocatedindifferentdatacenters,theywritelocallywithoutanyperformanceimpactoneitherVM.AkeypointwiththisconfigurationisthateachLUNordatastorehas“siteaffinity,”alsosometimesreferredtoas“sitebias”or“LUNlocality.”Inotherwords,ifanythinghappenstothelinkbetweenthesites,thestoragesystemonthepreferredsiteforagivendatastorewillbetheonlyoneremainingwithread/writeaccesstoit.Thispreventsanydatacorruptionincaseofafailurescenario.



Figure59-NonuniformConfiguration

Ourexamplesuseuniformstoragebecausetheseconfigurationsarecurrentlythemostcommonlydeployed.Manyofthedesignconsiderations,however,alsoapplytononuniformconfigurations.Wepointoutexceptionswhenthisisnotthecase.

ScenarioArchitectureInthissectionwewilldescribethearchitecturedeployedforthisscenario.WewillalsodiscusssomeofthebasicconfigurationandbehaviorofthevariousvSpherefeatures.Foranin-depthexplanationofeachrespectivefeature,refertotheHAandtheDRSsectionofthisbook.WewillmakespecificrecommendationsbasedonVMwarebestpracticesandprovideoperationalguidancewhereapplicable.Inourfailurescenariositwillbeexplainedhowthesepracticespreventorlimitdowntime.

Infrastructure



ThedescribedinfrastructureconsistsofasinglevSphere6.0clusterwithfourESXi6.0hosts.ThesehostsaremanagedbyasinglevCenterServer6.0instance.ThefirstsiteiscalledFrimley;thesecondsiteiscalledBluefin.ThenetworkbetweenFrimleydatacenterandBluefindatacenterisastretchedlayer2network.Thereisaminimaldistancebetweenthesites,asistypicalincampusclusterscenarios.

EachsitehastwoESXihosts,andthevCenterServerinstanceisconfiguredwithvSphereDRSaffinitytothehostsinBluefindatacenter.Inastretchedclusterenvironment,onlyasinglevCenterServerinstanceisused.ThisisdifferentfromatraditionalVMwareSiteRecoveryManager™configurationinwhichadualvCenterServerconfigurationisrequired.TheconfigurationofVM-to-hostaffinityrulesisdiscussedinmoredetailinthe“vSphereDRS”subsectionofthisdocument.

EightLUNsaredepictedthediagrambelow.FouroftheseareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheFrimleydatacenter;fourareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheBluefindatacenter.



Figure60-TestEnvironment

Location Hosts Datastores LocalIsolationAddress

Bluefin 172.16.103.184 Bluefin01 172.16.103.10

172.16.103.185 Bluefin02 n/a

Bluefin03 n/a

Bluefin04 n/a

Frimley 172.16.103.182 Frimley01 172.16.103.11

172.16.103.183 Frimley02 n/a

Frimley03 n/a

Frimley04 n/a



ThevSphereclusterisconnectedtoastretchedstoragesysteminafabricconfigurationwithauniformdeviceaccessmodel.Thismeansthateveryhostintheclusterisconnectedtobothstorageheads.Eachoftheheadsisconnectedtotwoswitches,whichareconnectedtotwosimilarswitchesinthesecondarylocation.ForanygivenLUN,oneofthetwostorageheadspresentstheLUNasread/writeviaiSCSI.Theotherstorageheadmaintainsthereplicated,read-onlycopythatiseffectivelyhiddenfromtheESXihosts.

vSphereConfigurationOurfocusinthissectionisonvSphereHA,vSphereDRS,andvSphereStorageDRSinrelationtostretchedclusterenvironments.DesignandoperationalconsiderationsregardingvSpherearecommonlyoverlookedandunderestimated.Muchemphasishastraditionallybeenplacedonthestoragelayer,butlittleattentionhasbeenappliedtohowworkloadsareprovisionedandmanaged.

Oneofthekeydriversforusingastretchedclusterisworkloadbalanceanddisasteravoidance.Howdoweensurethatourenvironmentisproperlybalancedwithoutimpactingavailabilityorseverelyincreasingtheoperationalexpenditure?Howdowebuildtherequirementsintoourprovisioningprocessandvalidateperiodicallythatwestillmeetthem?Ignoringtherequirementsmakestheenvironmentconfusingtoadministrateandlesspredictableduringthevariousfailurescenariosforwhichitshouldbeofhelp.

EachofthesethreevSpherefeatureshasveryspecificconfigurationrequirementsandcanenhanceenvironmentresiliencyandworkloadavailability.Architecturalrecommendationsbasedonourfindingsduringthetestingofthevariousfailurescenariosaregiventhroughoutthissection.

vSphereHA

Theenvironmenthasfourhostsandauniformstretchedstoragesolution.Afullsitefailureisonescenariothatmustbetakenintoaccountinaresilientarchitecture.VMwarerecommendsenablingvSphereHAadmissioncontrol.Workloadavailabilityistheprimarydriverformoststretchedclusterenvironments,soprovidingsufficientcapacityforafullsitefailureisrecommended.Suchhostsareequallydividedacrossbothsites.ToensurethatallworkloadscanberestartedbyvSphereHAonjustonesite,configuringtheadmissioncontrolpolicyto50percentforbothmemoryandCPUisrecommended.

VMwarerecommendsusingapercentage-basedpolicybecauseitoffersthemostflexibilityandreducesoperationaloverhead.Evenwhennewhostsareintroducedtotheenvironment,thereisnoneedtochangethepercentageandnoriskofaskewedconsolidationratioduetopossibleuseofVM-levelreservations.



ThescreenshotbelowshowsavSphereHAclusterconfiguredwithadmissioncontrolenabledandwiththepercentage-basedpolicysetto50percent.

Figure61-vSphereHAConfiguration

vSphereHAusesheartbeatmechanismstovalidatethestateofahost.Therearetwosuchmechanisms:networkheartbeatinganddatastoreheartbeating.NetworkheartbeatingistheprimarymechanismforvSphereHAtovalidateavailabilityofthehosts.Datastore



heartbeatingisthesecondarymechanismusedbyvSphereHA;itdeterminestheexactstateofthehostafternetworkheartbeatinghasfailed.

Ifahostisnotreceivinganyheartbeats,itusesafail-safemechanismtodetectifitismerelyisolatedfromitsmasternodeorcompletelyisolatedfromthenetwork.Itdoesthisbypingingthedefaultgateway.Inadditiontothismechanism,oneormoreisolationaddressescanbespecifiedmanuallytoenhancereliabilityofisolationvalidation.VMwarerecommendsspecifyingaminimumoftwoadditionalisolationaddresses,witheachaddresssitelocal.

Inourscenario,oneoftheseaddressesphysicallyresidesintheFrimleydatacenter;theotherphysicallyresidesintheBluefindatacenter.ThisenablesvSphereHAvalidationforcompletenetworkisolation,evenincaseofaconnectionfailurebetweensites.Thenextscreenshotshowsanexampleofhowtoconfiguremultipleisolationaddresses.ThevSphereHAadvancedsettingusedisdas.isolationaddress.MoredetailsonhowtoconfigurethiscanbefoundinVMwareKnowledgeBasearticle1002117.

Theminimumnumberofheartbeatdatastoresistwoandthemaximumisfive.ForvSphereHAdatastoreheartbeatingtofunctioncorrectlyinanytypeoffailurescenario,VMwarerecommendsincreasingthenumberofheartbeatdatastoresfromtwotofourinastretchedclusterenvironment.Thisprovidesfullredundancyforbothdatacenterlocations.Definingfourspecificdatastoresaspreferredheartbeatdatastoresisalsorecommended,selectingtwofromonesiteandtwofromtheother.ThisenablesvSphereHAtoheartbeattoadatastoreeveninthecaseofaconnectionfailurebetweensites.Subsequently,itenablesvSphereHAtodeterminethestateofahostinanyscenario.

Addinganadvancedsettingcalleddas.heartbeatDsPerHostcanincreasethenumberofheartbeatdatastores.Thisisshowninthescreenshotbelow.



http://kb.vmware.com/kb/1002117

Figure62-vSphereHAAdvancedSettings

Todesignatespecificdatastoresasheartbeatdevices,VMwarerecommendsusingSelectanyoftheclusterdatastorestakingintoaccountmypreferences.ThisenablesvSphereHAtoselectanyotherdatastoreifthefourdesignateddatastoresthathavebeenmanuallyselectedbecomeunavailable.VMwarerecommendsselectingtwodatastoresineachlocationtoensurethatdatastoresareavailableateachsiteinthecaseofasitepartition.

Figure63-DatastoreHeartbeating

PermanentDeviceLossandAllPathsDownScenarios



AsofvSphere6.0,enhancementshavebeenintroducedtoenableanautomatedfailoverofVMsresidingonadatastorethathaseitheranallpathsdown(APD)orapermanentdeviceloss(PDL)condition.PDLisapplicableonlytoblockstoragedevices.

APDLcondition,asisdiscussedinoneofourfailurescenarios,isaconditionthatiscommunicatedbythearraycontrollertotheESXihostviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditioniscommunicatedbythearrayiswhenaLUNissetoffline.ThisconditionisusedinnonuniformmodelsduringafailurescenariotoensurethattheESXihosttakesappropriateactionwhenaccesstoaLUNisrevoked.Whenafullstoragefailureoccurs,itisimpossibletogeneratethePDLconditionbecausethereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstateisidentifiedbytheESXihostasanAPDcondition.AnotherexampleofanAPDconditioniswherethestoragenetworkhasfailedcompletely.Inthisscenario,theESXihostalsodoesnotdetectwhathashappenedwiththestorageanddeclaresanAPD.

ToenablevSphereHAtorespondtobothanAPDandaPDLcondition,vSphereHAmustbeconfiguredinaspecificway.VMwarerecommendsenablingVMComponentProtection(VMCP).Afterthecreationofthecluster,VMCPmustbeenabled,asisshownbelow.

Figure64-VMComponentProtection

Theconfigurationscreencanbefoundasfollows:

LogintoVMwarevSphereWebClient.ClickHostsandClusters.Clicktheclusterobject.ClicktheManagetab.ClickvSphereHAandthenEdit.SelectProtectagainstStorageConnectivityLoss.Selectindividualfunctionality,asdescribedinthefollowing,byopeningFailureconditionsandVMresponse.

TheconfigurationforPDLisbasic.IntheFailureconditionsandVMresponsesection,theresponsefollowingdetectionofaPDLconditioncanbeconfigured.VMwarerecommendssettingthistoPoweroffandrestartVMs.Whenthisconditionisdetected,aVMisrestartedinstantlyonahealthyhostwithinthevSphereHAcluster.



ForanAPDscenario,configurationmustoccurinthesamesection,asisshowninthrscreenshotbelow.BesidesdefiningtheresponsetoanAPDcondition,itisalsopossibletoalterthetimingandtoconfigurethebehaviorwhenthefailureisrestoredbeforetheAPDtimeouthaspassed.

Figure65-VMCPDetailedConfiguration

WhenanAPDconditionisdetected,atimerisstarted.After140seconds,theAPDconditionisofficiallydeclaredandthedeviceismarkedasAPDtimeout.When140secondshavepassed,vSphereHAstartscounting.ThedefaultvSphereHAtimeoutis3minutes.Whenthe3minuteshavepassed,vSphereHArestartstheimpactedVMs,butVMCPcanbeconfiguredtoresponddifferentlyifpreferred.VMwarerecommendsconfiguringittoPoweroffandrestartVMs(conservative).

ConservativereferstothelikelihoodthatvSphereHAwillbeabletorestartVMs.Whensettoconservative,vSphereHArestartsonlytheVMthatisimpactedbytheAPDifitdetectsthatahostintheclustercanaccessthedatastoreonwhichtheVMresides.Inthecaseofaggressive,vSphereHAattemptstorestarttheVMevenifitdoesn’tdetectthestateoftheotherhosts.ThiscanleadtoasituationinwhichaVMisnotrestartedbecausethereisnohostthathasaccesstothedatastoreonwhichtheVMislocated.

IftheAPDisliftedandaccesstothestorageisrestoredbeforethetimeouthaspassed,vSphereHAdoesnotunnecessarilyrestarttheVMunlessexplicitlyconfiguredtodoso.IfaresponseischosenevenwhentheenvironmenthasrecoveredfromtheAPDcondition,ResponseforAPDrecoveryafterAPDtimeoutcanbeconfiguredtoResetVMs.VMwarerecommendsleavingthissettingdisabled.

WiththereleaseofvSphere5.5,anadvancedsettingcalledDisk.AutoremoveOnPDLwasintroduced.Itisimplementedbydefault.ThisfunctionalityenablesvSpheretoremovedevicesthataremarkedasPDLandhelpspreventreaching,forexample,the256-devicelimitforanESXihost.However,ifthePDLscenarioissolvedandthedevicereturns,the



ESXihost’sstoragesystemmustberescannedbeforethisdeviceappears.VMwarerecommendsdisablingDisk.AutoremoveOnPDLinthehostadvancedsettingsbysettingitto0.

Figure66-Disk.AutoremoveOnPDL

vSphereDRS

vSphereDRSisusedinmanyenvironmentstodistributeloadwithinacluster.Itoffersmanyotherfeaturesthatcanbeveryhelpfulinstretchedclusterenvironments.VMwarerecommendsenablingvSphereDRStofacilitateloadbalancingacrosshostsinthecluster.ThevSphereDRSload-balancingcalculationisbasedonCPUandmemoryuse.Careshouldbetakenwithregardtobothstorageandnetworkingresourcesaswellastotrafficflow.Toavoidstorageandnetworktrafficoverheadinastretchedclusterenvironment,VMwarerecommendsimplementingvSphereDRSaffinityrulestoenablealogicalseparationofVMs.Thissubsequentlyhelpsimproveavailability.ForVMsthatareresponsibleforinfrastructureservices,suchasMicrosoftActiveDirectoryandDNS,itassistsbyensuringseparationoftheseservicesacrosssites.

vSphereDRSaffinityrulesalsohelppreventunnecessarydowntime,andstorageandnetworktrafficflowoverhead,byenforcingpreferredsiteaffinity.VMwarerecommendsaligningvSphereVM-to-hostaffinityruleswiththestorageconfiguration—thatis,settingVM-to-hostaffinityruleswithapreferencethataVMrunonahostatthesamesiteasthearray



thatisconfiguredastheprimaryread/writenodeforagivendatastore.Forexample,inourtestconfiguration,VMsstoredontheFrimley01datastorearesetwithVM-to-hostaffinitywithapreferenceforhostsintheFrimleydatacenter.Thisensuresthatinthecaseofanetworkconnectionfailurebetweensites,VMsdonotloseconnectionwiththestoragesystemthatisprimaryfortheirdatastore.VM-to-hostaffinityrulesaimtoensurethatVMsstaylocaltothestorageprimaryforthatdatastore.ThiscoincidentallyalsoresultsinallreadI/O’sstayinglocal.

NOTE:DifferentstoragevendorsusedifferentterminologytodescribetherelationshipofaLUNtoaparticulararrayorcontroller.Forthepurposesofthisdocument,weusethegenericterm“storagesiteaffinity,”whichreferstothepreferredlocationforaccesstoagivenLUN.

VMwarerecommendsimplementing“shouldrules”becausetheseareviolatedbyvSphereHAinthecaseofafullsitefailure.Availabilityofservicesshouldalwaysprevail.Inthecaseof“mustrules,”vSphereHAdoesnotviolatetheruleset,andthiscanpotentiallyleadtoserviceoutages.Inthescenariowhereafulldatacenterfails,“mustrules”donotallowvSphereHAtorestarttheVMs,becausetheydonothavetherequiredaffinitytostartonthehostsintheotherdatacenter.Thisnecessitatestherecommendationtoimplement“shouldrules.”vSphereDRScommunicatestheserulestovSphereHA,andthesearestoredina“compatibilitylist”governingallowedstart-up.Ifasinglehostfails,VM-to-host“shouldrules”areignoredbydefault.VMwarerecommendsconfiguringvSphereHArulesettingstorespectVM-to-hostaffinityruleswherepossible.Withafullsitefailure,vSphereHAcanrestarttheVMsonhoststhatviolatetherules.Availabilitytakespreferenceinthisscenario.

Figure67-HAAffinityRuleSettings

Undercertaincircumstances,suchasmassivehostsaturationcoupledwithaggressiverecommendationsettings,vSphereDRScanalsoviolate“shouldrules.”Althoughthisisveryrare,werecommendmonitoringforviolationoftheserulesbecauseaviolationmightimpactavailabilityandworkloadperformance.

VMwarerecommendsmanuallydefining“sites”bycreatingagroupofhoststhatbelongtoasiteandthenaddingVMstothesesitesbasedontheaffinityofthedatastoreonwhichtheyareprovisioned.Inourscenario,onlyalimitednumberofVMswereprovisioned.VMwarerecommendsautomatingtheprocessofdefiningsiteaffinitybyusingtoolssuchasVMwarevCenterOrchestrator™orVMwarevSpherePowerCLI™.Ifautomatingtheprocessisnotan



option,useofagenericnamingconventionisrecommendedtosimplifythecreationofthesegroups.VMwarerecommendsthatthesegroupsbevalidatedonaregularbasistoensurethatallVMsbelongtothegroupwiththecorrectsiteaffinity.

Thefollowingscreenshotsdepicttheconfigurationusedforourscenario.Inthefirstscreenshot,allVMsthatshouldremainlocaltotheBluefindatacenterareaddedtotheBluefinVMgroup.

Figure68-VMGroup

Next,aBluefinhostgroupiscreatedthatcontainsallhostsresidinginthislocation.



Figure69-HostGroup

Next,anewruleiscreatedthatisdefinedasa“shouldrunonrule.”ItlinksthehostgroupandtheVMgroupfortheBluefinlocation.



Figure70-RuleDefinition

Thisshouldbedoneforbothlocations,whichshouldresultintworules.

Figure71-VM/HostRules



CorrectingAffinityRuleViolation

vSphereDRSassignsahighprioritytocorrectingaffinityruleviolations.Duringinvocation,theprimarygoalofvSphereDRSistocorrectanyviolationsandgeneraterecommendationstomigrateVMstothehostslistedinthehostgroup.Thesemigrationshaveahigherprioritythanload-balancingmovesandarestartedbeforethem.

vSphereDRSisinvokedevery5minutesbydefault,butitisalsotriggerediftheclusterdetectschanges.Forinstance,whenahostreconnectstothecluster,vSphereDRSisinvokedandgeneratesrecommendationstocorrecttheviolation.OurtestinghasshownthatvSphereDRSgeneratesrecommendationstocorrectaffinityrulesviolationswithin30secondsafterahostreconnectstothecluster.vSphereDRSislimitedbytheoverallcapacityofthevSpherevMotionnetwork,soitmighttakemultipleinvocationsbeforeallaffinityruleviolationsarecorrected.

vSphereStorageDRS

vSphereStorageDRSenablesaggregationofdatastorestoasingleunitofconsumptionfromanadministrativeperspective,anditbalancesVMdiskswhendefinedthresholdsareexceeded.Itensuresthatsufficientdiskresourcesareavailabletoaworkload.VMwarerecommendsenablingvSphereStorageDRSwithI/OMetricdisabled.TheuseofI/OMetricorVMwarevSphereStorageI/OControlisnotsupportedinavMSCconfiguration,asisdescribedinVMwareKnowledgeBasearticle2042596.



Figure72-StorageDRSConfiguration

vSphereStorageDRSusesvSphereStoragevMotiontomigrateVMdisksbetweendatastoreswithinadatastorecluster.Becausetheunderlyingstretchedstoragesystemsusesynchronousreplication,amigrationorseriesofmigrationshaveanimpactonreplicationtrafficandmightcausetheVMstobecometemporarilyunavailableduetocontentionfornetworkresourcesduringthemovementofdisks.MigrationtorandomdatastorescanalsopotentiallyleadtoadditionalI/OlatencyinuniformhostaccessconfigurationsifVMsarenotmigratedalongwiththeirvirtualdisks.Forexample,ifaVMresidingonahostatsiteAhasitsdiskmigratedtoadatastoreatsiteB,itcontinuesoperatingbutwithpotentiallydegradedperformance.TheVM’sdiskreadsnowaresubjecttotheincreasedlatencyassociatedwithreadingfromthevirtualiSCSIIPatsiteB.Readsaresubjecttointersitelatencyratherthanbeingsatisfiedbyalocaltarget.

Tocontrolifandwhenmigrationsoccur,VMwarerecommendsconfiguringvSphereStorageDRSinmanualmode.Thisenableshumanvalidationperrecommendationaswellasrecommendationstobeappliedduringoff-peakhours,whilegainingtheoperationalbenefitandefficiencyoftheinitialplacementfunctionality.

VMwarerecommendscreatingdatastoreclustersbasedonthestorageconfigurationwithrespecttostoragesiteaffinity.DatastoreswithasiteaffinityforsiteAshouldnotbemixedindatastoreclusterswithdatastoreswithasiteaffinityforsiteB.ThisenablesoperationalconsistencyandeasesthecreationandongoingmanagementofvSphereDRSVM-to-hostaffinityrules.EnsurethatallvSphereDRSVM-to-hostaffinityrulesareupdatedaccordingly



whenVMsaremigratedviavSphereStoragevMotionbetweendatastoreclustersandwhencrossingdefinedstoragesiteaffinityboundaries.Tosimplifytheprovisioningprocess,VMwarerecommendsaligningnamingconventionsfordatastoreclustersandVM-to-hostaffinityrules.

Figure73-DatastoreClusters

Thenamingconventionusedinourtestinggivesbothdatastoresanddatastoreclustersasite-specificnametoprovideeaseofalignmentofvSphereDRShostaffinitywithVMdeploymentinthecorrelatesite.

FailureScenariosTherearemanyfailuresthatcanbeintroducedinclusteredsystems.Butinaproperlyarchitectedenvironment,vSphereHA,vSphereDRS,andthestoragesubsystemdonotdetectmanyofthese.Wedonotaddressthezero-impactfailures,suchasthefailureofasinglenetworkcable,becausetheyareexplainedindepthinthedocumentationprovidedbythestoragevendorofthevarioussolutions.Wediscussthefollowing“common”failurescenarios:

Single-hostfailureinFrimleydatacenterSingle-hostisolationinFrimleydatacenterStoragepartitionDatacenterpartitionDiskshelffailureinFrimleydatacenterFullstoragefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterandfullstoragefailureinBluefindatacenterLossofcompleteFrimleydatacenter



Wealsoexaminescenariosinwhichspecificsettingsareincorrectlyconfigured.ThesesettingsdeterminetheavailabilityandrecoverabilityofVMsinafailurescenario.Itisimportanttounderstandtheimpactofmisconfigurationssuchasthefollowing:

IncorrectlyconfiguredVM-to-hostaffinityrulesIncorrectlyconfiguredheartbeatdatastoresIncorrectlyconfiguredisolationaddressIncorrectlyconfiguredPDLhandlingvCenterServersplit-brainscenario

Single-HostFailureinFrimleyDataCenter

Inthisscenario,wedescribethecompletefailureofahostinFrimleydatacenter.Thisscenarioisdepictedbelow.



Figure74-Single-HostFailureScenario

Result:vSphereHAsuccessfullyrestartedallVMsinaccordancewithVM-to-hostaffinityrules.

Explanation:Ifahostfails,thecluster’svSphereHAmasternodedetectsthefailurebecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehosthasfailedcompletely,itcannotgeneratedatastoreheartbeats;thesetooaredetectedasmissingbythevSphereHAmasternode.Duringthistime,athirdavailabilitycheck—pingingthemanagementaddressesofthefailedhosts—isconducted.Ifallofthesechecksreturnasunsuccessful,themasterdeclaresthemissinghostasdeadandattemptstorestartalltheprotectedVMsthathadbeenrunningonthehostbeforethemasterlostcontactwiththehost.



ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”vSphereHAVM-to-hostaffinityrulesshouldberespectedsoallVMsarerestartedwithinthecorrectsite.

However,ifthehostelementsoftheVM-to-hostgrouparetemporarilywithoutresources,oriftheyareunavailableforrestartsforanyotherreason,vSphereHAcandisregardtherulesandrestarttheremainingVMsonanyoftheremaininghostsinthecluster,regardlessoflocationandrules.Ifthisoccurs,vSphereDRSattemptstocorrectanyviolatedaffinityrulesatthefirstinvocationandautomaticallymigratesVMsinaccordancewiththeiraffinityrulestobringVMplacementinalignment.VMwarerecommendsmanuallyinvokingvSphereDRSafterthecauseforthefailurehasbeenidentifiedandresolved.ThisensuresthatallVMsareplacedonhostsinthecorrectlocationtoavoidpossibleperformancedegradationduetomisplacement.

Single-HostIsolationinFrimleyDataCenter

Inthisscenario,wedescribetheresponsetoisolationofasinglehostinFrimleydatacenterfromtherestofthenetwork.

Figure75-Single-HostIsolationScenario

Result:VMsremainrunningbecauseisolationresponseisconfiguredtoleavepoweredon.

Explanation:Whenahostisisolated,thevSphereHAmasternodedetectstheisolationbecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehostisisolated,itgeneratesdatastore



heartbeatsforthesecondaryvSphereHAdetectionmechanism.DetectionofvalidhostheartbeatsenablesthevSphereHAmasternodetodeterminethatthehostisrunningbutisisolatedfromthenetwork.Dependingontheisolationresponseconfigured,theimpactedhostcanpowerofforshutdownVMsorcanleavethempoweredon.Theisolationresponseistriggered30secondsafterthehosthasdetectedthatitisisolated.

VMwarerecommendsaligningtheisolationresponsetobusinessrequirementsandphysicalconstraints.Fromabestpracticesperspective,leavepoweredonistherecommendedisolationresponsesettingforthemajorityofenvironments.Isolatedhostsarerareinaproperlyarchitectedenvironment,giventhebuilt-inredundancyofmostmoderndesigns.Inenvironmentsthatusenetwork-basedstorageprotocols,suchasiSCSIandNFS,andwherenetworksareconverged,therecommendedisolationresponseispoweroff.Intheseenvironments,itismorelikelythatanetworkoutagethatcausesahosttobecomeisolatedalsoaffectsthehost’sabilitytocommunicatetothedatastores.

Ifanisolationresponsedifferentfromtherecommendedleavepoweredonisselectedandapowerofforshutdownresponseistriggered,thevSphereHAmasterrestartsVMsontheremainingnodesinthecluster.ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”However,becausethevSphereHArulesettingsspecifythatthevSphereHAVM-to-hostaffinityrulesshouldberespected,allVMsarerestartedwithinthecorrectsiteunder“normal”circumstances.

StoragePartition

Inthisscenario,afailurehasoccurredonthestoragenetworkbetweendatacenters,asisdepictedbelow.



Figure76-StoragePartitionScenario

Result:VMsremainrunningwithnoimpact.

Explanation:StoragesiteaffinityisdefinedforeachLUN,andvSphereDRSrulesalignwiththisaffinity.Therefore,becausestorageremainsavailablewithinthesite,noVMisimpacted.

IfforanyreasontheaffinityruleforaVMhasbeenviolatedandtheVMisrunningonahostinFrimleydatacenterwhileitsdiskresidesonadatastorethathasaffinitywithBluefindatacenter,itcannotsuccessfullyissueI/Ofollowinganintersitestoragepartition.ThisisbecausethedatastoreisinanAPDcondition.Inthisscenario,theVMcanberestartedbecausevSphereHAisconfiguredtorespondtoAPDconditions.Theresponseoccursafterthe3-minutegraceperiodhaspassed.This3-minuteperiodstartsaftertheAPDtimeoutof140secondshaspassedandtheAPDconditionhasbeendeclared.



ToavoidunnecessarydowntimeinanAPDscenario,VMwarerecommendsmonitoringcomplianceofvSphereDRSrules.AlthoughvSphereDRSisinvokedevery5minutes,thisdoesnotguaranteeresolutionofallaffinityruleviolations.Therefore,topreventunnecessarydowntime,rigidmonitoringisrecommendedthatenablesquickidentificationofanomaliessuchasaVM’scompute’sresidinginonesitewhileitsstorageresidesintheothersite.

DataCenterPartition

Inthisscenario,theFrimleydatacenterisisolatedfromtheBluefindatacenter,asisdepictedbelow.

Figure77-DataCenterPartitionScenario




Explanation:Inthisscenario,thetwodatacentersarefullyisolatedfromeachother.Thisscenarioissimilartoboththestoragepartitionandthehostisolationscenario.VMsarenotimpactedbythisfailurebecausevSphereDRSruleswerecorrectlyimplementedandnoruleswereviolated.

vSphereHAfollowsthislogicalprocesstodeterminewhichVMsrequirerestartingduringaclusterpartition:

ThevSphereHAmasternoderunninginFrimleydatacenterdetectsthatallhostsinBluefindatacenterareunreachable.Itfirstdetectsthatnonetworkheartbeatsarebeingreceived.Itthendetermineswhetheranystorageheartbeatsarebeinggenerated.Thischeckdoesnotdetectstorageheartbeatsbecausethestorageconnectionbetweensitesalsohasfailed,andtheheartbeatdatastoresareupdatedonly“locally.”BecausetheVMswithaffinitytotheremaininghostsarestillrunning,noactionisneededforthem.Next,vSphereHAdetermineswhetherarestartcanbeattempted.However,theread/writeversionofthedatastoreslocatedinBluefindatacenterarenotaccessiblebythehostsinFrimleydatacenter.Therefore,noattemptismadetostartthemissingVMs.

Similarly,theESXihostsinBluefindatacenterdetectthatthereisnomasteravailable,andtheyinitiateamasterelectionprocess.Afterthemasterhasbeenelected,ittriestodeterminewhichVMshadbeenrunningbeforethefailureanditattemptstorestartthem.BecauseallVMswithaffinitytoBluefindatacenterarestillrunningthere,thereisnoneedforarestart.OnlytheVMswithaffinitytoFrimleydatacenterareunavailable,andvSphereHAcannotrestartthembecausethedatastoresonwhichtheyarestoredhaveaffinitywithFrimleydatacenterandareunavailableinBluefindatacenter.

IfVM-to-hostaffinityruleshavebeenviolated—thatis,VMshavebeenrunningatalocationwheretheirstorageisnotdefinedasread/writebydefault—thebehaviorchanges.Thefollowingsequencedescribeswhatwouldhappeninthatcase:

1. TheVMwithaffinitytoFrimleydatacenterbutresidinginBluefindatacenterisunabletoreachitsdatastore.ThisresultsintheVM’sbeingunabletowritetoorreadfromdisk.

2. InFrimleydatacenter,thisVMisrestartedbyvSphereHAbecausethehostsinFrimleydatacenterdonotdetecttheinstance’srunninginBluefindatacenter.

3. BecausethedatastoreisavailableonlytoFrimleydatacenter,oneofthehostsinFrimleydatacenteracquiresalockontheVMDKandisabletopoweronthisVM.

4. ThiscanresultinascenarioinwhichthesameVMispoweredonandrunninginbothdatacenters.



Figure78-GhostVM

IftheAPDresponseisconfiguredtoPoweroffandrestartVMs(aggressive),asisrecommendedintheVMComponentProtectionsectionofthiswhitepaper,theVMispoweredoffaftertheAPDtimeoutandthegraceperiodhavepassed.ThisbehaviorisnewinvSphere6.0.

IftheAPDresponseisnotcorrectlyconfigured,twoVMswillberunning,forthefollowingpossiblereasons:

ThenetworkheartbeatfromthehostthatisrunningthisVMismissingbecausethereisnoconnectiontothatsite.Thedatastoreheartbeatismissingbecausethereisnoconnectiontothatsite.ApingtothemanagementaddressofthehostthatisrunningtheVMfailsbecausethereisnoconnectiontothatsite.ThemasterlocatedinFrimleydatacenterdetectsthattheVMhadbeenpoweredonbeforethefailure.BecauseitisunabletocommunicatewiththeVM’shostinBluefindatacenterafterthefailure,itattemptstorestarttheVMbecauseitcannotdetecttheactualstate.



Iftheconnectionbetweensitesisrestored,aclassic“VMsplit-brainscenario”willexist.Forashortperiodoftime,twocopiesoftheVMwillbeactiveonthenetwork,withbothhavingthesameMACaddress.Onlyonecopy,however,willhaveaccesstotheVMfiles,andvSphereHAwilldetectthis.Assoonasthisisdetected,allprocessesbelongingtotheVMcopythathasnoaccesstotheVMfileswillbekilled,asisdepictedbelow.

Figure79-TasksandEvents

Inthisexample,thedowntimeequatestoaVM’shavingtoberestarted.Propermaintenanceofsiteaffinitycanpreventthis.Toavoidunnecessarydowntime,VMwarerecommendsclosemonitoringtoensurethatvSphereDRSrulesalignwithdatastoresiteaffinity.

DiskShelfFailureinFrimleyDataCenter

Inthisscenario,oneofthediskshelvesinFrimleydatacenterhasfailed.BothFrimley01andFrimley02onstorageAareimpacted.



Figure80-DiskShelfFailureScenario


Explanation:Inthisscenario,onlyadiskshelfinFrimleydatacenterhasfailed.ThestorageprocessorhasdetectedthefailureandhasinstantlyswitchedfromtheprimarydiskshelfinFrimleydatacentertothemirrorcopyinBluefindatacenter.ThereisnonoticeableimpacttoanyoftheVMsexceptforatypicalshortspikeinI/Oresponsetime.Thestoragesolutionfullydetectsandhandlesthisscenario.ThereisnoneedforarescanofthedatastoresortheHBAsbecausetheswitchoverisseamlessandtheLUNsareidenticalfromtheESXiperspective.

FullStorageFailureinFrimleyDataCenter



Inthisscenario,afullstoragesystemfailurehasoccurredinFrimleydatacenter.

Figure81-FullStorageFailureScenario


Explanation:WhenthefullstoragesystemfailsinFrimleydatacenter,atakeovercommandmustbeinitiatedmanually.Asdescribedpreviously,weusedaNetAppMetroClusterconfigurationtodescribethisbehavior.ThistakeovercommandisparticulartoNetAppenvironments;dependingontheimplementedstoragesystem,therequiredprocedurecandiffer.Afterthecommandhasbeeninitiated,themirrored,read-onlycopyofeachofthefaileddatastoresissettoread/writeandisinstantlyaccessible.Wehavedescribedthisprocessonanextremelyhighlevel.Formoredetails,refertothestoragevendor’sdocumentation.



FromtheVMperspective,thisfailoverisseamless:Thestoragecontrollershandlethis,andnoactionisrequiredfromeitherthevSphereorstorageadministrator.AllI/OnowpassesacrosstheintrasiteconnectiontotheotherdatacenterbecauseVMsremainrunninginFrimleydatacenterwhiletheirdatastoresareaccessibleonlyinBluefindatacenter.

vSphereHAdoesnotdetectthistypeoffailure.Althoughthedatastoreheartbeatmightbelostbriefly,vSphereHAdoesnottakeactionbecausethevSphereHAmasteragentchecksforthedatastoreheartbeatonlywhenthenetworkheartbeatisnotreceivedfor3seconds.Becausethenetworkheartbeatremainsavailablethroughoutthestoragefailure,vSphereHAisnotrequiredtoinitiateanyrestarts.

PermanentDeviceLoss

Inthescenarioshownthediagrambelow,apermanentdeviceloss(PDL)conditionoccursbecausedatastoreFrimley01hasbeentakenofflineforESXi-01andESXi-02.PDLscenariosareuncommoninuniformconfigurationsandaremorelikelytooccurinanonuniformvMSCconfiguration.However,aPDLscenariocan,forinstance,occurwhentheconfigurationofastoragegroupchangesasinthecaseofthisdescribedscenario.



Figure82-PermanentDeviceLoss

Result:VMsarerestartedbyvSphereHAonESXi-03andESXi-04.

Explanation:WhenthePDLconditionoccurs,VMsrunningondatastoreFrimley01onhostsESXi-01andESXi-02arekilledinstantly.TheythenarerestartedbyvSphereHAonhostswithintheclusterthathaveaccesstothedatastore,ESXi-03andESXi-04inthisscenario.ThePDLandkillingoftheVMworldgroupcanbewitnessedbyfollowingtheentriesinthevmkernel.logfilelocatedin/var/log/ontheESXihosts.Thefollowingisanouttakeofthevmkernel.logfilewhereaPDLisrecognizedandappropriateactionistaken.

2012-03-14T13:39:25.085Zcpu7:4499)WARNING:VSCSI:4055:handle8198(vscsi4:0):openedby

wid4499(vmm0:fri-iscsi-02)hasPermanentDeviceLoss.Killingworldgroupleader4491



VMwarerecommendsconfiguringResponseforDatastorewithPermanentDeviceLoss(PDL)toPoweroffandrestartVMs.ThissettingensuresthatappropriateactionistakenwhenaPDLconditionexists.Thecorrectconfigurationisshownbelow.

Figure83-APD/PDLConfiguration

FullComputeFailureinFrimleyDataCenter

Inthisscenario,afullcomputefailurehasoccurredinFrimleydatacenter.

Figure84-FullComputeFailureScenario

Result:AllVMsaresuccessfullyrestartedinBluefindatacenter.



Explanation:ThevSphereHAmasterwaslocatedinFrimleydatacenteratthetimeofthefullcomputefailureatthatlocation.AfterthehostsinBluefindatacenterdetectedthatnonetworkheartbeatshadbeenreceived,anelectionprocesswasstarted.Withinapproximately20seconds,anewvSphereHAmasterwaselectedfromtheremaininghosts.ThenthenewmasterdeterminedwhichhostshadfailedandwhichVMshadbeenimpactedbythisfailure.BecauseallhostsattheothersitehadfailedandallVMsresidingonthemhadbeenimpacted,vSphereHAinitiatedtherestartofalloftheseVMs.vSphereHAcaninitiate32concurrentrestartsonasinglehost,providingalowrestartlatencyformostenvironments.Theonlysequencingofstartordercomesfromthebroadhigh,medium,andlowcategoriesforvSphereHA.Thispolicymustbesetonaper-VMbasis.Thesepoliciesweredeterminedtohavebeenadheredto;high-priorityVMsstartedfirst,followedbymedium-priorityandlow-priorityVMs.

Aspartofthetest,thehostsattheFrimleydatacenterwereagainpoweredon.AssoonasvSphereDRSdetectedthatthesehostswereavailable,avSphereDRSrunwasinvoked.BecausetheinitialvSphereDRSruncorrectsonlythevSphereDRSaffinityruleviolations,resourceimbalancewasnotcorrectuntilthenextfullinvocationofvSphereDRS.vSphereDRSisinvokedbydefaultevery5minutesorwhenVMsarepoweredofforonthroughtheuseofthevCenterWebClient.

LossofFrimleyDataCenter

Inthisscenario,afullfailureofFrimleydatacenterissimulated.



Figure85-FullDataCenterFailureScenario

Result:AllVMsweresuccessfullyrestartedinBluefindatacenter.

Explanation:Inthisscenario,thehostsinBluefindatacenterlostcontactwiththevSphereHAmasterandelectedanewvSphereHAmaster.Becausethestoragesystemhadfailed,atakeovercommandhadtobeinitiatedonthesurvivingsite,againduetotheNetApp-specificprocess.Afterthetakeovercommandhadbeeninitiated,thenewvSphereHAmasteraccessedtheper-datastorefilesthatvSphereHAusestorecordthesetofprotectedVMs.ThevSphereHAmasterthenattemptedtorestarttheVMsthatwerenotrunningonthesurvivinghostsinBluefindatacenter.Inourscenario,allVMswererestartedwithin2minutesafterfailureandwerefullyaccessibleandfunctionalagain.

NOTE:Bydefault,vSphereHAstopsattemptingtostartaVMafter30minutes.Ifthestorageteamdoesnotissueatakeovercommandwithinthattimeframe,thevSphereadministratormustmanuallystartupVMsafterthestoragebecomesavailable.



StretchedClusterusingVSANThisquestionkeepsoncomingupoverandoveragainlately,StretchedClusterusingVirtualSAN,canIdoit?WhenVirtualSANwasfirstreleasedtheanswertothisquestionwasaclearno,VirtualSANdidnotallowa"traditional"stretcheddeploymentusing2"data"sitesandathird"witness"site.AregularVirtualSANclusterstretchedacross3siteswithincampusdistancehoweverwaspossible.WithVirtualSAN6.1howeverintroducedthe"traditional"stretchedclusterdeploymentsupport.

Figure86-StretchedVirtualSANConfiguration

EverythinglearnedinthispublicationalsoappliestoastretchedVirtualSANcluster,withthatmeaningallHAandDRSbestpractices.ThereareacoupleofdifferencesthoughatthetimeofwritingbetweenavSphereMetroStorageClusterandaVSANStretchedClusterand



inthissectionwewillcalloutthesedifference.PleasenotethatthereisanextensiveVirtualSANStretchedClusteringGuideavailablewrittenbyCormacHoganandthereisafullVirtualSANbookavailablewrittenbyCormacHogananmyself(DuncanEpping).IfyouwanttoknowmoredetailsaboutVirtualSANwewouldliketorefertothesetwopublications.

Firstthingthatneedstobelookedatisthenetwork.FromaVirtualSANperspectivethereareclearrequirements:

5msRTTlatencymaxbetweendatasites200msRTTlatencymaxbetweendataandwitnesssiteBothL3andL2aresupportedbetweenthedatasites

10Gbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Multicastrequired,whichmeansthatifL3isused,someformofmulticastroutingisneeded.

L3isexpectedbetweendataandthewitnesssites100Mbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Nomulticastrequiredtothewitnesssite.

WhenitcomestoHAandDRStheconfigurationisprettystraightforward.Acoupleofthingswewanttopointoutastheyareconfigurationdetailswhichareeasytoforgetabout.Somearediscussedin-depthabove,somearesettingsyouactuallydonotusewithVSAN.Wewillpointthisoutinthelistbelow:

Makesuretospecifyadditionalisolationaddresses,oneineachsite(das.isolationAddress0–1).Disablethedefaultisolationaddressifitcan’tbeusedtovalidatethestateoftheenvironmentduringapartition(ifthegatewayisn’tavailableinbothsides).DisableDatastoreheartbeating,withouttraditionalexternalstoragethereisnoreasontohavethis.EnableHAAdmissionControlandmakesureitissetto50%forCPUandMemory.KeepVMslocalbycreating“VM/Host”shouldrules.

Thatcoversmostofit,summarizedrelativelybrieflycomparedtotheexcellentdocumentCormacdevelopedwithalldetailsyoucanwishfor.MakesuretoreadthatifyouwanttoknoweveryaspectandangleofastretchedVirtualSANclusterconfiguration.



https://www.vmware.com/files/pdf/products/vsan/VMware-Virtual-SAN-6.1-Stretched-Cluster-Guide.pdf

https://www.amazon.com/Essential-Virtual-SAN-VSAN-Administrators/dp/013385499X/ref=as_sl_pc_qf_sp_asin_til?tag=yellowbricks-20&linkCode=w00&linkId=TCN5SP2LTIEA2N73&creativeASIN=013385499X

https://www.gitbook.com/book/duncanyb/vsphere-ha-deepdive/edit

AdvancedSettingsTherearevarioustypesofKBarticlesandthisKBarticleexplainsit,butletmesummarizeitandsimplifyitabittomakeiteasiertodigest.

Therearevarioussortsofadvancedsettings,butforHAthreeinparticular:

das.*–>Clusterleveladvancedsetting.fdm.*–>FDMhostleveladvancedsettingvpxd.*–>vCenterleveladvancedsetting.

Howdoyouconfigurethese?Configuringtheseistypicallystraightforward,andmostofyouhopefullyknowthisalready,ifnot,letusgooverthestepstohelpconfiguringyourenvironmentasdesired.

ClusterLevelIntheWebClient:

Click“HostsandClusters”clickyourclusterobjectclickthe“Manage”tabclick“Settings”and“vSphereHA”hitthe“Edit”button

FDMHostLevel

OpenupanSSHsessiontoyourhostandedit“/etc/opt/vmware/fdm/fdm.cfg”

vCenterLevelIntheWebClient:

Click“vCenter”click“vCenterServers”selecttheappropriatevCenterServerandclickthe“Manage”tabclick“Settings”and“AdvancedSettings”

Inthissectionwewillprimarilyfocusontheonesmostcommonlyused,afulldetailedlistcanbefoundinKB2033250.Pleasenotethateachbulletdetailstheversionwhichsupportsthisadvancedsetting.

das.maskCleanShutdownEnabled-5.0,5.1,5.5


158AdvancedSettings

https://kb.vmware.com/kb/2033250

WhetherthecleanshutdownflagwilldefaulttofalseforaninaccessibleandpoweredOffVM.EnablingthisoptionwilltriggerVMfailoveriftheVM'shomedatastoreisn'taccessiblewhenitdiesorisintentionallypoweredoff.

das.ignoreInsufficientHbDatastore-5.0,5.1,5.5,6.0Suppressthehostconfigissuethatthenumberofheartbeatdatastoresislessthandas.heartbeatDsPerHost.Defaultvalueis“false”.Canbeconfiguredas“true”or“false”.

das.heartbeatDsPerHost-5.0,5.1,5.5,6.0Thenumberofrequiredheartbeatdatastoresperhost.Thedefaultvalueis2;valueshouldbebetween2and5.

das.failuredetectiontime-4.1andpriorNumberofmilliseconds,timeouttime,forisolationresponseaction(withadefaultof15000milliseconds).Pre-vSphere4.0itwasageneralbestpracticetoincreasethevalueto60000whenanactive/standbyServiceConsolesetupwasused.Thisisnolongerneeded.ForahostwithtwoServiceConsolesorasecondaryisolationaddressafailuredetectiontimeof15000isrecommended.

das.isolationaddress[x]-5.0,5.1,5.5,6.0IPaddresstheESXhostsusestocheckonisolationwhennoheartbeatsarereceived,where[x]=0‐9.(seescreenshotbelowforanexample)VMwareHAwillusethedefaultgatewayasanisolationaddressandtheprovidedvalueasanadditionalcheckpoint.Irecommendtoaddanisolationaddresswhenasecondaryserviceconsoleisbeingusedforredundancypurposes.

das.usedefaultisolationaddress-5.0,5.1,5.5,6.0Valuecanbe“true”or“false”andneedstobesettofalseincasethedefaultgateway,whichisthedefaultisolationaddress,shouldnotorcannotbeusedforthispurpose.Inotherwords,ifthedefaultgatewayisanon-pingableaddress,setthe“das.isolationaddress0”toapingableaddressanddisabletheusageofthedefaultgatewaybysettingthisto“false”.

das.isolationShutdownTimeout-5.0,5.1,5.5,6.0TimeinsecondstowaitforaVMtobecomepoweredoffafterinitiatingaguestshutdown,beforeforcingapoweroff.

das.allowNetwork[x]-5.0,5.1,5.5EnablestheuseofportgroupnamestocontrolthenetworksusedforVMwareHA,where[x]=0–?.YoucansetthevaluetobeʺServiceConsole2ʺorʺManagementNetworkʺtouse(only)thenetworksassociatedwiththoseportgroupnamesinthenetworkingconfiguration.In5.5thisoptionisignoredwhenVSANisenabledbytheway!

das.bypassNetCompatCheck-4.1andpriorDisablethe“compatiblenetwork”checkforHAthatwasintroducedwithESX3.5Update2.DisablingthischeckwillenableHAtobeconfiguredinaclusterwhich


159AdvancedSettings

containshostsindifferentsubnets,so-calledincompatiblenetworks.Defaultvalueis“false”;settingitto“true”disablesthecheck.

das.ignoreRedundantNetWarning-5.0,5.1,5.5Removetheerroricon/messagefromyourvCenterwhenyoudon’thavearedundantServiceConsoleconnection.Defaultvalueis“false”,settingitto“true”willdisablethewarning.HAmustbereconfiguredaftersettingtheoption.

das.vmMemoryMinMB-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotMemInMB”.

das.slotMemInMB-5.0,5.1,5.5Setstheslotsizeformemorytothespecifiedvalue.Thisadvancedsettingcanbeusedwhenavirtualmachinewithalargememoryreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.

das.vmCpuMinMHz-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotCpuInMHz”.

das.slotCpuInMHz-5.0,5.1,5.5SetstheslotsizeforCPUtothespecifiedvalue.ThisadvancedsettingcanbeusedwhenavirtualmachinewithalargeCPUreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.

das.perHostConcurrentFailoversLimit-5.0,5.1,5.5Bydefault,HAwillissueupto32concurrentVMpower-onsperhost.Thissettingcontrolsthemaximumnumberofconcurrentrestartsonasinglehost.SettingalargervaluewillallowmoreVMstoberestartedconcurrentlybutwillalsoincreasetheaveragelatencytorecoverasitaddsmorestressonthehostsandstorage.

das.config.log.maxFileNum-5.0,5.1,5.5Desirednumberoflogrotations.

das.config.log.maxFileSize-5.0,5.1,5.5Maximumfilesizeinbytesofthelogfile.

das.config.log.directory-5.0,5.1,5.5Fulldirectorypathusedtostorelogfiles.

das.maxFtVmsPerHost-5.0,5.1,5.5ThemaximumnumberofprimaryandsecondaryFTvirtualmachinesthatcanbeplacedonasinglehost.Thedefaultvalueis4.

das.includeFTcomplianceChecks-5.0,5.1,5.5ControlswhethervSphereFaultTolerancecompliancechecksshouldberunaspartoftheclustercompliancechecks.SetthisoptiontofalsetoavoidclustercompliancefailureswhenFaultToleranceisnotbeingusedinacluster.

das.iostatsinterval(VMMonitoring)-5.0,5.1,5.5,6.0TheI/Ostatsintervaldeterminesifanydiskornetworkactivityhasoccurredforthe


160AdvancedSettings

virtualmachine.Thedefaultvalueis120seconds.das.config.fdm.deadIcmpPingInterval-5.0,5.1,5.5

Defaultvalueis10.ICPMpingsareusedtodeterminewhetheraslavehostisnetworkaccessiblewhentheFDMonthathostisnotconnectedtothemaster.Thisparametercontrolstheinterval(expressedinseconds)betweenpings.

das.config.fdm.icmpPingTimeout-5.0,5.1,5.5Defaultvalueis5.DefinesthetimetowaitinsecondsforanICMPpingreplybeforeassumingthehostbeingpingedisnotnetworkaccessible.

das.config.fdm.hostTimeout-5.0,5.1,5.5Defaultis10.ControlshowlongamasterFDMwaitsinsecondsforaslaveFDMtorespondtoaheartbeatbeforedeclaringtheslavehostnotconnectedandinitiatingtheworkflowtodeterminewhetherthehostisdead,isolated,orpartitioned.

das.config.fdm.stateLogInterval-5.0,5.1,5.5Defaultis600.Frequencyinsecondstologclusterstate.

das.config.fdm.ft.cleanupTimeout-5.0,5.1,5.5Defaultis900.WhenavSphereFaultToleranceVMispoweredonbyvCenterServer,vCenterServerinformstheHAmasteragentthatitisdoingso.ThisoptioncontrolshowmanysecondstheHAmasteragentwaitsforthepoweronofthesecondaryVMtosucceed.Ifthepowerontakeslongerthanthistime(mostlikelybecausevCenterServerhaslostcontactwiththehostorhasfailed),themasteragentwillattempttopoweronthesecondaryVM.

das.config.fdm.storageVmotionCleanupTimeout-5.0,5.1,5.Defaultis900.WhenaStoragevMotionisdoneinaHAenabledclusterusingpre5.0hostsandthehomedatastoreoftheVMisbeingmoved,HAmayinterpretthecompletionofthestoragevmotionasafailure,andmayattempttorestartthesourceVM.Toavoidthisissue,theHAmasteragentwaitsthespecifiednumberofsecondsforastoragevmotiontocomplete.Whenthestoragevmotioncompletesorthetimerexpires,themasterwillassesswhetherafailureoccurred.

das.config.fdm.policy.unknownStateMonitorPeriod-5.0,5.1,5.5,6.0DefinesthenumberofsecondstheHAmasteragentwaitsafteritdetectsthataVMhasfailedbeforeitattemptstorestarttheVM.

das.config.fdm.event.maxMasterEvents-5.0,5.1,5.5Defaultis1000.Definesthemaximumnumberofeventscachedbythemaster

das.config.fdm.event.maxSlaveEvents-5.0,5.1,5.5Defaultis600.Definesthemaximumnumberofeventscachedbyaslave.

Thatisalonglistofadvancedsettingsindeed,andhopefullynooneisplanningtotrythemalloutonasinglecluster,orevenonmultipleclusters.Avoidusingadvancedsettingsasmuchaspossibleasitdefinitelyleadstoincreasedcomplexity,andoftentomoredowntimeratherthanless.


161AdvancedSettings

SummarizingHopefullyIhavesucceededingivingyouabetterunderstandingoftheinternalworkingsofHA.IhopethatthispublicationhashandedyouthetoolsneededtoupdateyourvSpheredesignandultimatelytoincreasetheresiliencyandup-timeofyourenvironment.

Ihavetriedtosimplifysomeoftheconceptstomakeiteasiertounderstand,stillweacknowledgethatsomeconceptsaredifficulttograspandtheamountofarchitecturalchangesthatvSphere5andnewfunctionalitythatvSphere6havebroughtcanbeconfusingattimes.Ihopethoughthatafterreadingthiseveryoneisconfidentenoughtomaketherequiredorrecommendedchanges.

Ifthereareanyquestionspleasedonothesitatetoreachoutmeviatwitterormyblog,orleaveacommentontheonlineversionofthispublication.Iwilldomybesttoansweryourquestions.


162Summarizing

Changelog1.0.1-Minoredits1.0.2-StartwithVSANStretchedClusterinUsecasesection1.0.3-StartwithVVolsectioninVSANandVVolspecificssection1.0.4-UpdatetoVVolsectionandreplaceddiagram(figure15)


163Changelog

Documents

Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course