163

Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Embed Size (px)

Citation preview

Page 1: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course
Page 2: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

TableofContentsIntroduction

Disclaimer

Abouttheauthor

IntroductiontoHA

ComponentsofHA

FundamentalConcepts

RestartingVirtualMachines

VirtualSANandVirtualVolumesspecifics

AddingresiliencytoHA

AdmissionControl

VMandApplicationMonitoring

vSphereHAand...

UseCase-StretchedClusters

AdvancedSettings

Summarizing

Changelog

vSphere6.xHADeepdive

2

Page 3: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

VMwarevSphere6.xHADeepdiveLikemanyofyouIamconstantlytryingtoexplorenewwaystosharecontentwiththerestoftheworld.OverthecourseofthelastdecadeIhavedonethisinmanydifferentformats,someofthemwereeasytodoandothersnotsomuch.Booksalwaysfellinthatlastcategory,whichisashameasIhavealwaysenjoyedwritingthem.

Iwantedtoexplorethedifferentoptionstherearetocreatecontentandshareitindifferentways,withouttheneedtore-doformattingandwastealotoftimeonthingsIdonotwanttowastetimeon.AfteranafternoonofreadingandresearchingGitBookpoppedup.Itlookedlikeaninterestingplatform/solutionthatwouldallowmetocreatecontentbothonlineandoffline,pushandpullittoandfromarepositoryandbuildbothastaticwebsitefromitaswellaspublishitinavarietyofdifferentformats.

Letitbeclearthatthisisatrial,andthismayormaynotresultinafollowup.IamstartingwiththevSphereHighAvailabilitycontentasthatiswhatIammostfamiliarwithandwillbeeasiesttoupdate.

Aspecialthanksgoesouttoeveryonewhohascontributedinanyshapeorformtothisproject.FirstofallFrankDenneman,thepersonwhomIwrotethefirst3versionsoftheClusteringDeepdivewithandwhodesignedallthegreatdiagramswhichyoufindthroughoutthispublication.Ofcoursealso:DougBaerforeditingthecontentinthepastandmytechnicalconscious:KeithFarkas,CormacHogan,ManojKrishnan,AnneHoller,MustafaUysalandGabrielTarasuk-Levin.

Forofflinereading,feelfreetodownloadthispublicationinanyofthefollowingformats:PDF-ePub-Mobi.

ThesourceofthispublicationisstoredonbothGitbookaswellasGithub.Feelfreetosubmit/contributewherepossibleandneeded.Notethatitisalsopossibletoleavefeedbackonthecontentbysimplyclickingonthe"+"ontherightsideoftheparagraphyouwanttocommenton(hoveroveritwithyourmouse).IwillreadandincorporatefeedbackassoonasIhavetime,henceitisusefultocheckbackregularlyandvalidateyourdownloadedversionagainstthedetailsbelow.

vSphere6.xHADeepdive,bookversion:1.0.4.BookbuiltwithGitBookversion:2.6.7.

Thanksforreading,andenjoy!

vSphere6.xHADeepdive

3Introduction

Page 4: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

DuncanEppingChiefTechnologistStorageandAvailability-VMware

vSphere6.xHADeepdive

4Introduction

Page 5: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

DisclaimerAlthougheveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorassumenoresponsibilityforerrorsoromissions.Neitherisanyliabilityassumedfordamagesresultingfromtheuseoftheinformationcontainedherein.

TheauthorofthispublicationworksforVMware.Theopinionsexpressedhereistheauthor'spersonalopinion.ContentpublishedwasnotapprovedinadvancebyVMwareanddoesnotnecessarilyreflecttheviewsandopinionofVMware.Thisistheauthor'sbook,notaVMware.

Copyrights/Licensing

Figure1-CreativeCommonsLicense

vSphere6.xHADeepdive

5Disclaimer

Page 6: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AbouttheAuthorDuncanEppingisaChiefTechnologistworkingintheOfficeofCTOofVMware'sStorageandAvailabilitybusinessunit.Inthatrole,heservesasapartnerandtrustedadvisertoVMware’scustomersprimarilyinEMEA.MainresponsibilitiesareensuringVMware’sfutureinnovationsalignwithessentialcustomerneedsandtranslatingcustomerproblemstoopportunities.DuncanspecializesinSoftwareDefinedStorage,hyper-convergedinfrastructuresandbusinesscontinuity/disasterrecoverysolutions.Hehas1patentgrantedand4patentspendingonthetopicofavailability,storageandresourcemanagement.DuncanisaVMwareCertifiedDesignExpert(VCDX007)andthemainauthorandownerofVMware/VirtualizationblogYellow-Bricks.com.

Hecanbefollowedontwitter@DuncanYB.

vSphere6.xHADeepdive

6Abouttheauthor

Page 7: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

IntroductiontovSphereHighAvailabilityAvailabilityhastraditionallybeenoneofthemostimportantaspectswhenprovidingservices.WhenprovidingservicesonasharedplatformlikeVMwarevSphere,theimpactofdowntimeexponentiallygrowsasmanyservicesrunonasinglephysicalmachine.AssuchVMwareengineeredafeaturecalledVMwarevSphereHighAvailability.VMwarevSphereHighAvailability,hereaftersimplyreferredtoasHA,providesasimpleandcosteffectivesolutiontoincreaseavailabilityforanyapplicationrunninginavirtualmachineregardlessofitsoperatingsystem.ItisconfiguredusingacoupleofsimplestepsthroughvCenterServer(vCenter)andassuchprovidesauniformandsimpleinterface.HAenablesyoutocreateaclusteroutofmultipleESXihosts.Thiswillallowyoutoprotectvirtualmachinesandtheirworkloads.Intheeventofafailureofoneofthehostsinthecluster,impactedvirtualmachinesareautomaticallyrestartedonotherESXihostswithinthatsameVMwarevSphereCluster(cluster).

Figure2-HighAvailabilityinaction

Ontopofthat,inthecaseofaGuestOSlevelfailure,HAcanrestartthefailedGuestOS.ThisfeatureiscalledVMMonitoring,butissometimesalsoreferredtoasVM-HA.Thismightsoundfairlycomplexbutagaincanbeimplementedwithasingleclick.

vSphere6.xHADeepdive

7IntroductiontoHA

Page 8: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure3-OSLevelHAjustasingleclickaway

Unlikemanyotherclusteringsolutions,HAisasimplesolutiontoimplementandliterallyenabledwithin5clicks.Ontopofthat,HAiswidelyadoptedandusedinallsituations.However,HAisnota1:1replacementforsolutionslikeMicrosoftClusteringServices/WindowsServerFailoverClustering(WSFC).ThemaindifferencebetweenWSFCandHAbeingthatWSFCwasdesignedtoprotectstatefulcluster-awareapplicationswhileHAwasdesignedtoprotectanyvirtualmachineregardlessofthetypeofworkloadwithin,butalsocanbeextendedtotheapplicationlayerthroughtheuseofVMandApplicationMonitoring.

InthecaseofHA,afail-overincursdowntimeasthevirtualmachineisliterallyrestartedononeoftheremaininghostsinthecluster.WhereasMSCStransitionstheservicetooneoftheremainingnodesintheclusterwhenafailureoccurs.Incontrarytowhatmanybelieve,WSFCdoesnotguaranteethatthereisnodowntimeduringatransition.Ontopofthat,yourapplicationneedstobecluster-awareandstatefulinordertogetthemostoutofthismechanism,whichlimitsthenumberofworkloadsthatcouldreallybenefitfromthistypeofclustering.

OnemightaskwhywouldyouwanttouseHAwhenavirtualmachineisrestartedandserviceistemporarilylost.Theanswerissimple;notallvirtualmachines(orservices)need99.999%uptime.FormanyservicesthetypeofavailabilityHAprovidesismorethansufficient.Ontopofthat,manyapplicationswereneverdesignedtorunontopofanWSFCcluster.ThismeansthatthereisnoguaranteeofavailabilityordataconsistencyifanapplicationisclusteredwithWSFCbutisnotcluster-aware.

Inaddition,WSFCclusteringcanbecomplexandrequiresspecialskillsandtraining.Oneexampleismanagingpatchesandupdates/upgradesinaWSFCenvironment;thiscouldevenleadtomoredowntimeifnotoperatedcorrectlyanddefinitelycomplicatesoperationalprocedures.HAhoweverreducescomplexity,costs(associatedwithdowntimeandMSCS),resourceoverheadandunplanneddowntimeforminimaladditionalcosts.ItisimportanttonotethatHA,contrarytoWSFC,doesnotrequireanychangestotheguestasHAisprovidedonthehypervisorlevel.Also,VMMonitoringdoesnotrequireanyadditionalsoftwareorOSmodificationsexceptforVMwareTools,whichshouldbeinstalledanywayasabestpractice.Incaseevenhigheravailabilityisrequired,VMwarealsoprovidesalevelof

vSphere6.xHADeepdive

8IntroductiontoHA

Page 9: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

applicationawarenessthroughApplicationMonitoring,whichhasbeenleveragedbypartnerslikeSymantectoenableapplicationlevelresiliencyandcouldbeusedbyin-housedevelopmentteamstoincreaseresiliencyfortheirapplication.

HAhasprovenitselfoverandoveragainandiswidelyadoptedwithintheindustry;ifyouarenotusingittoday,hopefullyyouwillbeconvincedafterreadingthissectionofthebook.

vSphere6.0BeforewediveintothemainconstructsofHAanddescribeallthechoicesonehastomakewhenconfiguringHA,wewillfirstbrieflytouchonwhat’snewinvSphere6.0anddescribethebasicrequirementsandstepsneededtoenableHA.ThisbookcoversallthereleasedversionsofwhatisknownwithinVMwareas“FaultDomainManager”(FDM)whichwasintroducedwithvSphere5.0.Wewillcalloutthedifferencesinbehaviorinthedifferentversionswhereapplicable,ourbaselinehoweverisvSphere6.0.

What’sNewin6.0?

ComparedtovSphere5.0thechangesintroducedwithvSphere6.0forHAappeartobeminor.However,someofthenewfunctionalitywillmakethelifeofmanyofyoumucheasier.Althoughthelistisrelativelyshort,fromanengineeringpointofviewmanyofthesethingshavebeenanenormouseffortastheyrequiredchangetothedeepfundamentsoftheHAarchitecture.

SupportforVirtualVolumes–WithVirtualVolumesanewtypeofstorageentityisintroducedinvSphere6.0.ThishasalsoresultedinsomechangesintheHAarchitecturetoaccommodateforthisnewwayofstoringvirtualmachinesSupportforVirtualSAN–ThiswasactuallyintroducedwithvSphere5.5,butasitisnewtomanyofyouandledtochangesinthearchitecturewedecidedtoincludeitinthisupdateVMComponentProtection–ThisallowsHAtorespondtoascenariowheretheconnectiontothevirtualmachine’sdatastoreisimpactedtemporarilyorpermanently

HA“ResponseforDatastorewithAllPathsDown”HA“ResponseforDatastorewithPermanentDeviceLoss”

Increasedhostscale–Clusterlimithasgrownfrom32to64hostsIncreasedVMscale–Clusterlimithasgrownfrom4000VMsto8000VMsperclusterSecureRPC–SecurestheVM/AppmonitoringchannelFullIPv6supportRegistrationof“HADisabled”VMsonhostsafterfailure

vSphere6.xHADeepdive

9IntroductiontoHA

Page 10: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

WhatisrequiredforHAtoWork?EachfeatureorproducthasveryspecificrequirementsandHAisnodifferent.KnowingtherequirementsofHAispartofthebasicswehavetocoverbeforedivingintosomeofthemorecomplexconcepts.ForthosewhoarecompletelynewtoHA,wewillalsoshowyouhowtoconfigureit.

Prerequisites

BeforeenablingHAitishighlyrecommendvalidatingthattheenvironmentmeetsalltheprerequisites.Wehavealsoincludedrecommendationsfromaninfrastructureperspectivethatwillenhanceresiliency.

Requirements:

MinimumoftwoESXihostsMinimumof5GBmemoryperhosttoinstallESXiandenableHAVMwarevCenterServerSharedStorageforvirtualmachinesPingablegatewayorotherreliableaddress

Recommendation:

RedundantManagementNetwork(notarequirement,buthighlyrecommended)8GBofmemoryormoreperhostMultipleshareddatastores

FirewallRequirements

ThefollowingtablecontainstheportsthatareusedbyHAforcommunication.Ifyourenvironmentcontainsfirewallsexternaltothehost,ensuretheseportsareopenedforHAtofunctioncorrectly.HAwillopentherequiredportsontheESXorESXifirewall.

Port Protocol Direction

8182 UDP Inbound

8182 TCP Inbound

8182 UDP Outbound

8182 TCP Outbound

ConfiguringvSphereHighAvailability

vSphere6.xHADeepdive

10IntroductiontoHA

Page 11: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

HAcanbeconfiguredwiththedefaultsettingswithinacoupleofclicks.ThefollowingstepswillshowyouhowtocreateaclusterandenableHA,includingVMMonitoring,usingthevSphereWebClient.Eachofthesettingsandthedesigndecisionsassociatedwiththesestepswillbedescribedinmoredepthinthefollowingchapters.

1. Click“Hosts&Clusters”underInventoriesontheHometab.2. Right-clicktheDatacenterintheInventorytreeandclickNewCluster.3. Givethenewclusteranappropriatename.Werecommendataminimumincludingthe

locationoftheclusterandasequencenumberie.ams-hadrs-001.4. SelectTurnOnvSphereHA.5. Ensure“Enablehostmonitoring”and“Enableadmissioncontrol”isselected.6. Select“Percentageofclusterresources…”underPolicyandspecifyapercentage.7. EnableVMMonitoringStatusbyselecting“VMandApplicationMonitoring”.8. Click“OK”tocompletethecreationofthecluster.

Figure4-ReadytocompletetheNewClusterWizard

vSphere6.xHADeepdive

11IntroductiontoHA

Page 12: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

WhentheHAclusterhasbeencreated,theESXihostscanbeaddedtotheclustersimplybyrightclickingthehostandselecting“MoveTo”,iftheywerealreadyaddedtovCenter,orbyrightclickingtheclusterandselecting“AddHost”.

WhenanESXihostisaddedtothenewly-createdcluster,theHAagentwillbeloadedandconfigured.Oncethishascompleted,HAwillenableprotectionoftheworkloadsrunningonthisESXihost.

Aswehaveclearlydemonstrated,HAisasimpleclusteringsolutionthatwillallowyoutoprotectvirtualmachinesagainsthostfailureandoperatingsystemfailureinliterallyminutes.UnderstandingthearchitectureofHAwillenableyoutoreachthatextra9whenitcomestoavailability.ThefollowingchapterswilldiscussthearchitectureandfundamentalconceptsofHA.Wewillalsodiscussalldecision-makingmomentstoensureyouwillconfigureHAinsuchawaythatitmeetstherequirementsofyouroryourcustomer’senvironment.

vSphere6.xHADeepdive

12IntroductiontoHA

Page 13: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ComponentsofHighAvailabilityNowthatweknowwhatthepre-requisitesareandhowtoconfigureHAthenextstepswillbedescribingwhichcomponentsformHA.Keepinmindthatthisisstilla“highlevel”overview.Thereismoreunderthecoverthatwewillexplaininfollowingchapters.Thefollowingdiagramdepictsatwo-hostclusterandshowsthekeyHAcomponents.

Figure5-ComponentsofHighAvailability

Asyoucanclearlysee,therearethreemajorcomponentsthatformthefoundationforHAasofvSphere6.0:

FDMHOSTDvCenter

ThefirstandprobablythemostimportantcomponentthatformsHAisFDM(FaultDomainManager).ThisistheHAagent.

vSphere6.xHADeepdive

13ComponentsofHA

Page 14: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

TheFDMAgentisresponsibleformanytaskssuchascommunicatinghostresourceinformation,virtualmachinestatesandHApropertiestootherhostsinthecluster.FDMalsohandlesheartbeatmechanisms,virtualmachineplacement,virtualmachinerestarts,loggingandmuchmore.Wearenotgoingtodiscussallofthisin-depthseparatelyaswefeelthatthiswillcomplicatethingstoomuch.

FDM,inouropinion,isoneofthemostimportantagentsonanESXihost,whenHAisenabled,ofcourse,andweareassumingthisisthecase.TheengineersrecognizedthisimportanceandaddedanextralevelofresiliencytoHA.FDMusesasingle-processagent.However,FDMspawnsawatchdogprocess.Intheunlikelyeventofanagentfailure,thewatchdogfunctionalitywillpickuponthisandrestarttheagenttoensureHAfunctionalityremainswithoutanyoneevernoticingitfailed.Theagentisalsoresilienttonetworkinterruptionsand“allpathsdown”(APD)conditions.Inter-hostcommunicationautomaticallyusesanothercommunicationpath(ifthehostisconfiguredwithredundantmanagementnetworks)inthecaseofanetworkfailure.

HAhasnodependencyonDNSasitworkswithIPaddressesonly.ThisisoneofthemajorimprovementsthatFDMbrought.ThisdoesnotmeanthatESXihostsneedtoberegisteredwiththeirIPaddressesinvCenter;itisstillabestpracticetoregisterESXihostsbyitsfullyqualifieddomainname

(FQDN)invCenter.AlthoughHAdoesnotdependonDNS,rememberthatotherservicesmaydependonit.Ontopofthat,monitoringandtroubleshootingwillbemucheasierwhenhostsarecorrectlyregisteredwithinvCenterandhaveavalidFQDN.

Basicdesignprinciple:AlthoughHAisnotdependentonDNS,itisstillrecommendedtoregisterthehostswiththeirFQDNforeaseofoperations/management.

vSphereHAalsohasastandardizedloggingmechanism,whereasinglelogfilehasbeencreatedforalloperationallogmessages;itiscalledfdm.log.Thislogfileisstoredunder/var/log/asdepictedinFigure5.

Figure6-HAlogfile

vSphere6.xHADeepdive

14ComponentsofHA

Page 15: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Ensuresyslogiscorrectlyconfiguredandlogfilesareoffloadedtoasafelocationtoofferthepossibilityofperformingarootcauseanalysisincasedisasterstrikes.

HOSTDAgentOneofthemostcrucialagentsonahostisHOSTD.Thisagentisresponsibleformanyofthetaskswetakeforgrantedlikepoweringonvirtualmachines.FDMtalksdirectlytoHOSTDandvCenter,soitisnotdependentonVPXA,likeinpreviousreleases.Thisis,ofcourse,toavoidanyunnecessaryoverheadanddependencies,makingHAmorereliablethaneverbeforeandenablingHAtorespondfastertopower-onrequests.ThatultimatelyresultsinhigherVMuptime.

When,forwhateverreason,HOSTDisunavailableornotyetrunningafterarestart,thehostwillnotparticipateinanyFDM-relatedprocesses.FDMreliesonHOSTDforinformationaboutthevirtualmachinesthatareregisteredtothehost,andmanagesthevirtualmachinesusingHOSTDAPIs.Inshort,FDMisdependentonHOSTDandifHOSTDisnotoperational,FDMhaltsallfunctionsandwaitsforHOSTDtobecomeoperational.

vCenterThatbringsustoourfinalcomponent,thevCenterServer.vCenteristhecoreofeveryvSphereClusterandisresponsibleformanytasksthesedays.Forourpurposes,thefollowingarethemostimportantandtheoneswewilldiscussinmoredetail:

DeployingandconfiguringHAAgentsCommunicationofclusterconfigurationchangesProtectionofvirtualmachines

vCenterisresponsibleforpushingouttheFDMagenttotheESXihostswhenapplicable.Thepushoftheseagentsisdoneinparalleltoallowforfasterdeploymentandconfigurationofmultiplehostsinacluster.vCenterisalsoresponsibleforcommunicatingconfigurationchangesintheclustertothehostwhichiselectedasthemaster.Wewilldiscussthisconceptofmasterandslavesinthefollowingchapter.Examplesofconfigurationchangesaremodificationoradditionofanadvancedsettingortheintroductionofanewhostintothecluster.

HAleveragesvCentertoretrieveinformationaboutthestatusofvirtualmachinesand,ofcourse,vCenterisusedtodisplaytheprotectionstatus(Figure6)ofvirtualmachines.(What“virtualmachineprotection”actuallymeanswillbediscussedinchapter3.)Ontopofthat,vCenterisresponsiblefortheprotectionandunprotectionofvirtualmachines.Thisnotonly

vSphere6.xHADeepdive

15ComponentsofHA

Page 16: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

appliestouserinitiatedpower-offsorpower-onsofvirtualmachines,butalsointhecasewhereanESXihostisdisconnectedfromvCenteratwhichpointvCenterwillrequestthemasterHAagenttounprotecttheaffectedvirtualmachines.

Figure7-Virtualmachineprotectionstate

AlthoughHAisconfiguredbyvCenterandexchangesvirtualmachinestateinformationwithHA,vCenterisnotinvolvedwhenHArespondstofailure.ItiscomfortingtoknowthatincaseofahostfailurecontainingthevirtualizedvCenterServer,HAtakescareofthefailureandrestartsthevCenterServeronanotherhost,includingallotherconfiguredvirtualmachinesfromthatfailedhost.

ThereisacornercasescenariowithregardstovCenterfailure:iftheESXihostsaresocalled“statelesshosts”andDistributedvSwitchesareusedforthemanagementnetwork,virtualmachinerestartswillnotbeattempteduntilvCenterisrestarted.Forstatelessenvironments,vCenterandAutoDeployavailabilityiskeyastheESXihostsliterallydependonthem.

IfvCenterisunavailable,itwillnotbepossibletomakechangestotheconfigurationofthecluster.vCenteristhesourceoftruthforthesetofvirtualmachinesthatareprotected,theclusterconfiguration,thevirtualmachine-to-hostcompatibilityinformation,andthehostmembership.So,whileHA,bydesign,willrespondtofailureswithoutvCenter,HAreliesonvCentertobeavailabletoconfigureormonitorthecluster.

WhenavirtualvCenterServer,orthevCenterServerAppliance,hasbeenimplemented,werecommendsettingthecorrectHArestartprioritiesforit.AlthoughvCenterServerisnotrequiredtorestartvirtualmachines,therearemultiplecomponentsthatrelyonvCenterand,assuch,aspeedyrecoveryisdesired.WhenconfiguringyourvCentervirtualmachinewitha

vSphere6.xHADeepdive

16ComponentsofHA

Page 17: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

highpriorityforrestarts,remembertoincludeallservicesonwhichyourvCenterserverdependsforasuccessfulrestart:DNS,MSADandMSSQL(oranyotherdatabaseserveryouareusing).

Basicdesignprinciples:

1. Instatelessenvironments,ensurevCenterandAutoDeployarehighlyavailableasrecoverytimeofyourvirtualmachinesmightbedependentonthem.

2. UnderstandtheimpactofvirtualizingvCenter.EnsureithashighpriorityforrestartsandensurethatserviceswhichvCenterServerdependsonareavailable:DNS,ADanddatabase.

vSphere6.xHADeepdive

17ComponentsofHA

Page 18: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

FundamentalConceptsNowthatyouknowaboutthecomponentsofHA,itistimetostarttalkingaboutsomeofthefundamentalconceptsofHAclusters:

Master/SlaveagentsHeartbeatingIsolatedvsNetworkpartitionedVirtualMachineProtectionComponentProtection

EveryonewhohasimplementedvSphereknowsthatmultiplehostscanbeconfiguredintoacluster.Aclustercanbestbeseenasacollectionofresources.TheseresourcescanbecarvedupwiththeuseofvSphereDistributedResourceScheduler(DRS)intoseparatepoolsofresourcesorusedtoincreaseavailabilitybyenablingHA.

TheHAarchitectureintroducestheconceptofmasterandslaveHAagents.Exceptduringnetworkpartitions,whicharediscussedlater,thereisonlyonemasterHAagentinacluster.Anyagentcanserveasamaster,andallothersareconsidereditsslaves.Amasteragentisinchargeofmonitoringthehealthofvirtualmachinesforwhichitisresponsibleandrestartinganythatfail.Theslavesareresponsibleforforwardinginformationtothemasteragentandrestartinganyvirtualmachinesatthedirectionofthemaster.TheHAagent,regardlessofitsroleasmasterorslave,alsoimplementstheVM/AppmonitoringfeaturewhichallowsittorestartvirtualmachinesinthecaseofanOperatingSystemorrestartservicesinthecaseofanapplicationfailure.

MasterAgentAsstated,oneoftheprimarytasksofthemasteristokeeptrackofthestateofthevirtualmachinesitisresponsibleforandtotakeactionwhenappropriate.Inanormalsituationthereisonlyasinglemasterinacluster.Wewilldiscussthescenariowheremultiplemasterscanexistinasingleclusterinoneofthefollowingsections,butfornowlet’stalkaboutaclusterwithasinglemaster.Amasterwillclaimresponsibilityforavirtualmachinebytaking“ownership”ofthedatastoreonwhichthevirtualmachine’sconfigurationfileisstored.

vSphere6.xHADeepdive

18FundamentalConcepts

Page 19: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Tomaximizethechanceofrestartingvirtualmachinesafterafailurewerecommendmaskingdatastoresonaclusterbasis.Althoughsharingofdatastoresacrossclusterswillwork,itwillincreasecomplexityfromanadministrativeperspective.

Thatisnotall,ofcourse.TheHAmasterisalsoresponsibleforexchangingstateinformationwithvCenter.ThismeansthatitwillnotonlyreceivebutalsosendinformationtovCenterwhenrequired.TheHAmasterisalsothehostthatinitiatestherestartofvirtualmachineswhenahosthasfailed.Youmayimmediatelywanttoaskwhathappenswhenthemasteristheonethatfails,or,moregenerically,whichofthehostscanbecomethemasterandwhenisitelected?

Election

AmasteriselectedbyasetofHAagentswhenevertheagentsarenotinnetworkcontactwithamaster.AmasterelectionthusoccurswhenHAisfirstenabledonaclusterandwhenthehostonwhichthemasterisrunning:

fails,becomesnetworkpartitionedorisolated,isdisconnectedfromvCenterServer,isputintomaintenanceorstandbymode,orwhenHAisreconfiguredonthehost.

TheHAmasterelectiontakesapproximately15secondsandisconductedusingUDP.WhileHAwon’treacttofailuresduringtheelection,onceamasteriselected,failuresdetectedbeforeandduringtheelectionwillbehandled.Theelectionprocessissimplebutrobust.Thehostthatisparticipatingintheelectionwiththegreatestnumberofconnecteddatastoreswillbeelectedmaster.Iftwoormorehostshavethesamenumberofdatastoresconnected,theonewiththehighestManagedObjectIdwillbechosen.Thishoweverisdonelexically;meaningthat99beats100as9islargerthan1.Foreachhost,theHAStateofthehostwillbeshownontheSummarytab.Thisincludestheroleasdepictedinscreenshotbelowwherethehostisamasterhost.

Afteramasteriselected,eachslavethathasmanagementnetworkconnectivitywithitwillsetupasinglesecure,encrypted,TCPconnectiontothemaster.ThissecureconnectionisSSL-based.Onethingtostressherethoughisthatslavesdonotcommunicatewitheachotherafterthemasterhasbeenelectedunlessare-electionofthemasterneedstotakeplace.

vSphere6.xHADeepdive

19FundamentalConcepts

Page 20: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure8-MasterAgent

Asstatedearlier,whenamasteriselecteditwilltrytoacquireownershipofallofthedatastoresitcandirectlyaccessoraccessbyproxyingrequeststooneoftheslavesconnectedtoitusingthemanagementnetwork.Forregularstoragearchitecturesitdoesthisbylockingafilecalled“protectedlist”thatisstoredonthedatastoresinanexistingcluster.Themasterwillalsoattempttotakeownershipofanydatastoresitdiscoversalongtheway,anditwillperiodicallyretryanyitcouldnottakeownershipofpreviously.

Thenamingformatandlocationofthisfileisasfollows:

/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/protectedlist

Forthosewonderinghow“cluster-specific-directory”isconstructed:

<uuidofvCenterServer>-<numberpartoftheMoIDofthecluster>-<random8charstring>-

<nameofthehostrunningvCenterServer>

Themasterusesthisprotectedlistfiletostoretheinventory.ItkeepstrackofwhichvirtualmachinesareprotectedbyHA.Callingitaninventorymightbeslightlyoverstating:itisalistofprotectedvirtualmachinesanditincludesinformationaroundvirtualmachineCPUreservationandmemoryoverhead.Themasterdistributesthisinventoryacrossalldatastoresinusebythevirtualmachinesinthecluster.Thenextscreenshotshowsanexampleofthisfileononeofthedatastores.

vSphere6.xHADeepdive

20FundamentalConcepts

Page 21: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure9-Protectedlistfile

Nowthatweknowthemasterlocksafileonthedatastoreandthatthisfilestoresinventorydetails,whathappenswhenthemasterisisolatedorfails?Ifthemasterfails,theanswerissimple:thelockwillexpireandthenewmasterwillrelockthefileifthedatastoreisaccessibletoit.

Inthecaseofisolation,thisscenarioisslightlydifferent,althoughtheresultissimilar.ThemasterwillreleasethelockithasonthefileonthedatastoretoensurethatwhenanewmasteriselecteditcandeterminethesetofvirtualmachinesthatareprotectedbyHAbyreadingthefile.If,byanychance,amastershouldfailrightatthemomentthatitbecameisolated,therestartofthevirtualmachineswillbedelayeduntilanewmasterhasbeenelected.Inascenariolikethis,accuracyandthefactthatvirtualmachinesarerestartedismoreimportantthanashortdelay.

Let’sassumeforasecondthatyourmasterhasjustfailed.Whatwillhappenandhowdotheslavesknowthatthemasterhasfailed?HAusesapoint-to-pointnetworkheartbeatmechanism.Iftheslaveshavereceivednonetworkheartbeatsfromthemaster,theslaveswilltrytoelectanewmaster.Thisnewmasterwillreadtherequiredinformationandwillinitiatetherestartofthevirtualmachineswithinroughly10seconds.

Restartingvirtualmachinesisnottheonlyresponsibilityofthemaster.ItisalsoresponsibleformonitoringthestateoftheslavehostsandreportingthisstatetovCenterServer.Ifaslavefailsorbecomesisolatedfromthemanagementnetwork,themasterwilldeterminewhichvirtualmachinesmustberestarted.Whenvirtualmachinesneedtoberestarted,themasterisalsoresponsiblefordeterminingtheplacementofthosevirtualmachines.Itusesaplacementenginethatwilltrytodistributethevirtualmachinestoberestartedevenlyacrossallavailablehosts.

vSphere6.xHADeepdive

21FundamentalConcepts

Page 22: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Alloftheseresponsibilitiesarereallyimportant,butwithoutamechanismtodetectaslavehasfailed,themasterwouldbeuseless.Justliketheslavesreceiveheartbeatsfromthemaster,themasterreceivesheartbeatsfromtheslavessoitknowstheyarealive.

SlavesAslavehassubstantiallyfewerresponsibilitiesthanamaster:aslavemonitorsthestateofthevirtualmachinesitisrunningandinformsthemasteraboutanychangestothisstate.

Theslavealsomonitorsthehealthofthemasterbymonitoringheartbeats.Ifthemasterbecomesunavailable,theslavesinitiateandparticipateintheelectionprocess.Lastbutnotleast,theslavessendheartbeatstothemastersothatthemastercandetectoutages.Likethemastertoslavecommunication,allslavetomastercommunicationispointtopoint.HAdoesnotusemulticast.

Figure10-SlaveAgent

FilesforbothSlaveandMasterBeforeexplainingthedetailsitisimportanttounderstandthatbothVirtualSANandVirtualVolumeshaveintroducedchangestothelocationandtheusageoffiles.Forspecificsonthesetwodifferentstoragearchitectureswereferyoutothoserespectivesectionsinthebook.

Boththemasterandslaveusefilesnotonlytostorestate,butalsoasacommunicationmechanism.We’vealreadyseentheprotectedlistfile(Figure8)usedbythemastertostorethelistofprotectedvirtualmachines.Wewillnowdiscussthefilesthatarecreatedbyboth

vSphere6.xHADeepdive

22FundamentalConcepts

Page 23: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

themasterandtheslaves.Remotefilesarefilesstoredonashareddatastoreandlocalfilesarefilesthatarestoredinalocationonlydirectlyaccessibletothathost.

RemoteFiles

Thesetofpoweredonvirtualmachinesisstoredinaper-host“poweron”file.Itshouldbenotedthat,becauseamasteralsohostsvirtualmachines,italsocreatesa“poweron”file.

Thenamingschemeforthisfileisasfollows:host-number-poweron

Trackingvirtualmachinepower-onstateisnottheonlythingthe“poweron”fileisusedfor.Thisfileisalsousedbytheslavestoinformthemasterthatitisisolatedfromthemanagementnetwork:thetoplineofthefilewilleithercontaina0ora1.A0(zero)meansnot-isolatedanda1(one)meansisolated.ThemasterwillinformvCenterabouttheisolationofthehost.

LocalFiles

Asmentionedbefore,whenHAisconfiguredonahost,thehostwillstorespecificinformationaboutitsclusterlocally.

Figure11-Locallystoredfiles

Eachhost,includingthemaster,willstoredatalocally.Thedatathatislocallystoredisimportantstateinformation.Namely,theVM-to-hostcompatibilitymatrix,clusterconfiguration,andhostmembershiplist.Thisinformationispersistedlocallyoneachhost.UpdatestothisinformationissenttothemasterbyvCenterandpropagatedbythemastertotheslaves.Althoughweexpectthatmostofyouwillnevertouchthesefiles–andwehighlyrecommendagainstmodifyingthem–wedowanttoexplainhowtheyareused:

vSphere6.xHADeepdive

23FundamentalConcepts

Page 24: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

clusterconfigThisfileisnothuman-readable.Itcontainstheconfigurationdetailsofthecluster.vmmetadataThisfileisnothuman-readable.ItcontainstheactualcompatibilityinfomatrixforeveryHAprotectedvirtualmachineandlistsallthehostswithwhichitiscompatibleplusavm/hostdictionaryfdm.cfgThisfilecontainstheconfigurationsettingsaroundlogging.Forinstance,thelevelofloggingandsyslogdetailsarestoredinhere.hostlistAlistofhostsparticipatinginthecluster,includinghostname,IPaddresses,MACaddressesandheartbeatdatastores.

HeartbeatingWementioneditacoupleoftimesalreadyinthischapter,anditisanimportantmechanismthatdeservesitsownsection:heartbeating.HeartbeatingisthemechanismusedbyHAtovalidatewhetherahostisalive.HAhastwodifferentheartbeatingmechanisms.Theseheartbeatmechanismsallowsittodeterminewhathashappenedtoahostwhenitisnolongerresponding.Let’sdiscusstraditionalnetworkheartbeatingfirst.

NetworkHeartbeating

NetworkHeartbeatingisusedbyHAtodetermineifanESXihostisalive.Eachslavewillsendaheartbeattoitsmasterandthemastersendsaheartbeattoeachoftheslaves,thisisapoint-to-pointcommunication.Theseheartbeatsaresentbydefaulteverysecond.

Whenaslaveisn’treceivinganyheartbeatsfromthemaster,itwilltrytodeterminewhetheritisIsolated–wewilldiscuss“states”inmoredetaillateroninthischapter.

Basicdesignprinciple:Networkheartbeatingiskeyfordeterminingthestateofahost.Ensurethemanagementnetworkishighlyresilienttoenableproperstatedetermination.

DatastoreHeartbeating

DatastoreheartbeatingaddsanextralevelofresiliencyandpreventsunnecessaryrestartattemptsfromoccurringasitallowsvSphereHAtodeterminewhetherahostisisolatedfromthenetworkoriscompletelyunavailable.Howdoesthiswork?

Datastoreheartbeatingenablesamastertomoredeterminethestateofahostthatisnotreachableviathemanagementnetwork.Thenewdatastoreheartbeatmechanismisusedincasethemasterhaslostnetworkconnectivitywiththeslaves.Thedatastoreheartbeatmechanismisthenusedtovalidatewhetherahosthasfailedorismerelyisolated/network

vSphere6.xHADeepdive

24FundamentalConcepts

Page 25: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

partitioned.Isolationwillbevalidatedthroughthe“poweron”filewhich,asmentionedearlier,willbeupdatedbythehostwhenitisisolated.Withoutthe“poweron”file,thereisnowayforthemastertovalidateisolation.Letthatbeclear!Basedontheresultsofchecksofbothfiles,themasterwilldeterminetheappropriateactiontotake.Ifthemasterdeterminesthatahosthasfailed(nodatastoreheartbeats),themasterwillrestartthefailedhost’svirtualmachines.IfthemasterdeterminesthattheslaveisIsolatedorPartitioned,itwillonlytakeactionwhenitisappropriatetotakeaction.Withthatmeaningthatthemasterwillonlyinitiaterestartswhenvirtualmachinesaredownorpowereddown/shutdownbyatriggeredisolationresponse,butwewilldiscussthisinmoredetailinChapter4.

Bydefault,HAselects2heartbeatdatastores–itwillselectdatastoresthatareavailableonallhosts,orasmanyaspossible.Althoughitispossibletoconfigureanadvancedsetting(das.heartbeatDsPerHost)toallowformoredatastoresfordatastoreheartbeatingwedonotrecommendconfiguringthisoptionasthedefaultshouldbesufficientformostscenarios,exceptforstretchedclusterenvironmentswhereitisrecommendedtohavetwoineachsitemanuallyselected.

TheselectionprocessgivespreferencetoVMFSoverNFSdatastores,andseekstochoosedatastoresthatarebackedbydifferentLUNsorNFSserverswhenpossible.Ifdesired,youcanalsoselecttheheartbeatdatastoresyourself.We,however,recommendlettingvCenterdealwiththisoperational“burden”asvCenterusesaselectionalgorithmtoselectheartbeatdatastoresthatarepresentedtoallhosts.ThishoweverisnotaguaranteethatvCentercanselectdatastoreswhichareconnectedtoallhosts.ItshouldbenotedthatvCenterisnotsite-aware.Inscenarioswherehostsaregeographicallydisperseditisrecommendtomanuallyselectheartbeatdatastorestoensureeachsitehasonesite-localheartbeatdatastoreatminimum.

Basicdesignprinciple:Inametro-cluster/geographicallydispersedclusterwerecommendsettingtheminimumnumberofheartbeatdatastorestofour.Itisrecommendedtomanuallyselectsitelocaldatastores,twoforeachsite.

vSphere6.xHADeepdive

25FundamentalConcepts

Page 26: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure12-Selectingtheheartbeatdatastores

Thequestionnowarises:what,exactly,isthisdatastoreheartbeatingandwhichdatastoreisusedforthisheartbeating?Let’sanswerwhichdatastoreisusedfordatastoreheartbeatingfirstaswecansimplyshowthatwithascreenshot,seebelow.vSpheredisplaysextensivedetailsaroundthe“ClusterStatus”ontheCluster’sMonitortab.Thisforinstanceshowsyouwhichdatastoresarebeingusedforheartbeatingandwhichhostsareusingwhichspecificdatastore(s).Inaddition,itdisplayshowmanyvirtualmachinesareprotectedandhowmanyhostsareconnectedtothemaster.

InblockbasedstorageenvironmentsHAleveragesanexistingVMFSfilesystemmechanism.Thedatastoreheartbeatmechanismusesasocalled“heartbeatregion”whichisupdatedaslongasthefileisopen.OnVMFSdatastores,HAwillsimplycheckwhethertheheartbeatregionhasbeenupdated.Inordertoupdateadatastoreheartbeatregion,a

vSphere6.xHADeepdive

26FundamentalConcepts

Page 27: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

hostneedstohaveatleastoneopenfileonthevolume.HAensuresthereisatleastonefileopenonthisvolumebycreatingafilespecificallyfordatastoreheartbeating.Inotherwords,aper-hostfileiscreatedonthedesignatedheartbeatingdatastores,asshownbelow.Thenamingschemeforthisfileisasfollows:host-number-hb.

OnNFSdatastores,eachhostwillwritetoitsheartbeatfileonceevery5seconds,ensuringthatthemasterwillbeabletocheckhoststate.Themasterwillsimplyvalidatethisbycheckingthatthetime-stampofthefilechanged.

Realizethatinthecaseofaconvergednetworkenvironment,theeffectivenessofdatastoreheartbeatingwillvarydependingonthetypeoffailure.Forinstance,aNICfailurecouldimpactbothnetworkanddatastoreheartbeating.If,forwhateverreason,thedatastoreorNFSsharebecomesunavailableorisremovedfromthecluster,HAwilldetectthisandselectanewdatastoreorNFSsharetousefortheheartbeatingmechanism.

Basicdesignprinciple

Datastoreheartbeatingaddsanewlevelofresiliencybutisnotthebe-allend-all.Inconvergednetworkingenvironments,theuseofdatastoreheartbeatingaddslittlevalueduetothefactthataNICfailuremayresultinboththenetworkandstoragebecomingunavailable.

IsolatedversusPartitionedWe’vealreadybrieflytouchedonitanditistimetohaveacloserlook.Whenitcomestonetworkfailurestherearetwodifferentstatesthatexist.WhataretheseexactlyandwhenisahostPartitionedratherthanIsolated?Beforewewillexplainthiswewanttopointoutthatthereisthestateasreportedbythemasterandthestateasobservedbyanadministratorandthecharacteristicsthesehave.

vSphere6.xHADeepdive

27FundamentalConcepts

Page 28: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

First,considertheadministrator’sperspective.Twohostsareconsideredpartitionediftheyareoperationalbutcannotreacheachotheroverthemanagementnetwork.Further,ahostisisolatedifitdoesnotobserveanyHAmanagementtrafficonthemanagementnetworkanditcan’tpingtheconfiguredisolationaddresses.Itispossibleformultiplehoststobeisolatedatthesametime.Wecallasetofhoststhatarepartitionedbutcancommunicatewitheachothera“managementnetworkpartition”.Networkpartitionsinvolvingmorethantwopartitionsarepossiblebutnotlikely.

Now,considertheHAperspective.WhenanyHAagentisnotinnetworkcontactwithamaster,theywillelectanewmaster.So,whenanetworkpartitionexists,amasterelectionwilloccursothatahostfailureornetworkisolationwithinthispartitionwillresultinappropriateactionontheimpactedvirtualmachine(s).ThescreenshotbelowshowspossiblewaysinwhichanIsolationoraPartitioncanoccur.

Figure13-IsolatedversusPartitioned

vSphere6.xHADeepdive

28FundamentalConcepts

Page 29: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Ifaclusterispartitionedinmultiplesegments,eachpartitionwillelectitsownmaster,meaningthatifyouhave4partitionsyourclusterwillhave4masters.Whenthenetworkpartitioniscorrected,anyofthefourmasterswilltakeovertheroleandberesponsiblefortheclusteragain.Itshouldbenotedthatamastercouldclaimresponsibilityforavirtualmachinethatlivesinadifferentpartition.Ifthisoccursandthevirtualmachinehappenstofail,themasterwillbenotifiedthroughthedatastorecommunicationmechanism.

IntheHAarchitecture,whetherahostispartitionedisdeterminedbythemasterreportingthecondition.So,intheaboveexample,themasteronhostESXi-01willreportESXi-03and04partitionedwhilethemasteronhost04willreport01and02partitioned.Whenapartitionoccurs,vCenterreportstheperspectiveofonemaster.

Amasterreportsahostaspartitionedorisolatedwhenitcan’tcommunicatewiththehostoverthemanagementnetwork,itcanobservethehost’sdatastoreheartbeatsviatheheartbeatdatastores.Themastercannotalonedifferentiatebetweenthesetwostates–ahostisreportedasisolatedonlyifthehostinformsthemasterviathedatastoresthatisisolated.

ThisstillleavesthequestionopenhowthemasterdifferentiatesbetweenaFailed,Partitioned,orIsolatedhost.

Whenthemasterstopsreceivingnetworkheartbeatsfromaslave,itwillcheckforhost“liveness”forthenext15seconds.Beforethehostisdeclaredfailed,themasterwillvalidateifithasactuallyfailedornotbydoingadditionallivenesschecks.First,themasterwillvalidateifthehostisstillheartbeatingtothedatastore.Second,themasterwillpingthemanagementIPaddressofthehost.Ifbotharenegative,thehostwillbedeclaredFailed.Thisdoesn’tnecessarilymeanthehosthasPSOD’ed;itcouldbethenetworkisunavailable,includingthestoragenetwork,whichwouldmakethishostIsolatedfromanadministrator’sperspectivebutFailedfromanHAperspective.Asyoucanimagine,however,thereareavariouscombinationspossible.Thefollowingtabledepictsthesecombinationsincludingthe“state”.

State NetworkHeartbeat StorageHeartbeatHostLive-

nessPing

IsolationCriteriaMet

Running Yes N/A N/A N/A

Isolated No Yes No Yes

Partitioned No Yes No No

Failed No No No N/A

FDMAgentDown N/A N/A Yes N/A

vSphere6.xHADeepdive

29FundamentalConcepts

Page 30: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

HAwilltriggeranactionbasedonthestateofthehost.WhenthehostismarkedasFailed,arestartofthevirtualmachineswillbeinitiated.WhenthehostismarkedasIsolated,themastermightinitiatetherestarts.

Theonethingtokeepinmindwhenitcomestoisolationresponseisthatavirtualmachinewillonlybeshutdownorpoweredoffwhentheisolatedhostknowsthereisamasterouttherethathastakenownershipforthevirtualmachineorwhentheisolatedhostlosesaccesstothehomedatastoreofthevirtualmachine.

Forexample,ifahostisisolatedandrunstwovirtualmachines,storedonseparatedatastores,thehostwillvalidateifitcanaccesseachofthehomedatastoresofthosevirtualmachines.Ifitcan,thehostwillvalidatewhetheramasterownsthesedatastores.Ifnomasterownsthedatastores,theisolationresponsewillnotbetriggeredandrestartswillnotbeinitiated.Ifthehostdoesnothaveaccesstothedatastore,forinstance,duringan“AllPathsDown”condition,HAwilltriggertheisolationresponsetoensurethe“original”virtualmachineispowereddownandwillbesafelyrestarted.Thistoavoidso-called“split-brain”scenarios.

Toreiterate,asthisisaveryimportantaspectofHAandhowithandlesnetworkisolations,theremaininghostsintheclusterwillonlyberequestedtorestartvirtualmachineswhenthemasterhasdetectedthateitherthehosthasfailedorhasbecomeisolatedandtheisolationresponsewastriggered.

VirtualMachineProtectionVirtualmachineprotectionhappensonseverallayersbutisultimatelytheresponsibilityofvCenter.WehaveexplainedthisbrieflybutwanttoexpandonitabitmoretomakesureeveryoneunderstandsthedependencyonvCenterwhenitcomestoprotectingvirtualmachines.Wedowanttostressthatthisonlyappliestoprotectingvirtualmachines;virtualmachinerestartsinnowayrequirevCentertobeavailableatthetime.

Whenthestateofavirtualmachinechanges,vCenterwilldirectthemastertoenableordisableHAprotectionforthatvirtualmachine.Protection,however,isonlyguaranteedwhenthemasterhascommittedthechangeofstatetodisk.Thereasonforthis,ofcourse,isthatafailureofthemasterwouldresultinthelossofanystatechangesthatexistonlyinmemory.Aspointedoutearlier,thisstateisdistributedacrossthedatastoresandstoredinthe“protectedlist”file.

Whenthepowerstatechangeofavirtualmachinehasbeencommittedtodisk,themasterwillinformvCenterServersothatthechangeinstatusisvisiblebothfortheuserinvCenterandforotherprocesseslikemonitoringtools.

vSphere6.xHADeepdive

30FundamentalConcepts

Page 31: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Toclarifytheprocess,wehavecreatedaworkflowdiagramoftheprotectionofavirtualmachinefromthepointitispoweredonthroughvCenter:

Figure14-VirtualMachineprotectionworkflow

Butwhatabout“unprotection?”Whenavirtualmachineispoweredoff,itmustberemovedfromtheprotectedlist.WehavedocumentedthisworkflowinthefollowingdiagramforthesituationwherethepoweroffisinvokedfromvCenter.

vSphere6.xHADeepdive

31FundamentalConcepts

Page 32: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure15-VirtualMachineUnprotectionworkflow

vSphere6.xHADeepdive

32FundamentalConcepts

Page 33: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

RestartingVirtualMachinesInthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.

HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:

FailedhostIsolatedhostFailedguestoperatingsystem

Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.

Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.

RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.

Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachinefails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.

vSphere6.xHADeepdive

33RestartingVirtualMachines

Page 34: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.

Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:

AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority

ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.

Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.

RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.

HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.

Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.

T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4

vSphere6.xHADeepdive

34RestartingVirtualMachines

Page 35: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure16-HighAvailabilityrestarttimeline

vSphere6.xHADeepdive

35RestartingVirtualMachines

Page 36: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.

WhataboutVMswhichare"disabled"forHA?WhatwillhappenwiththoseVMs?BeforevSphere6.0thoseVMswouldbeleftalone,asofvSphere6.0theseVMswillberegisteredonanotherhostafterafailure.Thiswillallowyoutoeasilypower-onthatVMwhenneededwithoutneededtomanuallyre-registerityourself.Note,HAwillnotdoapower-onoftheVM,itwilljustregisteritforyou!

Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:

Cluster:4Host(esxi01,esxi02,esxi03,esxi04)

Master:esxi01

Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,

unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.

Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.

Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall

ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd

poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.

Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissued

vSphere6.xHADeepdive

36RestartingVirtualMachines

Page 37: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

poweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.

Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.

Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.

Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.

FailedhostFailureofamasterFailureofaslave

Isolatedhostandresponse

FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.

TheFailureofaSlave

Thefailureofaslavehostisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:

T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.

vSphere6.xHADeepdive

37RestartingVirtualMachines

Page 38: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.

Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor

15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging

themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe

15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat

datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.

vSphere6.xHADeepdive

38RestartingVirtualMachines

Page 39: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure17-Restarttimelineslavefailure

Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist

vSphere6.xHADeepdive

39RestartingVirtualMachines

Page 40: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.

Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.

Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.

TheFailureofaMaster

Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:

T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.

Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.

vSphere6.xHADeepdive

40RestartingVirtualMachines

Page 41: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure18-Restarttimelinemasterfailure

Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.

IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.

IsolationResponse

vSphere6.xHADeepdive

41RestartingVirtualMachines

Page 42: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:

Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.

Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.

Figure19-Clusterdefaultsettings

Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.

UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.

Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.

vSphere6.xHADeepdive

42RestartingVirtualMachines

Page 43: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.

Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.

Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.

Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.

Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.

vSphere6.xHADeepdive

43RestartingVirtualMachines

Page 44: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Likelihoodthathostwillretainaccessto

VMdatastore

LikelihoodVMswillretain

accesstoVM

network

RecommendedIsolationPolicy

Rationale

Likely Likely LeavePoweredOn

Virtualmachineisrunningfine,noreasontopoweritoff

Likely UnlikelyEitherLeavePoweredOnorShutdown.

ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage

Unlikely Likely PowerOff

UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork

Unlikely UnlikelyLeavePoweredOnorPowerOff

LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.

Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.

Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.

Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.

IsolationDetection

vSphere6.xHADeepdive

44RestartingVirtualMachines

Page 45: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:

IsolationofaslaveIsolationofamaster

Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.

IsolationofaSlave

HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.

T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse

WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.

Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.

vSphere6.xHADeepdive

45RestartingVirtualMachines

Page 46: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure20-Isolationofaslavetimeline

IsolationofaMaster

Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.

T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse

AdditionalChecks

Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.

vSphere6.xHADeepdive

46RestartingVirtualMachines

Page 47: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure21-IsolationAddress

SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.

Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.

IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.

RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.

vSphere6.xHADeepdive

47RestartingVirtualMachines

Page 48: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.

Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.

NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:

CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.

Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.

Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.

Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:

vSphere6.xHADeepdive

48RestartingVirtualMachines

Page 49: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.

ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.

ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.

Figure22-VirtualMachineComponentProtection

vSphere6.xHADeepdive

49RestartingVirtualMachines

Page 50: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.

APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.

Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.

Figure23-EnablingVirtualMachineComponentProtection

Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.

UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.

WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed

vSphere6.xHADeepdive

50RestartingVirtualMachines

Page 51: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.

YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.

ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.

Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.

vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.

das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"

Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.

vSphere6.xHADeepdive

51RestartingVirtualMachines

Page 52: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.

das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"

ADDSCREENSHOTHERE!#RestartingVirtualMachines

Inthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.

HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:

FailedhostIsolatedhostFailedguestoperatingsystem

Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.

Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.

RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.

Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachine

vSphere6.xHADeepdive

52RestartingVirtualMachines

Page 53: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

fails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.

Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.

Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:

AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority

ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.

Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.

RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.

HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.

Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.

vSphere6.xHADeepdive

53RestartingVirtualMachines

Page 54: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4

vSphere6.xHADeepdive

54RestartingVirtualMachines

Page 55: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure24-HighAvailabilityrestarttimeline

vSphere6.xHADeepdive

55RestartingVirtualMachines

Page 56: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.

Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:

Cluster:4Host(esxi01,esxi02,esxi03,esxi04)

Master:esxi01

Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,

unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.

Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.

Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall

ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd

poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.

Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissuedpoweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.

Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.

vSphere6.xHADeepdive

56RestartingVirtualMachines

Page 57: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.

Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.

FailedhostFailureofamasterFailureofaslave

Isolatedhostandresponse

FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.

TheFailureofaSlave

Thefailureofaslavehostisisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:

T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.

Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor

vSphere6.xHADeepdive

57RestartingVirtualMachines

Page 58: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging

themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe

15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat

datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.

vSphere6.xHADeepdive

58RestartingVirtualMachines

Page 59: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure25-Restarttimelineslavefailure

Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist

vSphere6.xHADeepdive

59RestartingVirtualMachines

Page 60: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.

Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.

Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.

TheFailureofaMaster

Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:

T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.

Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.

vSphere6.xHADeepdive

60RestartingVirtualMachines

Page 61: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure26-Restarttimelinemasterfailure

Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.

IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.

IsolationResponse

vSphere6.xHADeepdive

61RestartingVirtualMachines

Page 62: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:

Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.

Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.

Figure27-Clusterdefaultsettings

Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.

UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.

Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.

vSphere6.xHADeepdive

62RestartingVirtualMachines

Page 63: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.

Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.

Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.

Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.

Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.

vSphere6.xHADeepdive

63RestartingVirtualMachines

Page 64: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Likelihoodthathostwillretainaccessto

VMdatastore

LikelihoodVMswillretain

accesstoVM

network

RecommendedIsolationPolicy

Rationale

Likely Likely LeavePoweredOn

Virtualmachineisrunningfine,noreasontopoweritoff

Likely UnlikelyEitherLeavePoweredOnorShutdown.

ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage

Unlikely Likely PowerOff

UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork

Unlikely UnlikelyLeavePoweredOnorPowerOff

LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.

Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.

Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.

Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.

IsolationDetection

vSphere6.xHADeepdive

64RestartingVirtualMachines

Page 65: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:

IsolationofaslaveIsolationofamaster

Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.

IsolationofaSlave

HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.

T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse

WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.

Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.

vSphere6.xHADeepdive

65RestartingVirtualMachines

Page 66: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure28-Isolationofaslavetimeline

IsolationofaMaster

Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.

T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse

AdditionalChecks

Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.

vSphere6.xHADeepdive

66RestartingVirtualMachines

Page 67: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure29-IsolationAddress

SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.

Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.

IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.

RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.

vSphere6.xHADeepdive

67RestartingVirtualMachines

Page 68: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.

Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.

NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:

CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.

Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.

Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.

Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:

vSphere6.xHADeepdive

68RestartingVirtualMachines

Page 69: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.

ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.

ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.

Figure30-VirtualMachineComponentProtection

vSphere6.xHADeepdive

69RestartingVirtualMachines

Page 70: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.

APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.

Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.

Figure31-EnablingVirtualMachineComponentProtection

Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.

UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.

WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed

vSphere6.xHADeepdive

70RestartingVirtualMachines

Page 71: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.

YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.

ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.

Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.

vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.

das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"

Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.

vSphere6.xHADeepdive

71RestartingVirtualMachines

Page 72: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.

das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"

ADDSCREENSHOTHERE!

vSphere6.xHADeepdive

72RestartingVirtualMachines

Page 73: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

VirtualSANandVirtualVolumesspecificsInthelastcoupleofsectionswehavediscussedtheinsandoutofHA.AllofitbasedonVMFSbasedorNFSbasedstorage.WiththeintroductionofVirtualSANandVirtualVolumesalsocomeschangestosomeofthediscussedconcepts.

HAandVirtualSANVirtualSANisVMware’sapproachtoSoftwareDefinedStorage.WearenotgoingtoexplaintheinsandoutsofVirtualSAN,butdowanttoprovideabasicunderstandingforthosewhohaveneverdoneanythingwithit.VirtualSANleverageshostlocalstorageandcreatesashareddatastoreoutofit.

Figure32-VirtualSANCluster

vSphere6.xHADeepdive

73VirtualSANandVirtualVolumesspecifics

Page 74: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

VirtualSANrequiresaminimumof3hostsandeachofthose3hostswillneedtohave1SSDforcachingand1capacitydevice(canbeSSDorHDD).Onlythecapacitydeviceswillcontributetotheavailablecapacityofthedatastore.Ifyouhave1TBworthofcapacitydevicesperhostthenwiththreehoststhetotalsizeofyourdatastorewillbe3TB.

Havingthatsaid,withVirtualSAN6.1VMwareintroduceda"2-node"option.This2-nodeoptionisactually2regularVSANnodeswithathird"witness"node.

ThebigdifferentiatorbetweenmoststoragesystemsandVirtualSANisthatavailabilityofthevirtualmachine’sisdefinedonapervirtualdiskorpervirtualmachinebasis.Thisiscalled“FailuresToTolerate”andcanbeconfiguredtoanyvaluebetween0(zero)and3.Whenconfiguredto0thenthevirtualmachinewillhaveonly1copyofitsvirtualdiskswhichmeansthatifahostfailswherethevirtualdisksarestoredthevirtualmachineislost.AssuchallvirtualmachinesaredeployedbydefaultwithFailuresToTolerate(FTT)setto1.AvirtualdiskiswhatVSANreferstoasanobject.Anobject,whenFTTisconfiguredas1orhigher,hasmultiplecomponents.InthediagrambelowwedemonstratetheFTT=1scenario,andthevirtualdiskinthiscasehas2"datacomponents"anda"witnesscomponents".Thewitnessisusedasa"quorom"mechnanism.

Figure33-VirtualSANObjectmodel

vSphere6.xHADeepdive

74VirtualSANandVirtualVolumesspecifics

Page 75: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Asthediagramabovedepicts,avirtualmachinecanberunningonthefirsthostwhileitsstoragecomponentsareontheremaininghostsinthecluster.AsyoucanimaginefromanHApointofviewthischangesthingsasaccesstothenetworkisnotonlycriticalforHAtofunctioncorrectlybutalsoforVirtualSAN.WhenitcomestonetworkingnotethatwhenVirtualSANisconfiguredinaclusterHAwillusethesamenetworkforitscommunications(heartbeatingetc).Ontopofthat,itisgoodtoknowthatVMwarehighlyrecommends10GbEtobeusedforVirtualSAN.

Basicdesignprinciple:10GbEishighlyrecommendforVirtualSAN,asvSphereHAalsoleveragestheVirtualSANnetworkandavailabilityofVMsisdependentonnetworkconnectivityensurethatataminimumtwo10GbEportsareusedandtwophysicalswitchesforresiliency.

ThereasonthatHAusesthesamenetworkasVirtualSANissimple,itistooavoidnetworkpartitionscenarioswhereHAcommunicationsisseparatedfromVirtualSANandthestateoftheclusterisunclear.NotethatyouwillneedtoensurethatthereisapingableisolationaddressontheVirtualSANnetworkandthisisolationaddresswillneedtobeconfiguredassuchthroughtheuseoftheadvancedsetting“das.isolationAddress0”.Wealsorecommendtodisabletheuseofthedefaultisolationaddressthroughtheadvancedsetting“das.useDefaultIsolationAddress”(settofalse).

Whenanisolationdoesoccurtheisolationresponseistriggeredasexplainedinearlierchapters.ForVirtualSANtherecommendationissimple,configuretheisolationresponseto“PowerOff,thenfailover”.Thisisthesafestoption.VirtualSANcanbecomparedtothe“convergednetworkwithIPbasedstorage”exampleweprovided.ItisveryeasytoreachasituationwhereahostisisolatedallvirtualmachinesremainrunningbutarerestartedonanotherhostbecausetheconnectiontotheVirtualSANdatastoreislost.

Basicdesignprinciple:ConfigureyourIsolationAddressandyourIsolationPolicyaccordingly.Werecommendselecting“poweroff”astheIsolationPolicyandareliablepingabledeviceastheisolationaddress.ItisrecommendedtoconfiguretheIsolationPolicyto“poweroff”.

WhataboutthingslikeheartbeatdatastoresandthefolderstructurethatexistsonaVMFSdatastore,hasanyofthatchangedwithVirtualSAN.Yesithas.Firstofall,ina“VirtualSAN”onlyenvironmenttheconceptofHearbeatDatastoresisnotusedatall.Thereasonforthisisstraightforward,asHAandVirtualSANsharethesamenetworkitissafetoassumethatwhentheHAheartbeatislostbecauseofanetworkfailuresoisaccesstotheVirtualSANdatastore.Onlyinanenvironmentwherethereisalsotraditionalstoragetheheartbeatdatastoreswillbeconfigured,leveragingthosetraditionaldatastoresasaheartbeatdatastore.NotethatwedonotfeelthereisareasontointroducetraditionalstoragejusttoprovideHAthisfunctionality,HAandVirtualSANworkperfectlyfinewithoutheartbeatdatastores.

vSphere6.xHADeepdive

75VirtualSANandVirtualVolumesspecifics

Page 76: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

NormallyHAmetadataisstoredintherootofthedatastore,forVirtualSANthisisdifferentasthemetadataisstoredintheVMsnamespaceobject.TheprotectedlistisheldinmemoryandupdatedautomaticallywhenVMsarepoweredonoroff.

Nowyoumaywonder,whathappenswhenthereisanisolation?HowdoesHAknowwheretostarttheVMthatisimpacted?Letstakealookatapartitionscenario.

Figure34-VSANPartitionscenario

Inthisscenariothereanetworkproblemhascausedaclusterpartition.WhereaVMisrestartedisdeterminedbywhichpartitionownsthevirtualmachinefiles.WithinaVSANclusterthisisfairlystraightforward.Therearetwopartitions,oneofwhichisrunningtheVMwithitsVMDKandtheotherpartitionhasaVMDKreplicaandawitness.Guesswhathappens?Right,VSANusesthewitnesstoseewhichpartitionhasquorumandbasedonthatresult,oneofthetwopartitionswillwin.Inthiscase,Partition2hasmorethan50%ofthecomponentsofthisobjectandassuchisthewinner.ThismeansthattheVMwillbe

vSphere6.xHADeepdive

76VirtualSANandVirtualVolumesspecifics

Page 77: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

restartedoneither“esxi-03″or“esxi-04″byvSphereHA.NotethattheVMinPartition1willbepoweredoffonlyifyouhaveconfiguredtheisolationresponsetodoso.Wewouldliketostressthatthisishighlyrecommended!(Isolationresponse–>poweroff)

HAandVirtualVolumesLetusstartwithfirstdescribingwhatVirtualVolumesisandwhatvalueitbringsforanadministrator.VirtualVolumeswasdevelopedtomakeyourlife(vSphereadmin)andthatofthestorageadministratoreasier.ThisisdonebyprovidingaframeworkthatenablesthevSphereadministratortoassignpoliciestovirtualmachinesorvirtualdisks.Inthesepoliciescapabilitiesofthestoragearraycanbedefined.Thesecapabilitiescanbethingslikesnapshotting,deduplication,raid-level,thin/thickprovisioningetc.WhatisofferedtothevSphereadministratorisuptotheStorageadministrator,andofcourseuptowhatthestoragesystemcanoffertobeginwith.Whenavirtualmachineisdeployedandapolicyisassignedthenthestoragesystemwillenablecertainfunctionalityofthearraybasedonwhatwasspecifiedinthepolicy.SonolongeraneedtoassigncapabilitiestoaLUNwhichholdsmanyVMs,butratheraperVMorevenperVMDKlevelcontrol.Sohowdoesthiswork?Wellletstakealookatanarchitecturaldiagramfirst.

vSphere6.xHADeepdive

77VirtualSANandVirtualVolumesspecifics

Page 78: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure35-VirtualVolumesArchitecture

ThediagramshowsacoupleofcomponentswhichareimportantintheVVolarchitecture.Letslistthemout:

ProtocolEndpointsakaPEVirtualDatastoreandaStorageContainerVendorProvider/VASAPoliciesVirtualVolumes

Letstakealookatallofthesethreeintheaboveorder.ProtocolEndpoints,whatarethey?

ProtocolEndpointsareliterallytheaccesspointtoyourstoragesystem.AllIOtovirtualvolumesisproxiedthroughaProtocolEndpointandyoucanhave1ormoreoftheseperstoragesystem,ifyourstoragesystemsupportshavingmultipleofcourse.(Implementationsofdifferentvendorswillvary.)PEsarecompatiblewithdifferentprotocols(FC,FCoE,iSCSI,NFS)andifyouaskmethatwholediscussionwithVirtualVolumeswillcometoanend.You

vSphere6.xHADeepdive

78VirtualSANandVirtualVolumesspecifics

Page 79: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

couldseeaProtocolEndpointasa“mountpoint”oradevice,andyestheywillcounttowardsyourmaximumnumberofdevicesperhost(256).(VirtualVolumesitselfwon’tcounttowardsthat!)

NextupistheStorageContainer.Thisistheplacewhereyoustoreyourvirtualmachines,orbettersaidwhereyourvirtualvolumesendup.TheStorageContainerisastoragesystemlogicalconstructandisrepresentedwithinvSphereasa“virtualdatastore”.Youneed1perstoragesystem,butyoucanhavemanywhendesired.TothisStorageContaineryoucanapplycapabilities.Soifyoulikeyourvirtualvolumestobeabletousearraybasedsnapshotsthenthestorageadministratorwillneedtoassignthatcapabilitytothestoragecontainer.Notethatastorageadministratorcangrowastoragecontainerwithouteveninformingyou.Astoragecontainerisn’tformattedwithVMFSoranythinglikethat,soyoudon’tneedtoincreasethevolumeinordertousethespace.

ButhowdoesvSphereknowwhichcontaineriscapableofdoingwhat?Inordertodiscoverastoragecontaineranditscapabilitiesweneedtobeabletotalktothestoragesystemfirst.ThisisdonethroughthevSphereAPIsforStorageAwareness.YousimplypointvSpheretotheVendorProviderandthevendorproviderwillreporttovSpherewhat’savailable,thisincludesboththestoragecontainersaswellasthecapabilitiestheypossess.NotethatasingleVendorProvidercanbemanagingmultiplestoragesystemswhichinitsturncanhavemultiplestoragecontainerswithmanycapabilities.Thesevendorproviderscanalsocomeindifferentflavours,forsomestoragesystemsitispartoftheirsoftwarebutforothersitwillcomeasavirtualappliancethatsitsontopofvSphere.

NowthatvSphereknowswhichsystemsthereare,whatcontainersareavailablewithwhichcapabilitiesyoucanstartcreatingpolicies.Thesepoliciescanbeacombinationofcapabilitiesandwillultimatelybeassignedtovirtualmachinesorvirtualdiskseven.YoucanimaginethatinsomecasesyouwouldlikeQualityofServiceenabledtoensureperformanceforaVMwhileinothercasesitisn’tasrelevantbutyouneedtohaveasnapshoteveryhour.Allofthisisenabledthroughthesepolicies.NolongerwillyoubemaintainingthatspreadsheetwithallyourLUNsandwhichdataservicewereenabledandwhatnot,noyousimplyassignapolicy.(Yes,apropernamingschemewillbehelpfulwhendefiningpolicies.)WhenrequirementschangeforaVMyoudon’tmovetheVMaround,noyouchangethepolicyandthestoragesystemwilldowhatisrequiredinordertomaketheVM(anditsdisks)compliantagainwiththepolicy.NottheVMreally,buttheVirtualVolumes.

Okay,thosearethebasics,nowwhataboutVirtualVolumesandvSphereHA.WhatchangeswhenyouarerunningVirtualVolumes,whatdoyouneedtokeepinmindwhenrunningVirtualVolumeswhenitcomestoHA?

Firstofall,letmementionthis,insomecasesstoragevendorshavedesignedasolutionwherethe"vendorprovider"isn'tdesignedinanHAfashion(VMwareallowsforActive/Active,Active/Standbyorjust"Active"asinasingleinstance).Makesuretovalidate

vSphere6.xHADeepdive

79VirtualSANandVirtualVolumesspecifics

Page 80: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

whatkindofimplementationyourstoragevendorhas,astheVendorProviderneedstobeavailablewhenpoweringonVMs.Thefollowingquoteexplainswhy:

WhenaVirtualVolumeiscreated,itisnotimmediatelyaccessibleforIO.ToAccessVirtualVolumes,vSphereneedstoissuea“Bind”operationtoaVASAProvider(VP),whichcreatesIOaccesspointforaVirtualVolumeonaProtocolEndpoint(PE)chosenbyaVP.AsinglePEcanbetheIOaccesspointformultipleVirtualVolumes.“Unbind”OperationwillremovethisIOaccesspointforagivenVirtualVolume.

Thatisthe"VirtualVolumes"implementationaspect,butofcoursethingshavealsochangedfromavSphereHApointofview.NolongerdowehaveVMFSorNFSdatastorestostorefilesonoruseforheartbeating.Whatchangesfromthatperspective.FirstofallaVMiscarvedupindifferentVirtualVolumes:

VMConfigurationVirtualMachineDisk'sSwapFileSnapshot(ifthereareany)

Besidesthesedifferenttypesofobjects,whenvSphereHAisenabledtherealsoisavolumeusedbyvSphereHAandthisvolumewillcontainallthemetadatawhichisnormallystoredunder"/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/"onregularVMFS.ForeachFaultDomainaseperatefolderwillbecreatedinthisVVol.

AllVMrelatedHAfileswhichnormallywouldbeundertheVMfolder,likeforinstancethepower-offfile,arenowstoredintheVMConfigurationVVolobject.ConceptuallyspeakingsimilartoregularVMFS,implementationwisehowevercompletelydifferent.

AnotherthingthatchangeswithVVolsisHeartbeatDatastores.

BEINGWORKEDON-EARLYDRAFT

vSphere6.xHADeepdive

80VirtualSANandVirtualVolumesspecifics

Page 81: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AddingResiliencytoHA(NetworkRedundancy)InthepreviouschapterweextensivelycoveredbothIsolationDetection,whichtriggerstheselectedIsolationResponseandtheimpactofafalsepositive.TheIsolationResponseenablesHAtorestartvirtualmachineswhen“Poweroff”or“Shutdown”hasbeenselectedandthehostbecomesisolatedfromthenetwork.However,thisalsomeansthatitispossiblethat,withoutproperredundancy,theIsolationResponsemaybeunnecessarilytriggered.Thisleadstodowntimeandshouldbeprevented.

Toincreaseresiliencyfornetworking,VMwareimplementedtheconceptofNICteaminginthehypervisorforbothVMkernelandvirtualmachinenetworking.WhendiscussingHA,thisisespeciallyimportantfortheManagementNetwork.

NICteamingistheprocessofgroupingtogetherseveralphysicalNICsintoonesinglelogicalNIC,whichcanbeusedfornetworkfaulttoleranceandloadbalancing.

Usingthismechanism,itispossibletoaddredundancytotheManagementNetworktodecreasethechancesofanisolationevent.Thisis,ofcourse,alsopossibleforother“Portgroups”butthatisnotthetopicofthischapterorbook.AnotheroptionisconfiguringanadditionalManagementNetworkbyenablingthe“managementnetwork”tickboxonanotherVMkernelport.AlittleunderstoodfactisthatiftherearemultipleVMkernelnetworksonthesamesubnet,HAwilluseallofthemformanagementtraffic,evenifonlyoneisspecifiedformanagementtraffic!

Althoughtherearemanyconfigurationspossibleandsupported,werecommendasimplebuthighlyresilientconfiguration.WehaveincludedthevMotion(VMkernel)networkinourexampleascombiningtheManagementNetworkandthevMotionnetworkonasinglevSwitchisthemostcommonlyusedconfigurationandanindustryacceptedbestpractice.

Requirements:

2physicalNICsVLANtrunking

Recommended:

2physicalswitchesIfavailable,enable“linkstatetracking”toensurelinkfailuresarereported

ThevSwitchshouldbeconfiguredasfollows:

vSphere6.xHADeepdive

81AddingresiliencytoHA

Page 82: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

vSwitch0:2PhysicalNICs(vmnic0andvmnic1).2Portgroups(ManagementNetworkandvMotionVMkernel).ManagementNetworkactiveonvmnic0andstandbyonvmnic1.vMotionVMkernelactiveonvmnic1andstandbyonvmnic0.FailbacksettoNo.

EachportgrouphasaVLANIDassignedandrunsdedicatedonitsownphysicalNIC;onlyinthecaseofafailureitisswitchedovertothestandbyNIC.Wehighlyrecommendsettingfailbackto“No”toavoidchancesofanunwantedisolationevent,whichcanoccurwhenaphysicalswitchroutesnotrafficduringbootbuttheportsarereportedas“up”.(NICTeamingTab)

Pros:Only2NICsintotalareneededfortheManagementNetworkandvMotionVMkernel,especiallyusefulinbladeserverenvironments.Easytoconfigure.

Cons:Justasingleactivepathforheartbeats.

Thefollowingdiagramdepictsthisactive/standbyscenario:

vSphere6.xHADeepdive

82AddingresiliencytoHA

Page 83: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure36-Active-StandbyManagementNetworkdesign

Toincreaseresiliency,wealsorecommendimplementingthefollowingadvancedsettingsandusingNICportsondifferentPCIbusses–preferablyNICsofadifferentmakeandmodel.Whenusingadifferentmakeandmodel,evenadriverfailurecouldbemitigated.

AdvancedSettings:das.isolationaddressX=<ip-address>

Theisolationaddresssettingisdiscussedinmoredetailinthesectiontitled"FundamentalConcepts".Inshort;itistheIPaddressthattheHAagentpingstoidentifyifthehostiscompletelyisolatedfromthenetworkorjustnotreceivinganyheartbeats.IfmultipleVMkernelnetworksondifferentsubnetsareused,itisrecommendedtosetanisolationaddresspernetworktoensurethateachofthesewillbeabletovalidateisolationofthehost.

Basicdesignprinciple:TakeadvantageofsomeofthebasicfeaturesvSpherehastoofferlikeNICteaming.CombiningdifferentphysicalNICswillincreaseoverallresiliencyofyoursolution.

vSphere6.xHADeepdive

83AddingresiliencytoHA

Page 84: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

CornerCaseScenario:Split-BrainAsplitbrainscenarioisascenariowhereasinglevirtualmachineispoweredupmultipletimes,typicallyontwodifferenthosts.Thisispossibleinthescenariowheretheisolationresponseissetto“leavepoweredon”andnetworkbasedstorage,likeNFS/iSCSIandevenVirtualSAN,isused.Thissituationcanoccurduringafullnetworkisolation,whichmayresultinthelockonthevirtualmachine’sVMDKbeinglost,enablingHAtoactuallypowerupthevirtualmachine.Asthevirtualmachinewasnotpoweredoffonitsoriginalhost(isolationresponsesetto“leavepoweredon”),itwillexistinmemoryontheisolatedhostandinmemorywithadisklockonthehostthatwasrequestedtorestartthevirtualmachine.

Keepinmindthatthistrulyisacornercasescenariowhichisveryunlikelytooccurinmostenvironments.Incaseitdoeshappen,HAreliesonthe“lostlockdetection”mechanismtomitigatethisscenario.InshortESXidetectsthatthelockontheVMDKhasbeenlostand,whenthedatastorebecomesaccessibleagainandthelockcannotbereacquired,issuesaquestionwhetherthevirtualmachineshouldbepoweredoff;HAautomaticallyanswersthequestionwithYes.However,youwillonlyseethisquestionifyoudirectlyconnecttotheESXihostduringthefailure.HAwillgenerateaneventforthisauto-answeredquestionthough.

Asstatedabovethequestionwillbeauto-answeredandthevirtualmachinewillbepoweredofftorecoverfromthesplitbrainscenario.Thequestionstillremains:inthecaseofanisolationwithiSCSIorNFS,shouldyoupoweroffvirtualmachinesorleavethempoweredon?

Asjustexplained,HAwillautomaticallypoweroffyouroriginalvirtualmachinewhenitdetectsasplit-brainscenario.Thisprocesshoweverisnotinstantaneousandassuchitisrecommendedtousetheisolationresponseof“PowerOff”or“Leavepoweredon.Wealsorecommendincreasingheartbeatnetworkresiliencytoavoidgettingintothissituation.WewilldiscusstheoptionsyouhaveforenhancingManagementNetworkresiliencyinthenextchapter.

LinkStateTrackingThiswasalreadybrieflymentionedinthelistofrecommendations,butthisfeatureissomethingwewouldliketoemphasize.Wehavenoticedthatpeopleoftenforgetaboutthiseventhoughmanyswitchesofferthiscapability,especiallyinbladeserverenvironments.

Linkstatetrackingwillmirrorthestateofanupstreamlinktoadownstreamlink.Let’sclarifythatwithadiagram.

vSphere6.xHADeepdive

84AddingresiliencytoHA

Page 85: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure37-LinkStatetrackingmechanism

Thediagramabovedepictsascenariowhereanuplinkofa“CoreSwitch”hasfailed.WithoutLinkStateTracking,theconnectionfromthe“EdgeSwitch”tovmnic0willbereportedasup.WithLinkStateTrackingenabled,thestateofthelinkonthe“EdgeSwitch”willreflectthestateofthelinkofthe“CoreSwitch”andassuchbemarkedas“down”.Youmightwonderwhythisisimportantbutthinkaboutitforasecond.ManyfeaturesthatvSphereofferrelyonnetworkingandsodoyourvirtualmachines.Inthecasewherethestateisnotreflected,somefunctionalitymightjustfail,forinstancenetworkheartbeatingcouldfailifitneedstoflowthroughthecoreswitch.Wecallthisa‘blackhole’scenario:thehostsendstrafficdownapaththatitbelievesisup,butthetrafficneverreachesitsdestinationduetothefailedupstreamlink.

Basicdesignprinciple:Knowyournetworkenvironment,talktothenetworkadministratorsandensureadvancedfeatureslikeLinkStateTrackingareusedwhenpossibletoincreaseresiliency.

vSphere6.xHADeepdive

85AddingresiliencytoHA

Page 86: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AdmissionControlAdmissionControlismorethanlikelythemostmisunderstoodconceptvSphereholdstodayandbecauseofthisitisoftendisabled.However,AdmissionControlisamustwhenavailabilityneedstobeguaranteedandisn’tthatthereasonforenablingHAinthefirstplace?

WhatisHAAdmissionControlabout?WhydoesHAcontainthisconceptcalledAdmissionControl?The“AvailabilityGuide”a.k.aHAbiblestatesthefollowing:

vCenterServerusesadmissioncontroltoensurethatsufficientresourcesareavailableinaclustertoprovidefailoverprotectionandtoensurethatvirtualmachineresourcereservationsarerespected.

Pleasereadthatquoteagainandespeciallythefirsttwowords.IndeeditisvCenterthatisresponsibleforAdmissionControl,contrarytowhatmanybelieve.AlthoughthismightseemlikeatrivialfactitisimportanttounderstandthatthisimpliesthatAdmissionControlwillnotdisallowHAinitiatedrestarts.HAinitiatedrestartsaredoneonahostlevelandnotthroughvCenter.

Assaid,AdmissionControlguaranteesthatcapacityisavailableforanHAinitiatedfailoverbyreservingresourceswithinacluster.Itcalculatesthecapacityrequiredforafailoverbasedonavailableresources.Inotherwords,ifahostisplacedintomaintenancemodeordisconnected,itistakenoutoftheequation.Thisalsoimpliesthatifahosthasfailedorisnotrespondingbuthasnotbeenremovedfromthecluster,itisstillincludedintheequation.“AvailableResources”indicatesthatthevirtualizationoverheadhasalreadybeensubtractedfromthetotalamount.

Togiveanexample;VMkernelmemoryissubtractedfromthetotalamountofmemorytoobtainthememoryavailablememoryforvirtualmachines.ThereisonegotchawithAdmissionControlthatwewanttobringtoyourattentionbeforedrillingintothedifferentpolicies.WhenAdmissionControlisenabled,HAwillinnowayviolateavailabilityconstraints.Thismeansthatitwillalwaysensuremultiplehostsareupandrunningandthisappliesformanualmaintenancemodeactionsand,forinstance,toVMwareDistributedPowerManagement.So,ifahostisstucktryingtoenterMaintenanceMode,rememberthatitmightbeHAwhichisnotallowingMaintenanceModetoproceedasitwouldviolatetheAdmissionControlPolicy.Inthissituation,userscanmanuallyvMotionvirtualmachinesoffthehostortemporarilydisableadmissioncontroltoallowtheoperationtoproceed.

vSphere6.xHADeepdive

86AdmissionControl

Page 87: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ButwhatifyouusesomethinglikeDistributedPowerManagement(DPM),wouldthatplaceallhostsinstandbymodetoreducepowerconsumption?No,DPMissmartenoughtotakehostsoutofstandbymodetoensureenoughresourcesareavailabletoprovideforHAinitiatedfailovers.Ifbyanychancetheresourcesarenotavailable,HAwillwaitfortheseresourcestobemadeavailablebyDPMandthenattempttherestartofthevirtualmachines.Inotherwords,theretrycount(5retriesbydefault)isnotwastedinscenarioslikethese.

AdmissionControlPolicyTheAdmissionControlPolicydictatesthemechanismthatHAusestoguaranteeenoughresourcesareavailableforanHAinitiatedfailover.ThissectiongivesageneraloverviewoftheavailableAdmissionControlPolicies.Theimpactofeachpolicyisdescribedinthefollowingsection,includingourrecommendation.HAhasthreemechanismstoguaranteeenoughcapacityisavailabletorespectvirtualmachineresourcereservations.

Figure38-Admissioncontrolpolicy

vSphere6.xHADeepdive

87AdmissionControl

Page 88: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

BelowwehavelistedallthreeoptionscurrentlyavailableastheAdmissionControlPolicy.Eachoptionhasadifferentmechanismtoensureresourcesareavailableforafailoverandeachoptionhasitscaveats.

AdmissionControlMechanismsEachAdmissionControlPolicyhasitsownAdmissionControlmechanism.UnderstandingeachoftheseAdmissionControlmechanismsisimportanttoappreciatetheimpacteachonehasonyourclusterdesign.Forinstance,settingareservationonaspecificvirtualmachinecanhaveanimpactontheachievedconsolidationratio.ThissectionwilltakeyouonajourneythroughthetrenchesofAdmissionControlPoliciesandtheirrespectivemechanismsandalgorithms.

HostFailuresClusterTolerates

TheAdmissionControlPolicythathasbeenaroundthelongestisthe“HostFailuresClusterTolerates”policy.ItisalsohistoricallytheleastunderstoodAdmissionControlPolicyduetoitscomplexadmissioncontrolmechanism.

ThisadmissioncontrolpolicycanbeconfiguredinanN-1fashion.Thismeansthatthenumberofhostfailuresyoucanspecifyina32hostclusteris31.

WithinthevSphereWebClientitispossibletomanuallyspecifytheslotsizeascanbeseeninthebelowscreenshot.ThevSphereWebClientalsoallowsyoutoviewwhichvirtualmachinesspanmultipleslots.Thiscanbeveryusefulinscenarioswheretheslotsizehasbeenexplicitlyspecified,wewillexplainwhyinjustasecond.

vSphere6.xHADeepdive

88AdmissionControl

Page 89: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure39-HostFailures

Theso-called“slots”mechanismisusedwhenthe“Hostfailuresclustertolerates”hasbeenselectedastheAdmissionControlPolicy.Thedetailsofthismechanismhavechangedseveraltimesinthepastanditisoneofthemostrestrictivepolicies;morethanlikely,itisalsotheleastunderstood.

SlotsdictatehowmanyvirtualmachinescanbepoweredonbeforevCenterstartsyelling“OutOfResources!”Normally,aslotrepresentsonevirtualmachine.AdmissionControldoesnotlimitHAinrestartingvirtualmachines,itensuresenoughunfragmentedresourcesareavailabletopoweronallvirtualmachinesintheclusterbypreventing“over-commitment”.Technicallyspeaking“over-commitment”isnotthecorrectterminologyasAdmissionControlensuresvirtualmachinereservationscanbesatisfiedandthatallvirtualmachines’initialmemoryoverheadrequirementsaremet.Althoughwehavealreadytouchedonthis,itdoesn’thurtrepeatingitasitisoneofthosemythsthatkeepscomingback;HAinitiatedfailoversarenotpronetotheAdmissionControlPolicy.AdmissionControlisdonebyvCenter.HAinitiatedrestarts,inanormalscenario,areexecuteddirectlyontheESXihostwithouttheuseofvCenter.Thecorner-caseiswhereHArequestsDRS(DRSisavCentertask!)todefragmentresourcesbutthatisbesidethepoint.EvenifresourcesarelowandvCenterwouldcomplain,itcouldn’tstoptherestartfromhappening.

Let’sdigintothisconceptwehavejustintroduced,slots.

AslotisdefinedasalogicalrepresentationofthememoryandCPUresourcesthatsatisfythereservationrequirementsforanypowered-onvirtualmachineinthecluster.

InotherwordsaslotistheworstcaseCPUandmemoryreservationscenarioinacluster.Thisdirectlyleadstothefirst“gotcha.”

vSphere6.xHADeepdive

89AdmissionControl

Page 90: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

HAusesthehighestCPUreservationofanygivenpowered-onvirtualmachineandthehighestmemoryreservationofanygivenpowered-onvirtualmachineinthecluster.Ifnoreservationofhigherthan32MHzisset,HAwilluseadefaultof32MHzforCPU.Ifnomemoryreservationisset,HAwilluseadefaultof0MB+memoryoverheadformemory.(SeetheVMwarevSphereResourceManagementGuideformoredetailsonmemoryoverheadpervirtualmachineconfiguration.)Thefollowingexamplewillclarifywhat“worst-case”actuallymeans.

Example:Ifvirtualmachine“VM1”has2GHzofCPUreservedand1024MBofmemoryreservedandvirtualmachine“VM2”has1GHzofCPUreservedand2048MBofmemoryreservedtheslotsizeformemorywillbe2048MB(+itsmemoryoverhead)andtheslotsizeforCPUwillbe2GHz.Itisacombinationofthehighestreservationofbothvirtualmachinesthatleadstothetotalslotsize.ReservationsdefinedattheResourcePoollevelhowever,willnotaffectHAslotsizecalculations.

Basicdesignprinciple:Bereallycarefulwithreservations,ifthere’snoneedtohavethemonapervirtualmachinebasis;don’tconfigurethem,especiallywhenusinghostfailuresclustertolerates.Ifreservationsareneeded,resorttoresourcepoolbasedreservations.

Nowthatweknowtheworst-casescenarioisalwaystakenintoaccountwhenitcomestoslotsizecalculations,wewilldescribewhatdictatestheamountofavailableslotsperclusterasthatultimatelydictateshowmanyvirtualmachinescanbepoweredoninyourcluster.

First,wewillneedtoknowtheslotsizeformemoryandCPU,nextwewilldividethetotalavailableCPUresourcesofahostbytheCPUslotsizeandthetotalavailablememoryresourcesofahostbythememoryslotsize.ThisleavesuswithatotalnumberofslotsforbothmemoryandCPUforahost.Themostrestrictivenumber(worst-casescenario)isthenumberofslotsforthishost.Inotherwords,whenyouhave25CPUslotsbutonly5memoryslots,theamountofavailableslotsforthishostwillbe5asHAalwaystakestheworstcasescenariointoaccountto“guarantee”allvirtualmachinescanbepoweredonincaseofafailureorisolation.

ThequestionwereceivealotishowdoIknowwhatmyslotsizeis?ThedetailsaroundslotsizescanbemonitoredontheHAsectionoftheCluster’sMonitortabbycheckingthethe“AdvancedRuntimeInfo”sectionwhenthe“HostFailures”AdmissionControlPolicyisconfigured.

vSphere6.xHADeepdive

90AdmissionControl

Page 91: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure40-HighAvailabilityclustermonitorsection

AdvancedRuntimeInfowillshowthespecificstheslotsizeandmoreusefuldetailssuchasthenumberofslotsavailableasdepictedinFigure30.

vSphere6.xHADeepdive

91AdmissionControl

Page 92: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure41-HighAvailabilityadvancedruntimeinfo

Asyoucanimagine,usingreservationsonapervirtualmachinebasiscanleadtoveryconservativeconsolidationratios.However,thisissomethingthatisconfigurablethroughtheWebClient.Ifyouhavejustonevirtualmachinewithareallyhighreservation,youcansetanexplicitslotsizebygoingto“EditClusterServices”andspecifyingthemundertheAdmissionControlPolicysectionasshowninFigure29.

Ifoneoftheseadvancedsettingsisused,HAwillensurethatthevirtualmachinethatskewedthenumberscanberestartedby“assigning”multipleslotstoit.However,whenyouarelowonresources,thiscouldmeanthatyouarenotabletopoweronthevirtualmachinewiththisreservationbecauseresourcesmaybefragmentedthroughouttheclusterinsteadofavailableonasinglehost.HAwillnotifyDRSthatapower-onattemptwasunsuccessfulandarequestwillbemadetodefragmenttheresourcestoaccommodatetheremainingvirtualmachinesthatneedtobepoweredon.InorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendations.

vSphere6.xHADeepdive

92AdmissionControl

Page 93: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Thefollowingdiagramdepictsascenariowhereavirtualmachinespansmultipleslots:

Figure42-VirtualmachinespanningmultipleHAslots

Noticethatbecausethememoryslotsizehasbeenmanuallysetto1024MB,oneofthevirtualmachines(groupedwithdottedlines)spansmultipleslotsduetoa4GBmemoryreservation.Asyoumighthavenoticed,noneofthehostshasenoughresourcesavailabletosatisfythereservationofthevirtualmachinethatneedstofailover.Althoughintotalthereareenoughresourcesavailable,theyarefragmentedandHAwillnotbeabletopower-onthisparticularvirtualmachinedirectlybutwillrequestDRStodefragmenttheresourcestoaccommodatethisvirtualmachine’sresourcerequirements.

AdmissionControldoesnottakefragmentationofslotsintoaccountwhenslotsizesaremanuallydefinedwithadvancedsettings.Itwilltakethenumberofslotsthisvirtualmachinewillconsumeintoaccountbysubtractingthemfromthetotalnumberofavailableslots,butitwillnotverifytheamountofavailableslotsperhosttoensurefailover.Asstatedearlier,though,HAwillrequestDRStodefragmenttheresources.Thisisbynomeansaguaranteeofasuccessfulpower-onattempt.

vSphere6.xHADeepdive

93AdmissionControl

Page 94: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Avoidusingadvancedsettingstodecreasetheslotsizeasitcouldleadtomoredowntimeandaddsanextralayerofcomplexity.Ifthereisalargediscrepancyinsizeandreservationswerecommendusingthepercentagebasedadmissioncontrolpolicy.

WithinthevSphereWebClientthereisfunctionalitywhichenablesyoutoidentifyvirtualmachineswhichspanmultipleslots,asshowninFigure29.Wehighlyrecommendmonitoringthissectiononaregularbasistogetabetterunderstandofyourenvironmentandtoidentifythosevirtualmachinesthatmightbeproblematictorestartincaseofahostfailure.

UnbalancedConfigurationsandImpactonSlotCalculation

Itisanindustrybestpracticetocreateclusterswithsimilarhardwareconfigurations.However,manycompaniesstartedoutwithasmallVMwareclusterwhenvirtualizationwasfirstintroduced.Whenthetimehascometoexpand,chancesarefairlylargethesamehardwareconfigurationisnolongeravailable.Thequestioniswillyouaddthenewlyboughthoststothesameclusterorcreateanewcluster?

FromaDRSperspective,largeclustersarepreferredasitincreasestheloadbalancingopportunities.HoweverthereisacaveatforDRSaswell,whichisdescribedintheDRSsectionofthisbook.ForHA,thereisabigcaveat.WhenyouthinkaboutitandunderstandtheinternalworkingsofHA,morespecificallytheslotalgorithm,youprobablyalreadyknowwhatiscomingup.

Let’sfirstdefinetheterm“unbalancedcluster.”

Anunbalancedclusterwould,forinstance,beaclusterwith3hostsofwhichonecontainssubstantiallymorememorythantheotherhostsinthecluster.

Let’strytoclarifythatwithanexample.

Example:Whatwouldhappentothetotalnumberofslotsinaclusterofthefollowingspecifications?

ThreehostclusterTwohostshave16GBofavailablememoryOnehosthas32GBofavailablememory

Thethirdhostisabrandnewhostthathasjustbeenboughtandaspricesofmemorydroppedimmenselythedecisionwasmadetobuy32GBinsteadof16GB.

Theclustercontainsavirtualmachinethathas1vCPUand4GBofmemory.A1024MBmemoryreservationhasbeendefinedonthisvirtualmachine.Asexplainedearlier,areservationwilldictatetheslotsize,whichinthiscaseleadstoamemoryslotsizeof1024

vSphere6.xHADeepdive

94AdmissionControl

Page 95: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

MB+memoryoverhead.Forthesakeofsimplicity,wewillcalculatewith1024MB.Thefollowingdiagramdepictsthisscenario:

Figure43-HighAvailabilitymemoryslotsize

WhenAdmissionControlisenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy,thenumberofslotswillbecalculatedperhostandtheclusterintotal.Thiswillresultin:

Host Numberofslots

ESXi-01 16Slots

ESXi-02 16Slots

ESXi-03 32Slots

AsAdmissionControlisenabled,aworst-casescenarioistakenintoaccount.Whenasinglehostfailurehasbeenspecified,thismeansthatthehostwiththelargestnumberofslotswillbetakenoutoftheequation.Inotherwords,forourcluster,thiswouldresultin:

ESXi-01+ESXi-02=32slotsavailable

vSphere6.xHADeepdive

95AdmissionControl

Page 96: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Althoughyouhavedoubledtheamountofmemoryinoneofyourhosts,youarestillstuckwithonly32slotsintotal.Asclearlydemonstrated,thereisabsolutelynopointinbuyingadditionalmemoryforasinglehostwhenyourclusterisdesignedwithAdmissionControlenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy.

Inourexample,thememoryslotsizehappenedtobethemostrestrictive;however,thesameprincipleapplieswhenCPUslotsizeismostrestrictive.

Basicdesignprinciple:Whenusingadmissioncontrol,balanceyourclustersandbeconservativewithreservationsasitleadstodecreasedconsolidationratios.

Now,whatwouldhappeninthescenarioabovewhenthenumberofallowedhostfailuresisto2?InthiscaseESXi-03istakenoutoftheequationandoneofanyoftheremaininghostsintheclusterisalsotakenout,resultingin16slots.Thismakessense,doesn’tit?

CanyouavoidlargeHAslotsizesduetoreservationswithoutresortingtoadvancedsettings?That’sthequestionwegetalmostdailyandtheansweristhe“PercentageofClusterResourcesReserved”admissioncontrolmechanism.

PercentageofClusterResourcesReserved

ThePercentageofClusterResourcesReservedadmissioncontrolpolicyisoneofthemostusedadmissioncontrolpolicies.Thesimplereasonforthisisthatitistheleastrestrictiveandmostflexible.Itisalsoveryeasytoconfigureasshowninthescreenshotbelow.

Figure44-SettingadifferentpercentageforCPU/Memory

ThemainadvantageofthepercentagebasedAdmissionControlPolicyisthatitavoidsthecommonlyexperiencedslotsizeissuewherevaluesareskewedduetoalargereservation.Butifitdoesn’tusetheslotalgorithm,whatdoesituse?

Whenyouspecifyapercentage,andlet’sassumefornowthatthepercentageforCPUandmemorywillbeconfiguredequally,thatpercentageofthetotalamountofavailableresourceswillstayreservedforHApurposes.Firstofall,HAwilladdupallavailable

vSphere6.xHADeepdive

96AdmissionControl

Page 97: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

resourcestoseehowmuchithasavailable(virtualizationoverheadwillbesubtracted)intotal.Then,HAwillcalculatehowmuchresourcesarecurrentlyreservedbyaddingupallreservationsformemoryandforCPUforallpoweredonvirtualmachines.

Forthosevirtualmachinesthatdonothaveareservation,adefaultof32MHzwillbeusedforCPUandadefaultof0MB+memoryoverheadwillbeusedforMemory.(Amountofoverheadperconfigurationtypecanbefoundinthe“UnderstandingMemoryOverhead”sectionoftheResourceManagementguide.)

Inotherwords:

((totalamountofavailableresources–totalreservedvirtualmachineresources)/totalamountofavailableresources)<=(percentageHAshouldreserveassparecapacity)

Totalreservedvirtualmachineresourcesincludesthedefaultreservationof32MHzandthememoryoverheadofthevirtualmachine.

Let’suseadiagramtomakeitabitclearer:

Figure45-Percentageofclusterresourcesreserved

Totalclusterresourcesare24GHz(CPU)and96GB(MEM).Thiswouldleadtothefollowingcalculations:

((24GHz-(2GHz+1GHz+32MHz+4GHz))/24GHz)=69%available

vSphere6.xHADeepdive

97AdmissionControl

Page 98: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

((96GB-(1,1GB+114MB+626MB+3,2GB)/96GB=85%available

Asyoucansee,theamountofmemorydiffersfromthediagram.Evenifareservationhasbeenset,theamountofmemoryoverheadisaddedtothereservation.ThisexamplealsodemonstrateshowkeepingCPUandmemorypercentageequalcouldcreateanimbalance.Ideally,ofcourse,thehostsareprovisionedinsuchawaythatthereisnoCPU/memoryimbalance.Experienceovertheyearshasproven,unfortunately,thatmostenvironmentsrunoutofmemoryresourcesfirstandthismightneedtobefactoredinwhencalculatingthecorrectvalueforthepercentage.However,thistrendmightbechangingasmemoryisgettingcheapereveryday.

Inordertoensurevirtualmachinescanalwaysberestarted,AdmissionControlwillconstantlymonitorifthepolicyhasbeenviolatedornot.PleasenotethatthisAdmissionControlprocessispartofvCenterandnotoftheESXihost!Whenoneofthethresholdsisreached,memoryorCPU,AdmissionControlwilldisallowpoweringonanyadditionalvirtualmachinesasthatcouldpotentiallyimpactavailability.ThesethresholdscanbemonitoredontheHAsectionoftheCluster’ssummarytab.

Figure46-HighAvailabilitysummary

Ifyouhaveanunbalancedcluster(hostswithdifferentsizesofCPUormemoryresources),yourpercentageshouldbeequalorpreferablylargerthanthepercentageofresourcesprovidedbythelargesthost.Thiswayyouensurethatallvirtualmachinesresidingonthishostcanberestartedincaseofahostfailure.

Asearlierexplained,thisAdmissionControlPolicydoesnotuseslots.Assuch,resourcesmightbefragmentedthroughoutthecluster.AlthoughDRSisnotifiedtorebalancethecluster,ifneeded,toaccommodatethesevirtualmachinesresourcerequirements,a

vSphere6.xHADeepdive

98AdmissionControl

Page 99: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

guaranteecannotbegiven.Werecommendselectingthehighestrestartpriorityforthisvirtualmachine(ofcourse,dependingontheSLA)toensureitwillbeabletoboot.

Thefollowingexampleanddiagram(Figure37)willmakeitmoreobvious:Youhave3hosts,eachwithroughly80%memoryusage,andyouhaveconfiguredHAtoreserve20%ofresourcesforbothCPUandmemory.Ahostfailsandallvirtualmachineswillneedtofailover.Oneofthosevirtualmachineshasa4GBmemoryreservation.Asyoucanimagine,HAwillnotbeabletoinitiateapower-onattempt,astherearenotenoughmemoryresourcesavailabletoguaranteethereservedcapacity.Insteadaneventwillgetgeneratedindicating"notenoughresourcesforfailover"forthisvirtualmachine.

Figure47-Availableresources

Basicdesignprinciple:AlthoughHAwillutilizeDRStotrytoaccommodatefortheresourcerequirementsofthisvirtualmachineaguaranteecannotbegiven.Dothemath;verifythatanysinglehosthasenoughresourcestopower-onyourlargestvirtualmachine.Alsotakerestartpriorityintoaccountforthis/thesevirtualmachine(s).

FailoverHosts

ThethirdoptiononecouldchooseistoselectoneormultipledesignatedFailoverhosts.Thisiscommonlyreferredtoasahotstandby.

vSphere6.xHADeepdive

99AdmissionControl

Page 100: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure48-SelectfailoverhostsAdmissionControlPolicy

Itis“whatyouseeiswhatyouget”.Whenyoudesignatehostsasfailoverhosts,theywillnotparticipateinDRSandyouwillnotbeabletorunvirtualmachinesonthesehosts!Thesehostsareliterallyreservedforfailoversituations.HAwillattempttousethesehostsfirsttofailoverthevirtualmachines.If,forwhateverreason,thisisunsuccessful,itwillattemptafailoveronanyoftheotherhosts.Forexample,whenthreehostswouldfail,includingthehostsdesignatedasfailoverhosts,HAwillstilltrytorestarttheimpactedvirtualmachinesonthehostthatisleft.Althoughthishostwasnotadesignatedfailoverhost,HAwilluseittolimitdowntime.

vSphere6.xHADeepdive

100AdmissionControl

Page 101: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure49-Selectmultiplefailoverhosts

DecisionMakingTimeAswithanydecisionyoumake,thereisanimpacttoyourenvironment.Thisimpactcouldbepositivebutalso,forinstance,unexpected.ThisespeciallygoesforHAAdmissionControl.SelectingtherightAdmissionControlPolicycanleadtoaquickerReturnOnInvestmentandalowerTotalCostofOwnership.Intheprevioussection,wedescribedallthealgorithmsandmechanismsthatformAdmissionControlandinthissectionwewillfocusmoreonthedesignconsiderationsaroundselectingtheappropriateAdmissionControlPolicyforyouroryourcustomer’senvironment.

ThefirstdecisionthatwillneedtobemadeiswhetherAdmissionControlwillbeenabled.WegenerallyrecommendenablingAdmissionControlasitistheonlywayofguaranteeingyourvirtualmachineswillbeallowedtorestartafterafailure.Itisimportant,though,thatthepolicyiscarefullyselectedandfitsyouroryourcustomer’srequirements.

Basicdesignprinciple

Admissioncontrolguaranteesenoughcapacityisavailableforvirtualmachinefailover.Assuchwerecommendenablingit.

vSphere6.xHADeepdive

101AdmissionControl

Page 102: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Althoughwealreadyhaveexplainedallthemechanismsthatarebeingusedbyeachofthepoliciesintheprevioussection,wewillgiveahighleveloverviewandlistalltheprosandconsinthissection.Ontopofthat,wewillexpandonwhatwefeelisthemostflexibleAdmissionControlPolicyandhowitshouldbeconfiguredandcalculated.

HostFailuresClusterTolerates

ThisoptionishistoricallyspeakingthemostusedforAdmissionControl.MostenvironmentsaredesignedwithanN+1redundancyandN+2isalsonotuncommon.ThisAdmissionControlPolicyuses“slots”toensureenoughcapacityisreservedforfailover,whichisafairlycomplexmechanism.SlotsarebasedonVM-levelreservationsandifreservationsarenotusedadefaultslotsizeforCPUof32MHzisdefinedandformemorythelargestmemoryoverheadofanygivenvirtualmachineisused.

Pros:

Fullyautomated(Whenahostisaddedtoacluster,HAre-calculateshowmanyslotsareavailable.)Guaranteesfailoverbycalculatingslotsizes.

Cons:

Canbeveryconservativeandinflexiblewhenreservationsareusedasthelargestreservationdictatesslotsizes.Unbalancedclustersleadtowastageofresources.Complexityforadministratorfromcalculationperspective.

PercentageasClusterResourcesReserved

ThepercentagebasedAdmissionControlisbasedonper-reservationcalculationinsteadoftheslotsmechanism.ThepercentagebasedAdmissionControlPolicyislessconservativethan“HostFailures”andmoreflexiblethan“FailoverHosts”.

Pros:

Accurateasitconsidersactualreservationpervirtualmachinetocalculateavailablefailoverresources.Clusterdynamicallyadjustswhenresourcesareadded.

Cons:

Manualcalculationsneededwhenaddingadditionalhostsinaclusterandnumberofhostfailuresneedstoremainunchanged.Unbalancedclusterscanbeaproblemwhenchosenpercentageistoolowand

vSphere6.xHADeepdive

102AdmissionControl

Page 103: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

resourcesarefragmented,whichmeansfailoverofavirtualmachinecan’tbeguaranteedasthereservationofthisvirtualmachinemightnotbeavailableasablockofresourcesonasinglehost.

Pleasenotethat,althoughafailovercannotbeguaranteed,therearefewscenarioswhereavirtualmachinewillnotbeabletorestartduetotheintegrationHAofferswithDRSandthefactthatmostclustershavesparecapacityavailabletoaccountforvirtualmachinedemandvariance.Althoughthisisacorner-casescenario,itneedstobeconsideredinenvironmentswhereabsoluteguaranteesmustbeprovided.

SpecifyFailoverHosts

Withthe“SpecifyFailoverHosts”AdmissionControlPolicy,whenoneormultiplehostsfail,HAwillattempttorestartallvirtualmachinesonthedesignatedfailoverhosts.Thedesignatedfailoverhostsareessentially“hotstandby”hosts.Inotherwords,DRSwillnotmigratevirtualmachinestothesehostswhenresourcesarescarceortheclusterisimbalanced.

Pros:

Whatyouseeiswhatyouget.Nofragmentedresources.

Cons:

Whatyouseeiswhatyouget.Dedicatedfailoverhostsnotutilizedduringnormaloperations.

RecommendationsWehavebeenaskedmanytimesforourrecommendationonAdmissionControlanditisdifficulttoansweraseachpolicyhasitsprosandcons.However,wegenerallyrecommendaPercentagebasedAdmissionControlPolicy.Itisthemostflexiblepolicyasitusestheactualreservationpervirtualmachineinsteadoftakinga“worstcase”scenarioapproachlikethenumberofhostfailuresdoes.However,thenumberofhostfailurespolicyguaranteesthefailoverlevelunderallcircumstances.Percentagebasedislessrestrictive,butofferslowerguaranteesthatinallscenariosHAwillbeabletorestartallvirtualmachines.WiththeaddedlevelofintegrationbetweenHAandDRSwebelieveaPercentagebasedAdmissionControlPolicywillfitmostenvironments.

vSphere6.xHADeepdive

103AdmissionControl

Page 104: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Basicdesignprinciple:Dothemath,andtakecustomerrequirementsintoaccount.Werecommendusinga“percentage”basedadmissioncontrolpolicy,asitisthemostflexible.

NowthatwehaverecommendedwhichAdmissionControlPolicytouse,thenextstepistoprovideguidancearoundselectingthecorrectpercentage.Wecannottellyouwhattheidealpercentageisasthattotallydependsonthesizeofyourclusterand,ofcourse,onyourresiliencymodel(N+1vs.N+2).Wecan,however,provideguidelinesaroundcalculatinghowmuchofyourresourcesshouldbesetasideandhowtopreventwastingresources.

SelectingtheRightPercentageItisacommonstrategytoselectasinglehostasapercentageofresourcesreservedforfailover.Wegeneårallyrecommendselectingapercentagewhichistheequivalentofasingleormultiplehosts,Let’sexplainwhyandwhattheimpactisofnotusingtheequivalentofasingleormultiplehosts.

Let’sstartwithanexample:aclusterexistsof8ESXihosts,eachcontaining70GBofavailableRAM.Thismightsoundlikeanawkwardmemoryconfigurationbuttosimplifythingswehavealreadysubtracted2GBasvirtualizationoverhead.Althoughvirtualizationoverheadisprobablylessthan2GB,wehaveusedthisnumbertomakecalculationseasier.ThisexamplezoomsinonmemorybutthisconceptalsoappliestoCPU,ofcourse.

ForthisclusterwewilldefinethepercentageofresourcestoreserveforbothMemoryandCPUto20%.Formemory,thisleadstoatotalclustermemorycapacityof448GB:

(70GB+70GB+70GB+70GB+70GB+70GB+70GB+70GB)*(1–20%)

Atotalof112GBofmemoryisreservedasfailovercapacity.

Onceapercentageisspecified,thatpercentageofresourceswillbeunavailableforvirtualmachines,thereforeitmakessensetosetthepercentageasclosetothevaluethatequalstheresourcesasingle(ormultiple)hostrepresents.Wewilldemonstratewhythisisimportantinsubsequentexamples.

Intheexampleabove,20%wasusedtobereservedforresourcesinan8-hostcluster.Thisconfigurationreservesmoreresourcesthanasinglehostcontributestothecluster.HA’smainobjectiveistoprovideautomaticrecoveryforvirtualmachinesafteraphysicalserverfailure.Forthisreason,itisrecommendedtoreserveresourcesequaltoasingleormultiplehosts.Whenusingtheper-hostlevelgranularityinan8-hostcluster(homogeneousconfiguredhosts),theresourcecontributionperhosttotheclusteris12.5%.However,the

vSphere6.xHADeepdive

104AdmissionControl

Page 105: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

percentageusedmustbeaninteger(wholenumber).Itisrecommendedtorounduptothevalueguaranteeingthatthefullcapacityofonehostisprotected,inthisexample(Figure40),theconservativeapproachwouldleadtoapercentageof13%.

Figure50-Settingthecorrectvalue

AggressiveApproach

Wehaveseenmanyenvironmentswherethepercentagewassettoavaluethatwaslessthanthecontributionofasinglehosttothecluster.Althoughthisapproachreducestheamountofresourcesreservedforaccommodatinghostfailuresandresultsinhigherconsolidationratios,italsooffersalowerguaranteethatHAwillbeabletorestartallvirtualmachinesafterafailure.Onemightarguethatthisapproachwillmorethanlikelyworkasmostenvironmentswillnotbefullyutilized;howeveritalsodoeseliminatetheguaranteethatafterafailureallvirtualmachineswillberecovered.Wasn’tthatthereasonforenablingHAinthefirstplace?

AddingHoststoYourCluster

Althoughthepercentageisdynamicandcalculatescapacityatacluster-level,changestoyourselectedpercentagemightberequiredwhenexpandingthecluster.Thereasonbeingthattheamountofreservedresourcesforafail-overmightnotcorrespondwiththecontributionperhostandasaresultleadtoresourcewastage.Forexample,adding4hoststoan8-hostclusterandcontinuingtousethepreviouslyconfiguredadmissioncontrolpolicyvalueof13%willresultinafailovercapacitythatisequivalentto1.5hosts.Figure41depicts

vSphere6.xHADeepdive

105AdmissionControl

Page 106: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ascenariowherean8-hostclusterisexpandedto12hosts.Eachhostholds82GHzcoresand70GBofmemory.Theclusterwasoriginallyconfiguredwithadmissioncontrolsetto13%,whichequalsto109.2GBand24.96GHz.Iftherequirementistoallowasinglehostfailure7.68Ghzand33.6GBis“wasted”asclearlydemonstratedinthediagrambelow.

Figure51-Avoidwastingresources

HowtoDefineYourPercentage?

AsexplainedearlieritwillfullydependontheN+Xmodelthathasbeenchosen.Basedonthismodel,werecommendselectingapercentagethatequalstheamountofresourcesasinglehostrepresents.So,inthecaseofan8hostclusterandN+2resiliency,thepercentageshouldbesetasfollows:2/8(*100)=25%

Basicdesignprinciple:InordertoavoidwastingresourceswerecommendcarefullyselectingyourN+Xresiliencyarchitecture.Calculatetherequiredpercentagebasedonthisarchitecture.

vSphere6.xHADeepdive

106AdmissionControl

Page 107: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

VMandApplicationMonitoringVMandApplicationMonitoringisanoftenoverlookedbutreallypowerfulfeatureofHA.ThereasonforthisismostlikelythatitisdisabledbydefaultandrelativelynewcomparedtoHA.WehavetriedtogatheralltheinformationwecouldaroundVMandApplicationMonitoring,butitisaprettystraightforwardproductthatactuallydoeswhatyouexpectitwoulddo.

Figure52-VMandApplicationMonitoring

WhyDoYouNeedVM/ApplicationMonitoring?VMandApplicationMonitoringactsonadifferentlevelfromHA.VM/AppMonitoringrespondstoasinglevirtualmachineorapplicationfailureasopposedtoHAwhichrespondstoahostfailure.Anexampleofasinglevirtualmachinefailurewould,forinstance,betheinfamous“bluescreenofdeath”.InthecaseofAppMonitoringthetypeoffailurethattriggersaresponseisdefinedbytheapplicationdeveloperoradministrator.

HowDoesVM/AppMonitoringWork?

VMMonitoringresetsindividualvirtualmachineswhenneeded.VM/AppmonitoringusesaheartbeatsimilartoHA.Ifheartbeats,and,inthiscase,VMwareToolsheartbeats,arenotreceivedforaspecific(andconfigurable)amountoftime,thevirtualmachinewillbe

vSphere6.xHADeepdive

107VMandApplicationMonitoring

Page 108: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

restarted.TheseheartbeatsaremonitoredbytheHAagentandarenotsentoveranetwork,butstaylocaltothehost.

Figure53-VMMonitoringsensitivity

WhenenablingVM/AppMonitoring,thelevelofsensitivity(Figure43)canbeconfigured.Thedefaultsettingshouldfitmostsituations.Lowsensitivitybasicallymeansthatthenumberofallowed“missed”heartbeatsishigherandthechancesofrunningintoafalsepositivearelower.However,ifafailureoccursandthesensitivitylevelissettoLow,theexperienceddowntimewillbehigher.Whenquickactionisrequiredintheeventofafailure,“highsensitivity”canbeselected.Asexpected,thisistheoppositeof“lowsensitivity”.Notethattheadvancedsettingsmentionedinthefollowingtablearedeprecatedandlistedforeducationalpurposes.

Sensitivity Failureinterval Maxfailures Maximresetstimewindow

Low 120Seconds 3 7Days

Medium 60Seconds 3 24Hours

High 30Seconds 3 1hour

ItisimportanttorememberthatVMMonitoringdoesnotinfinitelyrebootvirtualmachinesunlessyouspecifyacustompolicywiththisrequirement.Thisistoavoidaproblemfromrepeating.Bydefault,whenavirtualmachinehasbeenrebootedthreetimeswithinanhour,nofurtherattemptswillbetaken.Unlessthespecifiedtimehaselapsed.Thefollowingadvancedsettingscanbesettochangethisdefaultbehavioror“custom”canbeselectedasshowninFigure43.

AlthoughtheheartbeatproducedbyVMwareToolsisreliable,VMwareaddedafurtherverificationmechanism.Toavoidfalsepositives,VMMonitoringalsomonitorsI/Oactivityofthevirtualmachine.WhenheartbeatsarenotreceivedANDnodiskornetworkactivityhas

vSphere6.xHADeepdive

108VMandApplicationMonitoring

Page 109: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

occurredoverthelast120seconds,perdefault,thevirtualmachinewillbereset.Changingtheadvancedsetting“das.iostatsInterval”canmodifythis120-secondinterval.

Itisrecommendedtoalignthedas.iostatsIntervalwiththefailureintervalselectedintheVMMonitoringsectionofvSphereHAwithintheWebClientorthevSphereClient.

Basicdesignprinciple:Aligndas.iostatsIntervalwiththefailureinterval.

ScreenshotsOneofthemostusefulfeaturesaspartofVMMonitoringisthefactthatittakesscreenshotsofthevirtualmachine’sconsole.ThescreenshotsaretakenrightbeforeVMMonitoringresetsavirtualmachine.Itisaveryusefulfeaturewhenavirtualmachine“freezes”everyonceinawhilefornoapparentreason.Thisscreenshotcanbeusedtodebugthevirtualmachineoperatingsystemwhenneeded,andisstoredinthevirtualmachine’sworkingdirectoryasloggedintheEventsviewontheMonitortabofthevirtualmachine.

Basicdesignprinciple:VMandApplicationmonitoringcansubstantiallyincreaseavailability.ItispartoftheHAstackandwestronglyrecommendusingit!

VMMonitoringImplementationDetailsVM/AppMonitoringisimplementedaspartoftheHAagentitself.Theagentusesthe“PerformanceManager”tomonitordiskandnetworkI/O;VM/AppMonitoringusesthe“usage”countersforbothdiskandnetworkanditrequeststhesecountersonceenoughheartbeatshavebeenmissedthattheconfiguredpolicyistriggered.

Asstatedbefore,VM/AppMonitoringusesheartbeatsjustlikehost-levelHA.TheheartbeatsaremonitoredbytheHAagent,whichisresponsiblefortherestarts.Ofcourse,thisinformationisalsobeingrolledupintovCenter,butthatisdoneviatheManagementNetwork,notusingthevirtualmachinenetwork.Thisiscrucialtoknowasthismeansthatwhenavirtualmachinenetworkerroroccurs,thevirtualmachineheartbeatwillstillbereceived.Whenanerroroccurs,HAwilltriggerarestartofthevirtualmachinewhenallthreeconditionsaremet:

1. NoVMwareToolsheartbeatreceived2. NonetworkI/Ooverthelast120seconds3. NostorageI/Ooverthelast120seconds

Justlikewithhost-levelHA,theHAagentworksindependentlyofvCenterwhenitcomestovirtualmachinerestarts.

vSphere6.xHADeepdive

109VMandApplicationMonitoring

Page 110: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Timing

TheVM/Appmonitoringfeaturemonitorstheheartbeat(s)issuedbyaguestandresetsthevirtualmachineifthereisaheartbeatfailurethatsatisfiestheconfiguredpolicyforthevirtualmachine.HAcanmonitorjusttheheartbeatsissuedbytheVMwaretoolsprocessorcanmonitortheseheartbeatsplusthoseissuedbyanoptionalin-guestagent.

IftheVMmonitoringheartbeatsstopattimeT-0,theminimumtimebeforeHAwilldeclareaheartbeatfailureisintherangeof81secondsto119seconds,whereasforheartbeatsissuedbyanin-guestapplicationagent,HAwilldeclareafailureintherangeof61secondsto89seconds.Onceaheartbeatfailureisdeclaredforapplicationheartbeats,HAwillattempttoresetthevirtualmachine.However,forVMwaretoolsheartbeats,HAwillfirstcheckwhetheranyIOhasbeenissuedbythevirtualmachineforthelast2minutes(bydefault)andonlyiftherehasbeennoIOwillitissueareset.DuetohowHOSTDpublishestheI/Ostatistics,thischeckcoulddelaytheresetbyapproximately20secondsforvirtualmachinesthatwereissuingI/Owithinapproximately1minuteofT-0.

Timingdetails:therangedependsonwhentheheartbeatsstoprelativetotheHOSTDthreadthatmonitorsthem.ForthelowerboundoftheVMwaretoolsheartbeats,theheartbeatsstopasecondbeforetheHOSTDthreadruns,whichmeans,atT+31,theFDMagentonthehostwillbenotifiedofatoolsyellowstate,andthenatT+61oftheredstate,whichHAreactsto.HAthenmonitorstheheartbeatfailureforaminimumof30seconds,leadingtotheminofT+91.The30secondsmonitoringperioddonebyHAcanbeincreasedusingthedas.failureIntervalpolicysetting.Fortheupperbound,theFDMisnotnotifieduntilT+59s(T=0thefailureoccurs,T+29HOSTDnoticesitandstartstheheartbeatfailuretimer,andatT+59HOSTDreportsayellowstate,andatT+89reportsaredstate).

Fortheheartbeatsissuedbyanin-guestagent,noyellowstateissent,sothethereisnoadditional30secondsperiod.

ApplicationMonitoringApplicationMonitoringisapartofVMMonitoring.ApplicationMonitoringisafeaturethatpartnersand/orcustomerscanleveragetoincreaseresiliency,asshowninthescreenshotbelowbutfromanapplicationpointofviewratherthanfromaVMpointofview.ThereisanSDKavailabletothegeneralpublicanditispartoftheguestSDK.

vSphere6.xHADeepdive

110VMandApplicationMonitoring

Page 111: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure54-VMandApplicationMonitoring

TheGuestSDKiscurrentlyprimarilyusedbyapplicationdevelopersfrompartnerslikeSymantectodevelopsolutionsthatincreaseresilienceonadifferentlevelthanVMMonitoringandHA.InthecaseofSymantec,asimplifiedversionofVeritasClusterServer(VCS)isusedtoenableapplicationavailabilitymonitoring,includingrespondingtoissues.Notethatthisisnotamulti-nodeclusteringsolutionlikeVCSitself,butasinglenodesolution.

SymantecApplicationHA,asitiscalled,istriggeredtogettheapplicationupandrunningagainbyrestartingit.Symantec'sApplicationHAisawareofdependenciesandknowsinwhichorderservicesshouldbestartedorstopped.If,however,thisfailsforacertainnumber(configurableoptionwithinApplicationHA)oftimes,VMwareHAwillberequestedtotakeaction.Thisactionwillbearestartofthevirtualmachine.

AlthoughApplicationMonitoringisrelativelynewandthereareonlyafewpartnerscurrentlyexploringthecapabilities,inouropinion,itdoesaddawholenewlevelofresiliency.Yourin-housedevelopmentteamcouldleveragefunctionalityofferedthroughtheAPI,oryoucoulduseasolutiondevelopedbyoneofVMware’spartners.WehavetestedApplicationHAbySymantecandpersonallyfeelitisthemissinglink.ItenablesyouasSystemAdmintointegrateyourvirtualizationlayerwithyourapplicationlayer.ItensuresyouasaSystemAdminthatserviceswhichareprotectedarerestartedinthecorrectorderanditavoidsthecommonpitfallsassociatedwithrestartsandmaintenance.NotethatVMwarealsointroducedan"ApplicationMonitoring"solutionwhichwasbasedonHyperictechnology,thisproducthoweverhasbeendeprecatedandassuchwillnotbediscussedinthispublication.

ApplicationAwarenessAPI

TheApplicationAwarenessAPIisopenforeveryone.Wefeelthatthisisnottheplacetodoafulldeepdiveonhowtouseit,butwedowanttodiscussitbriefly.

TheApplicationAwarenessAPIallowsforanyonetotalktoit,includingscripts,whichmakesthepossibilitiesendless.Currentlythereare6functionsdefined:

_VMGuestAppMonitor_Enable_()

EnablesMonitoring_VMGuestAppMonitor_MarkActive_()

vSphere6.xHADeepdive

111VMandApplicationMonitoring

Page 112: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Callevery30secondstomarkapplicationasactive_VMGuestAppMonitor_Disable_()

DisableMonitoring_VMGuestAppMonitor_IsEnabled_()

ReturnsstatusofMonitoring_VMGuestAppMonitor_GetAppStatus_()

Returnsthecurrentapplicationstatusrecordedfortheapplication_VMGuestAppMonitor_Free(_)

FreestheresultoftheVMGuestAppMonitor_GetAppStatus()call

Thesefunctionscanbeusedbyyourdevelopmentteam,howeverAppMonitoringalsooffersanewexecutable.ThisallowsyoutousethefunctionalityAppMonitoringofferswithouttheneedtocompileafullbinary.Thisnewcommand,vmware-appmonitoring.exe,takesthefollowingarguments,whicharenotcoincidentallysimilartothefunctions:

EnableDisablemarkActiveisEnabledgetAppStatus

Whenrunningthecommandvmware-appmonitor.exe,whichcanbefoundunder"VMware-GuestAppMonitorSDK\bin\win32\"thefollowingoutputispresented:

Usage:vmware-appmonitor.exe{enable|disable|markActive|isEnabled|getAppStatus}

AsshowntherearemultiplewaysofleveragingApplicationMonitoringandtoenhanceresiliencyonanapplicationlevel.

vSphere6.xHADeepdive

112VMandApplicationMonitoring

Page 113: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

vSphereHAand...NowthatyouknowhowHAworksinsideout,wewanttoexplainthedifferentintegrationpointsbetweenHA,DRSandSDRS.

HAandStorageDRSvSphereHAinformsStorageDRSwhenafailurehasoccurred.ThistopreventtherelocationofanyHAprotectedvirtualmachine,meaning,avirtualmachinethatwaspoweredon,butwhichfailed,andhasnotbeenrestartedyetduetotheirbeinginsufficientcapacityavailable.Further,StorageDRSisnotallowedtoStoragevMotionavirtualmachinethatisownedbyamasterotherthantheonevCenterServeristalkingto.Thisisbecauseinsuchasituation,HAwouldnotbeabletoreprotectthevirtualmachineuntilthemastertowhichvCenterServeristalkingisabletolockthedatastoreagain.

StoragevMotionandHAIfavirtualmachineneedstoberestartedbyHAandthevirtualmachineisintheprocessofbeingStoragevMotionedandthevirtualmachinefails,therestartprocessisnotstarteduntilvCenterinformsthemasterthattheStoragevMotiontaskhascompletedorhasbeenrolledback.Ifthesourcehostfails,however,virtualmachinewillrestartthevirtualmachineaspartofthenormalworkflow.DuringaStoragevMotion,theHAagentonthehostonwhichtheStoragevMotionwasinitiatedmasksthefailurestateofthevirtualmachine.If,forwhateverreason,vCenterisunavailable,themaskingwilltimeoutafter15minutestoensurethatthevirtualmachinewillberestarted.

AlsonotethatwhenaStoragevMotioncompletes,vCenterwillreportthevirtualmachineasunprotecteduntilthemasterreportsitprotectedagainunderthenewpath.

HAandDRSHAintegratesonmultiplelevelswithDRS.ItisahugeimprovementanditissomethingthatwewantedtostressasithaschangedboththebehaviorandthereliabilityofHA.

HAandResourceFragmentation

vSphere6.xHADeepdive

113vSphereHAand...

Page 114: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Whenafailoverisinitiated,HAwillfirstcheckwhetherthereareresourcesavailableonthedestinationhostsforthefailover.If,forinstance,aparticularvirtualmachinehasaverylargereservationandtheAdmissionControlPolicyisbasedonapercentage,forexample,itcouldhappenthatresourcesarefragmentedacrossmultiplehosts.(Formoredetailsonthisscenario,seeChapter7.)HAwillaskDRStodefragmenttheresourcestoaccommodateforthisvirtualmachine’sresourcerequirements.AlthoughHAwillrequestadefragmentationofresources,aguaranteecannotbegiven.Assuch,evenwiththisadditionalintegration,youshouldstillbecautiouswhenitcomestoresourcefragmentation.

FlattenedShares

WhenshareshavebeensetcustomonavirtualmachineanissuecanarisewhenthatVMneedstoberestarted.WhenHAfailsoveravirtualmachine,itwillpower-onthevirtualmachineintheRootResourcePool.However,thevirtualmachine’sshareswerethoseconfiguredbyauserforit,andnotscaledforitbeingparentedundertheRootResourcePool.Thiscouldcausethevirtualmachinetoreceiveeithertoomanyortoofewresourcesrelativetoitsentitlement.

Ascenariowhereandwhenthiscanoccurwouldbethefollowing:

VM1hasa1000sharesandResourcePoolAhas2000shares.HoweverResourcePoolAhas2virtualmachinesandbothvirtualmachineswillhave50%ofthose“2000”shares.Thefollowingdiagramdepictsthisscenario:

vSphere6.xHADeepdive

114vSphereHAand...

Page 115: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure55-Flattensharesstartingpoint

Whenthehostfails,bothVM2andVM3willenduponthesamelevelasVM1,theRootResourcePool.However,asacustomsharesvalueof10,000wasspecifiedonbothVM2andVM3,theywillcompletelyblowawayVM1intimesofcontention.Thisisdepictedinthefollowingdiagram:

vSphere6.xHADeepdive

115vSphereHAand...

Page 116: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure56-Flattenshareshostfailure

ThissituationwouldpersistuntilthenextinvocationofDRSwouldre-parentthevirtualmachinesVM2andVM3totheiroriginalResourcePool.ToaddressthisissueHAcalculatesaflattenedsharevaluebeforethevirtualmachine’sisfailed-over.ThisflatteningprocessensuresthatthevirtualmachinewillgettheresourcesitwouldhavereceivedifithadfailedovertothecorrectResourcePool.Thisscenarioisdepictedinthefollowingdiagram.NotethatbothVM2andVM3areplacedundertheRootResourcePoolwithasharesvalueof1000.

vSphere6.xHADeepdive

116vSphereHAand...

Page 117: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure57-FlattensharesafterhostfailurebeforeDRSinvocation

Ofcourse,whenDRSisinvoked,bothVM2andVM3willbere-parentedunderResourcePool1andwillagainreceivethenumberofsharestheyhadbeenoriginallyassigned.

DPMandHA

IfDPMisenabledandresourcesarescarceduringanHAfailover,HAwilluseDRStotrytoadjustthecluster(forexample,bybringinghostsoutofstandbymodeormigratingvirtualmachinestodefragmenttheclusterresources)sothatHAcanperformthefailovers.

IfHAstrictAdmissionControlisenabled(default),DPMwillmaintainthenecessarylevelofpowered-oncapacitytomeettheconfiguredHAfailovercapacity.HAplacesaconstrainttopreventDPMfrompoweringdowntoomanyESXihostsifitwouldviolatetheAdmissionControlPolicy.

WhenHAadmissioncontrolisdisabled,HAwillpreventDPMfrompoweringoffallbutonehostinthecluster.Aminimumoftwohostsarekeptupregardlessoftheresourceconsumption.Thereasonthisbehaviorhaschangedisthatitisimpossibletorestartvirtualmachineswhentheonlyhostleftintheclusterhasjustfailed.

Inafailurescenario,ifHAcannotrestartsomevirtualmachines,itasksDRS/DPMtotrytodefragmentresourcesorbringhostsoutofstandbytoallowHAanotheropportunitytorestartthevirtualmachines.AnotherchangeisthatDRS/DPMwillpower-onorkeepon

vSphere6.xHADeepdive

117vSphereHAand...

Page 118: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

hostsneededtoaddressclusterconstraints,evenifthosehostarelightlyutilized.Onceagain,inorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendationsandallowtherestartofvirtualmachinestooccur.

vSphere6.xHADeepdive

118vSphereHAand...

Page 119: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

UseCase:StretchedClusterInthispartwewillbediscussingaspecificinfrastructurearchitectureandhowHA,DRSandStorageDRScanbeleveragedandshouldbedeployedtoincreaseavailability.Beitavailabilityofyourworkloadortheresourcesprovidedtoyourworkload,wewillguideyouthroughsomeofthedesignconsiderationsanddecisionpointsalongtheway.Ofcourse,afullunderstandingofyourenvironmentwillberequiredinordertomakeappropriatedecisionsregardingspecificimplementationdetails.Nevertheless,wehopethatthissectionwillprovideaproperunderstandingofhowcertainfeaturesplaytogetherandhowthesecanbeusedtomeettherequirementsofyourenvironmentandbuildthedesiredarchitecture.

ScenarioThescenariowehavechosenisastretchedclusteralsoreferredtoasaVMwarevSphereMetroStorageClustersolution.Wehavechosenthisspecificscenarioasitallowsustoexplainamultitudeofdesignandarchitecturalconsiderations.Althoughthisscenariohasbeentestedandvalidatedinourlab,everyenvironmentisuniqueandourrecommendationsarebasedonourexperienceandyourmileagemayvary.

AVMwarevSphereMetroStorageCluster(vMSC)configurationisaVMwarevSpherecertifiedsolutionthatcombinessynchronousreplicationwithstoragearraybasedclustering.Thesesolutionsaretypicallydeployedinenvironmentswherethedistancebetweendatacentersislimited,oftenmetropolitanorcampusenvironments.

Theprimarybenefitofastretchedclustermodelistoenablefullyactiveandworkload-balanceddatacenterstobeusedtotheirfullpotential.ManycustomersfindthisarchitectureattractiveduetothecapabilityofmigratingvirtualmachineswithvMotionandStoragevMotionbetweensites.Thisenableson-demandandnon-intrusivecross-sitemobilityofworkloads.Thecapabilityofastretchedclustertoprovidethisactivebalancingofresourcesshouldalwaysbetheprimarydesignandimplementationgoal.

Stretchedclustersolutionsofferthebenefitof:

WorkloadmobilityCross-siteautomatedloadbalancingEnhanceddowntimeavoidanceDisasteravoidance

Technicalrequirementsandconstraints

vSphere6.xHADeepdive

119UseCase-StretchedClusters

Page 120: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

DuetothetechnicalconstraintsofanonlinemigrationofVMs,thefollowingspecificrequirements,whicharelistedintheVMwareCompatibilityGuide,mustbemetpriortoconsiderationofastretchedclusterimplementation:

StorageconnectivityusingFibreChannel,iSCSI,NFS,andFCoEissupported.ThemaximumsupportednetworklatencybetweensitesfortheVMwareESXimanagementnetworksis10msround-triptime(RTT).vMotion,andStoragevMotion,supportsamaximumof150mslatencyasofvSphere6.0,butthisisnotintendedforstretchedclusteringusage.Themaximumsupportedlatencyforsynchronousstoragereplicationlinksis10msRTT.Refertodocumentationfromthestoragevendorbecausethemaximumtoleratedlatencyislowerinmostcases.ThemostcommonlysupportedmaximumRTTis5ms.TheESXivSpherevMotionnetworkhasaredundantnetworklinkminimumof250Mbps.

Thestoragerequirementsareslightlymorecomplex.AvSphereMetroStorageClusterrequireswhatisineffectasinglestoragesubsystemthatspansbothsites.Inthisdesign,agivendatastoremustbeaccessible—thatis,beabletobereadandbewrittento—simultaneouslyfrombothsites.Further,whenproblemsoccur,theESXihostsmustbeabletocontinuetoaccessdatastoresfromeitherarraytransparentlyandwithnoimpacttoongoingstorageoperations.

Thisprecludestraditionalsynchronousreplicationsolutionsbecausetheycreateaprimary–secondaryrelationshipbetweentheactive(primary)LUNwheredataisbeingaccessedandthesecondaryLUNthatisreceivingreplication.ToaccessthesecondaryLUN,replicationisstopped,orreversed,andtheLUNismadevisibletohosts.This“promoted”secondaryLUNhasacompletelydifferentLUNIDandisessentiallyanewlyavailablecopyofaformerprimaryLUN.Thistypeofsolutionworksfortraditionaldisasterrecovery–typeconfigurationsbecauseitisexpectedthatVMsmustbestarteduponthesecondarysite.ThevMSCconfigurationrequiressimultaneous,uninterruptedaccesstoenablelivemigrationofrunningVMsbetweensites.

ThestoragesubsystemforavMSCmustbeabletobereadfromandwritetobothlocationssimultaneously.Alldiskwritesarecommittedsynchronouslyatbothlocationstoensurethatdataisalwaysconsistentregardlessofthelocationfromwhichitisbeingread.Thisstoragearchitecturerequiressignificantbandwidthandverylowlatencybetweenthesitesinthecluster.Increaseddistancesorlatenciescausedelaysinwritingtodiskandadramaticdeclineinperformance.TheyalsoprecludesuccessfulvMotionmigrationbetweenclusternodesthatresideindifferentlocations.

UniformversusNon-Uniform

vSphere6.xHADeepdive

120UseCase-StretchedClusters

Page 121: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

vMSCsolutionsareclassifiedintotwodistinctcategories.Thesecategoriesarebasedonafundamentaldifferenceinhowhostsaccessstorage.Itisimportanttounderstandthedifferenttypesofstretchedstoragesolutionsbecausethisinfluencesdesignconsiderations.ThefollowingtwomaincategoriesareasdescribedontheVMwareHardwareCompatibilityList:

Uniformhostaccessconfiguration–ESXihostsfrombothsitesareallconnectedtoastoragenodeinthestorageclusteracrossallsites.PathspresentedtoESXihostsarestretchedacrossadistance.Nonuniformhostaccessconfiguration–ESXihostsateachsiteareconnectedonlytostoragenode(s)atthesamesite.PathspresentedtoESXihostsfromstoragenodesarelimitedtothelocalsite.

Thefollowingin-depthdescriptionsofbothcategoriesclearlydefinethemfromarchitecturalandimplementationperspectives.

Withuniformhostaccessconfiguration,hostsindatacenterAanddatacenterBhaveaccesstothestoragesystemsinbothdatacenters.Ineffect,thestorageareanetworkisstretchedbetweenthesites,andallhostscanaccessallLUNs.NetAppMetroClustersoftwareisanexampleofuniformstorage.Inthisconfiguration,read/writeaccesstoaLUNtakesplaceononeofthetwoarrays,andasynchronousmirrorismaintainedinahidden,read-onlystateonthesecondarray.Forexample,ifaLUNcontainingadatastoreisread/writeonthearrayindatacenterA,allESXihostsaccessthatdatastoreviathearrayindatacenterA.ForESXihostsindatacenterA,thisislocalaccess.ESXihostsindatacenterBthatarerunningVMshostedonthisdatastoresendread/writetrafficacrossthenetworkbetweendatacenters.Incaseofanoutageoranoperator-controlledshiftofcontroloftheLUNtodatacenterB,allESXihostscontinuetodetecttheidenticalLUNbeingpresented,butitisnowbeingaccessedviathearrayindatacenterB.

TheidealsituationisoneinwhichVMsaccessadatastorethatiscontrolled(read/write)bythearrayinthesamedatacenter.Thisminimizestrafficbetweendatacenterstoavoidtheperformanceimpactofreads’traversingtheinterconnect.

Thenotionof“siteaffinity”foraVMisdictatedbytheread/writecopyofthedatastore.“Siteaffinity”isalsosometimesreferredtoas“sitebias”or“LUNlocality.”ThismeansthatwhenaVMhassiteaffinitywithdatacenterA,itsread/writecopyofthedatastoreislocatedindatacenterA.Thisisexplainedinmoredetailinthe“vSphereDRS”subsectionofthissection.

vSphere6.xHADeepdive

121UseCase-StretchedClusters

Page 122: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure58-UniformConfiguration

Withnonuniformhostaccessconfiguration,hostsindatacenterAhaveaccessonlytothearraywithinthelocaldatacenter;thearray,aswellasitspeerarrayintheoppositedatacenter,isresponsibleforprovidingaccesstodatastoresinonedatacentertoESXihostsintheoppositedatacenter.EMCVPLEXisanexampleofastoragesystemthatcanbedeployedasanonuniformstoragecluster,althoughitcanalsobeconfiguredinauniformmanner.VPLEXprovidestheconceptofa“virtualLUN,”whichenablesESXihostsineachdatacentertoreadandwritetothesamedatastoreorLUN.VPLEXtechnologymaintainsthecachestateoneacharraysoESXihostsineitherdatacenterdetecttheLUNaslocal.EMCcallsthissolution“writeanywhere.”EvenwhentwoVMsresideonthesamedatastorebutarelocatedindifferentdatacenters,theywritelocallywithoutanyperformanceimpactoneitherVM.AkeypointwiththisconfigurationisthateachLUNordatastorehas“siteaffinity,”alsosometimesreferredtoas“sitebias”or“LUNlocality.”Inotherwords,ifanythinghappenstothelinkbetweenthesites,thestoragesystemonthepreferredsiteforagivendatastorewillbetheonlyoneremainingwithread/writeaccesstoit.Thispreventsanydatacorruptionincaseofafailurescenario.

vSphere6.xHADeepdive

122UseCase-StretchedClusters

Page 123: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure59-NonuniformConfiguration

Ourexamplesuseuniformstoragebecausetheseconfigurationsarecurrentlythemostcommonlydeployed.Manyofthedesignconsiderations,however,alsoapplytononuniformconfigurations.Wepointoutexceptionswhenthisisnotthecase.

ScenarioArchitectureInthissectionwewilldescribethearchitecturedeployedforthisscenario.WewillalsodiscusssomeofthebasicconfigurationandbehaviorofthevariousvSpherefeatures.Foranin-depthexplanationofeachrespectivefeature,refertotheHAandtheDRSsectionofthisbook.WewillmakespecificrecommendationsbasedonVMwarebestpracticesandprovideoperationalguidancewhereapplicable.Inourfailurescenariositwillbeexplainedhowthesepracticespreventorlimitdowntime.

Infrastructure

vSphere6.xHADeepdive

123UseCase-StretchedClusters

Page 124: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ThedescribedinfrastructureconsistsofasinglevSphere6.0clusterwithfourESXi6.0hosts.ThesehostsaremanagedbyasinglevCenterServer6.0instance.ThefirstsiteiscalledFrimley;thesecondsiteiscalledBluefin.ThenetworkbetweenFrimleydatacenterandBluefindatacenterisastretchedlayer2network.Thereisaminimaldistancebetweenthesites,asistypicalincampusclusterscenarios.

EachsitehastwoESXihosts,andthevCenterServerinstanceisconfiguredwithvSphereDRSaffinitytothehostsinBluefindatacenter.Inastretchedclusterenvironment,onlyasinglevCenterServerinstanceisused.ThisisdifferentfromatraditionalVMwareSiteRecoveryManager™configurationinwhichadualvCenterServerconfigurationisrequired.TheconfigurationofVM-to-hostaffinityrulesisdiscussedinmoredetailinthe“vSphereDRS”subsectionofthisdocument.

EightLUNsaredepictedthediagrambelow.FouroftheseareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheFrimleydatacenter;fourareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheBluefindatacenter.

vSphere6.xHADeepdive

124UseCase-StretchedClusters

Page 125: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure60-TestEnvironment

Location Hosts Datastores LocalIsolationAddress

Bluefin 172.16.103.184 Bluefin01 172.16.103.10

172.16.103.185 Bluefin02 n/a

Bluefin03 n/a

Bluefin04 n/a

Frimley 172.16.103.182 Frimley01 172.16.103.11

172.16.103.183 Frimley02 n/a

Frimley03 n/a

Frimley04 n/a

vSphere6.xHADeepdive

125UseCase-StretchedClusters

Page 126: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ThevSphereclusterisconnectedtoastretchedstoragesysteminafabricconfigurationwithauniformdeviceaccessmodel.Thismeansthateveryhostintheclusterisconnectedtobothstorageheads.Eachoftheheadsisconnectedtotwoswitches,whichareconnectedtotwosimilarswitchesinthesecondarylocation.ForanygivenLUN,oneofthetwostorageheadspresentstheLUNasread/writeviaiSCSI.Theotherstorageheadmaintainsthereplicated,read-onlycopythatiseffectivelyhiddenfromtheESXihosts.

vSphereConfigurationOurfocusinthissectionisonvSphereHA,vSphereDRS,andvSphereStorageDRSinrelationtostretchedclusterenvironments.DesignandoperationalconsiderationsregardingvSpherearecommonlyoverlookedandunderestimated.Muchemphasishastraditionallybeenplacedonthestoragelayer,butlittleattentionhasbeenappliedtohowworkloadsareprovisionedandmanaged.

Oneofthekeydriversforusingastretchedclusterisworkloadbalanceanddisasteravoidance.Howdoweensurethatourenvironmentisproperlybalancedwithoutimpactingavailabilityorseverelyincreasingtheoperationalexpenditure?Howdowebuildtherequirementsintoourprovisioningprocessandvalidateperiodicallythatwestillmeetthem?Ignoringtherequirementsmakestheenvironmentconfusingtoadministrateandlesspredictableduringthevariousfailurescenariosforwhichitshouldbeofhelp.

EachofthesethreevSpherefeatureshasveryspecificconfigurationrequirementsandcanenhanceenvironmentresiliencyandworkloadavailability.Architecturalrecommendationsbasedonourfindingsduringthetestingofthevariousfailurescenariosaregiventhroughoutthissection.

vSphereHA

Theenvironmenthasfourhostsandauniformstretchedstoragesolution.Afullsitefailureisonescenariothatmustbetakenintoaccountinaresilientarchitecture.VMwarerecommendsenablingvSphereHAadmissioncontrol.Workloadavailabilityistheprimarydriverformoststretchedclusterenvironments,soprovidingsufficientcapacityforafullsitefailureisrecommended.Suchhostsareequallydividedacrossbothsites.ToensurethatallworkloadscanberestartedbyvSphereHAonjustonesite,configuringtheadmissioncontrolpolicyto50percentforbothmemoryandCPUisrecommended.

VMwarerecommendsusingapercentage-basedpolicybecauseitoffersthemostflexibilityandreducesoperationaloverhead.Evenwhennewhostsareintroducedtotheenvironment,thereisnoneedtochangethepercentageandnoriskofaskewedconsolidationratioduetopossibleuseofVM-levelreservations.

vSphere6.xHADeepdive

126UseCase-StretchedClusters

Page 127: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ThescreenshotbelowshowsavSphereHAclusterconfiguredwithadmissioncontrolenabledandwiththepercentage-basedpolicysetto50percent.

Figure61-vSphereHAConfiguration

vSphereHAusesheartbeatmechanismstovalidatethestateofahost.Therearetwosuchmechanisms:networkheartbeatinganddatastoreheartbeating.NetworkheartbeatingistheprimarymechanismforvSphereHAtovalidateavailabilityofthehosts.Datastore

vSphere6.xHADeepdive

127UseCase-StretchedClusters

Page 128: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

heartbeatingisthesecondarymechanismusedbyvSphereHA;itdeterminestheexactstateofthehostafternetworkheartbeatinghasfailed.

Ifahostisnotreceivinganyheartbeats,itusesafail-safemechanismtodetectifitismerelyisolatedfromitsmasternodeorcompletelyisolatedfromthenetwork.Itdoesthisbypingingthedefaultgateway.Inadditiontothismechanism,oneormoreisolationaddressescanbespecifiedmanuallytoenhancereliabilityofisolationvalidation.VMwarerecommendsspecifyingaminimumoftwoadditionalisolationaddresses,witheachaddresssitelocal.

Inourscenario,oneoftheseaddressesphysicallyresidesintheFrimleydatacenter;theotherphysicallyresidesintheBluefindatacenter.ThisenablesvSphereHAvalidationforcompletenetworkisolation,evenincaseofaconnectionfailurebetweensites.Thenextscreenshotshowsanexampleofhowtoconfiguremultipleisolationaddresses.ThevSphereHAadvancedsettingusedisdas.isolationaddress.MoredetailsonhowtoconfigurethiscanbefoundinVMwareKnowledgeBasearticle1002117.

Theminimumnumberofheartbeatdatastoresistwoandthemaximumisfive.ForvSphereHAdatastoreheartbeatingtofunctioncorrectlyinanytypeoffailurescenario,VMwarerecommendsincreasingthenumberofheartbeatdatastoresfromtwotofourinastretchedclusterenvironment.Thisprovidesfullredundancyforbothdatacenterlocations.Definingfourspecificdatastoresaspreferredheartbeatdatastoresisalsorecommended,selectingtwofromonesiteandtwofromtheother.ThisenablesvSphereHAtoheartbeattoadatastoreeveninthecaseofaconnectionfailurebetweensites.Subsequently,itenablesvSphereHAtodeterminethestateofahostinanyscenario.

Addinganadvancedsettingcalleddas.heartbeatDsPerHostcanincreasethenumberofheartbeatdatastores.Thisisshowninthescreenshotbelow.

vSphere6.xHADeepdive

128UseCase-StretchedClusters

Page 129: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure62-vSphereHAAdvancedSettings

Todesignatespecificdatastoresasheartbeatdevices,VMwarerecommendsusingSelectanyoftheclusterdatastorestakingintoaccountmypreferences.ThisenablesvSphereHAtoselectanyotherdatastoreifthefourdesignateddatastoresthathavebeenmanuallyselectedbecomeunavailable.VMwarerecommendsselectingtwodatastoresineachlocationtoensurethatdatastoresareavailableateachsiteinthecaseofasitepartition.

Figure63-DatastoreHeartbeating

PermanentDeviceLossandAllPathsDownScenarios

vSphere6.xHADeepdive

129UseCase-StretchedClusters

Page 130: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AsofvSphere6.0,enhancementshavebeenintroducedtoenableanautomatedfailoverofVMsresidingonadatastorethathaseitheranallpathsdown(APD)orapermanentdeviceloss(PDL)condition.PDLisapplicableonlytoblockstoragedevices.

APDLcondition,asisdiscussedinoneofourfailurescenarios,isaconditionthatiscommunicatedbythearraycontrollertotheESXihostviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditioniscommunicatedbythearrayiswhenaLUNissetoffline.ThisconditionisusedinnonuniformmodelsduringafailurescenariotoensurethattheESXihosttakesappropriateactionwhenaccesstoaLUNisrevoked.Whenafullstoragefailureoccurs,itisimpossibletogeneratethePDLconditionbecausethereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstateisidentifiedbytheESXihostasanAPDcondition.AnotherexampleofanAPDconditioniswherethestoragenetworkhasfailedcompletely.Inthisscenario,theESXihostalsodoesnotdetectwhathashappenedwiththestorageanddeclaresanAPD.

ToenablevSphereHAtorespondtobothanAPDandaPDLcondition,vSphereHAmustbeconfiguredinaspecificway.VMwarerecommendsenablingVMComponentProtection(VMCP).Afterthecreationofthecluster,VMCPmustbeenabled,asisshownbelow.

Figure64-VMComponentProtection

Theconfigurationscreencanbefoundasfollows:

LogintoVMwarevSphereWebClient.ClickHostsandClusters.Clicktheclusterobject.ClicktheManagetab.ClickvSphereHAandthenEdit.SelectProtectagainstStorageConnectivityLoss.Selectindividualfunctionality,asdescribedinthefollowing,byopeningFailureconditionsandVMresponse.

TheconfigurationforPDLisbasic.IntheFailureconditionsandVMresponsesection,theresponsefollowingdetectionofaPDLconditioncanbeconfigured.VMwarerecommendssettingthistoPoweroffandrestartVMs.Whenthisconditionisdetected,aVMisrestartedinstantlyonahealthyhostwithinthevSphereHAcluster.

vSphere6.xHADeepdive

130UseCase-StretchedClusters

Page 131: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ForanAPDscenario,configurationmustoccurinthesamesection,asisshowninthrscreenshotbelow.BesidesdefiningtheresponsetoanAPDcondition,itisalsopossibletoalterthetimingandtoconfigurethebehaviorwhenthefailureisrestoredbeforetheAPDtimeouthaspassed.

Figure65-VMCPDetailedConfiguration

WhenanAPDconditionisdetected,atimerisstarted.After140seconds,theAPDconditionisofficiallydeclaredandthedeviceismarkedasAPDtimeout.When140secondshavepassed,vSphereHAstartscounting.ThedefaultvSphereHAtimeoutis3minutes.Whenthe3minuteshavepassed,vSphereHArestartstheimpactedVMs,butVMCPcanbeconfiguredtoresponddifferentlyifpreferred.VMwarerecommendsconfiguringittoPoweroffandrestartVMs(conservative).

ConservativereferstothelikelihoodthatvSphereHAwillbeabletorestartVMs.Whensettoconservative,vSphereHArestartsonlytheVMthatisimpactedbytheAPDifitdetectsthatahostintheclustercanaccessthedatastoreonwhichtheVMresides.Inthecaseofaggressive,vSphereHAattemptstorestarttheVMevenifitdoesn’tdetectthestateoftheotherhosts.ThiscanleadtoasituationinwhichaVMisnotrestartedbecausethereisnohostthathasaccesstothedatastoreonwhichtheVMislocated.

IftheAPDisliftedandaccesstothestorageisrestoredbeforethetimeouthaspassed,vSphereHAdoesnotunnecessarilyrestarttheVMunlessexplicitlyconfiguredtodoso.IfaresponseischosenevenwhentheenvironmenthasrecoveredfromtheAPDcondition,ResponseforAPDrecoveryafterAPDtimeoutcanbeconfiguredtoResetVMs.VMwarerecommendsleavingthissettingdisabled.

WiththereleaseofvSphere5.5,anadvancedsettingcalledDisk.AutoremoveOnPDLwasintroduced.Itisimplementedbydefault.ThisfunctionalityenablesvSpheretoremovedevicesthataremarkedasPDLandhelpspreventreaching,forexample,the256-devicelimitforanESXihost.However,ifthePDLscenarioissolvedandthedevicereturns,the

vSphere6.xHADeepdive

131UseCase-StretchedClusters

Page 132: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ESXihost’sstoragesystemmustberescannedbeforethisdeviceappears.VMwarerecommendsdisablingDisk.AutoremoveOnPDLinthehostadvancedsettingsbysettingitto0.

Figure66-Disk.AutoremoveOnPDL

vSphereDRS

vSphereDRSisusedinmanyenvironmentstodistributeloadwithinacluster.Itoffersmanyotherfeaturesthatcanbeveryhelpfulinstretchedclusterenvironments.VMwarerecommendsenablingvSphereDRStofacilitateloadbalancingacrosshostsinthecluster.ThevSphereDRSload-balancingcalculationisbasedonCPUandmemoryuse.Careshouldbetakenwithregardtobothstorageandnetworkingresourcesaswellastotrafficflow.Toavoidstorageandnetworktrafficoverheadinastretchedclusterenvironment,VMwarerecommendsimplementingvSphereDRSaffinityrulestoenablealogicalseparationofVMs.Thissubsequentlyhelpsimproveavailability.ForVMsthatareresponsibleforinfrastructureservices,suchasMicrosoftActiveDirectoryandDNS,itassistsbyensuringseparationoftheseservicesacrosssites.

vSphereDRSaffinityrulesalsohelppreventunnecessarydowntime,andstorageandnetworktrafficflowoverhead,byenforcingpreferredsiteaffinity.VMwarerecommendsaligningvSphereVM-to-hostaffinityruleswiththestorageconfiguration—thatis,settingVM-to-hostaffinityruleswithapreferencethataVMrunonahostatthesamesiteasthearray

vSphere6.xHADeepdive

132UseCase-StretchedClusters

Page 133: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

thatisconfiguredastheprimaryread/writenodeforagivendatastore.Forexample,inourtestconfiguration,VMsstoredontheFrimley01datastorearesetwithVM-to-hostaffinitywithapreferenceforhostsintheFrimleydatacenter.Thisensuresthatinthecaseofanetworkconnectionfailurebetweensites,VMsdonotloseconnectionwiththestoragesystemthatisprimaryfortheirdatastore.VM-to-hostaffinityrulesaimtoensurethatVMsstaylocaltothestorageprimaryforthatdatastore.ThiscoincidentallyalsoresultsinallreadI/O’sstayinglocal.

NOTE:DifferentstoragevendorsusedifferentterminologytodescribetherelationshipofaLUNtoaparticulararrayorcontroller.Forthepurposesofthisdocument,weusethegenericterm“storagesiteaffinity,”whichreferstothepreferredlocationforaccesstoagivenLUN.

VMwarerecommendsimplementing“shouldrules”becausetheseareviolatedbyvSphereHAinthecaseofafullsitefailure.Availabilityofservicesshouldalwaysprevail.Inthecaseof“mustrules,”vSphereHAdoesnotviolatetheruleset,andthiscanpotentiallyleadtoserviceoutages.Inthescenariowhereafulldatacenterfails,“mustrules”donotallowvSphereHAtorestarttheVMs,becausetheydonothavetherequiredaffinitytostartonthehostsintheotherdatacenter.Thisnecessitatestherecommendationtoimplement“shouldrules.”vSphereDRScommunicatestheserulestovSphereHA,andthesearestoredina“compatibilitylist”governingallowedstart-up.Ifasinglehostfails,VM-to-host“shouldrules”areignoredbydefault.VMwarerecommendsconfiguringvSphereHArulesettingstorespectVM-to-hostaffinityruleswherepossible.Withafullsitefailure,vSphereHAcanrestarttheVMsonhoststhatviolatetherules.Availabilitytakespreferenceinthisscenario.

Figure67-HAAffinityRuleSettings

Undercertaincircumstances,suchasmassivehostsaturationcoupledwithaggressiverecommendationsettings,vSphereDRScanalsoviolate“shouldrules.”Althoughthisisveryrare,werecommendmonitoringforviolationoftheserulesbecauseaviolationmightimpactavailabilityandworkloadperformance.

VMwarerecommendsmanuallydefining“sites”bycreatingagroupofhoststhatbelongtoasiteandthenaddingVMstothesesitesbasedontheaffinityofthedatastoreonwhichtheyareprovisioned.Inourscenario,onlyalimitednumberofVMswereprovisioned.VMwarerecommendsautomatingtheprocessofdefiningsiteaffinitybyusingtoolssuchasVMwarevCenterOrchestrator™orVMwarevSpherePowerCLI™.Ifautomatingtheprocessisnotan

vSphere6.xHADeepdive

133UseCase-StretchedClusters

Page 134: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

option,useofagenericnamingconventionisrecommendedtosimplifythecreationofthesegroups.VMwarerecommendsthatthesegroupsbevalidatedonaregularbasistoensurethatallVMsbelongtothegroupwiththecorrectsiteaffinity.

Thefollowingscreenshotsdepicttheconfigurationusedforourscenario.Inthefirstscreenshot,allVMsthatshouldremainlocaltotheBluefindatacenterareaddedtotheBluefinVMgroup.

Figure68-VMGroup

Next,aBluefinhostgroupiscreatedthatcontainsallhostsresidinginthislocation.

vSphere6.xHADeepdive

134UseCase-StretchedClusters

Page 135: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure69-HostGroup

Next,anewruleiscreatedthatisdefinedasa“shouldrunonrule.”ItlinksthehostgroupandtheVMgroupfortheBluefinlocation.

vSphere6.xHADeepdive

135UseCase-StretchedClusters

Page 136: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure70-RuleDefinition

Thisshouldbedoneforbothlocations,whichshouldresultintworules.

Figure71-VM/HostRules

vSphere6.xHADeepdive

136UseCase-StretchedClusters

Page 137: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

CorrectingAffinityRuleViolation

vSphereDRSassignsahighprioritytocorrectingaffinityruleviolations.Duringinvocation,theprimarygoalofvSphereDRSistocorrectanyviolationsandgeneraterecommendationstomigrateVMstothehostslistedinthehostgroup.Thesemigrationshaveahigherprioritythanload-balancingmovesandarestartedbeforethem.

vSphereDRSisinvokedevery5minutesbydefault,butitisalsotriggerediftheclusterdetectschanges.Forinstance,whenahostreconnectstothecluster,vSphereDRSisinvokedandgeneratesrecommendationstocorrecttheviolation.OurtestinghasshownthatvSphereDRSgeneratesrecommendationstocorrectaffinityrulesviolationswithin30secondsafterahostreconnectstothecluster.vSphereDRSislimitedbytheoverallcapacityofthevSpherevMotionnetwork,soitmighttakemultipleinvocationsbeforeallaffinityruleviolationsarecorrected.

vSphereStorageDRS

vSphereStorageDRSenablesaggregationofdatastorestoasingleunitofconsumptionfromanadministrativeperspective,anditbalancesVMdiskswhendefinedthresholdsareexceeded.Itensuresthatsufficientdiskresourcesareavailabletoaworkload.VMwarerecommendsenablingvSphereStorageDRSwithI/OMetricdisabled.TheuseofI/OMetricorVMwarevSphereStorageI/OControlisnotsupportedinavMSCconfiguration,asisdescribedinVMwareKnowledgeBasearticle2042596.

vSphere6.xHADeepdive

137UseCase-StretchedClusters

Page 138: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure72-StorageDRSConfiguration

vSphereStorageDRSusesvSphereStoragevMotiontomigrateVMdisksbetweendatastoreswithinadatastorecluster.Becausetheunderlyingstretchedstoragesystemsusesynchronousreplication,amigrationorseriesofmigrationshaveanimpactonreplicationtrafficandmightcausetheVMstobecometemporarilyunavailableduetocontentionfornetworkresourcesduringthemovementofdisks.MigrationtorandomdatastorescanalsopotentiallyleadtoadditionalI/OlatencyinuniformhostaccessconfigurationsifVMsarenotmigratedalongwiththeirvirtualdisks.Forexample,ifaVMresidingonahostatsiteAhasitsdiskmigratedtoadatastoreatsiteB,itcontinuesoperatingbutwithpotentiallydegradedperformance.TheVM’sdiskreadsnowaresubjecttotheincreasedlatencyassociatedwithreadingfromthevirtualiSCSIIPatsiteB.Readsaresubjecttointersitelatencyratherthanbeingsatisfiedbyalocaltarget.

Tocontrolifandwhenmigrationsoccur,VMwarerecommendsconfiguringvSphereStorageDRSinmanualmode.Thisenableshumanvalidationperrecommendationaswellasrecommendationstobeappliedduringoff-peakhours,whilegainingtheoperationalbenefitandefficiencyoftheinitialplacementfunctionality.

VMwarerecommendscreatingdatastoreclustersbasedonthestorageconfigurationwithrespecttostoragesiteaffinity.DatastoreswithasiteaffinityforsiteAshouldnotbemixedindatastoreclusterswithdatastoreswithasiteaffinityforsiteB.ThisenablesoperationalconsistencyandeasesthecreationandongoingmanagementofvSphereDRSVM-to-hostaffinityrules.EnsurethatallvSphereDRSVM-to-hostaffinityrulesareupdatedaccordingly

vSphere6.xHADeepdive

138UseCase-StretchedClusters

Page 139: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

whenVMsaremigratedviavSphereStoragevMotionbetweendatastoreclustersandwhencrossingdefinedstoragesiteaffinityboundaries.Tosimplifytheprovisioningprocess,VMwarerecommendsaligningnamingconventionsfordatastoreclustersandVM-to-hostaffinityrules.

Figure73-DatastoreClusters

Thenamingconventionusedinourtestinggivesbothdatastoresanddatastoreclustersasite-specificnametoprovideeaseofalignmentofvSphereDRShostaffinitywithVMdeploymentinthecorrelatesite.

FailureScenariosTherearemanyfailuresthatcanbeintroducedinclusteredsystems.Butinaproperlyarchitectedenvironment,vSphereHA,vSphereDRS,andthestoragesubsystemdonotdetectmanyofthese.Wedonotaddressthezero-impactfailures,suchasthefailureofasinglenetworkcable,becausetheyareexplainedindepthinthedocumentationprovidedbythestoragevendorofthevarioussolutions.Wediscussthefollowing“common”failurescenarios:

Single-hostfailureinFrimleydatacenterSingle-hostisolationinFrimleydatacenterStoragepartitionDatacenterpartitionDiskshelffailureinFrimleydatacenterFullstoragefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterandfullstoragefailureinBluefindatacenterLossofcompleteFrimleydatacenter

vSphere6.xHADeepdive

139UseCase-StretchedClusters

Page 140: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Wealsoexaminescenariosinwhichspecificsettingsareincorrectlyconfigured.ThesesettingsdeterminetheavailabilityandrecoverabilityofVMsinafailurescenario.Itisimportanttounderstandtheimpactofmisconfigurationssuchasthefollowing:

IncorrectlyconfiguredVM-to-hostaffinityrulesIncorrectlyconfiguredheartbeatdatastoresIncorrectlyconfiguredisolationaddressIncorrectlyconfiguredPDLhandlingvCenterServersplit-brainscenario

Single-HostFailureinFrimleyDataCenter

Inthisscenario,wedescribethecompletefailureofahostinFrimleydatacenter.Thisscenarioisdepictedbelow.

vSphere6.xHADeepdive

140UseCase-StretchedClusters

Page 141: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure74-Single-HostFailureScenario

Result:vSphereHAsuccessfullyrestartedallVMsinaccordancewithVM-to-hostaffinityrules.

Explanation:Ifahostfails,thecluster’svSphereHAmasternodedetectsthefailurebecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehosthasfailedcompletely,itcannotgeneratedatastoreheartbeats;thesetooaredetectedasmissingbythevSphereHAmasternode.Duringthistime,athirdavailabilitycheck—pingingthemanagementaddressesofthefailedhosts—isconducted.Ifallofthesechecksreturnasunsuccessful,themasterdeclaresthemissinghostasdeadandattemptstorestartalltheprotectedVMsthathadbeenrunningonthehostbeforethemasterlostcontactwiththehost.

vSphere6.xHADeepdive

141UseCase-StretchedClusters

Page 142: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”vSphereHAVM-to-hostaffinityrulesshouldberespectedsoallVMsarerestartedwithinthecorrectsite.

However,ifthehostelementsoftheVM-to-hostgrouparetemporarilywithoutresources,oriftheyareunavailableforrestartsforanyotherreason,vSphereHAcandisregardtherulesandrestarttheremainingVMsonanyoftheremaininghostsinthecluster,regardlessoflocationandrules.Ifthisoccurs,vSphereDRSattemptstocorrectanyviolatedaffinityrulesatthefirstinvocationandautomaticallymigratesVMsinaccordancewiththeiraffinityrulestobringVMplacementinalignment.VMwarerecommendsmanuallyinvokingvSphereDRSafterthecauseforthefailurehasbeenidentifiedandresolved.ThisensuresthatallVMsareplacedonhostsinthecorrectlocationtoavoidpossibleperformancedegradationduetomisplacement.

Single-HostIsolationinFrimleyDataCenter

Inthisscenario,wedescribetheresponsetoisolationofasinglehostinFrimleydatacenterfromtherestofthenetwork.

Figure75-Single-HostIsolationScenario

Result:VMsremainrunningbecauseisolationresponseisconfiguredtoleavepoweredon.

Explanation:Whenahostisisolated,thevSphereHAmasternodedetectstheisolationbecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehostisisolated,itgeneratesdatastore

vSphere6.xHADeepdive

142UseCase-StretchedClusters

Page 143: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

heartbeatsforthesecondaryvSphereHAdetectionmechanism.DetectionofvalidhostheartbeatsenablesthevSphereHAmasternodetodeterminethatthehostisrunningbutisisolatedfromthenetwork.Dependingontheisolationresponseconfigured,theimpactedhostcanpowerofforshutdownVMsorcanleavethempoweredon.Theisolationresponseistriggered30secondsafterthehosthasdetectedthatitisisolated.

VMwarerecommendsaligningtheisolationresponsetobusinessrequirementsandphysicalconstraints.Fromabestpracticesperspective,leavepoweredonistherecommendedisolationresponsesettingforthemajorityofenvironments.Isolatedhostsarerareinaproperlyarchitectedenvironment,giventhebuilt-inredundancyofmostmoderndesigns.Inenvironmentsthatusenetwork-basedstorageprotocols,suchasiSCSIandNFS,andwherenetworksareconverged,therecommendedisolationresponseispoweroff.Intheseenvironments,itismorelikelythatanetworkoutagethatcausesahosttobecomeisolatedalsoaffectsthehost’sabilitytocommunicatetothedatastores.

Ifanisolationresponsedifferentfromtherecommendedleavepoweredonisselectedandapowerofforshutdownresponseistriggered,thevSphereHAmasterrestartsVMsontheremainingnodesinthecluster.ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”However,becausethevSphereHArulesettingsspecifythatthevSphereHAVM-to-hostaffinityrulesshouldberespected,allVMsarerestartedwithinthecorrectsiteunder“normal”circumstances.

StoragePartition

Inthisscenario,afailurehasoccurredonthestoragenetworkbetweendatacenters,asisdepictedbelow.

vSphere6.xHADeepdive

143UseCase-StretchedClusters

Page 144: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure76-StoragePartitionScenario

Result:VMsremainrunningwithnoimpact.

Explanation:StoragesiteaffinityisdefinedforeachLUN,andvSphereDRSrulesalignwiththisaffinity.Therefore,becausestorageremainsavailablewithinthesite,noVMisimpacted.

IfforanyreasontheaffinityruleforaVMhasbeenviolatedandtheVMisrunningonahostinFrimleydatacenterwhileitsdiskresidesonadatastorethathasaffinitywithBluefindatacenter,itcannotsuccessfullyissueI/Ofollowinganintersitestoragepartition.ThisisbecausethedatastoreisinanAPDcondition.Inthisscenario,theVMcanberestartedbecausevSphereHAisconfiguredtorespondtoAPDconditions.Theresponseoccursafterthe3-minutegraceperiodhaspassed.This3-minuteperiodstartsaftertheAPDtimeoutof140secondshaspassedandtheAPDconditionhasbeendeclared.

vSphere6.xHADeepdive

144UseCase-StretchedClusters

Page 145: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

ToavoidunnecessarydowntimeinanAPDscenario,VMwarerecommendsmonitoringcomplianceofvSphereDRSrules.AlthoughvSphereDRSisinvokedevery5minutes,thisdoesnotguaranteeresolutionofallaffinityruleviolations.Therefore,topreventunnecessarydowntime,rigidmonitoringisrecommendedthatenablesquickidentificationofanomaliessuchasaVM’scompute’sresidinginonesitewhileitsstorageresidesintheothersite.

DataCenterPartition

Inthisscenario,theFrimleydatacenterisisolatedfromtheBluefindatacenter,asisdepictedbelow.

Figure77-DataCenterPartitionScenario

Result:VMsremainrunningwithnoimpact.

vSphere6.xHADeepdive

145UseCase-StretchedClusters

Page 146: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Explanation:Inthisscenario,thetwodatacentersarefullyisolatedfromeachother.Thisscenarioissimilartoboththestoragepartitionandthehostisolationscenario.VMsarenotimpactedbythisfailurebecausevSphereDRSruleswerecorrectlyimplementedandnoruleswereviolated.

vSphereHAfollowsthislogicalprocesstodeterminewhichVMsrequirerestartingduringaclusterpartition:

ThevSphereHAmasternoderunninginFrimleydatacenterdetectsthatallhostsinBluefindatacenterareunreachable.Itfirstdetectsthatnonetworkheartbeatsarebeingreceived.Itthendetermineswhetheranystorageheartbeatsarebeinggenerated.Thischeckdoesnotdetectstorageheartbeatsbecausethestorageconnectionbetweensitesalsohasfailed,andtheheartbeatdatastoresareupdatedonly“locally.”BecausetheVMswithaffinitytotheremaininghostsarestillrunning,noactionisneededforthem.Next,vSphereHAdetermineswhetherarestartcanbeattempted.However,theread/writeversionofthedatastoreslocatedinBluefindatacenterarenotaccessiblebythehostsinFrimleydatacenter.Therefore,noattemptismadetostartthemissingVMs.

Similarly,theESXihostsinBluefindatacenterdetectthatthereisnomasteravailable,andtheyinitiateamasterelectionprocess.Afterthemasterhasbeenelected,ittriestodeterminewhichVMshadbeenrunningbeforethefailureanditattemptstorestartthem.BecauseallVMswithaffinitytoBluefindatacenterarestillrunningthere,thereisnoneedforarestart.OnlytheVMswithaffinitytoFrimleydatacenterareunavailable,andvSphereHAcannotrestartthembecausethedatastoresonwhichtheyarestoredhaveaffinitywithFrimleydatacenterandareunavailableinBluefindatacenter.

IfVM-to-hostaffinityruleshavebeenviolated—thatis,VMshavebeenrunningatalocationwheretheirstorageisnotdefinedasread/writebydefault—thebehaviorchanges.Thefollowingsequencedescribeswhatwouldhappeninthatcase:

1. TheVMwithaffinitytoFrimleydatacenterbutresidinginBluefindatacenterisunabletoreachitsdatastore.ThisresultsintheVM’sbeingunabletowritetoorreadfromdisk.

2. InFrimleydatacenter,thisVMisrestartedbyvSphereHAbecausethehostsinFrimleydatacenterdonotdetecttheinstance’srunninginBluefindatacenter.

3. BecausethedatastoreisavailableonlytoFrimleydatacenter,oneofthehostsinFrimleydatacenteracquiresalockontheVMDKandisabletopoweronthisVM.

4. ThiscanresultinascenarioinwhichthesameVMispoweredonandrunninginbothdatacenters.

vSphere6.xHADeepdive

146UseCase-StretchedClusters

Page 147: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure78-GhostVM

IftheAPDresponseisconfiguredtoPoweroffandrestartVMs(aggressive),asisrecommendedintheVMComponentProtectionsectionofthiswhitepaper,theVMispoweredoffaftertheAPDtimeoutandthegraceperiodhavepassed.ThisbehaviorisnewinvSphere6.0.

IftheAPDresponseisnotcorrectlyconfigured,twoVMswillberunning,forthefollowingpossiblereasons:

ThenetworkheartbeatfromthehostthatisrunningthisVMismissingbecausethereisnoconnectiontothatsite.Thedatastoreheartbeatismissingbecausethereisnoconnectiontothatsite.ApingtothemanagementaddressofthehostthatisrunningtheVMfailsbecausethereisnoconnectiontothatsite.ThemasterlocatedinFrimleydatacenterdetectsthattheVMhadbeenpoweredonbeforethefailure.BecauseitisunabletocommunicatewiththeVM’shostinBluefindatacenterafterthefailure,itattemptstorestarttheVMbecauseitcannotdetecttheactualstate.

vSphere6.xHADeepdive

147UseCase-StretchedClusters

Page 148: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Iftheconnectionbetweensitesisrestored,aclassic“VMsplit-brainscenario”willexist.Forashortperiodoftime,twocopiesoftheVMwillbeactiveonthenetwork,withbothhavingthesameMACaddress.Onlyonecopy,however,willhaveaccesstotheVMfiles,andvSphereHAwilldetectthis.Assoonasthisisdetected,allprocessesbelongingtotheVMcopythathasnoaccesstotheVMfileswillbekilled,asisdepictedbelow.

Figure79-TasksandEvents

Inthisexample,thedowntimeequatestoaVM’shavingtoberestarted.Propermaintenanceofsiteaffinitycanpreventthis.Toavoidunnecessarydowntime,VMwarerecommendsclosemonitoringtoensurethatvSphereDRSrulesalignwithdatastoresiteaffinity.

DiskShelfFailureinFrimleyDataCenter

Inthisscenario,oneofthediskshelvesinFrimleydatacenterhasfailed.BothFrimley01andFrimley02onstorageAareimpacted.

vSphere6.xHADeepdive

148UseCase-StretchedClusters

Page 149: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure80-DiskShelfFailureScenario

Result:VMsremainrunningwithnoimpact.

Explanation:Inthisscenario,onlyadiskshelfinFrimleydatacenterhasfailed.ThestorageprocessorhasdetectedthefailureandhasinstantlyswitchedfromtheprimarydiskshelfinFrimleydatacentertothemirrorcopyinBluefindatacenter.ThereisnonoticeableimpacttoanyoftheVMsexceptforatypicalshortspikeinI/Oresponsetime.Thestoragesolutionfullydetectsandhandlesthisscenario.ThereisnoneedforarescanofthedatastoresortheHBAsbecausetheswitchoverisseamlessandtheLUNsareidenticalfromtheESXiperspective.

FullStorageFailureinFrimleyDataCenter

vSphere6.xHADeepdive

149UseCase-StretchedClusters

Page 150: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Inthisscenario,afullstoragesystemfailurehasoccurredinFrimleydatacenter.

Figure81-FullStorageFailureScenario

Result:VMsremainrunningwithnoimpact.

Explanation:WhenthefullstoragesystemfailsinFrimleydatacenter,atakeovercommandmustbeinitiatedmanually.Asdescribedpreviously,weusedaNetAppMetroClusterconfigurationtodescribethisbehavior.ThistakeovercommandisparticulartoNetAppenvironments;dependingontheimplementedstoragesystem,therequiredprocedurecandiffer.Afterthecommandhasbeeninitiated,themirrored,read-onlycopyofeachofthefaileddatastoresissettoread/writeandisinstantlyaccessible.Wehavedescribedthisprocessonanextremelyhighlevel.Formoredetails,refertothestoragevendor’sdocumentation.

vSphere6.xHADeepdive

150UseCase-StretchedClusters

Page 151: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

FromtheVMperspective,thisfailoverisseamless:Thestoragecontrollershandlethis,andnoactionisrequiredfromeitherthevSphereorstorageadministrator.AllI/OnowpassesacrosstheintrasiteconnectiontotheotherdatacenterbecauseVMsremainrunninginFrimleydatacenterwhiletheirdatastoresareaccessibleonlyinBluefindatacenter.

vSphereHAdoesnotdetectthistypeoffailure.Althoughthedatastoreheartbeatmightbelostbriefly,vSphereHAdoesnottakeactionbecausethevSphereHAmasteragentchecksforthedatastoreheartbeatonlywhenthenetworkheartbeatisnotreceivedfor3seconds.Becausethenetworkheartbeatremainsavailablethroughoutthestoragefailure,vSphereHAisnotrequiredtoinitiateanyrestarts.

PermanentDeviceLoss

Inthescenarioshownthediagrambelow,apermanentdeviceloss(PDL)conditionoccursbecausedatastoreFrimley01hasbeentakenofflineforESXi-01andESXi-02.PDLscenariosareuncommoninuniformconfigurationsandaremorelikelytooccurinanonuniformvMSCconfiguration.However,aPDLscenariocan,forinstance,occurwhentheconfigurationofastoragegroupchangesasinthecaseofthisdescribedscenario.

vSphere6.xHADeepdive

151UseCase-StretchedClusters

Page 152: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure82-PermanentDeviceLoss

Result:VMsarerestartedbyvSphereHAonESXi-03andESXi-04.

Explanation:WhenthePDLconditionoccurs,VMsrunningondatastoreFrimley01onhostsESXi-01andESXi-02arekilledinstantly.TheythenarerestartedbyvSphereHAonhostswithintheclusterthathaveaccesstothedatastore,ESXi-03andESXi-04inthisscenario.ThePDLandkillingoftheVMworldgroupcanbewitnessedbyfollowingtheentriesinthevmkernel.logfilelocatedin/var/log/ontheESXihosts.Thefollowingisanouttakeofthevmkernel.logfilewhereaPDLisrecognizedandappropriateactionistaken.

2012-03-14T13:39:25.085Zcpu7:4499)WARNING:VSCSI:4055:handle8198(vscsi4:0):openedby

wid4499(vmm0:fri-iscsi-02)hasPermanentDeviceLoss.Killingworldgroupleader4491

vSphere6.xHADeepdive

152UseCase-StretchedClusters

Page 153: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

VMwarerecommendsconfiguringResponseforDatastorewithPermanentDeviceLoss(PDL)toPoweroffandrestartVMs.ThissettingensuresthatappropriateactionistakenwhenaPDLconditionexists.Thecorrectconfigurationisshownbelow.

Figure83-APD/PDLConfiguration

FullComputeFailureinFrimleyDataCenter

Inthisscenario,afullcomputefailurehasoccurredinFrimleydatacenter.

Figure84-FullComputeFailureScenario

Result:AllVMsaresuccessfullyrestartedinBluefindatacenter.

vSphere6.xHADeepdive

153UseCase-StretchedClusters

Page 154: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Explanation:ThevSphereHAmasterwaslocatedinFrimleydatacenteratthetimeofthefullcomputefailureatthatlocation.AfterthehostsinBluefindatacenterdetectedthatnonetworkheartbeatshadbeenreceived,anelectionprocesswasstarted.Withinapproximately20seconds,anewvSphereHAmasterwaselectedfromtheremaininghosts.ThenthenewmasterdeterminedwhichhostshadfailedandwhichVMshadbeenimpactedbythisfailure.BecauseallhostsattheothersitehadfailedandallVMsresidingonthemhadbeenimpacted,vSphereHAinitiatedtherestartofalloftheseVMs.vSphereHAcaninitiate32concurrentrestartsonasinglehost,providingalowrestartlatencyformostenvironments.Theonlysequencingofstartordercomesfromthebroadhigh,medium,andlowcategoriesforvSphereHA.Thispolicymustbesetonaper-VMbasis.Thesepoliciesweredeterminedtohavebeenadheredto;high-priorityVMsstartedfirst,followedbymedium-priorityandlow-priorityVMs.

Aspartofthetest,thehostsattheFrimleydatacenterwereagainpoweredon.AssoonasvSphereDRSdetectedthatthesehostswereavailable,avSphereDRSrunwasinvoked.BecausetheinitialvSphereDRSruncorrectsonlythevSphereDRSaffinityruleviolations,resourceimbalancewasnotcorrectuntilthenextfullinvocationofvSphereDRS.vSphereDRSisinvokedbydefaultevery5minutesorwhenVMsarepoweredofforonthroughtheuseofthevCenterWebClient.

LossofFrimleyDataCenter

Inthisscenario,afullfailureofFrimleydatacenterissimulated.

vSphere6.xHADeepdive

154UseCase-StretchedClusters

Page 155: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Figure85-FullDataCenterFailureScenario

Result:AllVMsweresuccessfullyrestartedinBluefindatacenter.

Explanation:Inthisscenario,thehostsinBluefindatacenterlostcontactwiththevSphereHAmasterandelectedanewvSphereHAmaster.Becausethestoragesystemhadfailed,atakeovercommandhadtobeinitiatedonthesurvivingsite,againduetotheNetApp-specificprocess.Afterthetakeovercommandhadbeeninitiated,thenewvSphereHAmasteraccessedtheper-datastorefilesthatvSphereHAusestorecordthesetofprotectedVMs.ThevSphereHAmasterthenattemptedtorestarttheVMsthatwerenotrunningonthesurvivinghostsinBluefindatacenter.Inourscenario,allVMswererestartedwithin2minutesafterfailureandwerefullyaccessibleandfunctionalagain.

NOTE:Bydefault,vSphereHAstopsattemptingtostartaVMafter30minutes.Ifthestorageteamdoesnotissueatakeovercommandwithinthattimeframe,thevSphereadministratormustmanuallystartupVMsafterthestoragebecomesavailable.

vSphere6.xHADeepdive

155UseCase-StretchedClusters

Page 156: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

StretchedClusterusingVSANThisquestionkeepsoncomingupoverandoveragainlately,StretchedClusterusingVirtualSAN,canIdoit?WhenVirtualSANwasfirstreleasedtheanswertothisquestionwasaclearno,VirtualSANdidnotallowa"traditional"stretcheddeploymentusing2"data"sitesandathird"witness"site.AregularVirtualSANclusterstretchedacross3siteswithincampusdistancehoweverwaspossible.WithVirtualSAN6.1howeverintroducedthe"traditional"stretchedclusterdeploymentsupport.

Figure86-StretchedVirtualSANConfiguration

EverythinglearnedinthispublicationalsoappliestoastretchedVirtualSANcluster,withthatmeaningallHAandDRSbestpractices.ThereareacoupleofdifferencesthoughatthetimeofwritingbetweenavSphereMetroStorageClusterandaVSANStretchedClusterand

vSphere6.xHADeepdive

156UseCase-StretchedClusters

Page 157: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

inthissectionwewillcalloutthesedifference.PleasenotethatthereisanextensiveVirtualSANStretchedClusteringGuideavailablewrittenbyCormacHoganandthereisafullVirtualSANbookavailablewrittenbyCormacHogananmyself(DuncanEpping).IfyouwanttoknowmoredetailsaboutVirtualSANwewouldliketorefertothesetwopublications.

Firstthingthatneedstobelookedatisthenetwork.FromaVirtualSANperspectivethereareclearrequirements:

5msRTTlatencymaxbetweendatasites200msRTTlatencymaxbetweendataandwitnesssiteBothL3andL2aresupportedbetweenthedatasites

10Gbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Multicastrequired,whichmeansthatifL3isused,someformofmulticastroutingisneeded.

L3isexpectedbetweendataandthewitnesssites100Mbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Nomulticastrequiredtothewitnesssite.

WhenitcomestoHAandDRStheconfigurationisprettystraightforward.Acoupleofthingswewanttopointoutastheyareconfigurationdetailswhichareeasytoforgetabout.Somearediscussedin-depthabove,somearesettingsyouactuallydonotusewithVSAN.Wewillpointthisoutinthelistbelow:

Makesuretospecifyadditionalisolationaddresses,oneineachsite(das.isolationAddress0–1).Disablethedefaultisolationaddressifitcan’tbeusedtovalidatethestateoftheenvironmentduringapartition(ifthegatewayisn’tavailableinbothsides).DisableDatastoreheartbeating,withouttraditionalexternalstoragethereisnoreasontohavethis.EnableHAAdmissionControlandmakesureitissetto50%forCPUandMemory.KeepVMslocalbycreating“VM/Host”shouldrules.

Thatcoversmostofit,summarizedrelativelybrieflycomparedtotheexcellentdocumentCormacdevelopedwithalldetailsyoucanwishfor.MakesuretoreadthatifyouwanttoknoweveryaspectandangleofastretchedVirtualSANclusterconfiguration.

vSphere6.xHADeepdive

157UseCase-StretchedClusters

Page 158: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

AdvancedSettingsTherearevarioustypesofKBarticlesandthisKBarticleexplainsit,butletmesummarizeitandsimplifyitabittomakeiteasiertodigest.

Therearevarioussortsofadvancedsettings,butforHAthreeinparticular:

das.*–>Clusterleveladvancedsetting.fdm.*–>FDMhostleveladvancedsettingvpxd.*–>vCenterleveladvancedsetting.

Howdoyouconfigurethese?Configuringtheseistypicallystraightforward,andmostofyouhopefullyknowthisalready,ifnot,letusgooverthestepstohelpconfiguringyourenvironmentasdesired.

ClusterLevelIntheWebClient:

Click“HostsandClusters”clickyourclusterobjectclickthe“Manage”tabclick“Settings”and“vSphereHA”hitthe“Edit”button

FDMHostLevel

OpenupanSSHsessiontoyourhostandedit“/etc/opt/vmware/fdm/fdm.cfg”

vCenterLevelIntheWebClient:

Click“vCenter”click“vCenterServers”selecttheappropriatevCenterServerandclickthe“Manage”tabclick“Settings”and“AdvancedSettings”

Inthissectionwewillprimarilyfocusontheonesmostcommonlyused,afulldetailedlistcanbefoundinKB2033250.Pleasenotethateachbulletdetailstheversionwhichsupportsthisadvancedsetting.

das.maskCleanShutdownEnabled-5.0,5.1,5.5

vSphere6.xHADeepdive

158AdvancedSettings

Page 159: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

WhetherthecleanshutdownflagwilldefaulttofalseforaninaccessibleandpoweredOffVM.EnablingthisoptionwilltriggerVMfailoveriftheVM'shomedatastoreisn'taccessiblewhenitdiesorisintentionallypoweredoff.

das.ignoreInsufficientHbDatastore-5.0,5.1,5.5,6.0Suppressthehostconfigissuethatthenumberofheartbeatdatastoresislessthandas.heartbeatDsPerHost.Defaultvalueis“false”.Canbeconfiguredas“true”or“false”.

das.heartbeatDsPerHost-5.0,5.1,5.5,6.0Thenumberofrequiredheartbeatdatastoresperhost.Thedefaultvalueis2;valueshouldbebetween2and5.

das.failuredetectiontime-4.1andpriorNumberofmilliseconds,timeouttime,forisolationresponseaction(withadefaultof15000milliseconds).Pre-vSphere4.0itwasageneralbestpracticetoincreasethevalueto60000whenanactive/standbyServiceConsolesetupwasused.Thisisnolongerneeded.ForahostwithtwoServiceConsolesorasecondaryisolationaddressafailuredetectiontimeof15000isrecommended.

das.isolationaddress[x]-5.0,5.1,5.5,6.0IPaddresstheESXhostsusestocheckonisolationwhennoheartbeatsarereceived,where[x]=0‐9.(seescreenshotbelowforanexample)VMwareHAwillusethedefaultgatewayasanisolationaddressandtheprovidedvalueasanadditionalcheckpoint.Irecommendtoaddanisolationaddresswhenasecondaryserviceconsoleisbeingusedforredundancypurposes.

das.usedefaultisolationaddress-5.0,5.1,5.5,6.0Valuecanbe“true”or“false”andneedstobesettofalseincasethedefaultgateway,whichisthedefaultisolationaddress,shouldnotorcannotbeusedforthispurpose.Inotherwords,ifthedefaultgatewayisanon-pingableaddress,setthe“das.isolationaddress0”toapingableaddressanddisabletheusageofthedefaultgatewaybysettingthisto“false”.

das.isolationShutdownTimeout-5.0,5.1,5.5,6.0TimeinsecondstowaitforaVMtobecomepoweredoffafterinitiatingaguestshutdown,beforeforcingapoweroff.

das.allowNetwork[x]-5.0,5.1,5.5EnablestheuseofportgroupnamestocontrolthenetworksusedforVMwareHA,where[x]=0–?.YoucansetthevaluetobeʺServiceConsole2ʺorʺManagementNetworkʺtouse(only)thenetworksassociatedwiththoseportgroupnamesinthenetworkingconfiguration.In5.5thisoptionisignoredwhenVSANisenabledbytheway!

das.bypassNetCompatCheck-4.1andpriorDisablethe“compatiblenetwork”checkforHAthatwasintroducedwithESX3.5Update2.DisablingthischeckwillenableHAtobeconfiguredinaclusterwhich

vSphere6.xHADeepdive

159AdvancedSettings

Page 160: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

containshostsindifferentsubnets,so-calledincompatiblenetworks.Defaultvalueis“false”;settingitto“true”disablesthecheck.

das.ignoreRedundantNetWarning-5.0,5.1,5.5Removetheerroricon/messagefromyourvCenterwhenyoudon’thavearedundantServiceConsoleconnection.Defaultvalueis“false”,settingitto“true”willdisablethewarning.HAmustbereconfiguredaftersettingtheoption.

das.vmMemoryMinMB-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotMemInMB”.

das.slotMemInMB-5.0,5.1,5.5Setstheslotsizeformemorytothespecifiedvalue.Thisadvancedsettingcanbeusedwhenavirtualmachinewithalargememoryreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.

das.vmCpuMinMHz-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotCpuInMHz”.

das.slotCpuInMHz-5.0,5.1,5.5SetstheslotsizeforCPUtothespecifiedvalue.ThisadvancedsettingcanbeusedwhenavirtualmachinewithalargeCPUreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.

das.perHostConcurrentFailoversLimit-5.0,5.1,5.5Bydefault,HAwillissueupto32concurrentVMpower-onsperhost.Thissettingcontrolsthemaximumnumberofconcurrentrestartsonasinglehost.SettingalargervaluewillallowmoreVMstoberestartedconcurrentlybutwillalsoincreasetheaveragelatencytorecoverasitaddsmorestressonthehostsandstorage.

das.config.log.maxFileNum-5.0,5.1,5.5Desirednumberoflogrotations.

das.config.log.maxFileSize-5.0,5.1,5.5Maximumfilesizeinbytesofthelogfile.

das.config.log.directory-5.0,5.1,5.5Fulldirectorypathusedtostorelogfiles.

das.maxFtVmsPerHost-5.0,5.1,5.5ThemaximumnumberofprimaryandsecondaryFTvirtualmachinesthatcanbeplacedonasinglehost.Thedefaultvalueis4.

das.includeFTcomplianceChecks-5.0,5.1,5.5ControlswhethervSphereFaultTolerancecompliancechecksshouldberunaspartoftheclustercompliancechecks.SetthisoptiontofalsetoavoidclustercompliancefailureswhenFaultToleranceisnotbeingusedinacluster.

das.iostatsinterval(VMMonitoring)-5.0,5.1,5.5,6.0TheI/Ostatsintervaldeterminesifanydiskornetworkactivityhasoccurredforthe

vSphere6.xHADeepdive

160AdvancedSettings

Page 161: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

virtualmachine.Thedefaultvalueis120seconds.das.config.fdm.deadIcmpPingInterval-5.0,5.1,5.5

Defaultvalueis10.ICPMpingsareusedtodeterminewhetheraslavehostisnetworkaccessiblewhentheFDMonthathostisnotconnectedtothemaster.Thisparametercontrolstheinterval(expressedinseconds)betweenpings.

das.config.fdm.icmpPingTimeout-5.0,5.1,5.5Defaultvalueis5.DefinesthetimetowaitinsecondsforanICMPpingreplybeforeassumingthehostbeingpingedisnotnetworkaccessible.

das.config.fdm.hostTimeout-5.0,5.1,5.5Defaultis10.ControlshowlongamasterFDMwaitsinsecondsforaslaveFDMtorespondtoaheartbeatbeforedeclaringtheslavehostnotconnectedandinitiatingtheworkflowtodeterminewhetherthehostisdead,isolated,orpartitioned.

das.config.fdm.stateLogInterval-5.0,5.1,5.5Defaultis600.Frequencyinsecondstologclusterstate.

das.config.fdm.ft.cleanupTimeout-5.0,5.1,5.5Defaultis900.WhenavSphereFaultToleranceVMispoweredonbyvCenterServer,vCenterServerinformstheHAmasteragentthatitisdoingso.ThisoptioncontrolshowmanysecondstheHAmasteragentwaitsforthepoweronofthesecondaryVMtosucceed.Ifthepowerontakeslongerthanthistime(mostlikelybecausevCenterServerhaslostcontactwiththehostorhasfailed),themasteragentwillattempttopoweronthesecondaryVM.

das.config.fdm.storageVmotionCleanupTimeout-5.0,5.1,5.Defaultis900.WhenaStoragevMotionisdoneinaHAenabledclusterusingpre5.0hostsandthehomedatastoreoftheVMisbeingmoved,HAmayinterpretthecompletionofthestoragevmotionasafailure,andmayattempttorestartthesourceVM.Toavoidthisissue,theHAmasteragentwaitsthespecifiednumberofsecondsforastoragevmotiontocomplete.Whenthestoragevmotioncompletesorthetimerexpires,themasterwillassesswhetherafailureoccurred.

das.config.fdm.policy.unknownStateMonitorPeriod-5.0,5.1,5.5,6.0DefinesthenumberofsecondstheHAmasteragentwaitsafteritdetectsthataVMhasfailedbeforeitattemptstorestarttheVM.

das.config.fdm.event.maxMasterEvents-5.0,5.1,5.5Defaultis1000.Definesthemaximumnumberofeventscachedbythemaster

das.config.fdm.event.maxSlaveEvents-5.0,5.1,5.5Defaultis600.Definesthemaximumnumberofeventscachedbyaslave.

Thatisalonglistofadvancedsettingsindeed,andhopefullynooneisplanningtotrythemalloutonasinglecluster,orevenonmultipleclusters.Avoidusingadvancedsettingsasmuchaspossibleasitdefinitelyleadstoincreasedcomplexity,andoftentomoredowntimeratherthanless.

vSphere6.xHADeepdive

161AdvancedSettings

Page 162: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

SummarizingHopefullyIhavesucceededingivingyouabetterunderstandingoftheinternalworkingsofHA.IhopethatthispublicationhashandedyouthetoolsneededtoupdateyourvSpheredesignandultimatelytoincreasetheresiliencyandup-timeofyourenvironment.

Ihavetriedtosimplifysomeoftheconceptstomakeiteasiertounderstand,stillweacknowledgethatsomeconceptsaredifficulttograspandtheamountofarchitecturalchangesthatvSphere5andnewfunctionalitythatvSphere6havebroughtcanbeconfusingattimes.Ihopethoughthatafterreadingthiseveryoneisconfidentenoughtomaketherequiredorrecommendedchanges.

Ifthereareanyquestionspleasedonothesitatetoreachoutmeviatwitterormyblog,orleaveacommentontheonlineversionofthispublication.Iwilldomybesttoansweryourquestions.

vSphere6.xHADeepdive

162Summarizing

Page 163: Table of Contents - vmgu.ru · VMware vSphere 6.x HA Deepdive Like many of you I am constantly trying to explore new ways to share content with the rest of the world. Over the course

Changelog1.0.1-Minoredits1.0.2-StartwithVSANStretchedClusterinUsecasesection1.0.3-StartwithVVolsectioninVSANandVVolspecificssection1.0.4-UpdatetoVVolsectionandreplaceddiagram(figure15)

vSphere6.xHADeepdive

163Changelog