Mesos and the State of Application Deploys

Preview:

Citation preview

MESOSANDTHESTATEOFAPPLICATION

DEPLOYS

WHOAMI?KoryBrown.DevOpsEngineerFromTexas.SiteOperationsEngineeratFitbit.

WHATISTHISTALKABOUT?

MESOS!

APPLICATIONDEPLOYMENTS!

MORESPECIFICALLYHowwe,asanindustry,usedtohandleapplicationdeployments.Howwe,asanindustry,currentlyhandleapplicationdeployments.HowIthinkwe'llhandleapplicationdeploymentsinthefuture.

WAIT,WHATABOUTMESOS?We'llgetthere.

ButtounderstandtheproblemsMesossolvesfordeployswemustfirst:

Understandwhatwedotoday.Understandhowwegothere.

Mesos,inthiscase,isanimplementationdetail.

WHYSHOULDYOUCARE?It'ssuperfreakingcoolAndyouknow...DevOps

WHATISDEVOPS?ThisisnotatalkonDevOps......butit'simportantforustohaveacommondefinition.

"DevOpsisaboutrecognizingthatthebackinginfrastructureisnotseparatefromyourapplication,butratheravitalpartof

it."--Me

HOWTHINGSUSEDTOBE

YOUWANTTODEPLOY1. GetaServer.

GETASERVERPutinarequest.Ahumanallocatesaserver.Ahumaninstallsanoperatingsystem.Ahumanensuresthenetworkingiscorrect.Ahumaninstallsallspecifieddependencies.etc,etc,etc

YOUWANTTODEPLOY1. GetaServer.2. Deployyourapplication.

DEPLOYYOURAPPLICATIONAnOpsguylogsintotheserver.

Downloadsyourapplication.Installsyourapplication.Configuresyourapplication.

DEPLOYYOURAPPLICATIONTheapplicationdoesn'tstart.

Theycallyou(probablyinthemiddleofthenight).Afteranhouroftroubleshooting,yourealizetheytypoedtheconfig.Youhateeverything.

YOUWANTTODEPLOY1. GetaServer.2. Deployyourapplication.3. Repeatntimesforscale.

Eachtimeslightlydifferently

SOMETHINGGOESWRONG(Hardwarefailure/OSissues/Maintenance/etc)

GodHelpYou.

SOMETHINGGOESWRONG(Hardwarefailure/OSissues/Maintenence/etc)

Fileaticket.Datacentertechfindsthemachine.Theypulltheharddrive.

Whichisweird,becauseIsaidtheRAMtestedbad.Twoweekslaterthemachineislostinaseaoftickets.Everythingisterrible.

THISSUCKEDNotuncommontomeasureturnaroundinweeks.Littletonoautomation.Incrediblyerrorprone.Requiresapersonatmosteverystep.Generallyleadstofinelycraftedartisanalmachines.

TL;DR:

Horrificallyinefficient,errorprone,andtimeconsuming.

HOWTHINGSARETODAY

YOUWANTTODEPLOY1. Spinupanewcloudinstance.2. Runyourautomationtoolofchoice

(Puppet/Ansible/Chef/etc).3. Repeatntimes.

SOMETHINGGOESWRONG(Hardwarefailure/OSissues/Maintenance/etc)

Spinupanewinstance!

Turnaroundtimesaresolow,whocares?

Birthofthe"TreatserverslikeCattle,notpets"thoughtprocess.

SUCKSWAYLESSTurnaroundmeasuredinminutes.Almostentirelyautomated.Fairlydeterministic.Doesn'thavetoinvolvepeopleatall!

BUTNOTPERFECTStillmanageanentireOSforeachapplication.Thisincludes:

Updates!(BothOSandanyapplications/libraries)Backups!Monitoring/Metrics!etc,etc,etc

BUTNOTPERFECTNotfullyutilizingtheavailablehardware.

Unlessyourapplicationrunsconstantlyat100%CPUandRAM,youarewastingcyclesandmoney!

BUTNOTPERFECTTL;DR:

Good,butnotgreatpastacertainscale.

ENTERMESOS

WHATISMESOS?Opensourcedistributedclustermanagementtool.

WHATISMESOS?Anabstractionlayerforcomputingresources(CPU/RAM/Disk/etc)containedwithinapoolofservers.

MESOSISADISTRIBUTEDKERNEL

AtraditionalKernel(likeLinux!)providesasetofAPIsforinteractingwithavailablehardwareonalocalmachine.

AdistributedKernel(likeMesos!)providesasetofAPIsforinteractingwithavailablehardwareonapoolofservers.

THEDATACENTEROSTheKernelitselfisasmallpartofanOperatingSystem.Manyothercomponentsthatmakeitsomethinguseful.Mostrelevantforusrightnow:

Initsystem--Someprocesstomanagethelifecycleofotherprocesses.Cron--Someprocesstoruntasksonsomespecifiedinterval

Mesosimplementsthisfunctionalitywith"Frameworks".

WHAT'SAFRAMEWORK?AMesos"Application".Mustcontain2components:

Ascheduler,whichregisterswiththeMaster,andreceivesresourceoffers.OneormoreExecutors,whichlaunchestasksonslaves.

MESOSFRAMEWORKSInitSystem:

MarathonAurora

Cron:

ChronosAurora

AQUICKASIDEONFRAMEWORKSTheFrameworkslistedarejustonesthatarerelevantinthe

contextofapplicationdeploys.

AQUICKASIDEONFRAMEWORKSMesosismeanttobeusedasagenericcomputingclustermanager,whichmeansyoucanalsouse,asaframework:

JenkinsHadoopSparkStormKafkaManymorethingsprobably!

Perhapsmoreimportantly,alloftheseframeworkscansharethesameclusterofmachines!

MESOSOFFERSAlistofanagentnode'savailableresources(CPU/RAM/Disk/etc).Two-TierSchedulingSystem.

AgentNodesendsresourcestoMasternode.Masternodeusesanalgorithmcalled"DominantResourceFairness"tofairlypassthatresourceoffertoit'sregisteredFrameworks.

OTHERMESOSRELATEDWORDS

MasterService:RunsonamachinethathasbeendesignatedasaMesosMaster.Managesagents.

Agent:Runstasksthatbelongtoframeworks.

PreviouslyreferredtoasaSlave,butthisisbeingchangedinnewerversions.

Task:AunitofworkscheduledbytheFramework,andexecutedontheAgent.

ZooKeeper:Distributedconsensusframework.

Masterusesforleaderelection.Frameworksusetosharestate.

MESOSARCHITECTURE

Quorumofmasters,keptinsyncbyZooKeeper.

Slavessendresourceofferstotheelectedmaster.

Mastersofferthoseresourcestoit'sregisteredframeworks.

Iftheofferisaccepted,frameworksexecutetasksonslaves.

SHOULDMYAPPRUNONMESOS?

STATELESSAPPLICATIONWebApp(Rails,Django,Play,etc)JenkinsBuildSlavesMemcachedAny12factor-ishapplication

Yes

STATEFULAPPLICATIONSMySQLPostgresSQLJenkinsMaster

No

DOESITADDRESSCURRENTPROBLEMS?ManagingfewerOperatingSystems.Betterutilization.

MANAGINGFEWEROPERATINGSYSTEMS

ManageanOSforeachphysicalhost,NOTeachapplication.

BETTERUTILIZATIONInsteadofoneapplicationrunningperhost...

...webinpackthemalltogether!

Anyresourcesyourapplicationisn'tusingcanbesharedasneeded!

BONUS:REDUNDANCYBAKEDIN!Sincewehavethisbigpoolofsharedresources,ifan

instanceorhostdies,wejustrescheduleitsomewhereelse.

STATICALLYPROVISIONEDCLUSTER

STATICALLYPROVISIONEDCLUSTER

Railsappnowat1/3rdcapacity!

DYNAMICALLYPROVISIONEDCLUSTER

DYNAMICALLYPROVISIONEDCLUSTER

Twonodesdown!

Whocares?

DEPLOYINGWITHMESOS

YOUWANTTODEPLOY1. WriteaJobConfigforyourFrameworkofChoice.

WeuseAurora--it'sprettysweet.2. TellAuroratorunit.

SOMETHINGGOESWRONG(Hardwarefailure/OSissues/Maintenance/etc)

Aurorareschedulesyourapplicationtoanothernode.

HARDLYSUCKSATALL!Turnaroundmeasuredinseconds!Entirelyautomated!Deterministic!FaultTolerant!

DEVELOPERPERSPECTIVE

TESTINGMesoscanbeyourdev,qa,staging,andproductionenvironments.

Aurorasplitspermissionsandallocationsbetweenconfiguredenvironments.Prioritizesproduction.Meaningitwillpreemptjobsrunningin"lesser"environments.

BecomeseasytotestinaProdlikeenvironmentbecauseitISthesameenvironment!

SELF-MANAGEMENTNomorewaitingonOpsforhardware!Easilyautomate-able!

WantHerokustyleddeploys?ThisishowHerokudoesit.

TL;DRWe'vecomealonglongway.Thing'ssuckwaylessnow.Suckinglessandlesseveryday.

QUESTIONS?

Recommended