www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
1
D3.1:QoSMonitoringSystemArchitecture
Author(s) IgnacioBlanquer(UPV),GuilhermeMaluf(UFMG),WalterdosSantosFilho(UFMG)
Status Draft/Review/Approval/Final
Version v1.0
Date 2/05/2016
DisseminationLevelX PU:Public PP:Restrictedtootherprogrammeparticipants(includingtheCommission) RE:Restrictedtoagroupspecifiedbytheconsortium(includingtheCommission) CO:Confidential,onlyformembersoftheconsortium(includingtheCommission)
Abstract:Europe-BrazilCollaborationofBIGDataScientificResearchthroughCloud-CentricApplications(EUBra-BIGSEA)isamedium-scaleresearchprojectfundedbytheEuropeanCommissionundertheCooperationProgramme,andtheMinistryofScienceandTechnology(MCT)ofBrazilintheframeofthethirdEuropean-Braziliancoordinatedcall.Thedocumenthasbeenproducedwiththeco-fundingoftheEuropeanCommissionandtheMCT.ThepurposeofthisreportontheQoSMonitoringSystemArchitectureistodefinethesoftwarecomponentsthatwillcollecttheexecutiondatafromthecloudarchitecture,aswellasthemaincomponentsthatinterveneinthefullprocessofdeployment,configuration,contextualizationandexecution.
EUBra-BIGSEAisfundedbytheEuropeanCommissionundertheCooperationProgramme,Horizon2020grantagreementNo690116.
Esteprojetoéresultanteda3aChamadaCoordenadaBR-UEemTecnologiasdaInformaçãoeComunicação(TIC),anunciadapeloMinistériodeCiência,TecnologiaeInovação(MCTI)
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
2
Documentidentifier:EUBRABIGSEA-WP3-D3.1
Deliverablelead UPV
Relatedworkpackage WP3
Author(s) GermánMoltó (UPV), Ignacio Blanquer (UPV), GuilhermeMaluf (UFMG), Walter dosSantosFilho(UFMG)
Contributor(s) Ignacio Blanquer (UPV), Anna Guimarães (UFMG), Wagner Meira (UFMG), DorgivalGuedes (UFMG), Andrey Brito (UFCG), Danilo Ardagna (POLIMI), Daniele Lezzi (BSC),SandroFiore(CMCC),RobertoCascella(TRUST-IT)
Duedate 30/06/2016
Actualsubmissiondate 30/06/2016
Reviewedby DanieleLezzi(BSC),NazarenoAndrade(UFCG)
Approvedby PMB
StartdateofProject 01/01/2016
Duration 24months
Keywords QualityofService,Cloudservices,Monitoring
Versioningandcontributionhistory
Version Date Authors Notes
0.1 02/05/2016 IgnacioBlanquer(UPV),GermánMoltó(UPV) TableofContents
0.2 09/05/2016 WalterdosSantosFilho(UFMG) Firstversionofsection5content
0.3 23/05/2016 IgnacioBlanquer(UPV) Restructuringsection4
0.4 31/05/2016 WalterdosSantos,GuilhermeMaluf(UFMG) Monitoringsection
0.5 02/06/2016 DanieleLezzi(BSC),IgnacioBlanquer(UPV) Changes on the architecture diagramsandassociatedinformation.
0.6 08/06/2016 IgnacioBlanquer(UPV),GermánMoltó(UPV) Sections2-4completed
0.7 10/06/2016 IgnacioBlanquer(UPV) ExecutiveSummaryandconclusions
0.8 15/06/2016 Andrey Brito (UFCG), Walter dos Santos(UFMG), Daniele Lezzi (BSC), Danilo Ardagna(POLIMI)
In-depthreviewofthedocument.
0.9 27/06/2016 DanieleLezzi(BSC),NazarenoAndrade(UFCG),IgnacioBlanquer(UPV),GermánMoltó(UPV).
Implementation of the comments fromreviewersandfinalcandidateversion
Copyright notice: This work is licensed under the Creative Commons CC-BY 4.0 license. To view a copy of this license, visithttps://creativecommons.org/licenses/by/4.0.
Disclaimer: The content of the document herein is the sole responsibility of the publishers and it does not necessarily represent the viewsexpressedbytheEuropeanCommissionoritsservices.
Whiletheinformationcontainedinthedocumentisbelievedtobeaccurate,theauthor(s)oranyotherparticipantintheEUBra-BIGSEAConsortiummakenowarrantyofanykindwithregardtothismaterialincluding,butnotlimitedtotheimpliedwarrantiesofmerchantabilityandfitnessforaparticularpurpose.
NeithertheEUBra-BIGSEAConsortiumnoranyof itsmembers, theirofficers,employeesoragentsshallberesponsibleor liable innegligenceorotherwisehowsoeverinrespectofanyinaccuracyoromissionherein.
WithoutderogatingfromthegeneralityoftheforegoingneithertheEUBra-BIGSEAConsortiumnoranyofitsmembers,theirofficers,employeesoragentsshallbeliableforanydirectorindirectorconsequentiallossordamagecausedbyorarisingfromanyinformationadviceorinaccuracyoromissionherein.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
3
TABLEOFCONTENTEXECUTIVESUMMARY.................................................................................................................5
1 Introduction..........................................................................................................................61.1 ScopeoftheDocument................................................................................................................61.2 TargetAudience...........................................................................................................................61.3 Structure......................................................................................................................................6
2 Dataandexecutionrequirements.........................................................................................7
3 EUBra-BIGSEAInfrastructureOverview.................................................................................13.1 EUBra-BIGSEAGeneralInfrastructure...........................................................................................13.2 ProceduretoDescribecomponents..............................................................................................4
4 QoSIaaS................................................................................................................................54.1 ApplicationusecasesandLifecycle..............................................................................................5
4.1.1 Batchjobs.....................................................................................................................................54.1.2 Interactivejobs.............................................................................................................................64.1.3 Applicationlifecycle......................................................................................................................6
4.2 OASISTOSCA................................................................................................................................84.3 ResourceManagementFrameworks............................................................................................9
4.3.1 DockerSwarm.............................................................................................................................104.3.2 ApacheMesos.............................................................................................................................114.3.3 YARN...........................................................................................................................................134.3.4 Myriad.........................................................................................................................................144.3.5 ImageRepository........................................................................................................................15
4.4 JobExecution(Scheduling).........................................................................................................164.4.1 Spark(spark-submitandspark-shell).........................................................................................174.4.2 Chronos.......................................................................................................................................184.4.3 Marathon....................................................................................................................................19
4.5 ConfigurationandContextualization(andOrchestration)..........................................................204.5.1 InfrastructureManager..............................................................................................................21
4.6 Reactiveelasticity......................................................................................................................234.6.1 CLUsterEnergySavingS(CLUES).................................................................................................244.6.2 ElasticComputeClusterintheCloud(EC3)................................................................................254.6.3 CloudVirtualmachineAutomaticProcurement(CloudVAMP)..................................................26
4.7 TechnologyAnalysis...................................................................................................................284.7.1 RequirementsRelatedtoResourceManagementFrameworks.................................................284.7.2 RequirementsRelatedtoSchedulers.........................................................................................284.7.3 RequirementsRelatedtoImageRegistry...................................................................................294.7.4 RequirementsRelatedtoOrchestrationandDeployment.........................................................294.7.5 RequirementsRelatedtoReactiveElasticity..............................................................................29
4.8 Conclusion.................................................................................................................................304.8.1 ProposedAPI..............................................................................................................................31
5 MonitoringService..............................................................................................................345.1 Cloudcomputingmonitoring......................................................................................................345.2 Requirementsformonitoringservice.........................................................................................345.3 Proposedarchitecture................................................................................................................35
5.3.1 Minimumsupportedinfrastructuremetrics...............................................................................375.3.2 Supportedapplicationmetricsandlogging................................................................................38
5.4 Technologyevaluation...............................................................................................................385.4.1 Metriccollection,storageandretrieval.....................................................................................38
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
4
5.4.2 InfluxDB......................................................................................................................................395.4.3 Sensu..........................................................................................................................................405.4.4 OpenStackMonasca...................................................................................................................415.4.5 Logscollectionandanalysis........................................................................................................435.4.6 ELKStack.....................................................................................................................................44
5.5 Solutionevaluation....................................................................................................................455.6 Comparativeanalysis.................................................................................................................47
5.6.1 ProposedAPI..............................................................................................................................485.7 Conclusion.................................................................................................................................49
6 TECHNICALPROCEDURES....................................................................................................50
7 CONCLUSIONS.....................................................................................................................51
8 REFERENCES........................................................................................................................52
TableofFiguresFigure1:High-levelviewoftheEUBra-BIGSEAArchitecture.............................................................................1Figure2:RelationsamongWP7andtheothertechnicalWPs.Howthefirstsetofdeliverablesiscontributing
totheapplicationdevelopment................................................................................................................2Figure3:Detailedviewofthesoftwarearchitecture.Thecomponentslistedaredescribedwithmoredetail
alongthedocument..................................................................................................................................2Figure4:LifecycleoftheapplicationsinEUBra-BIGSEA...................................................................................7Figure5:BasicsampleTOSCAtemplate............................................................................................................9Figure6:DetailedinteractionamongcomponentsatQoScloudservicesLayer............................................31Figure7:JSONjobdescription........................................................................................................................32Figure8:High-levelofmonitoringsystemconcepts.......................................................................................36Figure9:Monitoringsystemarchitectureandcomponents...........................................................................37Figure10:Sensu/InfluxDB/ELKsoftwarearchitecture....................................................................................45Figure11:MONASCAsoftwarearchitecture...................................................................................................46
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
5
EXECUTIVESUMMARYEUBra-BIGSEAprojectaimsatdevelopingasetofcloudservicesempoweringBigDataanalyticstoeasethedevelopmentofmassivedataprocessing applications. EUBra-BIGSEAwill developmodels, predictive andreactivecloud infrastructureQoS techniques,efficientandscalableBigDataoperatorsandaprivacyandqualityanalysisframework,exposedtoseveralprogrammingenvironments.EUBra-BIGSEAaimsatcoveringgeneralrequirementsofmultipleapplicationareas,althoughitwillshowcaseinthetreatmentofmassiveconnectedsocietyinformation,andparticularlyintrafficrecommendation.
The Quality of Service (QoS) architecture is the computational core of the EUBra-BIGSEA platform. TheperformanceofdataanalyticsapplicationsrunningontheEUBRra-BIGSEAplatformareprofiledinadvance,soaQoSguaranteeisdefinedbasedontheperformancerequirements.Then,themonitoringservicewillcloselyfollowtheexecutionsoitcanreactandrequestforadditionalresourcesifneeded.Thiswillbefedback to the performance estimationmodel so the allocation of resources for the next execution can bemoreprecise.Thisway,proactiveelasticitycanadaptthesysteminadvancesothatnodrasticactionsareneededforguaranteeingtheQoSoftheapplications,meanwhile,reactiveelasticitycanworkasasecondactiontorespondtochangesthatwerenotadequatelyhandledbytheproactiveapproach.Userswillnotneed to specify theexpected resources, leveraging the resourceestimation system todecide,within theacceptableboundaries.
TheuseofMesoswillconstitutethebasicfoundationsforthemanagementofdistributedresources,asitisawidelyusedcomponenttoprovideisolation,toavoidfragmentationofdatacentresandtoachievehigh-availabilityandreliability.YarnwillbesupportedonMesosthroughMyriad.Mesoswillbeenhancedwiththecapabilityofautomatically increasingresourcestoprovidearealadaptationofworkloadtopoweronresources,whilebeingagnostictotheupperlayers.Executiongranularityoftheapplicationsisdefinedatthelevelofcontainers.
Theproject identifiesthreetypesofworkloads,persistent,periodicbatchand interactive jobs,whichwillbeservedbydifferentschedulers.PersistentjobswillbeservedbytheMarathonscheduler,periodicjobsbymeans of Chronos scheduler, and interactive jobs through interactive shells (e.g. spark shells). Thoseschedulerswill deploy frameworks thatwill embed the executable services and negotiate the resourceswithMesos.ThoserequestswillbeinterceptedtoguaranteetheavailabilityofresourcesinMesos(e.g.byaddingmoreresourcestothesystem).Therefore,atthelevelofthecloudservices,theprojectwillcapturetheframeworkrequestsandinteractwithMesos,aswellasdealwiththeprovisionandmanagementofbaseVMstopursueanefficientuseofresources.QoS cloud services will deal with the reactive elasticity of the Mesos cluster and the physical (virtual)resources that Mesos base on. Frameworks could request additional resources in case of applicationstarvationanddependingontheprofileof theapplication.Then, themonitoringsystemwill triggersuchactions.ThechoiceforthemonitoringsystemisMonasca.MonascaisayoungOpenStackprojectformonitoring.IthasbeenselectedforitssuitabilitytoEUBra-BIGSEAusecaseandtheopportunitiesthatcouldarisefromthecollaborationwithOpenStack.Monascawillmonitortheresources,servicesandapplications,triggeringactionsifrequired.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
6
1 INTRODUCTION
1.1 ScopeoftheDocumentThisdocumentdescribestheQoSMonitoringSystemArchitectureaswellasthesoftwarearchitectureofothercloud-servicerelatedcomponentsandtheirinteractions.ThedocumentfollowstheSoftwareDesignSpecification (SDS) standard (IEEE STANDARD 1016). The document tries to address all the componentsneeded for the execution of theData analyticsworkload of the project and theway to report back theprogress to be reused in further executions or to dynamically react on real-time. Each component isdescribedintermsofitsexternalinterfacesanddependenciesonothercomponents.
Wewill alsodescribe the reasonsof the choicesdone in termsof technologiesand servicesused,andafine-grainworkplanforthenextsixmonths,leavingtherestoftheprojectinacoarsergraindefinitiontoletitevolvewiththetechnologychanges.
1.2 TargetAudienceThedocumentismainly intendedfor internaluse,althoughit ispubliclyreleased.ThemaintargetofthisdocumentistheglobalteamoftechnicalexpertsoftheEUBra-BIGSEA,includingWP3,WP4,WP5andWP6.ThisdocumentgoesbeyondtheQoSmonitoringtounderstandtheglobalarchitectureofWP3.ItdescribesthesoftwarearchitectureofWP3,themainbuildingblocks,therequirementsandthesoftwareavailable,as well as it proposes an architecture to instrument the development of policies, the deployment andexecutionofapplicationsandtheelasticitymeasures.
1.3 StructureTherestof thedocument isstructured into7mainparts.First,asummaryof therequirements fromtheusecasesispresentedanddiscussed.TherequirementsthataffectWP3areoutlinedandreferencedacrossthedocument.Then,section3describesthehigh-levelEUBra-BIGSEAinfrastructure,definingthescopeofeachoneofthemainlayersandtheirrelationtootherdocuments.Then,section4describesindetailtheQualityofService(QoS)InfrastructureasaService(IaaS),describingtheapplicationusecasesandthemaintechnologiesthatwillbeusedasbasisforthedevelopmentsofWP3cloudservices.Then,section5dealswith the monitoring architecture. Then, section 6 describes the policies that describe the resourceassignment. Finally, the document ends up with section 7, which describes the procedures for coding,sourcerepositoriesandcontinuousintegration,andsection8withtheconclusions.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
7
2 DATAANDEXECUTIONREQUIREMENTSAdeeperanalysisoftherequirementsispresentedinD7.1End-UsersRequirementsElicitation.Thissectionincludes the requirements summary list for all the requirements that should be tackled from theWP3perspective.
Req# Description Priority WP
RE.1. UnrestrictedBatchjobs MUST WP3
RE.2. UnrestrictedBagofTasks MUST WP3
RE.3. QoSBatchjobs MUST WP3
RE.4. Deadline-basedscheduling SHOULD WP3
RE.5. Self-adaptingelasticity MUST WP3
RE.6. Short-jobs MUST WP3
Table1:ListofrequirementsrelatedtoWP3.
Fromthosehigh-levelrequirements,thisdocumentidentifies25technical-levelrequirements,describedinsection4.7, thatwilldrive the implementationof thesystem.Thosetechnical requirements refer toveryspecific details for five main categories: the management of resources, the scheduling of jobs, themanagementandcataloguingofcontainer images, theorchestrationanddeploymentofservicesandtheelasticity.Thesetechnicalrequirementswillbeintroducedinsections4.3to4.6.
Nexttableanticipatesthosetechnicalrequirementsandidentifiestheircorrelations
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
1
RE.1.UnrestrictedBatchjobs
RE.2.UnrestrictedBagofTasks
RE.3.QoSBatchjobs
RE.4.Deadline-basedscheduling
RE.5.Self-adaptingelasticity
RE.6.Short-jobs
R01-TheRMFmustenableaschedulertoallocateresources. x x x x
xR02-TheRMFmustsupportmultipletypesofworkloads. x x x x
x
R03-Anapplicationtopologymayinvolveseveralcontainers. x x R04-ThestatusoftheresourcesshouldbeaccessiblebyanAPI.
x
R05-RMFsshouldexposeanAPItochangethenumberofresources.
x x xR06-Monitortheusageofresourcesandtheapplicationhealth.
x x x
R07-Acentrallistofcontainersimagesshouldbeavailable. x x
xR08-Updatesshouldbeautomated. x x
x
R09-Ausershouldbeabletomodifyitsownapplication. x x
xR10-Capabilityofexecutingcontainer-basedbatchjobs. x x x
R11-IntegrationwiththeselectedCMF
x xR12-Capabilityofexecutingjobsinvolvingconcurrentprocesses.
x
R13-SupportofSparkjobs. x x x x
xR14-Supportofdeadline-QoSperiodicjobs.
x x
R15-Highavailabilityforlong-runningjobs. x
x x R16-SupportforinteractiveJobs.
x
R17-DeploymentofTOSCAblueprints. x x x x x xR18-UpdateofTOSCAblueprintstoreconfigurethesystem.
x
R19-Supportofmultipleplatforms. x x x x x xR20-Automaticscalingupofresourceswhennewjobrequestsarise.
x x x
R21-Automaticdeallocationofresourceswhenidleforagivenperiod.
xR22-Customizationpoliciesforelasticity.
x
R23-Reallocationofmemorysizeandnumberofresources.
xR24-Transparentmanagementofmemoryallocation.
x
R25-Automaticreconfigurationofexecutionkernels.
x
Tabla1:RelationamongtheTechnicalrequirementsandtheUseCaserequirements.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
1
3 EUBRA-BIGSEAINFRASTRUCTUREOVERVIEW
3.1 EUBra-BIGSEAGeneralInfrastructure
TheEUBra-BIGSEAgeneralInfrastructurecomprises4mainblocks:- QoSCloudInfrastructureservices,whichintegratesthemodellingoftheworkload,themonitoring
oftheresources,theimplementationofverticalandhorizontalelasticityandthecontextualization.Thisisthemainpartofthisdocument.
- BigDataAnalytics services,which provide operators to process huge datasets andwhich can beintegrated in the programming models. Analytics services are characterized in the QoS cloudinfrastructure models of the underlying layer, which automatically (or explicitly driven by theanalyticsservices)willadjustresourcestotheexpectedworkloadandconsideringitsspecificities.
- ProgrammingModels,whichprovideahigher-levelprogrammaticframework(Python,Java,Spark)andarealsocharacterizedbythemodelsoftheinfrastructure.Theprogrammingmodelswilleasetheparallelisationoftheapplicationsdevelopedontopofthem.
- PrivacyandSecurity framework,whichprovides themeans toannotatedataandprocessingandensurestheproperprotectionofprivacyandsecurity.
On top of those four blocks, applications are developed using the programming models and the dataanalytics extensions. Application developers are expected to use the programmingmodels andmay useotherfeaturesofunderlyinglayers,suchastheuser-levelQoSmetrics.
Figure1:High-levelviewoftheEUBra-BIGSEAArchitecture
Figure1showsthehigh-levelviewoftheEUBra-BIGSEAarchitecturedepictingtheinteractionsamongthemainblocks.Figure2showstheinteractionsamongWorkpackagesandthemainsourcesofinformationatthisstageintheproject.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
2
Figure2:RelationsamongWP7andtheothertechnicalWPs.Howthefirstsetofdeliverablesiscontributingtotheapplication
development.
Figure 3 shows a more detailed schema of the architecture. More details will be provided in the nextsectionsandinthedeliverableD5.1.EUBra-BIGSEASoftwareArchitecture.
Figure3:Detailedviewofthesoftwarearchitecture.Thecomponentslistedaredescribedwithmoredetailalongthedocument.
Inordertoimplementtheexecutionlifecyclefromeachoneoftheuserscenariolevels,itisnecessarytodefine:
- Application binary and associated dependencies, embedded in a container. The applicationdependencies can be coded as a dependency file (“a la dockerfile”) or directly registering the
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
3
containerona repository.Thebasicexecutionunit is therefore thecontainer. Someapplicationsmaynotneedotherdependenciesdifferentfromtheonesfromtheprogrammingmodels,alreadyembeddedinthebasecontainerimages.
- QoSPolicies.Thedifferentapplicationswillbeanalysedintermsofdependencies,graphexecutionpath,resourcedemandandperformance.Whenanapplicationissubmitted,aninitialestimationofresourceswillbedefinedforeachexecutedcase.Reactiveelasticityoftheplatformwillensurethatenough resources are available for the application to start. The elasticitymodules can then usemonitoring data to dynamically update the resource allocation to ensure meeting the QoS. Forexample, the platform can enlarge the memory allocation for a Spark job if it is starving or aCOMPSsjobcanhavemorephysicalcomputingslotsifitisnotprogressingasexpected.
- Aspecificexecutiondescription,programmaticallycodedorasajobdescription,toberunonthescheduleroftheplatform.
Mostof this information is inherent to theprogrammingmodel andpart of the specific algorithm tobeexecutedonit.Therefore,theuserwillnotneedtoprovideadditionalinformation,buttheanalysisoftherequirementsshouldderivethisinformationorthemethodstoobtainit.
Additionally,therequirementanalysiswillprovideinputtothefollowingcomponents:- Authentication.Theusecasesshoulddefinewhetheruseridentificationisneededornot,atwhich
levelswillbeneededandifthereisalreadyanexistingmechanism.- Authorisation.Theauthorisationontheaccesstothedatacouldbedoneatthelevelofindividuals
orgroups,andtheactionsthattheauthorisationcouldgrantcanbedifferent.Userscenarioshavetoidentifysuchneeds.
- Privacymanagement. Theproject doesnot involve themanagementof protectedpersonal data,but ithasanactivity todevelopcomponents for this.Therefore, therequirementanalysisshouldidentifytheinformationneededtocodeandprotectthelevelofprivacyofthedata,eitherfromitsacquisitionorafteritsprocessing.Itmayhappenthatrawdatawillnotrequiredataprotection,butpost-processeddatacoulddiscoverre-identifiabledata,soitcouldnotbestoredinthesameway.
- Dataacquisition.Datasources,dataformats,datavolumes,dataacquisitionrate,expectedstorageneedsanddatavalidity.
- Programmingmodels.Theprojectwillstartfromtheexistingexpertiseofthedataanalyticsgroups,who already attain experience on popular execution environments. Application developers anddata scientists will express their requirements in the form of programming languages andframeworks.
- Execution patterns. Different use cases identified in this documentwill have different executionpatterns(event-based,bagoftaskbatch,interactive).
- Loggingandmonitoring.Theusecaseswillalsodefinemetricsthatwillbeloggedtofeedbackandadjust the static and real-time policies. Use cases should definewhich are themetrics they canexpose(additionallytothebasicresourcemonitoring,suchasCPU,memoryanddiskusage,ortheonesthatcouldbeobtainedfromthescheduler,asthejobwaitingtime)andwhicharerelevanttotheusecase.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
4
3.2 ProceduretoDescribecomponentsThissectionincludesatemplatefordescribingthepotentialcomponentstobeusedinEUBra-BIGSEA.Nextsectionswillinstantiatesuchtemplateforthedifferentcomponentsidentified.
Identification Nameandlayerwherethecomponentwillbeapplied(acomponentmaybeappliedtodifferentlayers)
Potentialusage(inEUBra-BIGSEA)
Sectionwherethecomponentwillbeused.
Type Module,subprogram,datafile,controlprocedure,class,framework,service,etc.
License Licensemodel
Website URLtogetadditionalinformationorthecode.
Purpose Function and performance requirements implemented by the design component,includingderivedrequirementsthatrelatetotherequirementsoftheproject.
Function Whatthecomponentdoes,thetransformationprocess,thespecificprocessedinputs,the used algorithms, the produced outputs, where the data items are stored, andwhichdataitemsaremodified.
High LevelArchitecture
Theinternalstructureofthecomponentandtheirinnerinteractionsthatarerelevantfortheprojectrequirements.
Dependencies Other components requiredby thecomponentsandhow this component isusedbyother components. Interactiondetails suchas timing, interactionconditions (suchasorderofexecutionanddatasharing),andresponsibilityforcreation,duplication,use,storage,andeliminationofcomponents.
Interfaces Detailed descriptions of all external and internal interfaces as well as of anymechanisms for communicating through messages, parameters, or common dataareas.
Data Internaldatarequiredbythecomponenttowork.
NeededImprovement
Description of the needed improvement to the tool, that are foreseen during theEUBra-BIGSEAprojectinordertofulfilluserrequirementsstatedinprevioussection.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
5
4 QOSIAASThe execution requirements described in section 2 mainly address the execution of Batch jobs (RE.1), Bag of tasks (RE.2), QoS Batch jobs (RE.3), Short jobs (RE.6), Deadline-based jobs (RE.4) and the self adapting elasticity (RE.5). In order to address such requirements a job execution framework is needed. The Quality of services IaaS, main objective of WP3, focuses on the implementation of such a platform that could be capable of:
- Deploying and configuring the proper virtual infrastructure to run the data analytics jobs. This imply the specification of the software services, their dependencies, their configuration recipes and basic images.
- Orchestrating the execution of the workload on top of the virtual resources, leveraging the data locality and data parallelism.
- Registering the execution performance and logs to feedback and update further runs and to trigger elasticity rules.
- To implement elasticity at the level of the memory and the number of resource instances.
Those requirements should be addressed by a set of services, which may come from existing components in the literature and new developments from the consortium.
The section first describes the major workload cases and the lifecycle and the rest of the section describes the main components required for managing them.
4.1 ApplicationusecasesandLifecycleTheQoS IaaSaddresses twokindsofworkload:batch jobs thatare submitted through theprogrammingmodels(e.g.SparkorCOMPSs[R7])andinteractivejobsthatusetheDataAnalyticsconsole(e.g.OPHIDIA[R8]orSpark).
4.1.1 Batchjobs
Analytic jobsforthecreationandevaluationofthedescriptiveandpredictivemodels,aswellastocheckthequalityofthedatawillbeimplementedontheprogrammingmodelandrunasbatchjobs.Weidentifytwotypesofbatchworkload:
- Scenario1(modelfitting):Jobsarelaunchedwithapre-specifieddeadline.Jobsareexecutedoneby one and an ordered queue is maintained in a way that jobs can be executed within theirdeadlines.Whenanewjobissubmitted,thequeueisreorderedinawayalljobdeadlinescanbemetaccordingtothecurrentclusterconfiguration.Ifthisisnotthecaseortheexpectedmakespan(sum of execution time of all jobs) would be larger than a safety margin (e.g. the expectedexecution timewillbe10% longer than theestimatedone), thesystemcan increaseordecreasethe number of resources. The system periodically checks jobs execution by means of themonitoring system or specific calls to the monitoring API. If, considering data coming from themonitoring infrastructure thesystempredicts that thecurrent jobdeadlinecanbeviolated, thenclusterre-configurationistriggeredandresourcesareadded.Re-configurationisalsore-triggeredwheneveranewjobstarts.
- Scenario2(modelprojection):Wedealwithasustainedsubmissionrateofshortjobs(e.g.300jobsin5minutes,arrivinginanirregularpattern:from30to90jobsperminute).Eachjobwillrequire10seconds.Thesystemshouldquicklyallocateresourcesforthoseshortjobsinordertohavetheminimumoverhead (e.g. an overhead < 5 seconds per job).Wewant to have theminimum idletimeintheresources(e.g.<20%)toavoidwastingresources.
JobcanbeperiodicallyexecutedandwithdeadlineQoS.Forexample,foroneoftheapplications,aroutingrecommenderforpublictransportation, inordertoprovidethemostaccurateinformationontheroutes,
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
6
the information on the use of public transport from the previous day should be processed. If thisinformation isnotcomputedbythepeaktimeuserswillstartrequesting information,theresultsmaybeout-dated.Iftheanalysisisperformedquiteinadvance,theinformationmaynotbeup-to-dateenough.Sothejobshouldbeexecutedperiodicallyandendingclosetothepeakhours.
4.1.2 Interactivejobs
Ausermaylaunchaninteractiveconsoletoloadandanalyzethedata.ThiswillbepossibleforbothSparkandOphidia,whoprovideaconsolethatcanbeusedtoexecuteanalyticoperationsontopofit.
Wedifferentiatefromtwoscenarios:
- AuserexecutesanSparkjobthatwillbeusedtoloaddatainmemoryandexecutedataanalyticprimitives.Theexecutionofthetaskswillbedoneontheworkingnodesofthecluster.Sinceitwillhaveanunpredictableworkload,theallocationofresourcescannotbetunedupapriori.Therefore,thesystemshouldbeabletoreactdynamicallyprovidingadditionalresources ifneeded.Thiswillbeperformedtransparentlytotheuserandbasedonana-prioriminimumandmaximumQoS.
- Auserlinksaclusterwithintheanalyticconsole.Thiswillprovidetheuserwiththeabilitytocreatethecluster“aposteriori”,fromanalreadyactiveinstance.Thebehaviourcouldbethesameasinthepreviouscase.
4.1.3 Applicationlifecycle
TheapplicationsinEUBra-BIGSEAcanbe:
- DataanalyticbatchjobsthatcouldintrinsicallyexploitparallelismusingCOMPSs(Parallelregionsidentified by the COMPSs runtime are executed on multiple containers) or Spark jobs. Theseapplicationsmaybecharacterizedbytheexecutionpolicies,providingthemtheexpectedresourceprofileforitsexecution.
- Consoleanalysisinteractiveinstances.ThiswillimplyaccessingtoaSparkorOPHIDIAconsolethatwilldeployad-hocclusters.Thisworkloadisunpredictableandtheinfrastructurewillmainlyreacttotheresourcedemand.
Along with the dependencies related to COMPSs, OPHIDIA or Spark, those applications may requireadditionalsoftwarelibrariesandcomponentscomingfromtheprogrammingenvironmentsused.
Therefore,theapplicationlifecycleinvolves:
- Defining theVirtual resourcedependenciesproperly.Thiswill require tocreatecontainer imagesthat can be either uploaded on public repositories or kept on the infrastructure resources. Apreferredmethodwill be to use dockerfiles,DockerHub and/orGitHubAutomatedbuild. Theseimages will be configured on top of the base images of the Spark, COMPSs and OPHIDIAenvironmentsandwillbelocallycreatedthroughDocker-composeifnecessary.
- DefiningaTOSCAdocumentthatdescribestheapplication topologyandtheparametersthatcanbecustomizedinruntime.
- Submitting theapplication throughCOMPSs,OPHIDIAorSpark, referencing thosecontainersandadditional informationaboutthecomponentsthatcomposeanapplication.TheTOSCAdocumentwillpointouttotheimagesandtheapplicationtopology.
- Ordeployingtheinfrastructurefortheinteractiveusageoftheanalyticframework.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
7
Figure4:LifecycleoftheapplicationsinEUBra-BIGSEA
Thelifecycleoftheapplicationsgothroughdifferentstepsthatrangefromthedefinitionofthesoftwarearchitecture (software topology) that defines the application, the description of the dependencies, thedeployment on a resource pool, the configuration and contextualization of the instances, the actualexecutionoraccesstotheapplication,themonitoringoftheexecutionandthedynamicreconfigurationofthesystem.Figure4showsabasicdiagramofinteractionsforthislifecycle.
Therefore,wecoverinthisdocumenttheanalysisofthefollowingcomponents:
- Thespecificationofthevirtualresourcesrequired.Thisisdescribedinsection4.2,anditwillmakeuseofTOSCAtemplatesthatdescribetheapplicationtopology.
- The provisioning and orchestration of the resources. Section 4.3 includes a list of resourceprovisioningtechnologiesavailableandanalysestheselection.
- Theconfigurationandcontextualizationoftheapplication.Thisisdescribedinsection4.4.Itcoversthreemain alternatives: on-the-fly configuration of basic images, deployment of fully-configuredimagesandintermediatesolutions.
- Theexecutionofbatchjobs.Thisisdescribedinsection4.5.- Thedynamic reconfigurationof the resourcepool, described in section4.6, and addressingboth
horizontalandverticalelasticity.- Themonitoring and loggingof the resources and application services. Thiswas initially themain
objectiveofthisdeliverable,anditisfullydescribedinsection5.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
8
4.2 OASISTOSCATheEUBra-BIGSEAplatformprovidesanentrypointtoitsfunctionalityviatheOrchestratorservice,whichwillfeatureaRESTfulAPIthatreceivesaTOSCA-compliantdescriptionoftheapplicationarchitecturetobedeployed. TOSCA (Topology and Orchestration Specification for Cloud Applications) [1] is an OASISspecification for the interoperable description of application and infrastructure cloud services, therelationships between parts of the services, and the operational behavior of these services. TOSCA hasbeen selected as the language for describing applications, due to the wide ranging adoption of thisstandardandtheavailabilityofsolutionsforbothOpenNebula[2]andOpenStack[3].
ThecoreTOSCAspecificationprovidesa languagetodescribeservicecomponentsandtheirrelationshipsusingatopologymodel,anditprovidesfordescribingthemanagementproceduresthatcreateormodifyservices using orchestration processes. The combination of topology and orchestration in a ServiceTemplate describes what is needed to be preserved across deployments in different environments toenable interoperable deployment of cloud services and their management throughout the completelifecyclewhentheapplicationsareportedoveralternativecloudenvironments.
A Topology Template consists of a set of nodes and relations. The nodes form a directed graph thatdescribe all the components of an application. Each node is represented by a node type and a set ofpropertiesthatdefinetheinterfacestomanipulatethecomponent.CustomToscatypesshouldbederivedfrom the normative types. Dependencies among Tosca types enable defining the dependencies andrequirementsinastructuredandportableway.ToscadeploymentplansinstantiatesuchToscatypeswiththespecificvaluesrequired.
A Tosca Topology template describe the interactions among the components (infrastructure,platform/middleware and application modules). Components are described in YAML [4]. Applicationarchitectscanmodelservices,policiesandrequirementsofanapplicationonaTOSCAtemplate,which isextendedwithadditional artifactsby thedevelopmentplans. TheTOSCA templatesareused to test anddeploytheapplication.
Figure5showsasampleTOSCAblueprint
tosca_definitions_version: tosca_simple_yaml_1_0 imports: - custom_types: <<URL to a yaml with generic custom_types.yaml>> description: > Sample TOSCA file topology_template: inputs: download_url: type: string default: <<URL to any specific data required for the installation>> node_templates: my_app: type: <<Existing generic type>> requirements: - host: my_app_prerequisites interfaces: Standard: configure: implementation: <<URL to yml file to install the software>> inputs: download_url: { get_input: download_url } my_app_prerequisites: type: <<existing generic type>>
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
9
properties: public_ip: yes capabilities: # Host container properties host: properties: num_cpus: 1 mem_size: 1 GB # Guest Operating System properties os: properties: # host Operating System image properties type: linux distribution: ubuntu
Figure5:BasicsampleTOSCAtemplate
4.3 ResourceManagementFrameworksA Resource Management Framework (RMF) is a multitenant component that manages distributedresources,allocatesthemtotherequestsperformedbyascheduler,executesaprocessinganddeallocatesthe resources once the execution finishes. This section describes the components required tomanage apoolofresourcestodeploytheapplicationcontainersthatsupporttheexecutionoftheapplications.Thosecomponentsdealwiththefollowingfeatures:
- ResourcepoolManagement. Itshouldmanageasetofresourcestodeploycontainerswheretheapplicationswillrun.
- Loadbalancing.Itshouldidentifythefreeresourceswherethecontainerswillbedeployed.- Fault tolerance. It should provide the ability to ignore faulty resources and to rearrange the
infrastructure.- Instance deployment and execution (Orchestration). Thismay be achieved through a scheduler
system,whichcouldbeindependentofthesystem.- Elasticity. It should enable enlarging or decreasing the resources allocated, despite that thiswill
requireexternalservicestoautomatetheprocess.Therequirementsthatwehaveforsuchframeworksare:
- R01-TheRMFmustenableaschedulertoallocateresourcestoaspecifickernel.- R02-TheRMFmustsupportmultipletypesofworkloads,fromcontainerstoSparkjobs.- R03 - An application topology may involve several containers, which must be deployed in a
coordinatedway.- R04-ThestatusoftheresourcesshouldbeaccessiblebyanAPI.- R05-RMFsshouldexposeanAPItochangethenumberofresourcesallocated.- R06-Thesystemshouldmonitortheusageofresourcesandtheapplicationhealth.
Thefollowingcomponentsdonotprovideafullsetoftheabovefeatures,butsomeofthemcomplementeachother.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
10
4.3.1 DockerSwarm
Identification DockerSWARM
Potentialusage ResourcePoolManagementandContainerOrchestration.
Type Serviceandclient
License Apache2.0
Website https://docs.docker.com/swarm,https://github.com/docker/swarm
Purpose Anative clustering system forDocker. It turns apool ofDockerhosts into a single,virtualDockerhost.
Function Itenablescreatingapoolof resourceswheredockercontainerscanbe run.DockerSwarmprovidesDiscoveryServices,schedulingandbothaCLIandanAPI.
High LevelArchitecture
Eachhost runs a Swarmagent andonehost runs a Swarmmanager (on small testclusters this host may also run an agent). The manager is responsible for theorchestrationandschedulingofcontainersonthehosts.
Swarm canbe run in a high-availabilitymodeusingConsul or ZooKeeper tohandlefail-over toaback-upmanager (therecanbemultiplemanagers).Thereare severaldifferentmethodsforhowhostsarefoundandaddedtoacluster,whichisknownasdiscoveryinSwarm.Bydefault,tokenbaseddiscoveryisused,wheretheaddressesofhostsarekeptinaliststoredontheDockerHub.
ASwarmclusterrequiresanopenTCPportoneachnodeforcommunicationwiththeSwarm manager, to install Docker on each node and to create and manage TLScertificatestosecureyourcluster.
Dependencies Docker,DockerHub,andothercomponents forHA, suchasConsul [5]orZookeper[6].
Interfaces It uses the same Docker API, thus facilitating the migration from single Dockerresourcestoapoolofresources.
Data Notapplicable.
NeededImprovement
Moreadvancedschedulingpolicies,horizontalandverticalelasticity,multitenancy.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
11
4.3.2 ApacheMesos
Identification Mesos
PotentialUsage MainResourcePoolManagement.
Type Amiddleware(setofdaemons/services)forclustermanagement
License Apache2.0
Website http://mesos.apache.org/
Purpose Mesos provides efficient resource isolation and sharing across distributedapplications (frameworks). Mesos can run on top of Virtual Machines and/or baremetaland/orDockercontainers.
Function Mesos requires computing resources to be assigned to it, so that it can deploydistributedapplicationsontheseresources.Mesosprovidesthefollowingmainfunctionalities:- Resource allocation/revocation/re-allocation: Mesos implements a pluggableresource allocation module architecture. By default, Mesos includes a strictpriorityresourceallocationmoduleandamodifiedfairsharingresourceallocationmodule. Advanced scheduling features like oversubscription are available sinceversion0.23.
- Performance isolation among framework executors running on the same slavenode through pluggable isolation modules. The default mechanism leveragescontainertechnologies.
- frameworkauthorizationimplementedthroughconfigurableACLsinJSONformatthatallow1)frameworksto(re-)registerwithauthorizedroles;2)frameworkstolaunch tasks/executors as authorized users; 3) Authorized users to shutdownframework(s)through“/shutdown”HTTPendpoint
- frameworkrate-limitingthatallowstoconfigurethemaximumnumberofqueriespersecondsforeachframework.This featureaimsatprotectingthethroughputofhigh-SLAframeworksbyhavingthemasterthrottlemessagesfromother(e.g.,development,batch)frameworks.
- resourcereservation.Instaticreservation,resourcesarereservedforaparticularrole; Frameworks can use dynamic reservations (Mesos 0.24 on) to reserveoffered resources, allowing those resources to only be re-offered to the sameframework.Thisisespeciallyusefulifthetaskoftheframeworkstoredsomestateontheslave,andneedsaguaranteedsetofresourcesreserved,sothatitcanre-launchataskonthesameslavetorecoverthatstate.
- monitoring:Mesosmasterandslavenodes reporta setof statisticsandmetricsincluding details about available resources, used resources, registeredframeworks,activeslaves,andtaskstate.Thesemetricsareavailablequeryingthehttpendpointsexposedbythemasterandslavenodes.HeapstercanbeusedtomonitoraMesosinfrastructure.
- slave recovery: this feature allows1) Executors/tasks to keep runningwhen theslaveprocess isdownandallows2) a restarted slaveprocess to reconnectwithrunningexecutors/tasksontheslave.
- nativeDockersupportthatallowsuserstolaunchaDockerimageasaTask,orasanExecutor.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
12
- DNS-basedservicediscovery:itallowsapplicationsandservicesrunningonMesostofindeachotherthroughthedomainnamesystem
HighLevelArchitecture
Mesoscomprisestwomaincomponents:- Mesosmaster- Mesosslave
TheMesosmasterisadaemonthatmanagesslavedaemonsrunningoneachclusternode. The high-availability and fault-tolerance of themaster can be achieved usingmultiplemastersandZookeeper.Theslaves registerwith themasterandoffer“resources” i.e. capacity tobeable toruntasks.Mesosusestheconceptofframeworkstoencapsulateprocessingengines.Frameworks creation is triggered by the schedulers registered in the system.Frameworks reserve resources for the execution of tasks. Mesos can reallocateresources to frameworks dynamically. Resource profiles are described as “roles”which define number of resources that can be allocated (something similar to the“flavours”). This ensures that there is no physical relation between resources andframework types (e.g. a Spark framework will only consume resources whensubscribedtoMesosanditcouldhavedifferentresourceseachtimeisexecuted).TheExecutorsrunontheMesosslavesandareresponsiblefor launchingtasks.Oneor more executors from the same framework may run concurrently on the samemachine. A dedicated class, “MesosExecutorDriver”, is used both to manage theExecutor’s lifecycle and to connect the Framework Executor to Mesos. There is alightweightexecutorforcontainerofbatchjobscalled“CommandExecutor”.Mesos provides the Scheduler interface to be implemented by each specificframework; this interface includesmethods to register, re-register, unregister withtheMesosmasterandtoacceptorrejectresourceoffers.The master decides how many resources must be offered to each frameworkscheduler according to a given organizational policy, such as fair sharing, or strictpriority. To support a diverse set of policies, the master employs a modulararchitecture that makes it easy to add new allocation modules via a pluginmechanism.Mesossupports internalmonitoringof frameworks,executorsandtasks,andMesosservicescanbemonitoredthroughasystemcalledSatellite.
Dependencies Zookeperserversforleaderelectioninhigh-availabilitymode;
Interfaces Java,PythonandC++APIsandaWebUIforclustermanagement.
Data NotApplicable.
NeededImprovement
Integrationwiththereactivepoliciesandthemonitoringsystem.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
13
4.3.3 YARN
Identification YetAnotherResourceNegotiator(YARN)
PotentialUsage SchedulerfortheHadoop-basedjobs.
Type Frameworkofservices
License Apache2.0
Website http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
Purpose Adataprocessingframeworkbasedonaresourceandapplicationmanager.
Function YARNhas substitutedMapReduce for theprocessingof data inHadoop. It includesnowtwoseparatecomponents:TheResourceManager(oneperinstallation)andtheApplicationMaster(oneperapplication).
High LevelArchitecture
A YARN deployment has aResourceManager (RM) and aset of slave NodeManagers(NM) that constitute the jobenvironment. The RMmanages the resources andschedules the jobs. TheApplication Master (AM)negotiates the resources withthe RM and the NMs toexecute the tasks (a job inYARNcanbeaDAG). TheNMis the per-machine slave,which is responsible forlaunching the applications’containers, monitoring theirresourceusage(cpu,memory,disk,network)andreportingittotheRM.YARNhasaCapacityScheduler to runHadoopapplications ina shared,multitenantcluster, while maximizing the throughput and the utilization of the cluster. TheCapacityScheduler is designed to allow sharing a large cluster while giving eachorganizationcapacityguarantees.Thereisanaddedbenefitthatanorganizationcanaccessanyexcesscapacitynotbeingusedbyothers.Thisprovideselasticity for theorganizationsinacost-effectivemanner.
Dependencies ItispartofHadoop2.x
Interfaces CLI,JavaandRESTAPI.
NeededImprov. N/A
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
14
4.3.4 Myriad
Identification ApacheMyriad
PotentialUsage
Scheduler.ItcanaddresstheintegrationofYARNandMesosifeventuallyweuseboth.
Type Scheduler
License Apache2.0
Website http://myriad.incubator.apache.org/
Purpose Myriadenables theco-existenceofApacheHadoopandApacheMesosonthephysicalinfrastructure.ByrunningHadoopYARNasaMesosframework,YARNapplicationsandMesosframeworkscanrunside-by-side,dynamicallysharingclusterresources.
Function WithApacheMyriad,itispossibleto:- Runanoperationalapplications(includingthoserunninginDocker)side-by-sidewithananalyticapplications.
- AchieveHadoopmulti-tenancybyprovisioninglogicalHadoopclustersforeachuser.- YARN running as aMesos Framework,with resourcemanager and nodemanagersrunninginsideMesoscontainers.
- AbilitytolaunchmultipleYARNclustersonthesamesetofnodes.- Ability to deploy YARN Resource Manager using Marathon. This feature leveragesMarathon's dynamic scheduling, process supervision, and integration with servicediscovery(Mesos-DNS).
- AbilitytorunMapReducev2andassociatedlibrariessuchasHive,Pig,andMahout.
High LevelArch.
Mesos acts as the main resource manager.Myriad provides a Control Plane thatorchestratestheschedulingbetweenMesosandYARNschedulers.MesoswillspawnYARNNodeManagersasataskin theMesosNode (step1).TheNodeManagerregisters its capacity to the YARN ResourceManager (2), and then the Resource Managercanlaunchthecontainers(3).TheNodeManagercapacitycanbereadjustedtoenable other Mesos workload to profit unusedYARNresources.The control plane implements also horizontal scaling by getting information about thestarvationofresourcesfromYARNResourceManager.YARNclustersrunningonMesosthatcanallocateresourcesindifferentways:
- Static - Administrators can use an API or a GUI to add or remove nodemanagersorauxiliaryservicesliketheJobHistoryServer.
- Fine-grained - Administrators can provision thin node managers that aredynamicallyresizedbasedonapplicationdemand.
Dependenc. YARNandMesos
Interfaces RESTAPI
NeededImprov.
InterfacetheControlPlanewithoursolutionsandmonitoringforverticalandhorizontalelasticity.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
15
4.3.5 ImageRepository
One of the workloads to be supported in EUBra-BIGSEA are batch jobs wrapped along with theirdependenciesoncontainers.Thebuilding,registrationanddownloadingofcontainersshouldbedoneinanefficientway, reducingcommunicationoverhead.This registry shouldbeavailable toall the resources inthesystemandupdateshouldbesimpleandautomatic.
Requirements:
- R07-Acentrallistofcontainersimagesshouldbeavailable.- R08-Updatesshouldbeautomated.- R09-Ausershouldbeabletomodifyitsownapplication.
WeproposeusingDockerHubandtheautomatedbuildprocesstodealwiththis.
Identification DockerHub/localdockerimageregistry
PotentialUsage DockerHubcanbeusedtostorethebasicimagesofSpark,COMPSsandOPHIDIAandotherprocessingenvironmentsneeded.Thelocalregistryintheuser’sdeploymentwillstore the customised imageswith the additional software dependencies. Eventually,relevantandshareableimagescanbepulledintheDockerHubrepository.
Type Service
License DockerHubisaproprietarysolution.ThelocalregistryisApache2.0
Website https://hub.docker.com,https://github.com/docker/distribution
Purpose AregistryofconfiguredDockerimages.
Function Docker images are stored locally where theywill be run. A user can customize andstore amodified version of a container image locally, not sharing it with any otheruser.DockerHub is used toupload and share container images that areusedbymultipleusers and sites. Users “pull” the images that are available in this repository anddownloadthemontheylocalresources.Alternatively,ifthesourcecodeforaDockerimageisonGitHuborBitbucket,youcanusean“Automatedbuild”repository,whichisbuiltbytheDockerHubservices.
High LevelArchitecture
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
16
Dependencies Docker
Interfaces NativeDockerCLIandanAPIREST
Data N/A
NeededImprovement
Thosecomponentscannotbemodified.
4.3.5.1 Automatedbuild
Inorder to facilitate themaintenanceof customcontainer imageswith theembeddeddependencies,anautomatedbuildprocesswillbeimplemented.Despitethecreationandmaintenanceofcontainerimagesisasimpleprocess,whensharedrepositoriesaremanagedthisprocesscouldleadtoincompletebuildsormanualsynchronization.Therefore,weproposetousetheDockerautomatedbuild.
Forthisprocess,theEUBra-BIGSEAGitHuborganization(https://github.com/organizations/EUBra-BIGSEA/)will be used in coordination with the EUBra-BIGSEA Docker Hub organization(https://hub.docker.com/u/eubrabigsea/). EUBra-BIGSEA developers will be invited to join bothorganizations.DockerHubenable creatingautomatedbuilds froma repository inGitHub.The repositoryshouldhaveadockerfileuploadedintheGitHubrepository.Anychangecommittotherepositorywillleadto the automatic creation of the docker image, which will be automatically registered in the properorganization repository of DockerHub. Users can pull the new images by typing “docker pulleubrabigsea/image_name”.
4.4 JobExecution(Scheduling)
The Resource Management Frameworks (RMFs) described in section 4.3 provide a way to share andallocate resources to multiple executions, even for heterogeneous traffic. On top of these systems,schedulersdispatchjobsontheallocatedresources.
Accordingtothetypeofjobs,EUBra-BIGSEAhasthefollowingrequirements:- R10-Capabilityofexecutingcontainer-basedbatchjobs.- R11-IntegrationwiththeselectedCMF- R12-Capabilityofexecutingcomposedjobsinvolvingmultipleconcurrentprocesses.- R13-SupportofSparkjobs.- R14-Supportofdeadline-QoSperiodicjobs.- R15-Highavailabilityforlong-runningjobs.- R16-SupportforinteractiveJobs.
Thissectiondescribesseveralproposals.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
17
4.4.1 Spark(spark-submitandspark-shell)
Identification ApacheSpark
PotentialUsage
SparkisoneofthesupportedprogrammingmodelsinEUBra-BIGSEAandwillbemainlyusedfordatafilteringandpreparation.
Type ExecutionFramework
License Apache2.0
Website http://spark.apache.org
Purpose In-memoryengine for large-scaledataprocessing.ApacheSpark isa fastandgeneral-purpose cluster computing system and an optimized engine that supports generalexecutiongraphs.
Function Developmentandexecutionof in-memorydataanalyticapplicationsbasedonanownprogrammingparadigm.
High LevelArchitecture
In a standalone cluster deployment, the clustermanager is a Sparkmaster instance.When using Mesos, the Mesos master replaces the Spark master as the clustermanager.
Spark can be used forbatch jobs throughspark-submit, whichcan use local, YARN orMesos resources,among others. Spark-submit can be used toexecute binariesremotely.
Spark-shellisaScalainteractiveconsolethatcanuseasback-endaMesoscluster.Thisway, one can execute data analytic operations and execute them interactively on aremotesystem.
Dependencies AHadoop,YARNorMesoscluster.
Interfaces Itprovideshigh-levelAPIsinJava,Scala,PythonandR.
NeededImprovement
Notdirectly,butthroughtheresourcepoolmanagementandtheorchestrationsystem.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
18
4.4.2 Chronos
Identification AfaulttolerantjobschedulerforMesos
PotentialUsage Submissionofperiodicjobsthatneedtomeetaspecificdeadline.
Type Scheduler
License Apache2.0
Website https://mesos.github.io/chronos/
Purpose Chronos is a suitable scheduler for periodic tasks that need to be triggeredperiodically or when other conditions are met. This is a sound alternative tonever-endingtasks,andfitsverywell thescenariosofperiodicdataextraction-transformation-load(ETL)jobs,Dataqualityanalysis,modelstuning,etc.
Function Chronos provides similar features (but more focused on periodic jobs) asMarathon.ItusesSSLauthentication,canbeusedtodefinegroupedjobs,enablejob retries and collects the historic performance of the executed jobs. It alsosupportslaunchingDockercontainers.
High LevelArchitecture
The high-level architecture is quitesimilartoonefromMarathon. Italsouses a JSON model to describe thejobs.
Jobs can be required to run onpredefined containers, and additionaldata canbe fetchedondemand. Jobsrun for a number of times (eveninfinitelyifrequired)andcanbescheduledtostartonaspecifictimeanddate.
Dependencies Chronos requires Apache Mesos 0.20.0+. Zookeeper is also required for highavailability.
Interfaces ARESTAPI[https://mesos.github.io/chronos/docs/api.html]andaWebUI.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
19
4.4.3 Marathon
Identification AcontainerorchestrationplatformforMesos(Marathon)
PotentialUsage
SchedulertobeexposedtoWP5-4applicationsthatrequirecomputation.ItcanbenefitfromanyresourceupdateinterventionwecouldapplytoMesos.
Type Scheduler
License Apache2.0
Website https://mesosphere.github.io/marathon/
Purpose It interacts with Mesos to submit batch jobs on a set of resources provisioned byMesos. It can submitbasic shell scripts, including theautomatic fetchingof files, andDocker-basedapplications.Marathon isespeciallywellsuitedfor longrunningservicejobs,thatneedhighavailability.
Function Jobs in Marathon are described in JSON documents that include the hardwarerequirements, the job execution code, URIs of dependencies and additionalinformation.Jobscanbeembeddedascontainers.JobsarecreatedandregisteredintheMarathonplatformfromtheirJSONdescriptionorusingtheWebGUI.Jobscanberestartedorstoppedasrequested.OtherinterestingfeaturessupportedbyMarathonare:
- SSLauthenticationoftheendpoints.- Applicationscalingondemand(bothincreasinganddecreasing).- Healthchecks.- Grouping of applications tomanage related applications that depend one on
theother.TheresourceconfigurationofMarathonjobscanbeupdateddynamicallyviaAPIanditenablessubmittingDocker-basedjobs.
High LevelArchitecture
Marathon is implemented into two components: theschedulerandtheexecutors.Theschedulercoordinatestheexecutionsandtheexecutorcontroltasks.
Mesos Master provides Marathon scheduler withresources (1). Then the executorwill launch a task (2)on the mesos slave provided by the master. Theexecutorwillupdatethestatusofthetask(3),fedintotheMarathonschedulerthroughtheMesosMaster(4).
Dependencies MarathondependsonMesos.ItalsocanusetheDockerRegistry.
Interfaces A REST API [https://mesosphere.github.io/marathon/docs/rest-api.html], as well asmultipleclientsavailable(CLI,Java,Python,etc.).
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
20
4.5 ConfigurationandContextualization(andOrchestration)EUBra-BIGSEAwilldevelopacloud-basedplatformfordataanalytics.EUBra-BIGSEAwillproduceasetofservices to dealwith different types of applicationworkloads. One should be able to deploy only thosecomponents that are suitable for theapplication scenario. For example,one candeploya EUBra-BIGSEAinstrumented CMF with COMPSs as programming model for batch QoS jobs. Other configuration mayinvolveonlySparkjobs,includinginteractiveaccess.Therefore,thedeploymentofEUBra-BIGSEAshouldbeflexible,modular,platformagnosticandcapabletodealwithanunpredictedworkload.
Theconfigurationandcontextualizationprocessimplythesettingupofthevirtualresourcestobuilduptheapplicationenvironment.Itwill implytoinstallsoftwaredependencies,toconfigurethem,tosetupcrossinformationof IPs,directories,ports, etc. thatneed tobe sharedamong theapplicationcomponents, tocreate users, copy files, among other tasks. Contextualization deals with the core configuration of aninstance. Finally, orchestration relates to the coordinated deployment and reconfiguration of a set ofservicesthatformthetopologyofanapplication.
Applicationdependencieswillbeembedded in containers.Application-specific code is includedasbinaryexecutablefilescompiledwiththeCOMPSsruntimeorjarfilesfromSparkjobs.ASparkorCOMPSsjobwillexpect the Spark or COMPSs runtime dependencies automatically available in the container.Moreover,application-specificdependenciesmustbeindicatedsomehow.
Thereforetherearetwocomplementaryalternatives:- Createapplication-specificcontainersforeachoneoftheapplicationstoberun.Thiswillrequire:
- Aregistryofthecontainerscreated.- Acollectionofbasiccontainerimages.- Aproceduretobuildupcontainerswithapplicationspecificdependencies.
- Configureon-theflythecontainersfrombasicimages.Thiswillrequire:- Acollectionofbasiccontainerimages.- Configurationrecipesas,forexampleAnsibleGalaxyplaybooks1.- ADevOpstooltodeployandconfigurethecontainers.
Firstapproachwillproduceimmediatedeploymentofcontainers,butitwillrequireuserstobuilduptheirownimages.ThiscanbeautomatedthroughDockerfilesandautomatedbuildservices.Thiswillnotpreventfromadditional configuration steps tobe appliedon topof the container, for example, to configure thenetworkproperly.Secondapproachwillreducethemaintenanceoftheimages,whichcanbereducedtoaminimalsetandwill facilitatethedynamicreconfigurationofapplicationtopologies.Ontheotherside, itwillrequirealongerdeploymenttime.
In both cases, EUBra-BIGSEAwill needanorchestration system that could translate application topologydescriptionsinTOSCAintolocaldescriptionsthatcouldbetranslatedintoCloudManagementFramework-specificactions.
Therequirementsforthisspecificaspectoftheinfrastructureare:
- R17-DeploymentofTOSCAblueprints.- R18-UpdateofTOSCAblueprintstoreconfigurethesystem.- R19-Supportofmultipleplatforms.
1https://galaxy.ansible.com
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
21
4.5.1 InfrastructureManager
The InfrastructureManager (IM) isa tool thatdeployscomplexandcustomizedvirtual infrastructuresonIaaS Cloud by automating the Virtual Machine Image selection, deployment, configuration, softwareinstallationandmonitoringandupdateofvirtualinfrastructuresonmultipleCloudback-ends.
Identification InfrastructureManager(IM)
PotentialUsage DeploymentofTOSCABlueprintswiththewholearchitecture.
Type Aserviceandaclient.
License GPLv.3https://github.com/grycap/im/blob/master/LICENSE
URL http://www.grycap.upv.es/im
Purpose TheIMisaserviceforthewholeorchestrationofvirtualinfrastructuresandapplicationsdeployedonit,includingresourceprovisioning,deployment,configuration,re-configurationandtermination.
Function Theservicemanagesthecompletedeploymentofvirtualinfrastructuresorindividualcomponentswithinthem.Thestatusofavirtualinfrastructurecanbe:
- pending:launched,butstillininitializationstage;- running:createdsuccessfullyandrunning,but
stillintheconfigurationstage;- configured:runningandcontextualized;- unconfigured:runningbutnotcorrectly
contextualized;- stopped:stoppedorsuspended;- off:shutdownorremovedfromthe
infrastructure;- failed:anerrorhappenedduringsubmission;- unknown:unabletoobtainthestatus.
Thefigureintherightshowsastatetransitiondiagramforavirtualinfrastructure.ThefollowingisthesetofrequirementsidentifiedinEUBra-BIGSEA:
- DeployaTOSCA-specifiedvirtualinfrastructure.- Getinformationfromadeployedinfrastructure,
includingstatus,specificationinformation,andcontextualizationlogs.- Reconfigureanexistinginfrastructure,addingorremovingVMsor
containers.- Stopandrestartexistinginfrastructures.- Terminateanexistingdeployedinfrastructure.- InteractionwithindividualVMs/containerswiththeabilityofstopping,
restarting,resizingorqueryingthestatusofindividualVMs/containers.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
22
High-levelarchitecture
Thefollowingfiguredescribesthehigh-levelarchitectureoftheIM,includingexternaldependencies.IMusestheVirtualMachineImage(VMI)Repository&Catalog(VMRC)forsearchingtheVMIs.IMintegratesacloudselectorthatselectsthemostsuitableVMIfortheapplicationdescription.Then,aconfigurationmanagerbasedonAnsibleconfigurestheVMs/containersdeployedbythecloudconnectorandinstallsthenecessarysoftware.ThecloudconnectorprovidestheindependencetothecloudIaaSplatform.
Dependencies TheIMservicerequirestwoadditionalcomponentstowork,IaaScloudresourcesandaVMIrepository.Itsupportsmultipleback-endsandstandards(AmazonEC2,MicrosoftAzure,GoogleCloud,OpenNebula,OpenStack,OCCI,Fogbow,Docker,Kubernetes),whichenablesinteractingwithotherresourcesandback-ends.IMusesVMRCastheVMIrepository,althoughitmaybeadaptedforotherrepositoriesinthecontextofEUBra-BIGSEA.
Interfaces TheIMservicesupportstwoAPIs:- ThenativeoneinXML-RPC.- ARESTinterface.
Italsoincludesacommand-linePythonclient.
Data TheIMwillusethreetypesofinformation- ApplicationdescriptionsfollowingtheOASISTOSCArepresentation.For
detailedinformationonitssyntaxandsemanticsreferto[R1].- Informationaboutthecloudprovidersend-pointsandassociatedmetadata
tobeusedbytheIM.- Informationaboutthedeployedinfrastructures(specifications,IDs,status,
end-points,etc.)inaMySQLdatabaseorinafile.ClientandserverexchangethedatathroughtheparametersoftheAPIcalls.
NeededImprovement
IMalreadysupportsTOSCAblueprints,althoughadditionalfeaturesoftheTOSCAdocumentsneedtobeaddedtofulfiltherequirementsofWP7,suchas:
- QoSspecifications.- Multi-parametricexecution.- SpecialtypesforCOMPSs,OPHIDIAandSparkjobs.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
23
4.6 Reactiveelasticity
Thissectionelaboratesthecapabilitiesofthecloud-servicestoprovisionmoreorlessresources(instancesandmemory) according to the executionworkload. In the context of EUBra-BIGSEA, applicationswill beprovidedwiththeexpectedamountofresourcesneededtofulfillthejobexecutionconstraints.However,initialstaticallocationmaybecorrectedaccordingtotheexecutionprogress.
Thisadjustmentscouldhappenattwolevels:- A the level of Resource Management Framework (CMF), to adjust the infrastructure to the
workload.Multiplejobsmaycompetefortheresources,andeventually,thenumberofcomputingresourcesmaybe insufficient todealwith theworkload submitted.This information is knowna-priori, as each request will have the expected resources required. Moreover, in a fully-elasticenvironment,noresourcesmaybepreallocated,soinmostcases,theCMFwillhavetoreadjusttheinfrastructure.
- At the level of the Applications, to guarantee that the QoS is met. The monitoring system candetect a potential QoS break bymeans of the differentmetrics. Then, the reactive rules of themonitoringsystemshouldrequestadditionalresourcestobeallocatedtoanapplication.ThiswillrequiredealingwithboththeschedulerandtheCMF.
Bothverticalandhorizontalelasticityareconsidered:- HorizontalelasticitywillleadtoachangeinthenumberofresourcesintheCMFcluster.- Verticalelasticitywill leadtoachange in thememoryandCPUallocation limitsofanapplication
runningonthesystem.
Therefore,wedefinethreescenarios:- Thescheduler receivesanew job.Then, thesystemshouldguarantee thatenoughresourcesare
provisionedfortheexecutionofthejob.- Thesystemperiodicallyinspectstheschedulerqueuestodecideifresourcescanbedeallocatedto
reducecosts.- Themonitoring systemdetects the need of increasing the resources of an application. It can be
triggered by a generic metric (CPU, network, memory, etc.) or an application-specific metric.Associatedruleswilltriggertheresourceincreasing.Resourcedecreasingislessimportantasitwillbeeitherdirectlytriggeredbythereactivesystemorunnecessary(e.g.cpuormemoryallocationincontainerizedapplicationsdoesnotimplypreemption).
Therequirementsofthesystemare:- R20-Automaticscalingupofresourceswhennewjobrequestsarise.- R21-Automaticdeallocationofresourceswhenidleforagivenperiod.- R22-Customizationpoliciesforwaitingtime,coolingtimeandsimultaneouspoweron,atleast.- R23 - Reallocation of resources for the execution kernels on the memory size and number of
resources.- R24-Transparentmanagementofmemoryallocationtoenableoverprovisioning.- R25-Automaticreconfigurationofexecutionkernels.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
24
4.6.1 CLUsterEnergySavingS(CLUES)
Identification CLUsterEnergySavingS(CLUES)
PotentialUsage
Elasticity. Back-end component to manage the automatic power-on/off of virtualmachines.
Type Aserviceandaclientapplication.
License GPLv.3
Website www.grycap.upv.es/clues
Purpose CLUES is an energy management system for High Performance Computing (HPC)ClustersandCloudinfrastructures.
Function Themain functionof thesystem is topoweroff internalclusternodeswhentheyarenotbeingused,andconverselytopowerthemonwhentheyareneeded.CLUESsystemintegrateswith theclustermanagementmiddleware, suchasabatch-queuing systemoracloudinfrastructuremanagementsystem,bymeansofdifferentconnectors.
High LevelArchitecture
CLUESalso integrateswith thephysicalinfrastructure by means of differentplug-ins,sothatnodescanbepoweredon/offusing the techniqueswhichbestsuit each particular infrastructure (e.g.usingwake-on-LAN,IntelligentPlatformManagement Interface (IPMI)orPowerDeviceUnits,PDU).
Although there exist some batch-queuing systems that provide energysaving mechanisms, some of the mostpopular choices, such as Torque/PBS,lack this possibility. As far as cloudinfrastructuremanagementmiddlewareis concerned, none of the most usualoptions for scientific environmentsprovide similar features. Theadditional advantageoftheapproachtakenbyCLUESisthatitcanbeintegratedwithvirtuallyanyresourcemanager,whetherornotthemanagerprovidesenergysavingfeatures.
Currently,CLUEScountswithconnectorsforintegrationwithsomeofthemostpopularbatch-queuingsystems(suchasTorque/PBSorSunGridEngine)andwithOpenNebulaand OpenStack which are two of the best known cloud infrastructure managementsystemswithinthescientificcommunity.
Dependencies AnLRMSqueuesystem,suchasTORQUE/PBS,CONDOR,SGEorSLURM.
Interfaces Commandlinetoolandaservice.ItseamlesslyintegrateswiththeLRMSsoitdoesnotrequireanyadditionalactionfromtheuser.
NeededImprovement
SupportofMesosQueues.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
25
4.6.2 ElasticComputeClusterintheCloud(EC3)
Identification ElasticComputeClusterintheCloud(EC3)
PotentialUsage
Elasticity.ComponenttomonitortheMesosqueuetoprovideelasticity.
Type Aclient-sidetoolfortheInfrastructureManager(IM)
License Apache2.0
Website www.grycap.upv.es/ec3
Purpose EC3enablestodeployvirtualelastichybridclustersacrossCloudinfrastructures.
Function Itconsistsofasetofrecipesandacommand-lineinterface(CLI)usedasaclientfortheIMinordertodeployacustomizedfront-endnodeofavirtualclusterthatfeatures:i)aninstanceofanIMtoprovisionforadditionalcomputingresources(workingnodes);ii)CLUES,implementingtheelasticityrulesconsideringthestateoftheLocalResourceManagement System (LRMS) and iii) the specific configuration for the virtual clusterrequiredfortheexecutionoftheapplicationsthatwillberunonthecluster.EC3isalsoofferedasafreeonlineservicetodeployon-demandelasticvirtualclustersonAmazonWebServices,OpenNebulaandOpenStack.
High LevelArchitecture
ThefollowingfiguresummarizesthemainarchitectureofEC3.
Inthefigure,theCLIofEC3isusedtocontacttheIMinordertodeployandconfigurethefront-endnodeofavirtualclusteronaCloudinfrastructure.Thefront-endnodeisconfiguredwithanotherinstanceoftheIMtogetherwithCLUES,whichisinchargeofimplementingtheelasticityrulesbyinterceptingthejobssubmittedtotheLRMS(e.g.SLURM, PBS/Torque, etc.) and deciding when additional worker nodes should bedeployeddependingon theamountandcharacteristicsof the jobsqueuedupat theLRMS. Optional components in the architecture of EC3 that could be of interest for
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
26
EUBRa-BIGSEA are the checkpointmanager (CKPTManager) andBLCR (Berkeley LabCheckpoint Restart), which enable to introduce efficient and cost-effective usage ofspotinstances,availableforAmazonWebServices,tocheckpointthestateofthejobsandautomaticallyrestartthemonapoolofworkernodesdeployedonspotinstances,whichcanbeterminatedanytime.Users connect via SSH to the front-end node and submit jobs as usual. The virtualcluster will deploy additional worker nodes as required, and integrate them on theLRMS without user intervention, in order to cope with increased workload of jobs.Workernodeswillbeterminatedwhentheyarenolongerrequired.
Dependencies EC3dependsontheIMtoperformresourceprovisioningonmultipleCloudbackends(AmazonEC2,MicrosoftAzure,GoogleCloud,OpenNebula,OpenStack,OCCI,Fogbow,etc.).ItalsodependsonCLUEStomanageelasticityofthevirtualcluster.
Interfaces EC3providestwodifferentinterfaces:- Acommand-lineinterfacethatactsasclientfortheIMservice.- Aweb interface that enables anyonewithawebbrowser todeploya virtual
elasticclusteronOpenNebulaandOpenStackon-premisesCloudManagementPlatformsandAmazonWebServices
Data EC3 relies on the templates/recipes, written in IM’s native language, called RADL(Resource&ApplicationDescriptionLanguage)whicharecurrentlyavailableatGitHub.ItproducesasoutputtheIPofthefront-endnodethatcanbeaccessedviaSSH.
NeededImprovement
Advanceindealingwithallocationrequirementsfromtheproactivepolicies. SupportofMesosfurthermorethanthroughChronosandMarathon.
4.6.3 CloudVirtualmachineAutomaticProcurement(CloudVAMP)
Identification CloudVirtualMachineAutomaticMemoryProcurement(CLOUDVAMP)
PotentialUsage
Elasticity.Back-endcomponenttooverprovisionmemorytoMesosagentsthatwillrunFrameworks.
Type Agent
License Apache2.0
Website http://www.grycap.upv.es/cloudvamp
Purpose It enablesoversubscription inmemory for theVirtualMachines runningonaCMFandimplementsmemoryballooningandautomaticlivemigrationofVMs.
Function CloudVAMPimplementsthreemainfunctions:- Dynamic Memory Resize for VMs. Cloud users tend to overestimate the
memory requirements of their applications/services and templates typicallyrepresent upper bounds for application requirements, thus wasting memoryallocated to the VM. CloudVAMP monitors actual memory usage anddynamically resizes the memory allocated to VMs, relying on the memoryballooningsupportprovidedbytheKVMhypervisor.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
27
- EnableOversubscription.ThestolenmemoryfromtheVMs isexposedasfreememory to theCloudManagementPlatform (CMP).TheCMP's scheduler candecidetoallocateadditionalVMsonthephysicalhosts..
- PreventMemory Overload by LiveMigration. Oversubscription may result inmemory overload of the physical host if no preventive measures areconsidered. For that, CloudVAMP prevents memory overload via the livemigration of VMs across the physical nodes, as supported both by KVM andOpenNebula. No downtime is introduced for VMs and memory overload isprevented,thusmaintainingtheLevelofService.
High LevelArchitecture
ThearchitectureofCloudVAMPconsistsofthreecomponents:- The Cloud Vertical Elasticity Manager (CVEM). An agent that analyzes the
amountofmemoryactuallyneededby theVMsanddynamicallyupdates thememoryallocatedtoeachofthem,accordingtoasetofcustomizablerules.
- TheMemory Reporter (MR). An agent that runs in theVMs and reports to amonitoringsystemthefree,usedmemoryandusageoftheswapspace,bytheapplicationsintheVM.ThisinformationmustbeavailableforCVEM.
- TheMemoryOversubscriptionGranter(MOG).AsystemthatinformstheCMPabouttheamountofmemorythatcanbeoversubscribedfromthehosts,tobetakenintoaccountbythescheduleroftheCMP.
ThepreviousfiguredepictsthearchitectureofthesystembasedonaOpenNebula(ONE) implementation. OpenNebula requires a cluster-based installation in whichthemainservicesareinstalledinthefront-endnodewhereastheVMsaredeployedontheinternalworkingnodes,wheretheKVMhypervisorhastobeinstalled.
Dependencies OpenNebulaandKVM
Interfaces Internalservices.Thereisnoneedtointeractwiththesystemoncestarted.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
28
4.7 TechnologyAnalysis
Theabovesectionshaveanalysedasetofexistingtechnologiesandsomeimprovementsontopofthem.Some of the components complement, meanwhile other components compete. This section tries toanalyseprosandconsoftheabovecomponentstobuildupthecloudexecutionservicesoftheQoSIaaS.
4.7.1 RequirementsRelatedtoResourceManagementFrameworks
R# Description DockerSwarm Mesos YARN Myriad
R01TheRMFmustenableaschedulertoallocateresourcestoaspecifickernel.
OnlyDocker-typeresources
AnykindHadoop-based
AnyviaMesos
R02TheRMFmustsupportdifferenttypesofworkloads,fromcontainerstoSparkjobs.
Docker Any SparkAnyvia
MesosandYARN
R03Anapplicationtopologymayinvolveseveralcontainers,whichmustbedeployedinacoordinatedway.
Docker-compose
Transparent Transparent Transparent
R04ThestatusoftheresourcesshouldbeaccessiblebyanAPI.
REST REST REST REST
R05RMFsshouldexposeanAPItochangethenumberofresourcesallocated.
Yes REST REST REST
R06Thesystemshouldmonitortheusageofresourcesandtheapplicationhealth.
RESTAuto/viaSatellite
ApplicationstatusREST
REST
Table2:RequirementsRelatedtoResourceManagementFrameworks(section4.3)
4.7.2 RequirementsRelatedtoSchedulers
R# Description Marathon Chronos Spark-submit/shell
R10 Capabilityofexecutingcontainer-basedjobs. Yes Yes Yes
R11 IntegrationwiththeselectedCMF. Mesos Mesos Mesos/YARN
R12Capabilityofexecutingcomposedjobsinvolvingmultipleconcurrentprocesses.
No YesInnerparallelism(e.g.
map/reduce)
R13 SupportofSparkjobs. No No Yes
R14 Supportofdeadline-QoSperiodicjobs. Yes Yes No
R15 Highavailabilityforlong-runningjobs. Yes Partial Retry
R16 SupportforinteractiveJobs. No No Yes
Table3:RequirementsRelatedtoSchedulers(section4.4)
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
29
4.7.3 RequirementsRelatedtoImageRegistry
R# Description DockerHub
R07Acentrallistofcontainersimagesshouldbeavailable.
Yes
R08 Updatesshouldbeautomated. Automatedbuild
R09Ausershouldbeabletomodifyitsownapplication.
AutomatedbuildofDockerHublinkedtoGitHub
Table4:RequirementsRelatedtoImageRegistry(section4.3)
4.7.4 RequirementsRelatedtoOrchestrationandDeployment
R# Description InfrastructureManager
R17 DeploymentofTOSCAblueprints. Yes,throughTOSCAParser
R18UpdateofTOSCAblueprintstoreconfigurethesystem.
Yes
R19 Supportofmultipleplatforms. ONE,Ostack,Docker,Kubernetes,VMWARE,AWS,Azure,GoogleCloud
Table5:RequirementsRelatedtoOrchestrationandDeployment(section4.5)
4.7.5 RequirementsRelatedtoReactiveElasticity
R# Description CLUES EC3 CLOUDVAMP
R20Automaticscalingupofresourceswhennewjobrequestsarise.
Yes * *
R21Automaticdeallocationofresourceswhenidleforagivenperiod.
Yes Yes *
R22Customizationpoliciesforwaitingtime,coolingtimeandsimultaneouspoweron,atleast.
Yes * *
R23Reallocationofresourcesforthetasksonthememorysizeandnumberofresources.
*OnlyNumberofresources
*
R24Transparentmanagementofmemoryallocationtoenableoverprovisioning.
* * Yes
R25 Automaticreconfigurationofexecutionkernels. * Yes *
Table6:RequirementsRelatedtoReactiveElasticity(section4.6)
*Notapplicableduetothefunctionalityofthecomponent.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
30
4.8 Conclusion
Fromthepreviousanalysisofthetechnology,weconcludeasetofimplementationdecisionsforthecloudexecutionservicesoftheQoSIaaS:- Deployment.
- IMtodealwithTOSCAtopologydocumentsfromEUBra-BIGSEA.- Write TOSCA Documents for environments dealing with Spark, Mesos, OPHIDIA and
COMPSs.- ExtendTOSCAtypestoreflectQoSinformation.
- Execution- Develop a top-level endpoint as a REST service thatwill interactwith the different scheduler
mechanisms,dependingonthetypeofjobs,andprovidingtheQoScapabilities.- Deadline-basedbatchjobsimplementedthroughcontainer-basedjobsexecutedthrough
Chronos. The Chronos Job Management REST API[https://mesos.github.io/chronos/docs/api.html] enables creating, updating, listing,refreshinganddeletingchronos jobsandgroupsof jobs.Chronoswillbeextendedwiththepossibilityofprogramming jobtimeaccordingtothedeadlineandtheestimationoftheexecutiontimeforaspecificrequest.
- Long-timerunningjobsthatdealwithcontinuousdataretrieval.Thiswillbeimplementedthrough marathon jobs. The Marathon REST API[https://mesosphere.github.io/marathon/docs/rest-api.html] enables creating, updating,listinganddeletingMarathonjobs.Thiswillbeusedasisforlong-runningjobs.
- Interactive jobs . Mesos will allocate the resources required to submit the applicationDAG. Mesos scheduler API [http://mesos.apache.org/documentation/latest/scheduler-http-api/] could be used to allocate resources and Mesos executor API[http://mesos.apache.org/documentation/latest/executor-http-api/] to access theresources.
- Inanycase,bathjobscanbecodedeitherinSparkorCOMPSs.SubmissionofSpark-basedjobstoYARNwillbedonethroughMyriad.MesoswillallocatetheresourcesrequiredtosubmittheapplicationDAG.SparknativelysupportsMesosandCOMPSscouldevolvefromDockerSwarmtoMesosclusters.
- Elasticity- Combine EC3 and Mesos to provide automatically scalable job submission through Chronos,
MarathonandSpark.EC3willaddmorenodestoaMesosclusterifthenumberofresourcesisscarce
- InstrumentMarathon,ChronosandSparktoreallocatemoreresources(memoryandnodes) ifrequired(accordingtotheexecutionestimation).
- Directlymodify the resourceallocationofCPUandmemory toanactive taskbyupdating theconfiguration in Chronos, Marathon and Spark, by means of a direct interaction from thenotificationsofthemonitoringsystemandactingontheMesosCMF.
- Scale up the number of instances in Marathon, by means of a direct interaction from thenotificationsofthemonitoringsystem.
- Use CLOUDVAMP under the scenes to dealwith themanagement of VMs at the level of theCloudManagementFramework.Thiswillbetransparenttotheapplicationdevelopers,butitwillexposemorememorythanavailable.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
31
Figure6showsadetailedinteractionforthispart.
Figure6:DetailedinteractionamongcomponentsatQoScloudservicesLayer.
4.8.1 ProposedAPI
Inorder to interactwith the single submissionend-point,wewilldefineamacro-APIandacommon jobdescriptionthatcouldbemappedtoMarathon,Chronos,Sparkorsimilarschedulers.TheAPIwillbeaRESTAPIthatwillinteractwiththetwomainentities:theResourceManagementFrameworkandtheScheduler.The rest of the serviceswill use this API to submit jobs, retrieve their status, kill jobs, get the status ofresourcesandreallocatethem.
Figure6showsaJSONstructureproposaltodefinejobs.Thesamestructurecouldbeusedtoretrievetheinformation-extendedwithadditionalfieldssuchastheactualmemoryusage,theactualCPUusage,theapplication id and the values of the parameters we have left the scheduler to choose, such as portallocation.Itcanbeusedtoupdateconfigurationswhenreasonable.{ "type": "CMD", "name": "my_job_name", "deadline": "2016-06-10T17:22:00Z+2", "periodic": "R24P60M", "expectedduration": "10M" "container" : [ "type": "DOCKER", "image": "eubrabigsea/ubuntu", "forcePullImage": true "volumes": [ {
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
32
"containerPath": "/var/log/", "hostPath": "/logs/", "mode": "RW" }], "portMappings": [ { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }]], "environmentVariables": [ { "name": "value" } ], "cpu" : "1.5", "mem" : "512M", "disk" : "1G", "command" : "python -m SimpleHTTPServer 8000"}
Figure7:JSONjobdescription
ThemeaningofthefieldsoftheJSONstructureisthefollowing:- "type".Itdefinesthetypeofjob,inordertodistinguishthescheduler.Possiblevaluesare"spark",
"COMPSs", “CMD”. Spark jobswill be executed through spark-submit, COMPSs jobswill add theCOMPSslaunchertothe“command”line,and“CMD”willsimplyrunthe“command”.Mandatory.
- "name”.Anamegiventoidentifythejob.Itmustbeunique.Mandatory.- "deadline".Adateandtimeexpressionwhenthejobshouldhavebeenfinished.Theexpressionis
YEAR-MONTH-DAY,HOUR:MINUTE:SECONDTimezone(ZisUTC).Optional.- "periodic". It indicates if the job has to be repeated or not. Rn indicates ‘n’ repetitions and ‘R’
repeatforever.PXXindicatestheperiodicity(XXMwillbeminutes,XXHwillbehours).Optional(ifitisnotpresent,itwillrunonlyonce).
- "expectedduration",describestheexpecteddurationofajob.Optional.- "container", describes the configuration features for a container-based job.Mandatory for non-
Spark jobs. The fields are: "type”, describes the driver; "image", indicates the container image";"volumes": indicate thevolumemappingof thecontainer;“portMappings”, list theportmappingbetween the container and the host. No values indicate either no mappings or the automaticselectionofports.
- "environmentVariables",definespairsofenvironmentvariablenameandvalue.Optional.- "cpu", defines the allocation of CPUs (could be a float number) estimated for the job. The
framework should allocate enough resourcesor queue the job. Similarmeaning for thememoryanddisk.Units inKBytes(K),MBytes(M)andGBytes(G).This informationshouldbeprovidedbytheproactivepoliciessystem.Mandatory.
- "command".Itdefinesthecommandtobeexecuted.OptionalforCMDjobs(canbeembeddedinthecontainer),incompatiblewithSparkjobs.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
33
TheproposedAPIwillfollowthenextstructure:
Resourcepath Description
/v1/scheduler POSTmethod,submitsajobrequestinJSON.
/v1/scheduler/jobs GETmethod, listsall the jobs in thescheduler (a JSONwithall thejob).ItcanacceptaJSONwithadeadlinethatdefinesthedatelimit.
/v1/scheduler/job/name GETmethod, getsall the information froma specific job (;DELETEmethod,killsthespecificjob;POSTmethod,reallocatesresources.
/v1/resource/slaves GET method, lists all the resources registered in the master orclusterofmasters.
/v1/resource/slave GETmethod,providesthestatusinformationofaspecificslave.
/v1/resource/slave/resource/up POSTmethod,bootsupanewresource.
/v1/resource/slave/resource/down POSTmethod,powersdownaspecificresource.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
34
5 MONITORINGSERVICE
5.1 CloudcomputingmonitoringMonitoring in cloud environment differs over traditionalmodel,where you have a dynamicworkload incontrastofastaticsetofassets.Theoldtraditional infrastructureexpectedthatnewserverswererarelyinstalled and old ones decommissioned sometimes. The whole infrastructure was meant to be"permanent".Cloudcomputingincontrastisadynamicandfluidenvironment.Serversarecreatedanddestroyedallthetime, adjusting the increased traffic on the service it runs or reducing the amount of resources to savemoney.A cloud monitoring system should reflect these dynamic aspect while providing essentials informationswhich helps providers to properly plan their infrastructure capacity and resourcesmanagement; ensureSLAsandoverallQoS,offeredthroughandbythecloud;identifybottlenecks,failuresorproblems;andbillresourcesusage.
5.2 Requirementsformonitoringservice
Accordingto[R09],manymonitoringsystems(MS)havenotaddressedtherapidlychanginganddynamicinfrastructure seen in service clouds. Authors then have determined main requirements for monitoringcloudscenarios:
Scalability - itshould beabletohandle largenumberofagents,probesanddataflow;collect,transferandanalyzesuchvolumeofdatawithoutimpairingservicesperformance.
Elasticity - in order to handle drastic changes ofmonitored environments,MS should support upsizinganddownsizingofmonitoredresources(e.gagents,probes,etc.).
Migration - virtual resources may be moved from one physical host to another. All previous dataassociatedtothesevirtualresourcesmustbeindependentofthephysicalhost.
Adaptability - it should be minimally invasive and adapt itself to various computational and networkloads.
Autonomicity-itisdesirablethemonitoringsystemkeepsrunningwithminimalhumaninterventionandreconfiguration.
Federation-itshouldmonitorvirtualresourcesresidingindifferentdomainsoffederatedcloud.
Anotherlistofrequirementsrelatedtomonitoringsystemscanbefoundin[11]:
Timeliness - events andmeasurementsdata shouldbeavailableon time for their intendeduse [10].Aconsumer may require updated data from a producer in order to execute an action but theinterdependencyofTimelinesswithotherpropertiesof themonitoringsystem,suchasElasticity,AutonomicityandAdaptability,implieschallengesortrade-offsbetweenopposingrequirements.
Comprehensiveness - it should supports different levels of abstraction. Such levels includemonitoringphysicalandvirtualresourcesandserviceapplications.Ineachlevel,differentmetrics,kindofdataand probes are available. It is desirable to support isolation of environments, by segmentedauthentication,authorizationandresources,asamulti-tenantsystem.
Resilience-somefaultycomponents(non-critical)maynotdisturbthemonitoringsystem.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
35
High availability - it should employ a high availability strategy, such as data replication/redundancy,workloadbalancingandsharding.
Accuracy -measuresprovidedbymonitoringsystemareaccuratewhentheyareascloseaspossibletothe real value measured and they satisfy what the application considers accurate (applicationdependent).
Extensibility - newmetrics and probesmay be added to themonitoring system, including those onesdefinedintheapplicationlevel.
Besidetheserequirements,weidentify:
Usability -configurationandusagemaybeassimpleaspossible,butusersshouldbeabletoconfigureadvancedoptions.
Integration -newcomputingallocatedresourcesmustcontainmonitoringsystemagentsandprobes intheir installation image.MetricsmustbeexposedbyanAPI (applicationprograminterface)tobeconsumedbyothercomponentsofEUBra-BIGSEAinfrastructure,speciallythoseofWP3.
Security-sensitivedatamustbeprotectedandrequireauthorization.
Alerttriggering-ifapre-configuredconditionissatisfiedinmonitoringsystem,actionsmaybeexecutedin the monitored resource or system wide. Actions include execution of programs, sendingmessagestooperators,dynamicallyadjustmeasurement,etc.
Singlesignon -authentication inanycomponentofmonitoringsystemmustbedoneusingacommonuserdatabase.
5.3 ProposedarchitectureInordertoaccomplishthoserequirementsformonitoringsystemshouldperformfourmainjobs:
Alarms processing - watch and check resources; send alerts via handles; perform actions whenthresholdsareexceededorwhenservicesunavailable.
Measurements collection - self-explanatory, gather a steady streamofnumbers and/or signals, storethem,showthem,watchthresholds,etc.
Logscollection,storage,analysis -extract informationfromlogtext,monitor log levels,search logbytext.
Userinterface-eventsdisplay,measurementsandlogvisualization,configurableinterface.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
36
Figure8:High-levelofmonitoringsystemconcepts
MonascaMonitoringSystem[12] identifiesasetofarchitecturalcomponentsthatareinaccordancewithourviewofthearchitecture.Somecomponentsmaynotbeincludedinfirstversionofmonitoringsystem,buttheyarelistedhereasreference:
- Monitoring agent - runs on monitored resource, providing probes results to monitoring system;perform actions coordinated by monitoring system and triggered by some configured criteria.Measurementsincludessystemmetrics(e.gCPU,disk,memoryusage,etc.),Nagiosplugins,statsdandmanychecksforservices(e.gDBMS,WebServers,customapplications,etc.).
- Monitoring API: An API, preferably RESTful, focused on get, set and querying metrics, providingstatistics aboutmetrics, alarmmanagementandnotifications. ThisAPIwill beusedby someEUBra-BIGSEAcomponents,speciallythosedealingwithWP3dynamicandstaticresourceallocation.
- Messagequeue:A component that receivespublishedmetrics frommonitoringAPIandalarmstatemessages from threshold engine. Message queue decouples system components and defines acommoncommunicationbusbetweendifferentsystemcomponents.
- Persister services: Consumes metrics and alarms from message queue and store them in its owndatabase.
- Metricsandalarmsdatabase:Acomponent, ingeneral,aDBMS,whichprimarilystoresmetricsandalarmshistory.SomeNoSQLsolutionsprovidedifferentparadigms inrespectofhowdata is logicallyorganized.Forinstance,CassandraNoSQLdatabaseallowsmodelingdatainacolumn-orientedformat,organised in a such way that it is very efficient to retrieve data, specially those represented astemporalseries.Forunstructureddata,suchaslog(syslog,applog),ElasticSearchisusedtoindextext.
- Transformandaggregationengine:Transformsmetricnamesandvalues,suchasdeltaortime-basedderivativecalculations,andcreatesnewmetricsthatarepublishedtothemessagequeue(optional).
- Anomalyandpredictionengine:Evaluatespredictionandanomaliesandgeneratespredictedmetricsaswellasanomalylikelihoodandanomalyscores.IncaseofMonasca,itisinaprototypestatus.ButtoEUBra-BIGSEAitisaresearchtrack.
- Thresholdengine:Computesthresholdsonmetricsandpublishesalarmstothemessagequeuewhenexceeded.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
37
- Notification engine: Consumes alarm state transitionmessages from themessagequeue and sendsnotifications,suchasemailsorremoteexternalAPIevocationforalarms.
- Analytics engine: Consumes alarm state transitions andmetrics from themessage queue and doesanomalydetectionandalarmclustering/correlation(optional).
- Configuration database: A component that stores the configuration and other information in thesystem.
- Dashboard monitoring interface: An application (preferably web-based) that includes configureddashboardsandreports. Itallowsusersaccessmetricsvalues,alarmstatusandresourcesallocation,showingdataascharts,tablesandanyotherkindofvisualisationapplicable.
OthercomponentsareMonascaspecificandmaybeincludeinfurtherarchitectureevolution.
Thefollowingimageshowshowmonitoringsystemcomponentsinteracteachother:
Figure9:Monitoringsystemarchitectureandcomponents
5.3.1 Minimumsupportedinfrastructuremetrics
Monitored resource may be running in different hardware abstractions: bare metal, virtual machine oroperationalsystemcontainerand,here,hostreferstoanykindoftheseabstractions.Independentlyoftheabove,monitoringsystemshouldbeabletointeractwithitsagents.Therewillbeaminimumsetofmetricsandtheirvariationsorderivationssupportedbymonitoringsystem:
- CPU - Discover if a host’s CPU is being heavily utilized by the kernel, app running code, or otherprocessesonthesamehost.Itcanbesegmentedinsystemandusertime.
- Loadaverage -Monitor loadaverage(inpredefined intervals, forexample,1-,5-,or15-minute) toknowtheaveragesystemloadoveraperiodoftime.AsCPU,canbedividedinuserandsystemtime.
- Memory-Knowwhenanapplicationisconsumingtoomuchmemory,andhowmuchof it isbeingconsumedbytheoperatingsystem.Virtualmemoryisalsoconsidered.
- File system -Monitor file systems, detectingout of spaceor available i-nodes, bytesor anyotherdimension. Also may detect unavailability or mounting state changes (for instance, read-onlymountingafterfailure).
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
38
- Disk - Measure input/output operations per second (IOPS) and throughput, in order to identifybottleneckswithgreaterprecision.
- Network - Detect network errors, collisions, overruns, and dropped packets, in order to diagnosenetworkinterfaceproblems.Maybeusedtodetectofflineresources(pingliketests).
- Uptime-Howlongresourcehasbeenrunning.
5.3.2 Supportedapplicationmetricsandlogging
Each application should be able to define its own metrics, related to security, performance, capacity,uptime,throughput,servicelevelagreements(SLAs),usermetricsoranyotherinternalvalue.Byusingthemonitoring system proposed API, applications can create, remove, update and query metrics,measurementsandalerts.MetricsandtimeseriesexclusivelyusedbytheapplicationitselforunrelatedtoWP3mustnotbestoredinmonitoringsysteminfrastructure.
Somecommonandgenericmetricsaresupportedbymonitoringsystematapplicationlevel:
- Pingcheck.Monitorsservercommunicationandresponsetimes.- HTTPcheck.Getspecificcontenttoverifythatuserscanseeandinteractwithapplications.- HTTPScheck.MonitorsthevalidityoftheSecureSocketLayer(SSL)certificate- SSH check. Confirms that the target server’s secure shell (SSH) protocol is running and accepting
requests,andmeasurestheresponsetime.- TCPPortcheck.Confirmsthatthetargetserver’sTCPportislistening.
TheResourceManagementFrameworkwillalsoprovideinformationaboutthestatusoftheresources.ForMesos,SatellitefromTwoSigmaprovidesacompletesetofmetricstomonitorthestatusoftheslaves2.
Monitoring systemwill support basicmanagement of application logs. Applications can submit logs to acentralmonitoringsystemcomponent, responsible forprocessing,extractingand indexingof logrecords.Eachlogrecordmusthaveasetoffieldsthatincludesdate/hour,severity,nameoftheapplicationandthemessageofthelog.
5.4 TechnologyevaluationThis section analyses solutions for the monitoring of cloud infrastructures.We first detail tools for thecollection,storageandretrievalofmonitoringmetricsand,then,discussapproachesformanaginglogs.
5.4.1 Metriccollection,storageandretrieval
5.4.1.1 Zabbix
Identification Zabbix
PotentialUsage MonitoringtheresourceconsumptionofVMs
Type MonitoringSystem
License GNUGeneralPublicLicense(GPL)version2
Website http://www.zabbix.com
Purpose Monitoringsystem
2https://github.com/twosigma/satellite/blob/master/satellite-slave/resources/metrics_snapshot.json
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
39
Function Zabbixisanagent-basedmonitoringsystemthatcancollect,storeandmanageawiderangeofsystemmetricsrelatedtohosts,Virtualmachines,applicationsandservices.Zabbix consumes more resources than other environments, but provides a morecompleteUIandasimplerwaytocodeprobesandmonitoringKPIs.
High LevelArchitecture
Dependencies MySQL,PHP,ApacheHTTPServer.Agentsmustbeinstalledinmonitoredresourceinordertocollectmetricsandtriggers.
Interfaces WelldocumentedAPIwithoperationsthatallowcontrolallaspectsofthesoftware.Richuserinterfacewithsupporttocustomgraphs,triggers,screens,actions,etc.
Data Configuration,metricsandtriggerhistoryarestoredinaMySQL.
NeededImprovement
Zabbixhasauto-registrationfeaturesincev2.0.Zabbixagentscanregisterthemselveswithout manual intervention. Although, auto-registration is not well documented.ResourcesremovalisnotautomaticanditwouldneedinteractingwiththeAPI.
5.4.2 InfluxDB
InfluxData is thealerting, visualizationandbackend forbuildingcustommonitoring solutions for servers,sensors,storageappliances,networkinfrastructure,apps,logsandmore.
Identification InfluxDB
PotentialUsage Databasefordatafrommonitoringmetrics.
Type Time-seriesdatabase
License MITLicense
Website https://influxdata.com
Purpose Storemeasurementsfromvariousprobes
Function InfluxDBismeanttobeusedasabackingstoreforanyusecaseinvolvinglargeamounts of timestamped data, including DevOps monitoring, application
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
40
metrics,IoTsensordata,andreal-timeanalytics.ItalsoincludespluginsforotherdataingestionprotocolssuchasGraphite,collectd,andOpenTSDB.
High LevelArchitecture
InfluxDBisastandalonedatabasewhichreadandwritesdataintoit.
There are many feasible high availability architectures. An example above,illustrateaproxyR/W(forexamplenginx)tobothInfluxDBinstances.
Dependencies none
Interfaces influxCommandlineinterfacewithexpressiveSQL-likequery languagetailoredtoeasilyqueryaggregateddata.TheCLIcommunicateswithInfluxDBdirectlybymakingrequeststotheInfluxDBHTTPAPIoverport8086bydefault.BuiltinwebadmininterfaceandaHTTPAPI.
NeededImprovement
Customization
5.4.3 Sensu
Sensu is an open-source monitoring framework that allows organizations to compose comprehensivemonitoring & telemetry solutions thatmeet their unique requirements. Provides a platform focuses onwhattomonitorandmeasure,ratherthanhow.
Identification SensuCore
PotentialUsage Monitoringandtelemetrysystem
Type Infrastructureandapplicationmonitoringandtelemetrysolution
License MITLicense
Website http://sensuapp.org/
Purpose Controlandintegratemonitoringresources
Function Integrates monitoring infrastructure, service & application health, and business
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
41
KPIs.ItsupportsdeploymentviaPuppet,ChefandAnsible.Extensibleframework(includingamessagebus,eventprocessor,monitoringagent,anddocumentedAPIs),andmanyavailableplugins.Integrateswiththetoolsandservices(e.gemails,PagerDutyalerts,Slack,HipChat,IRCnotifications,etc.).Service checks provide status and telemetry data, and event handlers processresults. Hundreds of plugins are available for monitoring the tools and servicesalready inuse.Pluginshaveavery simple specification,andcanbewritten inanyprogramminglanguage.
High LevelArchitecture
Clients executes servicechecks, plugins/scriptsfeeding broker(RabbitMQ) with data,which will be consumedby server/API andprocessed by handlers,mutators,etc.,performingspecific actions, such assending data to a timeseriesdatabaseorsendinganalerte-mail.Events and check statusare stored and query onRedis.
Dependencies Redis,RabbitMQ,Ruby(embedded)
Interfaces RESTfulHTTPAPI;SimpleUXdashboard
NeededImprovement
Dashboard interface is quite simple and lack features like, auth, roles, customsettings.
5.4.4 OpenStackMonasca
Identification OpenStackMonasca
PotentialUsage Completemonitoringsystem
Type Monitoring-as-a-servicesolution
License ApacheLicense,Version2.0
Website https://wiki.openstack.org/wiki/Monasca
Purpose Controlandintegratemonitoringresources
Function Fullsolutionformonitoringinfrastructure,service&applicationhealth,andbusinessKPIs.IntegrateswithOpenStackcomponentsfororchestrationandaccesscontrolandscoping.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
42
High LevelArchitecture
Dependencies MySQL, Apache Kafka, InfluxDb, Apache Zookeeper, Apache Storm, OpenStackKeystone,OpenStackHorizonDashboard,Graphana
Interfaces Documented API and user interface integrated with OpenStack Horizon DashboardandGraphana.
NeededImprovement
Currentversiondoesnotsupportcollectingandforwardinglogdata.Thereareplanstoimplementitinfuture.Userinterfaceisinearlystagesandmaychangesoon.APIisnotstable,butmainoperationsareimplemented.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
43
5.4.5 Logscollectionandanalysis
5.4.5.1 rsyslog
Identification RSYSLOGistherocket-fastsystemforlogprocessing.
PotentialUsage Logretrievalandmanagement
Type Systemlogging
License GNUGeneralPublicLicensev3
Website http://www.rsyslog.com/
Purpose Centralizedlogsmanagementandstorageforfurtherprocessingandanalyses
Function Multi-threadingTCP, SSL, TLS, RELPMySQL, PostgreSQL, Oracle and moreFilter any part of syslog messageFully configurable output formatContent-basedfiltering
High LevelArchitecture
Rsyslog can act in various modes. It ispossible towrite logsmessages into thelocal hard drive and/or forward logsmessages over network to a central logserver. On central log server, messagescouldbe filteredbycontentandwritteninto different local harddrives/databases.
LogmessagesaresendviaTCPorUDPtocentral server which will process andcould do things like splitting intobranches.
Interfaces SyslogprotocoloverTCPorUDP
NeededImprovement
Customizationonloggingfiltering
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
44
5.4.6 ELKStack
Identification ElasticSearch,LogstashandKibanastack
PotentialUsage Loganalyser.
Type Real-timedataanalyticstool
License Apache2OpenSourceLicense
Website https://www.elastic.co
Purpose ParsedatawithLogstashdirectlyintotheElasticSearch;ElasticSearchwillthenhandlethedata;Kibanawillvisualizethedata.
Function ElasticSearch- Distributed,scalable,andhighlyavailable - Real-timesearchandanalyticscapabilities - Document-Oriented & Full text search functionality, with powerful query
options- BuildontopofApacheLucene
Logstash- Receiveandprocesslogdataoranyothertime-baseddata- Filteroptionsusedtotransforminputdata,Pluginsforcustomdatasources- Centralizedataprocessingofalltypes- Normalizevaryingschemaandformats
Kibana- PresentthedatastoredfromLogstashintoElasticSearch- Customizableinterfacewithhistogramandotherpanels- Flexibleanalyticsandvisualizationplatform- Real-timesummaryandchartingofstreamingdata- Instantsharingandembeddingofdashboards
High LevelArchitecture
Interfaces RESTfulAPI;WebUX;CLI
NeededImprov. Customization
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
45
5.5 SolutionevaluationThis section analyses the different complete architectures and components for the monitoring andproposesasolutiontobeusedinEUBra-BIGSEA.
5.5.1.1 Sensu/InfluxDB/ELK
Figure10:Sensu/InfluxDB/ELKsoftwarearchitecture
Sensu/ELKsolutioniswellmature,allcomponentshasmanyfeaturesandperformwhat isproposedverywell.InthissolutionSensuisthecore,responsibleformanagingandintegratingallcomponents.
Configuration is done through JSON files, defined in sensu-server and sensu-client. Checks results andmeasurements are collected via scripts running onmonitored resources. All data are sent to amessagequeueandconsumedbysensu-server,thatwillanalyze,exhibitinformationandperformsomeaction(sendemail,proxyinformation,etc).
Measurementsare,particularly,senttoInfluxDB,thatwillstoreandmadepossiblequeryingandretrieving.MeasurementdataisexhibitbyGrafana.
IndependentfromSensu/InfluxDBthelogpartisexecutedbyacombinationbetweenrsyslog,ElasticSearch,Logstash and Kibana. Rsyslog will sent log application to Logstash server, that will parse and extractmeaningful informationfromlogtextandstore intoElasticSearchforfurtheranalyses.KibanawillexhibitthevariouskindofanalysesthatwecandonewithElasticSearch.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
46
5.5.1.2 OpenstackMonasca
Figure11:MONASCAsoftwarearchitecture
Monasca is a fullmonitoring solution that bind a few softwares tomake itwork. It uses a RESTAPI forinteracting and receiving checks results andmeasurementsdata.All data areput inmessagequeueandconsumed by engines (alarm engine, notification engine, etc.). Processed data is stored in database forfurtheranalyses,visualizationandactionsexecution.MonascaisdevelopedinPythonandJavawhichusesApache Kafka as message queue, Apache Storm for real time data computation and Zookeeper toorchestratecomponents.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
47
5.6 Comparativeanalysis
Requirement Monasca Sensu/InfluxDB/ELK Zabbix
Scalability Yes Yes Yes3
Elasticity Yes,butexclusionofmonitoredresourcecantriggerincorrectalarms
Yes,butexclusionofmonitoredresourcecantriggerincorrectalarms
Partial.Supportsautodiscoveringofhosts,butitdependsoncorrectconfiguration.Exclusionofmonitoredresourcecantriggerincorrectalarms.
Migration Yes Yes Yes,ifresourcekeepssameIPorDNSname.
Adaptability4 Notsupported Notsupported Static.Youcanconfiguremonitoreditemswithdifferentupdateintervals.
Autonomicity Yes Yes Yes
Federation Yes Yes Yes
Timeliness5 Yes Yes Yes(min.1secondforupdateinterval)
Comprehensiveness Yes Yes Yes
Resilience Yes,butdependsonbuildingblocksresiliencearchitecture
Yes,butdependsonbuildingblocksresiliencearchitecture
Yes,exceptforserveranddatabase.
Highavailability Yes,butdependsonbuildingblocksHA
Yes,butdependsonbuildingblocksHA
Yes,ascalabilityconfigurationalsoresultsinhighavailability.
Accuracy Yes Yes Yes
Extensibility Yes(opensource,documentedAPI)
Yes(opensource,documentedAPI)
Yes(opensource,documentedAPI)
3Zabbixcanscalebyscalingunderlyingcomponents(MySQL,OS,HTTPServer).Agooddiscussionispresentedinhttp://blog.zabbix.com/scalable-zabbix-lessons-on-hitting-9400-nvps/2615/.4Adaptabilityofagentsrunningonhostsfacingheavyload5Undernormalworkload.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
48
Usability IntegrateswithotherOpenStackproducts(Horizon)
Differentproductsinterfacescanconfuseusers
Goodusability,althoughitusesanoldinterfacedesign.
Integration Great,supportsNagiospluginsandmanyothers.
Great,supportsNagiospluginsandmanyothers.
Great,supportsNagiospluginsandmanyothers.
Security Userauthenticationandauthorisationformetricaccess.
Onlyuserauthentication
Yes,communicationchannel(version>=3.0),butnotdatabase.
Alerttriggering Yes Yes Yes
Singlesignon IntegrateswithOpenStackKeystone.
Notsupportednatively.CanbeimplementedbyHTTPServer.
Basic.SupportsHTTPBasicAuth(integratedtoApacheHTTP)andLDAP.
Table7:RequirementsRelatedtoMonitoring
Allmetricsandmeasurementsshouldbekept inresilientdatabase. Insteadofdefiningastaticperiodsoftimefordataretentionswouldbeabetterapproachtoestablishadiskusagethreshold,whichpromotesamoremeaningfulusageofavailablestorage.Afterthatlimit,olddatashouldbeaggregatedinwidespanoftime.
5.6.1 ProposedAPI
Monitoring systemrequiresaprogramming interface (API) for interactingwithandmanagingmonitoringrelatedresources.SuchAPIshouldbeableto:
- includeandremovemonitoredresources(serversandservices)- activeandinactivemonitoredresources- assignmonitoredresourcestoasetofmetricsoralarms- includeandremovemetricsprobesandalarmschecks- provideanentrypointtoquerycollecteddata- collectmeasurementdata- getalarmscheckstatus- bindactions(alarms,handlerslikeemail,sms,etc)toalarmsormetricsbasedonconditionals
Proposed API is RESTful JSON and may be used by implementing clients in different programminglanguages. Clients will use HTTP or HTTPS protocols and all requests must be authenticated by using aspecialHTTPheader.CompletedocumentationoftheAPI,includingalltechnicaldetails,isoutofscopeofthisdocument.
Resourcepath Description
/v1/metrics Entrypointtoquery,create,editandexcludemetrics.
/v1/metrics/measurements Allow to store measurements values with a timestamp for a specificmetricorasetofmetricsandtoqueryoriginaloraggregatedvaluesformeasurements.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
49
/v1/metrics/statistics Entry point to calculate and retrieve statistics aboutmetrics (average,minimumandmaximumvalues,sumandcount)foraperiodoftime.
/v1/alarms Entrypointtoquery,create,editandexcludealarms.
/v1/hosts Operationsonmonitoredhosts.Returnlistofmonitoredhosts.
/v1/groups Allow adding monitored resources as group members. Add metricsand/oreventstogroupmembers
5.7 ConclusionSensu and Zabbix solution are mature and working solutions, but choosing Monasca offers a greatopportunitytocontributetoaworld-wideprojectoncloudplatformsandestablishacloudstandardfortheEUBra-BIGSEAproject,asitwillbecomeapartofOpenStack.TheOpenStackecosystemisrichofmemberfromdiverseareas,academyand industry,whichcouldprovideusgoodpracticesandstateof theart inmonitoringsolutions.TheshorttermroadmapoftheMonascaprojectalsoalignswellwithEUBra-BIGSEAas itconsiderstheusageoftheELKframeworktomanagelogs.Monascaalsoprovidesstandardpatternsforcreatingmetricsandextendthewholemonitoringsystem.Support forhigher levelsofabstractions isdonethroughtheusageofStatsdorbydirectlypostingtoitsmetricsAPIs.MonascaWebhooknotificationscanalsobeusedtotriggeractionsinthereactiveelasticiymodules.
Monasca integrates with Openstack Keystone, a federated authentication and authorization solution,offeringthepossibilityoflimitingthevisibilityofthemetricsbyscopingthemtoaspecificproject.Monascaalso provides all the essentials requirements as measurements collection, high availability, extensibility,alarmstriggeringandarichinterfaceforcreating,configuringandvisualizealarms.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
50
6 TECHNICALPROCEDURESThe project will not impose themigration of the components which already are available on a specificrepository to a central one. However, the use of a common repository for the new developments isencouraged.
Currently,thecomponentsidentifiedandtheiravailabilityare:
- Mesos,https://github.com/apache/mesos- YARN,https://github.com/apache/hadoop- MYRIAD,https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=summary- Marathon,https://github.com/mesosphere/marathon- Chronos,https://github.com/mesos/chronos- Spark,https://github.com/apache/spark- InfrastructureManager,https://github.com/grycap/im- TOSCAParser,https://github.com/openstack/tosca-parser.- CLUES,https://github.com/grycap/clues- CloudVAMP,https://github.com/grycap/cloudvamp- EC3,https://github.com/grycap/ec3- Spark,http://spark.apache.org/- COMPSs,https://www.bsc.es/computer-sciences/grid-computing/comp-superscalar
Theprojectwillsetupasetofpublicrepositories:
- AnEUBra-BIGSEAGitHubRepositoryhasbeencreatedincludingforksfromtheexistingrepositories(https://github.com/eubra-bigsea)
- An EUBra-BIGSEA Docker Hub organisation can be created to store the corresponding basiccontainer images, and any other image derived from the application analysis(https://hub.docker.com/r/eubrabigsea/).
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
51
7 CONCLUSIONSThis document has gone through a thorough review of the components that will form the QoS cloudservicesarchitecturefortheBigDataanalyticsplatformdevelopedinEUBra-BIGSEA.Themainfocusofthedeliverable was initially to focus on themonitoring architecture, but it has been extended to draw thewholelayerofservicesfortheexecutionandmonitoringofcloudservices.
The document proposes the use ofMesos as the basic foundations for themanagement of distributedresources,asitisawidelyusedcomponenttoprovideisolation,toavoidfragmentationofdatacentresandto achieve high-availability and reliability. Mesos will be enhanced with the capability of automaticallyincreasingresourcestoprovidearealadaptationofworkloadtopoweronresources,whilebeingagnosticto the upper layers. The project identifies three types of workloads, persistent, periodic batch andinteractive jobs,whichwillbeservedbydifferentschedulers.Persistent jobswillbeservedbyMarathon,periodicjobsbymeansofChronos,andinteractivejobsthroughinteractiveshells(e.g.sparkshells).Thoseschedulerswill deploy frameworks thatwill embed the executable services and negotiate the resourceswithMesos.ThoserequestswillbeinterceptedtoguaranteetheavailabilityofresourcesinMesos(e.g.byaddingmoreresourcestothesystem).
Therefore,atthelevelofthecloudservices,theprojectwillcapturetheframeworkrequestsandinteractwithMesos,aswellasdealwiththeprovisionandmanagementofbaseVMstopursueanefficientuseofresources.
QoS cloud services will deal with the reactive elasticity of the Mesos cluster and the physical (virtual)resources that Mesos base on. Frameworks could request additional resources in case of applicationstarvationanddependingontheprofileof theapplication.Then, themonitoringsystemwill triggersuchactions.
EUBra-BIGSEAwillcontributetothisecosysteminseveralaspects:First,itwillprovidereactiveelasticitytotheresourcemanagementplatform,whichwilluseTOSCAasstandardspecificationtoenhanceportability.Thiswill enableMesos clusters to grown and shrinkwithin datacentre boundaries. Next, itwill create asinglesubmissionpointthatwilldecideuponthedifferentschedulersbasedonthe jobfeatures.Third, itwillprovidereactivepoliciesconnectedtothemonitoringsystemtoensurethatthedeadlinesaremet.Andfourth, itwill integratemonitoringandnotificationsystemswiththeproactivepoliciesto learnfrompastexecutions,reducingtheneedoffine-tuningofthereactivepolicies.
ThechoiceforthemonitoringsystemisMonasca.MonascaisayoungOpenStackprojectformonitoring.IthasbeenselectedforitssuitabilitytoEUBra-BIGSEAusecaseandtheopportunitiesthatcouldarisefromthecollaborationwithOpenStack.Monascawillmonitortheresources,servicesandapplications,triggeringactionsifrequired.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
52
8 REFERENCES[R1] TOSCA (Topology and Orchestration Specification for Cloud Applications), https://www.oasis-
open.org/committees/tc_home.php?wg_abbrev=tosca
[R2]http://opennebula.org
[R3]http://www.openstack.org
[R4]http://yaml.org/
[R5]https://www.consul.io/
[R6]https://zookeeper.apache.org/
[R7]https://www.bsc.es/computer-sciences/grid-computing/comp-superscalar
[R8]http://ophidia.cmcc.it/
[R9] Clayman, Stuart et al. "Monitoring Service Clouds in the Future Internet."Future Internet AssemblyApr.2010:115-126.
[R10]Wang, Chengwei et al. "A flexible architecture integrating monitoring and analytics for managinglarge-scale data centers." Proceedings of the 8th ACM international conference on Autonomiccomputing14Jun.2011:141-150.
[R11]Aceto,Giuseppeetal."Cloudmonitoring:Asurvey."ComputerNetworks57.9(2013):2093-2115.
[R12]"Monasca-OpenStack."2014.13May.2016.Avalilableathttps://wiki.openstack.org/wiki/Monasca.
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
53
GLOSSARYACL AccessControlLists Security
API ApplicationProgrammingInterface Interfacing
BoT BagofTasks ProgrammingModel
CHRONOS AfaulttolerantjobschedulerforMesos Scheduler
CLOUDVAMP CloudVirtualMachineAutomaticMemoryProcurement Elasticity
CLUES CLUsterEnergySavingS Elasticity
CMP CloudManagementPlatform(e.g.OpenNebulaorOpenStack) ResourceManagement
COMPSS COMPSuperscalar(COMPSs)isaprogrammingmodelwhichaimstoeasethedevelopmentofapplicationsfordistributedinfrastructures,suchasClusters,GridsandClouds
ProgrammingModel
CSV CommaSeparatedValue DataType
DBMS DatabaseManagementSystem MonitoringService
DOCKER Anopenplatformfordistributedapplicationsfordevelopersandsysadmins
Scheduler
DOCKERHUB AcloudhostedservicefromDockerthatprovidesregistrycapabilitiesforpublicandprivatecontent
Scheduler
EC3 ElasticComputeClusterintheCloud Elasticity
ELK ElasticStack:Elasticsearch,Logstash,KibanaandBeats Monitoring
GITHUB Aweb-basedrepositoryhostingserviceforaGit-basedversioncontrolsystem
Repository
HPC High-PerformanceComputing ComputingArchitecture
IAAS InfrastructureasaService ResourceManagement
IM InfrastructureManager ResourceManagement
INFLUXDB Aplatformforcollectingandmanagingseriesdata MonitoringService
JSON JavaScriptObjectNotation Datatype
MARATHON AcontainerorchestrationplatformforMesos Scheduler
MESOS AResourceManagementplatformthatabstractsCPU,memory,storage,andothercomputeresourcesawayfrommachines
ResourceManagement
MONASCA Monascaisaopen-sourcemulti-tenant,highlyscalable,performant,fault-tolerantmonitoring-as-a-servicesolutionthatintegrateswithOpenStack
MonitoringService
MS MonitoringSystems MonitoringService
Myriad DeployApacheYARNApplicationsUsingApacheMesos ResourceManagement
NoSQL NotOnlySQL Databaseparadigm
OPHIDIA ACMCCFoundationresearchprojectaddressingbigdatachallengesforeScience
Databaseengine
QoS QualityofService Scheduler
REST REpresentationalStateTransfer Interfacing
www.eubra-bigsea.eu|[email protected]|@bigsea_eubr
54
RMF ResourceManagementFramework ResourceManagement
RSYSLOG Rocket-fastSystemforLOGprocessing MonitoringService
SLA ServiceLevelAgreement Scheduler
SPARK Afastandgeneralengineforlarge-scaledataprocessing ProgrammingModel
TOSCA TopologyandOrchestrationSpecificationforCloudApplications ResourceManagement
VM VirtualMachine ResourceManagement
YAML YAMLAin'tAnotherMarkupLanguage ResourceManagement
YARN YetAnotherResourceNegotiator Scheduler
ZABBIX RealTimeMonitoringsolution MonitoringService
Zookeeper Acentralizedserviceformaintainingconfigurationinformation,naming,providingdistributedsynchronization,andprovidinggroupservices
ResourceManagement