Download pdf - D3.1: QoS Monitoring System Architecture - EuBra-Bigsea · The purpose of this report on the QoS Monitoring System Architecture is to define the software components that will collect

www.eubra-bigsea.eu|[email protected]|@bigsea_eubr

1

D3.1:QoSMonitoringSystemArchitecture

Author(s) IgnacioBlanquer(UPV),GuilhermeMaluf(UFMG),WalterdosSantosFilho(UFMG)

Status Draft/Review/Approval/Final

Version v1.0

Date 2/05/2016

DisseminationLevelX PU:Public PP:Restrictedtootherprogrammeparticipants(includingtheCommission) RE:Restrictedtoagroupspecifiedbytheconsortium(includingtheCommission) CO:Confidential,onlyformembersoftheconsortium(includingtheCommission)

Abstract:Europe-BrazilCollaborationofBIGDataScientificResearchthroughCloud-CentricApplications(EUBra-BIGSEA)isamedium-scaleresearchprojectfundedbytheEuropeanCommissionundertheCooperationProgramme,andtheMinistryofScienceandTechnology(MCT)ofBrazilintheframeofthethirdEuropean-Braziliancoordinatedcall.Thedocumenthasbeenproducedwiththeco-fundingoftheEuropeanCommissionandtheMCT.ThepurposeofthisreportontheQoSMonitoringSystemArchitectureistodefinethesoftwarecomponentsthatwillcollecttheexecutiondatafromthecloudarchitecture,aswellasthemaincomponentsthatinterveneinthefullprocessofdeployment,configuration,contextualizationandexecution.

EUBra-BIGSEAisfundedbytheEuropeanCommissionundertheCooperationProgramme,Horizon2020grantagreementNo690116.

Esteprojetoéresultanteda3aChamadaCoordenadaBR-UEemTecnologiasdaInformaçãoeComunicação(TIC),anunciadapeloMinistériodeCiência,TecnologiaeInovação(MCTI)


2

Documentidentifier:EUBRABIGSEA-WP3-D3.1

Deliverablelead UPV

Relatedworkpackage WP3

Author(s) GermánMoltó (UPV), Ignacio Blanquer (UPV), GuilhermeMaluf (UFMG), Walter dosSantosFilho(UFMG)

Contributor(s) Ignacio Blanquer (UPV), Anna Guimarães (UFMG), Wagner Meira (UFMG), DorgivalGuedes (UFMG), Andrey Brito (UFCG), Danilo Ardagna (POLIMI), Daniele Lezzi (BSC),SandroFiore(CMCC),RobertoCascella(TRUST-IT)

Duedate 30/06/2016

Actualsubmissiondate 30/06/2016

Reviewedby DanieleLezzi(BSC),NazarenoAndrade(UFCG)

Approvedby PMB

StartdateofProject 01/01/2016

Duration 24months

Keywords QualityofService,Cloudservices,Monitoring

Versioningandcontributionhistory

Version Date Authors Notes

0.1 02/05/2016 IgnacioBlanquer(UPV),GermánMoltó(UPV) TableofContents

0.2 09/05/2016 WalterdosSantosFilho(UFMG) Firstversionofsection5content

0.3 23/05/2016 IgnacioBlanquer(UPV) Restructuringsection4

0.4 31/05/2016 WalterdosSantos,GuilhermeMaluf(UFMG) Monitoringsection

0.5 02/06/2016 DanieleLezzi(BSC),IgnacioBlanquer(UPV) Changes on the architecture diagramsandassociatedinformation.

0.6 08/06/2016 IgnacioBlanquer(UPV),GermánMoltó(UPV) Sections2-4completed

0.7 10/06/2016 IgnacioBlanquer(UPV) ExecutiveSummaryandconclusions

0.8 15/06/2016 Andrey Brito (UFCG), Walter dos Santos(UFMG), Daniele Lezzi (BSC), Danilo Ardagna(POLIMI)

In-depthreviewofthedocument.

0.9 27/06/2016 DanieleLezzi(BSC),NazarenoAndrade(UFCG),IgnacioBlanquer(UPV),GermánMoltó(UPV).

Implementation of the comments fromreviewersandfinalcandidateversion

Copyright notice: This work is licensed under the Creative Commons CC-BY 4.0 license. To view a copy of this license, visithttps://creativecommons.org/licenses/by/4.0.

Disclaimer: The content of the document herein is the sole responsibility of the publishers and it does not necessarily represent the viewsexpressedbytheEuropeanCommissionoritsservices.

Whiletheinformationcontainedinthedocumentisbelievedtobeaccurate,theauthor(s)oranyotherparticipantintheEUBra-BIGSEAConsortiummakenowarrantyofanykindwithregardtothismaterialincluding,butnotlimitedtotheimpliedwarrantiesofmerchantabilityandfitnessforaparticularpurpose.

NeithertheEUBra-BIGSEAConsortiumnoranyof itsmembers, theirofficers,employeesoragentsshallberesponsibleor liable innegligenceorotherwisehowsoeverinrespectofanyinaccuracyoromissionherein.

WithoutderogatingfromthegeneralityoftheforegoingneithertheEUBra-BIGSEAConsortiumnoranyofitsmembers,theirofficers,employeesoragentsshallbeliableforanydirectorindirectorconsequentiallossordamagecausedbyorarisingfromanyinformationadviceorinaccuracyoromissionherein.


3

TABLEOFCONTENTEXECUTIVESUMMARY.................................................................................................................5

1 Introduction..........................................................................................................................61.1 ScopeoftheDocument................................................................................................................61.2 TargetAudience...........................................................................................................................61.3 Structure......................................................................................................................................6

2 Dataandexecutionrequirements.........................................................................................7

3 EUBra-BIGSEAInfrastructureOverview.................................................................................13.1 EUBra-BIGSEAGeneralInfrastructure...........................................................................................13.2 ProceduretoDescribecomponents..............................................................................................4

4 QoSIaaS................................................................................................................................54.1 ApplicationusecasesandLifecycle..............................................................................................5

4.1.1 Batchjobs.....................................................................................................................................54.1.2 Interactivejobs.............................................................................................................................64.1.3 Applicationlifecycle......................................................................................................................6

4.2 OASISTOSCA................................................................................................................................84.3 ResourceManagementFrameworks............................................................................................9

4.3.1 DockerSwarm.............................................................................................................................104.3.2 ApacheMesos.............................................................................................................................114.3.3 YARN...........................................................................................................................................134.3.4 Myriad.........................................................................................................................................144.3.5 ImageRepository........................................................................................................................15

4.4 JobExecution(Scheduling).........................................................................................................164.4.1 Spark(spark-submitandspark-shell).........................................................................................174.4.2 Chronos.......................................................................................................................................184.4.3 Marathon....................................................................................................................................19

4.5 ConfigurationandContextualization(andOrchestration)..........................................................204.5.1 InfrastructureManager..............................................................................................................21

4.6 Reactiveelasticity......................................................................................................................234.6.1 CLUsterEnergySavingS(CLUES).................................................................................................244.6.2 ElasticComputeClusterintheCloud(EC3)................................................................................254.6.3 CloudVirtualmachineAutomaticProcurement(CloudVAMP)..................................................26

4.7 TechnologyAnalysis...................................................................................................................284.7.1 RequirementsRelatedtoResourceManagementFrameworks.................................................284.7.2 RequirementsRelatedtoSchedulers.........................................................................................284.7.3 RequirementsRelatedtoImageRegistry...................................................................................294.7.4 RequirementsRelatedtoOrchestrationandDeployment.........................................................294.7.5 RequirementsRelatedtoReactiveElasticity..............................................................................29

4.8 Conclusion.................................................................................................................................304.8.1 ProposedAPI..............................................................................................................................31

5 MonitoringService..............................................................................................................345.1 Cloudcomputingmonitoring......................................................................................................345.2 Requirementsformonitoringservice.........................................................................................345.3 Proposedarchitecture................................................................................................................35

5.3.1 Minimumsupportedinfrastructuremetrics...............................................................................375.3.2 Supportedapplicationmetricsandlogging................................................................................38

5.4 Technologyevaluation...............................................................................................................385.4.1 Metriccollection,storageandretrieval.....................................................................................38


4

5.4.2 InfluxDB......................................................................................................................................395.4.3 Sensu..........................................................................................................................................405.4.4 OpenStackMonasca...................................................................................................................415.4.5 Logscollectionandanalysis........................................................................................................435.4.6 ELKStack.....................................................................................................................................44

5.5 Solutionevaluation....................................................................................................................455.6 Comparativeanalysis.................................................................................................................47

5.6.1 ProposedAPI..............................................................................................................................485.7 Conclusion.................................................................................................................................49

6 TECHNICALPROCEDURES....................................................................................................50

7 CONCLUSIONS.....................................................................................................................51

8 REFERENCES........................................................................................................................52

TableofFiguresFigure1:High-levelviewoftheEUBra-BIGSEAArchitecture.............................................................................1Figure2:RelationsamongWP7andtheothertechnicalWPs.Howthefirstsetofdeliverablesiscontributing

totheapplicationdevelopment................................................................................................................2Figure3:Detailedviewofthesoftwarearchitecture.Thecomponentslistedaredescribedwithmoredetail

alongthedocument..................................................................................................................................2Figure4:LifecycleoftheapplicationsinEUBra-BIGSEA...................................................................................7Figure5:BasicsampleTOSCAtemplate............................................................................................................9Figure6:DetailedinteractionamongcomponentsatQoScloudservicesLayer............................................31Figure7:JSONjobdescription........................................................................................................................32Figure8:High-levelofmonitoringsystemconcepts.......................................................................................36Figure9:Monitoringsystemarchitectureandcomponents...........................................................................37Figure10:Sensu/InfluxDB/ELKsoftwarearchitecture....................................................................................45Figure11:MONASCAsoftwarearchitecture...................................................................................................46


5

EXECUTIVESUMMARYEUBra-BIGSEAprojectaimsatdevelopingasetofcloudservicesempoweringBigDataanalyticstoeasethedevelopmentofmassivedataprocessing applications. EUBra-BIGSEAwill developmodels, predictive andreactivecloud infrastructureQoS techniques,efficientandscalableBigDataoperatorsandaprivacyandqualityanalysisframework,exposedtoseveralprogrammingenvironments.EUBra-BIGSEAaimsatcoveringgeneralrequirementsofmultipleapplicationareas,althoughitwillshowcaseinthetreatmentofmassiveconnectedsocietyinformation,andparticularlyintrafficrecommendation.

The Quality of Service (QoS) architecture is the computational core of the EUBra-BIGSEA platform. TheperformanceofdataanalyticsapplicationsrunningontheEUBRra-BIGSEAplatformareprofiledinadvance,soaQoSguaranteeisdefinedbasedontheperformancerequirements.Then,themonitoringservicewillcloselyfollowtheexecutionsoitcanreactandrequestforadditionalresourcesifneeded.Thiswillbefedback to the performance estimationmodel so the allocation of resources for the next execution can bemoreprecise.Thisway,proactiveelasticitycanadaptthesysteminadvancesothatnodrasticactionsareneededforguaranteeingtheQoSoftheapplications,meanwhile,reactiveelasticitycanworkasasecondactiontorespondtochangesthatwerenotadequatelyhandledbytheproactiveapproach.Userswillnotneed to specify theexpected resources, leveraging the resourceestimation system todecide,within theacceptableboundaries.

TheuseofMesoswillconstitutethebasicfoundationsforthemanagementofdistributedresources,asitisawidelyusedcomponenttoprovideisolation,toavoidfragmentationofdatacentresandtoachievehigh-availabilityandreliability.YarnwillbesupportedonMesosthroughMyriad.Mesoswillbeenhancedwiththecapabilityofautomatically increasingresourcestoprovidearealadaptationofworkloadtopoweronresources,whilebeingagnostictotheupperlayers.Executiongranularityoftheapplicationsisdefinedatthelevelofcontainers.

Theproject identifiesthreetypesofworkloads,persistent,periodicbatchand interactive jobs,whichwillbeservedbydifferentschedulers.PersistentjobswillbeservedbytheMarathonscheduler,periodicjobsbymeans of Chronos scheduler, and interactive jobs through interactive shells (e.g. spark shells). Thoseschedulerswill deploy frameworks thatwill embed the executable services and negotiate the resourceswithMesos.ThoserequestswillbeinterceptedtoguaranteetheavailabilityofresourcesinMesos(e.g.byaddingmoreresourcestothesystem).Therefore,atthelevelofthecloudservices,theprojectwillcapturetheframeworkrequestsandinteractwithMesos,aswellasdealwiththeprovisionandmanagementofbaseVMstopursueanefficientuseofresources.QoS cloud services will deal with the reactive elasticity of the Mesos cluster and the physical (virtual)resources that Mesos base on. Frameworks could request additional resources in case of applicationstarvationanddependingontheprofileof theapplication.Then, themonitoringsystemwill triggersuchactions.ThechoiceforthemonitoringsystemisMonasca.MonascaisayoungOpenStackprojectformonitoring.IthasbeenselectedforitssuitabilitytoEUBra-BIGSEAusecaseandtheopportunitiesthatcouldarisefromthecollaborationwithOpenStack.Monascawillmonitortheresources,servicesandapplications,triggeringactionsifrequired.


6

1 INTRODUCTION

1.1 ScopeoftheDocumentThisdocumentdescribestheQoSMonitoringSystemArchitectureaswellasthesoftwarearchitectureofothercloud-servicerelatedcomponentsandtheirinteractions.ThedocumentfollowstheSoftwareDesignSpecification (SDS) standard (IEEE STANDARD 1016). The document tries to address all the componentsneeded for the execution of theData analyticsworkload of the project and theway to report back theprogress to be reused in further executions or to dynamically react on real-time. Each component isdescribedintermsofitsexternalinterfacesanddependenciesonothercomponents.

Wewill alsodescribe the reasonsof the choicesdone in termsof technologiesand servicesused,andafine-grainworkplanforthenextsixmonths,leavingtherestoftheprojectinacoarsergraindefinitiontoletitevolvewiththetechnologychanges.

1.2 TargetAudienceThedocumentismainly intendedfor internaluse,althoughit ispubliclyreleased.ThemaintargetofthisdocumentistheglobalteamoftechnicalexpertsoftheEUBra-BIGSEA,includingWP3,WP4,WP5andWP6.ThisdocumentgoesbeyondtheQoSmonitoringtounderstandtheglobalarchitectureofWP3.ItdescribesthesoftwarearchitectureofWP3,themainbuildingblocks,therequirementsandthesoftwareavailable,as well as it proposes an architecture to instrument the development of policies, the deployment andexecutionofapplicationsandtheelasticitymeasures.

1.3 StructureTherestof thedocument isstructured into7mainparts.First,asummaryof therequirements fromtheusecasesispresentedanddiscussed.TherequirementsthataffectWP3areoutlinedandreferencedacrossthedocument.Then,section3describesthehigh-levelEUBra-BIGSEAinfrastructure,definingthescopeofeachoneofthemainlayersandtheirrelationtootherdocuments.Then,section4describesindetailtheQualityofService(QoS)InfrastructureasaService(IaaS),describingtheapplicationusecasesandthemaintechnologiesthatwillbeusedasbasisforthedevelopmentsofWP3cloudservices.Then,section5dealswith the monitoring architecture. Then, section 6 describes the policies that describe the resourceassignment. Finally, the document ends up with section 7, which describes the procedures for coding,sourcerepositoriesandcontinuousintegration,andsection8withtheconclusions.


7

2 DATAANDEXECUTIONREQUIREMENTSAdeeperanalysisoftherequirementsispresentedinD7.1End-UsersRequirementsElicitation.Thissectionincludes the requirements summary list for all the requirements that should be tackled from theWP3perspective.

Req# Description Priority WP

RE.1. UnrestrictedBatchjobs MUST WP3

RE.2. UnrestrictedBagofTasks MUST WP3

RE.3. QoSBatchjobs MUST WP3

RE.4. Deadline-basedscheduling SHOULD WP3

RE.5. Self-adaptingelasticity MUST WP3

RE.6. Short-jobs MUST WP3

Table1:ListofrequirementsrelatedtoWP3.

Fromthosehigh-levelrequirements,thisdocumentidentifies25technical-levelrequirements,describedinsection4.7, thatwilldrive the implementationof thesystem.Thosetechnical requirements refer toveryspecific details for five main categories: the management of resources, the scheduling of jobs, themanagementandcataloguingofcontainer images, theorchestrationanddeploymentofservicesandtheelasticity.Thesetechnicalrequirementswillbeintroducedinsections4.3to4.6.

Nexttableanticipatesthosetechnicalrequirementsandidentifiestheircorrelations


1

RE.1.UnrestrictedBatchjobs

RE.2.UnrestrictedBagofTasks

RE.3.QoSBatchjobs

RE.4.Deadline-basedscheduling

RE.5.Self-adaptingelasticity

RE.6.Short-jobs

R01-TheRMFmustenableaschedulertoallocateresources. x x x x

xR02-TheRMFmustsupportmultipletypesofworkloads. x x x x

x

R03-Anapplicationtopologymayinvolveseveralcontainers. x x R04-ThestatusoftheresourcesshouldbeaccessiblebyanAPI.

x

R05-RMFsshouldexposeanAPItochangethenumberofresources.

x x xR06-Monitortheusageofresourcesandtheapplicationhealth.

x x x

R07-Acentrallistofcontainersimagesshouldbeavailable. x x

xR08-Updatesshouldbeautomated. x x

x

R09-Ausershouldbeabletomodifyitsownapplication. x x

xR10-Capabilityofexecutingcontainer-basedbatchjobs. x x x

R11-IntegrationwiththeselectedCMF

x xR12-Capabilityofexecutingjobsinvolvingconcurrentprocesses.

x

R13-SupportofSparkjobs. x x x x

xR14-Supportofdeadline-QoSperiodicjobs.

x x

R15-Highavailabilityforlong-runningjobs. x

x x R16-SupportforinteractiveJobs.

x

R17-DeploymentofTOSCAblueprints. x x x x x xR18-UpdateofTOSCAblueprintstoreconfigurethesystem.

x

R19-Supportofmultipleplatforms. x x x x x xR20-Automaticscalingupofresourceswhennewjobrequestsarise.

x x x

R21-Automaticdeallocationofresourceswhenidleforagivenperiod.

xR22-Customizationpoliciesforelasticity.

x

R23-Reallocationofmemorysizeandnumberofresources.

xR24-Transparentmanagementofmemoryallocation.

x

R25-Automaticreconfigurationofexecutionkernels.

x

Tabla1:RelationamongtheTechnicalrequirementsandtheUseCaserequirements.


1

3 EUBRA-BIGSEAINFRASTRUCTUREOVERVIEW

3.1 EUBra-BIGSEAGeneralInfrastructure

TheEUBra-BIGSEAgeneralInfrastructurecomprises4mainblocks:- QoSCloudInfrastructureservices,whichintegratesthemodellingoftheworkload,themonitoring

oftheresources,theimplementationofverticalandhorizontalelasticityandthecontextualization.Thisisthemainpartofthisdocument.

- BigDataAnalytics services,which provide operators to process huge datasets andwhich can beintegrated in the programming models. Analytics services are characterized in the QoS cloudinfrastructure models of the underlying layer, which automatically (or explicitly driven by theanalyticsservices)willadjustresourcestotheexpectedworkloadandconsideringitsspecificities.

- ProgrammingModels,whichprovideahigher-levelprogrammaticframework(Python,Java,Spark)andarealsocharacterizedbythemodelsoftheinfrastructure.Theprogrammingmodelswilleasetheparallelisationoftheapplicationsdevelopedontopofthem.

- PrivacyandSecurity framework,whichprovides themeans toannotatedataandprocessingandensurestheproperprotectionofprivacyandsecurity.

On top of those four blocks, applications are developed using the programming models and the dataanalytics extensions. Application developers are expected to use the programmingmodels andmay useotherfeaturesofunderlyinglayers,suchastheuser-levelQoSmetrics.

Figure1:High-levelviewoftheEUBra-BIGSEAArchitecture

Figure1showsthehigh-levelviewoftheEUBra-BIGSEAarchitecturedepictingtheinteractionsamongthemainblocks.Figure2showstheinteractionsamongWorkpackagesandthemainsourcesofinformationatthisstageintheproject.


2

Figure2:RelationsamongWP7andtheothertechnicalWPs.Howthefirstsetofdeliverablesiscontributingtotheapplication

development.

Figure 3 shows a more detailed schema of the architecture. More details will be provided in the nextsectionsandinthedeliverableD5.1.EUBra-BIGSEASoftwareArchitecture.

Figure3:Detailedviewofthesoftwarearchitecture.Thecomponentslistedaredescribedwithmoredetailalongthedocument.

Inordertoimplementtheexecutionlifecyclefromeachoneoftheuserscenariolevels,itisnecessarytodefine:

- Application binary and associated dependencies, embedded in a container. The applicationdependencies can be coded as a dependency file (“a la dockerfile”) or directly registering the


3

containerona repository.Thebasicexecutionunit is therefore thecontainer. Someapplicationsmaynotneedotherdependenciesdifferentfromtheonesfromtheprogrammingmodels,alreadyembeddedinthebasecontainerimages.

- QoSPolicies.Thedifferentapplicationswillbeanalysedintermsofdependencies,graphexecutionpath,resourcedemandandperformance.Whenanapplicationissubmitted,aninitialestimationofresourceswillbedefinedforeachexecutedcase.Reactiveelasticityoftheplatformwillensurethatenough resources are available for the application to start. The elasticitymodules can then usemonitoring data to dynamically update the resource allocation to ensure meeting the QoS. Forexample, the platform can enlarge the memory allocation for a Spark job if it is starving or aCOMPSsjobcanhavemorephysicalcomputingslotsifitisnotprogressingasexpected.

- Aspecificexecutiondescription,programmaticallycodedorasajobdescription,toberunonthescheduleroftheplatform.

Mostof this information is inherent to theprogrammingmodel andpart of the specific algorithm tobeexecutedonit.Therefore,theuserwillnotneedtoprovideadditionalinformation,buttheanalysisoftherequirementsshouldderivethisinformationorthemethodstoobtainit.

Additionally,therequirementanalysiswillprovideinputtothefollowingcomponents:- Authentication.Theusecasesshoulddefinewhetheruseridentificationisneededornot,atwhich

levelswillbeneededandifthereisalreadyanexistingmechanism.- Authorisation.Theauthorisationontheaccesstothedatacouldbedoneatthelevelofindividuals

orgroups,andtheactionsthattheauthorisationcouldgrantcanbedifferent.Userscenarioshavetoidentifysuchneeds.

- Privacymanagement. Theproject doesnot involve themanagementof protectedpersonal data,but ithasanactivity todevelopcomponents for this.Therefore, therequirementanalysisshouldidentifytheinformationneededtocodeandprotectthelevelofprivacyofthedata,eitherfromitsacquisitionorafteritsprocessing.Itmayhappenthatrawdatawillnotrequiredataprotection,butpost-processeddatacoulddiscoverre-identifiabledata,soitcouldnotbestoredinthesameway.

- Dataacquisition.Datasources,dataformats,datavolumes,dataacquisitionrate,expectedstorageneedsanddatavalidity.

- Programmingmodels.Theprojectwillstartfromtheexistingexpertiseofthedataanalyticsgroups,who already attain experience on popular execution environments. Application developers anddata scientists will express their requirements in the form of programming languages andframeworks.

- Execution patterns. Different use cases identified in this documentwill have different executionpatterns(event-based,bagoftaskbatch,interactive).

- Loggingandmonitoring.Theusecaseswillalsodefinemetricsthatwillbeloggedtofeedbackandadjust the static and real-time policies. Use cases should definewhich are themetrics they canexpose(additionallytothebasicresourcemonitoring,suchasCPU,memoryanddiskusage,ortheonesthatcouldbeobtainedfromthescheduler,asthejobwaitingtime)andwhicharerelevanttotheusecase.


4

3.2 ProceduretoDescribecomponentsThissectionincludesatemplatefordescribingthepotentialcomponentstobeusedinEUBra-BIGSEA.Nextsectionswillinstantiatesuchtemplateforthedifferentcomponentsidentified.

Identification Nameandlayerwherethecomponentwillbeapplied(acomponentmaybeappliedtodifferentlayers)

Potentialusage(inEUBra-BIGSEA)

Sectionwherethecomponentwillbeused.

Type Module,subprogram,datafile,controlprocedure,class,framework,service,etc.

License Licensemodel

Website URLtogetadditionalinformationorthecode.

Purpose Function and performance requirements implemented by the design component,includingderivedrequirementsthatrelatetotherequirementsoftheproject.

Function Whatthecomponentdoes,thetransformationprocess,thespecificprocessedinputs,the used algorithms, the produced outputs, where the data items are stored, andwhichdataitemsaremodified.

High LevelArchitecture

Theinternalstructureofthecomponentandtheirinnerinteractionsthatarerelevantfortheprojectrequirements.

Dependencies Other components requiredby thecomponentsandhow this component isusedbyother components. Interactiondetails suchas timing, interactionconditions (suchasorderofexecutionanddatasharing),andresponsibilityforcreation,duplication,use,storage,andeliminationofcomponents.

Interfaces Detailed descriptions of all external and internal interfaces as well as of anymechanisms for communicating through messages, parameters, or common dataareas.

Data Internaldatarequiredbythecomponenttowork.

NeededImprovement

Description of the needed improvement to the tool, that are foreseen during theEUBra-BIGSEAprojectinordertofulfilluserrequirementsstatedinprevioussection.


5

4 QOSIAASThe execution requirements described in section 2 mainly address the execution of Batch jobs (RE.1), Bag of tasks (RE.2), QoS Batch jobs (RE.3), Short jobs (RE.6), Deadline-based jobs (RE.4) and the self adapting elasticity (RE.5). In order to address such requirements a job execution framework is needed. The Quality of services IaaS, main objective of WP3, focuses on the implementation of such a platform that could be capable of:

- Deploying and configuring the proper virtual infrastructure to run the data analytics jobs. This imply the specification of the software services, their dependencies, their configuration recipes and basic images.

- Orchestrating the execution of the workload on top of the virtual resources, leveraging the data locality and data parallelism.

- Registering the execution performance and logs to feedback and update further runs and to trigger elasticity rules.

- To implement elasticity at the level of the memory and the number of resource instances.

Those requirements should be addressed by a set of services, which may come from existing components in the literature and new developments from the consortium.

The section first describes the major workload cases and the lifecycle and the rest of the section describes the main components required for managing them.

4.1 ApplicationusecasesandLifecycleTheQoS IaaSaddresses twokindsofworkload:batch jobs thatare submitted through theprogrammingmodels(e.g.SparkorCOMPSs[R7])andinteractivejobsthatusetheDataAnalyticsconsole(e.g.OPHIDIA[R8]orSpark).

4.1.1 Batchjobs

Analytic jobsforthecreationandevaluationofthedescriptiveandpredictivemodels,aswellastocheckthequalityofthedatawillbeimplementedontheprogrammingmodelandrunasbatchjobs.Weidentifytwotypesofbatchworkload:

- Scenario1(modelfitting):Jobsarelaunchedwithapre-specifieddeadline.Jobsareexecutedoneby one and an ordered queue is maintained in a way that jobs can be executed within theirdeadlines.Whenanewjobissubmitted,thequeueisreorderedinawayalljobdeadlinescanbemetaccordingtothecurrentclusterconfiguration.Ifthisisnotthecaseortheexpectedmakespan(sum of execution time of all jobs) would be larger than a safety margin (e.g. the expectedexecution timewillbe10% longer than theestimatedone), thesystemcan increaseordecreasethe number of resources. The system periodically checks jobs execution by means of themonitoring system or specific calls to the monitoring API. If, considering data coming from themonitoring infrastructure thesystempredicts that thecurrent jobdeadlinecanbeviolated, thenclusterre-configurationistriggeredandresourcesareadded.Re-configurationisalsore-triggeredwheneveranewjobstarts.

- Scenario2(modelprojection):Wedealwithasustainedsubmissionrateofshortjobs(e.g.300jobsin5minutes,arrivinginanirregularpattern:from30to90jobsperminute).Eachjobwillrequire10seconds.Thesystemshouldquicklyallocateresourcesforthoseshortjobsinordertohavetheminimumoverhead (e.g. an overhead < 5 seconds per job).Wewant to have theminimum idletimeintheresources(e.g.<20%)toavoidwastingresources.

JobcanbeperiodicallyexecutedandwithdeadlineQoS.Forexample,foroneoftheapplications,aroutingrecommenderforpublictransportation, inordertoprovidethemostaccurateinformationontheroutes,


6

the information on the use of public transport from the previous day should be processed. If thisinformation isnotcomputedbythepeaktimeuserswillstartrequesting information,theresultsmaybeout-dated.Iftheanalysisisperformedquiteinadvance,theinformationmaynotbeup-to-dateenough.Sothejobshouldbeexecutedperiodicallyandendingclosetothepeakhours.

4.1.2 Interactivejobs

Ausermaylaunchaninteractiveconsoletoloadandanalyzethedata.ThiswillbepossibleforbothSparkandOphidia,whoprovideaconsolethatcanbeusedtoexecuteanalyticoperationsontopofit.

Wedifferentiatefromtwoscenarios:

- AuserexecutesanSparkjobthatwillbeusedtoloaddatainmemoryandexecutedataanalyticprimitives.Theexecutionofthetaskswillbedoneontheworkingnodesofthecluster.Sinceitwillhaveanunpredictableworkload,theallocationofresourcescannotbetunedupapriori.Therefore,thesystemshouldbeabletoreactdynamicallyprovidingadditionalresources ifneeded.Thiswillbeperformedtransparentlytotheuserandbasedonana-prioriminimumandmaximumQoS.

- Auserlinksaclusterwithintheanalyticconsole.Thiswillprovidetheuserwiththeabilitytocreatethecluster“aposteriori”,fromanalreadyactiveinstance.Thebehaviourcouldbethesameasinthepreviouscase.

4.1.3 Applicationlifecycle

TheapplicationsinEUBra-BIGSEAcanbe:

- DataanalyticbatchjobsthatcouldintrinsicallyexploitparallelismusingCOMPSs(Parallelregionsidentified by the COMPSs runtime are executed on multiple containers) or Spark jobs. Theseapplicationsmaybecharacterizedbytheexecutionpolicies,providingthemtheexpectedresourceprofileforitsexecution.

- Consoleanalysisinteractiveinstances.ThiswillimplyaccessingtoaSparkorOPHIDIAconsolethatwilldeployad-hocclusters.Thisworkloadisunpredictableandtheinfrastructurewillmainlyreacttotheresourcedemand.

Along with the dependencies related to COMPSs, OPHIDIA or Spark, those applications may requireadditionalsoftwarelibrariesandcomponentscomingfromtheprogrammingenvironmentsused.

Therefore,theapplicationlifecycleinvolves:

- Defining theVirtual resourcedependenciesproperly.Thiswill require tocreatecontainer imagesthat can be either uploaded on public repositories or kept on the infrastructure resources. Apreferredmethodwill be to use dockerfiles,DockerHub and/orGitHubAutomatedbuild. Theseimages will be configured on top of the base images of the Spark, COMPSs and OPHIDIAenvironmentsandwillbelocallycreatedthroughDocker-composeifnecessary.

- DefiningaTOSCAdocumentthatdescribestheapplication topologyandtheparametersthatcanbecustomizedinruntime.

- Submitting theapplication throughCOMPSs,OPHIDIAorSpark, referencing thosecontainersandadditional informationaboutthecomponentsthatcomposeanapplication.TheTOSCAdocumentwillpointouttotheimagesandtheapplicationtopology.

- Ordeployingtheinfrastructurefortheinteractiveusageoftheanalyticframework.


7

Figure4:LifecycleoftheapplicationsinEUBra-BIGSEA

Thelifecycleoftheapplicationsgothroughdifferentstepsthatrangefromthedefinitionofthesoftwarearchitecture (software topology) that defines the application, the description of the dependencies, thedeployment on a resource pool, the configuration and contextualization of the instances, the actualexecutionoraccesstotheapplication,themonitoringoftheexecutionandthedynamicreconfigurationofthesystem.Figure4showsabasicdiagramofinteractionsforthislifecycle.

Therefore,wecoverinthisdocumenttheanalysisofthefollowingcomponents:

- Thespecificationofthevirtualresourcesrequired.Thisisdescribedinsection4.2,anditwillmakeuseofTOSCAtemplatesthatdescribetheapplicationtopology.

- The provisioning and orchestration of the resources. Section 4.3 includes a list of resourceprovisioningtechnologiesavailableandanalysestheselection.

- Theconfigurationandcontextualizationoftheapplication.Thisisdescribedinsection4.4.Itcoversthreemain alternatives: on-the-fly configuration of basic images, deployment of fully-configuredimagesandintermediatesolutions.

- Theexecutionofbatchjobs.Thisisdescribedinsection4.5.- Thedynamic reconfigurationof the resourcepool, described in section4.6, and addressingboth

horizontalandverticalelasticity.- Themonitoring and loggingof the resources and application services. Thiswas initially themain

objectiveofthisdeliverable,anditisfullydescribedinsection5.


8

4.2 OASISTOSCATheEUBra-BIGSEAplatformprovidesanentrypointtoitsfunctionalityviatheOrchestratorservice,whichwillfeatureaRESTfulAPIthatreceivesaTOSCA-compliantdescriptionoftheapplicationarchitecturetobedeployed. TOSCA (Topology and Orchestration Specification for Cloud Applications) [1] is an OASISspecification for the interoperable description of application and infrastructure cloud services, therelationships between parts of the services, and the operational behavior of these services. TOSCA hasbeen selected as the language for describing applications, due to the wide ranging adoption of thisstandardandtheavailabilityofsolutionsforbothOpenNebula[2]andOpenStack[3].

ThecoreTOSCAspecificationprovidesa languagetodescribeservicecomponentsandtheirrelationshipsusingatopologymodel,anditprovidesfordescribingthemanagementproceduresthatcreateormodifyservices using orchestration processes. The combination of topology and orchestration in a ServiceTemplate describes what is needed to be preserved across deployments in different environments toenable interoperable deployment of cloud services and their management throughout the completelifecyclewhentheapplicationsareportedoveralternativecloudenvironments.

A Topology Template consists of a set of nodes and relations. The nodes form a directed graph thatdescribe all the components of an application. Each node is represented by a node type and a set ofpropertiesthatdefinetheinterfacestomanipulatethecomponent.CustomToscatypesshouldbederivedfrom the normative types. Dependencies among Tosca types enable defining the dependencies andrequirementsinastructuredandportableway.ToscadeploymentplansinstantiatesuchToscatypeswiththespecificvaluesrequired.

A Tosca Topology template describe the interactions among the components (infrastructure,platform/middleware and application modules). Components are described in YAML [4]. Applicationarchitectscanmodelservices,policiesandrequirementsofanapplicationonaTOSCAtemplate,which isextendedwithadditional artifactsby thedevelopmentplans. TheTOSCA templatesareused to test anddeploytheapplication.

Figure5showsasampleTOSCAblueprint

tosca_definitions_version: tosca_simple_yaml_1_0 imports: - custom_types: <<URL to a yaml with generic custom_types.yaml>> description: > Sample TOSCA file topology_template: inputs: download_url: type: string default: <<URL to any specific data required for the installation>> node_templates: my_app: type: <<Existing generic type>> requirements: - host: my_app_prerequisites interfaces: Standard: configure: implementation: <<URL to yml file to install the software>> inputs: download_url: { get_input: download_url } my_app_prerequisites: type: <<existing generic type>>


9

properties: public_ip: yes capabilities: # Host container properties host: properties: num_cpus: 1 mem_size: 1 GB # Guest Operating System properties os: properties: # host Operating System image properties type: linux distribution: ubuntu

Figure5:BasicsampleTOSCAtemplate

4.3 ResourceManagementFrameworksA Resource Management Framework (RMF) is a multitenant component that manages distributedresources,allocatesthemtotherequestsperformedbyascheduler,executesaprocessinganddeallocatesthe resources once the execution finishes. This section describes the components required tomanage apoolofresourcestodeploytheapplicationcontainersthatsupporttheexecutionoftheapplications.Thosecomponentsdealwiththefollowingfeatures:

- ResourcepoolManagement. Itshouldmanageasetofresourcestodeploycontainerswheretheapplicationswillrun.

- Loadbalancing.Itshouldidentifythefreeresourceswherethecontainerswillbedeployed.- Fault tolerance. It should provide the ability to ignore faulty resources and to rearrange the

infrastructure.- Instance deployment and execution (Orchestration). Thismay be achieved through a scheduler

system,whichcouldbeindependentofthesystem.- Elasticity. It should enable enlarging or decreasing the resources allocated, despite that thiswill

requireexternalservicestoautomatetheprocess.Therequirementsthatwehaveforsuchframeworksare:

- R01-TheRMFmustenableaschedulertoallocateresourcestoaspecifickernel.- R02-TheRMFmustsupportmultipletypesofworkloads,fromcontainerstoSparkjobs.- R03 - An application topology may involve several containers, which must be deployed in a

coordinatedway.- R04-ThestatusoftheresourcesshouldbeaccessiblebyanAPI.- R05-RMFsshouldexposeanAPItochangethenumberofresourcesallocated.- R06-Thesystemshouldmonitortheusageofresourcesandtheapplicationhealth.

Thefollowingcomponentsdonotprovideafullsetoftheabovefeatures,butsomeofthemcomplementeachother.


10

4.3.1 DockerSwarm

Identification DockerSWARM

Potentialusage ResourcePoolManagementandContainerOrchestration.

Type Serviceandclient

License Apache2.0

Website https://docs.docker.com/swarm,https://github.com/docker/swarm

Purpose Anative clustering system forDocker. It turns apool ofDockerhosts into a single,virtualDockerhost.

Function Itenablescreatingapoolof resourceswheredockercontainerscanbe run.DockerSwarmprovidesDiscoveryServices,schedulingandbothaCLIandanAPI.


Eachhost runs a Swarmagent andonehost runs a Swarmmanager (on small testclusters this host may also run an agent). The manager is responsible for theorchestrationandschedulingofcontainersonthehosts.

Swarm canbe run in a high-availabilitymodeusingConsul or ZooKeeper tohandlefail-over toaback-upmanager (therecanbemultiplemanagers).Thereare severaldifferentmethodsforhowhostsarefoundandaddedtoacluster,whichisknownasdiscoveryinSwarm.Bydefault,tokenbaseddiscoveryisused,wheretheaddressesofhostsarekeptinaliststoredontheDockerHub.

ASwarmclusterrequiresanopenTCPportoneachnodeforcommunicationwiththeSwarm manager, to install Docker on each node and to create and manage TLScertificatestosecureyourcluster.

Dependencies Docker,DockerHub,andothercomponents forHA, suchasConsul [5]orZookeper[6].

Interfaces It uses the same Docker API, thus facilitating the migration from single Dockerresourcestoapoolofresources.

Data Notapplicable.

NeededImprovement

Moreadvancedschedulingpolicies,horizontalandverticalelasticity,multitenancy.


11

4.3.2 ApacheMesos

Identification Mesos

PotentialUsage MainResourcePoolManagement.

Type Amiddleware(setofdaemons/services)forclustermanagement

License Apache2.0

Website http://mesos.apache.org/

Purpose Mesos provides efficient resource isolation and sharing across distributedapplications (frameworks). Mesos can run on top of Virtual Machines and/or baremetaland/orDockercontainers.

Function Mesos requires computing resources to be assigned to it, so that it can deploydistributedapplicationsontheseresources.Mesosprovidesthefollowingmainfunctionalities:- Resource allocation/revocation/re-allocation: Mesos implements a pluggableresource allocation module architecture. By default, Mesos includes a strictpriorityresourceallocationmoduleandamodifiedfairsharingresourceallocationmodule. Advanced scheduling features like oversubscription are available sinceversion0.23.

- Performance isolation among framework executors running on the same slavenode through pluggable isolation modules. The default mechanism leveragescontainertechnologies.

- frameworkauthorizationimplementedthroughconfigurableACLsinJSONformatthatallow1)frameworksto(re-)registerwithauthorizedroles;2)frameworkstolaunch tasks/executors as authorized users; 3) Authorized users to shutdownframework(s)through“/shutdown”HTTPendpoint

- frameworkrate-limitingthatallowstoconfigurethemaximumnumberofqueriespersecondsforeachframework.This featureaimsatprotectingthethroughputofhigh-SLAframeworksbyhavingthemasterthrottlemessagesfromother(e.g.,development,batch)frameworks.

- resourcereservation.Instaticreservation,resourcesarereservedforaparticularrole; Frameworks can use dynamic reservations (Mesos 0.24 on) to reserveoffered resources, allowing those resources to only be re-offered to the sameframework.Thisisespeciallyusefulifthetaskoftheframeworkstoredsomestateontheslave,andneedsaguaranteedsetofresourcesreserved,sothatitcanre-launchataskonthesameslavetorecoverthatstate.

- monitoring:Mesosmasterandslavenodes reporta setof statisticsandmetricsincluding details about available resources, used resources, registeredframeworks,activeslaves,andtaskstate.Thesemetricsareavailablequeryingthehttpendpointsexposedbythemasterandslavenodes.HeapstercanbeusedtomonitoraMesosinfrastructure.

- slave recovery: this feature allows1) Executors/tasks to keep runningwhen theslaveprocess isdownandallows2) a restarted slaveprocess to reconnectwithrunningexecutors/tasksontheslave.

- nativeDockersupportthatallowsuserstolaunchaDockerimageasaTask,orasanExecutor.


12

- DNS-basedservicediscovery:itallowsapplicationsandservicesrunningonMesostofindeachotherthroughthedomainnamesystem

HighLevelArchitecture

Mesoscomprisestwomaincomponents:- Mesosmaster- Mesosslave

TheMesosmasterisadaemonthatmanagesslavedaemonsrunningoneachclusternode. The high-availability and fault-tolerance of themaster can be achieved usingmultiplemastersandZookeeper.Theslaves registerwith themasterandoffer“resources” i.e. capacity tobeable toruntasks.Mesosusestheconceptofframeworkstoencapsulateprocessingengines.Frameworks creation is triggered by the schedulers registered in the system.Frameworks reserve resources for the execution of tasks. Mesos can reallocateresources to frameworks dynamically. Resource profiles are described as “roles”which define number of resources that can be allocated (something similar to the“flavours”). This ensures that there is no physical relation between resources andframework types (e.g. a Spark framework will only consume resources whensubscribedtoMesosanditcouldhavedifferentresourceseachtimeisexecuted).TheExecutorsrunontheMesosslavesandareresponsiblefor launchingtasks.Oneor more executors from the same framework may run concurrently on the samemachine. A dedicated class, “MesosExecutorDriver”, is used both to manage theExecutor’s lifecycle and to connect the Framework Executor to Mesos. There is alightweightexecutorforcontainerofbatchjobscalled“CommandExecutor”.Mesos provides the Scheduler interface to be implemented by each specificframework; this interface includesmethods to register, re-register, unregister withtheMesosmasterandtoacceptorrejectresourceoffers.The master decides how many resources must be offered to each frameworkscheduler according to a given organizational policy, such as fair sharing, or strictpriority. To support a diverse set of policies, the master employs a modulararchitecture that makes it easy to add new allocation modules via a pluginmechanism.Mesossupports internalmonitoringof frameworks,executorsandtasks,andMesosservicescanbemonitoredthroughasystemcalledSatellite.

Dependencies Zookeperserversforleaderelectioninhigh-availabilitymode;

Interfaces Java,PythonandC++APIsandaWebUIforclustermanagement.

Data NotApplicable.

NeededImprovement

Integrationwiththereactivepoliciesandthemonitoringsystem.


13

4.3.3 YARN

Identification YetAnotherResourceNegotiator(YARN)

PotentialUsage SchedulerfortheHadoop-basedjobs.

Type Frameworkofservices

License Apache2.0

Website http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Purpose Adataprocessingframeworkbasedonaresourceandapplicationmanager.

Function YARNhas substitutedMapReduce for theprocessingof data inHadoop. It includesnowtwoseparatecomponents:TheResourceManager(oneperinstallation)andtheApplicationMaster(oneperapplication).


A YARN deployment has aResourceManager (RM) and aset of slave NodeManagers(NM) that constitute the jobenvironment. The RMmanages the resources andschedules the jobs. TheApplication Master (AM)negotiates the resources withthe RM and the NMs toexecute the tasks (a job inYARNcanbeaDAG). TheNMis the per-machine slave,which is responsible forlaunching the applications’containers, monitoring theirresourceusage(cpu,memory,disk,network)andreportingittotheRM.YARNhasaCapacityScheduler to runHadoopapplications ina shared,multitenantcluster, while maximizing the throughput and the utilization of the cluster. TheCapacityScheduler is designed to allow sharing a large cluster while giving eachorganizationcapacityguarantees.Thereisanaddedbenefitthatanorganizationcanaccessanyexcesscapacitynotbeingusedbyothers.Thisprovideselasticity for theorganizationsinacost-effectivemanner.

Dependencies ItispartofHadoop2.x

Interfaces CLI,JavaandRESTAPI.

NeededImprov. N/A


14

4.3.4 Myriad

Identification ApacheMyriad

PotentialUsage

Scheduler.ItcanaddresstheintegrationofYARNandMesosifeventuallyweuseboth.

Type Scheduler

License Apache2.0

Website http://myriad.incubator.apache.org/

Purpose Myriadenables theco-existenceofApacheHadoopandApacheMesosonthephysicalinfrastructure.ByrunningHadoopYARNasaMesosframework,YARNapplicationsandMesosframeworkscanrunside-by-side,dynamicallysharingclusterresources.

Function WithApacheMyriad,itispossibleto:- Runanoperationalapplications(includingthoserunninginDocker)side-by-sidewithananalyticapplications.

- AchieveHadoopmulti-tenancybyprovisioninglogicalHadoopclustersforeachuser.- YARN running as aMesos Framework,with resourcemanager and nodemanagersrunninginsideMesoscontainers.

- AbilitytolaunchmultipleYARNclustersonthesamesetofnodes.- Ability to deploy YARN Resource Manager using Marathon. This feature leveragesMarathon's dynamic scheduling, process supervision, and integration with servicediscovery(Mesos-DNS).

- AbilitytorunMapReducev2andassociatedlibrariessuchasHive,Pig,andMahout.

High LevelArch.

Mesos acts as the main resource manager.Myriad provides a Control Plane thatorchestratestheschedulingbetweenMesosandYARNschedulers.MesoswillspawnYARNNodeManagersasataskin theMesosNode (step1).TheNodeManagerregisters its capacity to the YARN ResourceManager (2), and then the Resource Managercanlaunchthecontainers(3).TheNodeManagercapacitycanbereadjustedtoenable other Mesos workload to profit unusedYARNresources.The control plane implements also horizontal scaling by getting information about thestarvationofresourcesfromYARNResourceManager.YARNclustersrunningonMesosthatcanallocateresourcesindifferentways:

- Static - Administrators can use an API or a GUI to add or remove nodemanagersorauxiliaryservicesliketheJobHistoryServer.

- Fine-grained - Administrators can provision thin node managers that aredynamicallyresizedbasedonapplicationdemand.

Dependenc. YARNandMesos

Interfaces RESTAPI

NeededImprov.

InterfacetheControlPlanewithoursolutionsandmonitoringforverticalandhorizontalelasticity.


15

4.3.5 ImageRepository

One of the workloads to be supported in EUBra-BIGSEA are batch jobs wrapped along with theirdependenciesoncontainers.Thebuilding,registrationanddownloadingofcontainersshouldbedoneinanefficientway, reducingcommunicationoverhead.This registry shouldbeavailable toall the resources inthesystemandupdateshouldbesimpleandautomatic.

Requirements:

- R07-Acentrallistofcontainersimagesshouldbeavailable.- R08-Updatesshouldbeautomated.- R09-Ausershouldbeabletomodifyitsownapplication.

WeproposeusingDockerHubandtheautomatedbuildprocesstodealwiththis.

Identification DockerHub/localdockerimageregistry

PotentialUsage DockerHubcanbeusedtostorethebasicimagesofSpark,COMPSsandOPHIDIAandotherprocessingenvironmentsneeded.Thelocalregistryintheuser’sdeploymentwillstore the customised imageswith the additional software dependencies. Eventually,relevantandshareableimagescanbepulledintheDockerHubrepository.

Type Service

License DockerHubisaproprietarysolution.ThelocalregistryisApache2.0

Website https://hub.docker.com,https://github.com/docker/distribution

Purpose AregistryofconfiguredDockerimages.

Function Docker images are stored locally where theywill be run. A user can customize andstore amodified version of a container image locally, not sharing it with any otheruser.DockerHub is used toupload and share container images that areusedbymultipleusers and sites. Users “pull” the images that are available in this repository anddownloadthemontheylocalresources.Alternatively,ifthesourcecodeforaDockerimageisonGitHuborBitbucket,youcanusean“Automatedbuild”repository,whichisbuiltbytheDockerHubservices.



16

Dependencies Docker

Interfaces NativeDockerCLIandanAPIREST

Data N/A

NeededImprovement

Thosecomponentscannotbemodified.

4.3.5.1 Automatedbuild

Inorder to facilitate themaintenanceof customcontainer imageswith theembeddeddependencies,anautomatedbuildprocesswillbeimplemented.Despitethecreationandmaintenanceofcontainerimagesisasimpleprocess,whensharedrepositoriesaremanagedthisprocesscouldleadtoincompletebuildsormanualsynchronization.Therefore,weproposetousetheDockerautomatedbuild.

Forthisprocess,theEUBra-BIGSEAGitHuborganization(https://github.com/organizations/EUBra-BIGSEA/)will be used in coordination with the EUBra-BIGSEA Docker Hub organization(https://hub.docker.com/u/eubrabigsea/). EUBra-BIGSEA developers will be invited to join bothorganizations.DockerHubenable creatingautomatedbuilds froma repository inGitHub.The repositoryshouldhaveadockerfileuploadedintheGitHubrepository.Anychangecommittotherepositorywillleadto the automatic creation of the docker image, which will be automatically registered in the properorganization repository of DockerHub. Users can pull the new images by typing “docker pulleubrabigsea/image_name”.

4.4 JobExecution(Scheduling)

The Resource Management Frameworks (RMFs) described in section 4.3 provide a way to share andallocate resources to multiple executions, even for heterogeneous traffic. On top of these systems,schedulersdispatchjobsontheallocatedresources.

Accordingtothetypeofjobs,EUBra-BIGSEAhasthefollowingrequirements:- R10-Capabilityofexecutingcontainer-basedbatchjobs.- R11-IntegrationwiththeselectedCMF- R12-Capabilityofexecutingcomposedjobsinvolvingmultipleconcurrentprocesses.- R13-SupportofSparkjobs.- R14-Supportofdeadline-QoSperiodicjobs.- R15-Highavailabilityforlong-runningjobs.- R16-SupportforinteractiveJobs.

Thissectiondescribesseveralproposals.


17

4.4.1 Spark(spark-submitandspark-shell)

Identification ApacheSpark

PotentialUsage

SparkisoneofthesupportedprogrammingmodelsinEUBra-BIGSEAandwillbemainlyusedfordatafilteringandpreparation.

Type ExecutionFramework

License Apache2.0

Website http://spark.apache.org

Purpose In-memoryengine for large-scaledataprocessing.ApacheSpark isa fastandgeneral-purpose cluster computing system and an optimized engine that supports generalexecutiongraphs.

Function Developmentandexecutionof in-memorydataanalyticapplicationsbasedonanownprogrammingparadigm.


In a standalone cluster deployment, the clustermanager is a Sparkmaster instance.When using Mesos, the Mesos master replaces the Spark master as the clustermanager.

Spark can be used forbatch jobs throughspark-submit, whichcan use local, YARN orMesos resources,among others. Spark-submit can be used toexecute binariesremotely.

Spark-shellisaScalainteractiveconsolethatcanuseasback-endaMesoscluster.Thisway, one can execute data analytic operations and execute them interactively on aremotesystem.

Dependencies AHadoop,YARNorMesoscluster.

Interfaces Itprovideshigh-levelAPIsinJava,Scala,PythonandR.

NeededImprovement

Notdirectly,butthroughtheresourcepoolmanagementandtheorchestrationsystem.


18

4.4.2 Chronos

Identification AfaulttolerantjobschedulerforMesos

PotentialUsage Submissionofperiodicjobsthatneedtomeetaspecificdeadline.

Type Scheduler

License Apache2.0

Website https://mesos.github.io/chronos/

Purpose Chronos is a suitable scheduler for periodic tasks that need to be triggeredperiodically or when other conditions are met. This is a sound alternative tonever-endingtasks,andfitsverywell thescenariosofperiodicdataextraction-transformation-load(ETL)jobs,Dataqualityanalysis,modelstuning,etc.

Function Chronos provides similar features (but more focused on periodic jobs) asMarathon.ItusesSSLauthentication,canbeusedtodefinegroupedjobs,enablejob retries and collects the historic performance of the executed jobs. It alsosupportslaunchingDockercontainers.


The high-level architecture is quitesimilartoonefromMarathon. Italsouses a JSON model to describe thejobs.

Jobs can be required to run onpredefined containers, and additionaldata canbe fetchedondemand. Jobsrun for a number of times (eveninfinitelyifrequired)andcanbescheduledtostartonaspecifictimeanddate.

Dependencies Chronos requires Apache Mesos 0.20.0+. Zookeeper is also required for highavailability.

Interfaces ARESTAPI[https://mesos.github.io/chronos/docs/api.html]andaWebUI.


19

4.4.3 Marathon

Identification AcontainerorchestrationplatformforMesos(Marathon)

PotentialUsage

SchedulertobeexposedtoWP5-4applicationsthatrequirecomputation.ItcanbenefitfromanyresourceupdateinterventionwecouldapplytoMesos.

Type Scheduler

License Apache2.0

Website https://mesosphere.github.io/marathon/

Purpose It interacts with Mesos to submit batch jobs on a set of resources provisioned byMesos. It can submitbasic shell scripts, including theautomatic fetchingof files, andDocker-basedapplications.Marathon isespeciallywellsuitedfor longrunningservicejobs,thatneedhighavailability.

Function Jobs in Marathon are described in JSON documents that include the hardwarerequirements, the job execution code, URIs of dependencies and additionalinformation.Jobscanbeembeddedascontainers.JobsarecreatedandregisteredintheMarathonplatformfromtheirJSONdescriptionorusingtheWebGUI.Jobscanberestartedorstoppedasrequested.OtherinterestingfeaturessupportedbyMarathonare:

- SSLauthenticationoftheendpoints.- Applicationscalingondemand(bothincreasinganddecreasing).- Healthchecks.- Grouping of applications tomanage related applications that depend one on

theother.TheresourceconfigurationofMarathonjobscanbeupdateddynamicallyviaAPIanditenablessubmittingDocker-basedjobs.


Marathon is implemented into two components: theschedulerandtheexecutors.Theschedulercoordinatestheexecutionsandtheexecutorcontroltasks.

Mesos Master provides Marathon scheduler withresources (1). Then the executorwill launch a task (2)on the mesos slave provided by the master. Theexecutorwillupdatethestatusofthetask(3),fedintotheMarathonschedulerthroughtheMesosMaster(4).

Dependencies MarathondependsonMesos.ItalsocanusetheDockerRegistry.

Interfaces A REST API [https://mesosphere.github.io/marathon/docs/rest-api.html], as well asmultipleclientsavailable(CLI,Java,Python,etc.).


20

4.5 ConfigurationandContextualization(andOrchestration)EUBra-BIGSEAwilldevelopacloud-basedplatformfordataanalytics.EUBra-BIGSEAwillproduceasetofservices to dealwith different types of applicationworkloads. One should be able to deploy only thosecomponents that are suitable for theapplication scenario. For example,one candeploya EUBra-BIGSEAinstrumented CMF with COMPSs as programming model for batch QoS jobs. Other configuration mayinvolveonlySparkjobs,includinginteractiveaccess.Therefore,thedeploymentofEUBra-BIGSEAshouldbeflexible,modular,platformagnosticandcapabletodealwithanunpredictedworkload.

Theconfigurationandcontextualizationprocessimplythesettingupofthevirtualresourcestobuilduptheapplicationenvironment.Itwill implytoinstallsoftwaredependencies,toconfigurethem,tosetupcrossinformationof IPs,directories,ports, etc. thatneed tobe sharedamong theapplicationcomponents, tocreate users, copy files, among other tasks. Contextualization deals with the core configuration of aninstance. Finally, orchestration relates to the coordinated deployment and reconfiguration of a set ofservicesthatformthetopologyofanapplication.

Applicationdependencieswillbeembedded in containers.Application-specific code is includedasbinaryexecutablefilescompiledwiththeCOMPSsruntimeorjarfilesfromSparkjobs.ASparkorCOMPSsjobwillexpect the Spark or COMPSs runtime dependencies automatically available in the container.Moreover,application-specificdependenciesmustbeindicatedsomehow.

Thereforetherearetwocomplementaryalternatives:- Createapplication-specificcontainersforeachoneoftheapplicationstoberun.Thiswillrequire:

- Aregistryofthecontainerscreated.- Acollectionofbasiccontainerimages.- Aproceduretobuildupcontainerswithapplicationspecificdependencies.

- Configureon-theflythecontainersfrombasicimages.Thiswillrequire:- Acollectionofbasiccontainerimages.- Configurationrecipesas,forexampleAnsibleGalaxyplaybooks1.- ADevOpstooltodeployandconfigurethecontainers.

Firstapproachwillproduceimmediatedeploymentofcontainers,butitwillrequireuserstobuilduptheirownimages.ThiscanbeautomatedthroughDockerfilesandautomatedbuildservices.Thiswillnotpreventfromadditional configuration steps tobe appliedon topof the container, for example, to configure thenetworkproperly.Secondapproachwillreducethemaintenanceoftheimages,whichcanbereducedtoaminimalsetandwill facilitatethedynamicreconfigurationofapplicationtopologies.Ontheotherside, itwillrequirealongerdeploymenttime.

In both cases, EUBra-BIGSEAwill needanorchestration system that could translate application topologydescriptionsinTOSCAintolocaldescriptionsthatcouldbetranslatedintoCloudManagementFramework-specificactions.

Therequirementsforthisspecificaspectoftheinfrastructureare:

- R17-DeploymentofTOSCAblueprints.- R18-UpdateofTOSCAblueprintstoreconfigurethesystem.- R19-Supportofmultipleplatforms.

1https://galaxy.ansible.com


21

4.5.1 InfrastructureManager

The InfrastructureManager (IM) isa tool thatdeployscomplexandcustomizedvirtual infrastructuresonIaaS Cloud by automating the Virtual Machine Image selection, deployment, configuration, softwareinstallationandmonitoringandupdateofvirtualinfrastructuresonmultipleCloudback-ends.

Identification InfrastructureManager(IM)

PotentialUsage DeploymentofTOSCABlueprintswiththewholearchitecture.

Type Aserviceandaclient.

License GPLv.3https://github.com/grycap/im/blob/master/LICENSE

URL http://www.grycap.upv.es/im

Purpose TheIMisaserviceforthewholeorchestrationofvirtualinfrastructuresandapplicationsdeployedonit,includingresourceprovisioning,deployment,configuration,re-configurationandtermination.

Function Theservicemanagesthecompletedeploymentofvirtualinfrastructuresorindividualcomponentswithinthem.Thestatusofavirtualinfrastructurecanbe:

- pending:launched,butstillininitializationstage;- running:createdsuccessfullyandrunning,but

stillintheconfigurationstage;- configured:runningandcontextualized;- unconfigured:runningbutnotcorrectly

contextualized;- stopped:stoppedorsuspended;- off:shutdownorremovedfromthe

infrastructure;- failed:anerrorhappenedduringsubmission;- unknown:unabletoobtainthestatus.

Thefigureintherightshowsastatetransitiondiagramforavirtualinfrastructure.ThefollowingisthesetofrequirementsidentifiedinEUBra-BIGSEA:

- DeployaTOSCA-specifiedvirtualinfrastructure.- Getinformationfromadeployedinfrastructure,

includingstatus,specificationinformation,andcontextualizationlogs.- Reconfigureanexistinginfrastructure,addingorremovingVMsor

containers.- Stopandrestartexistinginfrastructures.- Terminateanexistingdeployedinfrastructure.- InteractionwithindividualVMs/containerswiththeabilityofstopping,

restarting,resizingorqueryingthestatusofindividualVMs/containers.


22

High-levelarchitecture

Thefollowingfiguredescribesthehigh-levelarchitectureoftheIM,includingexternaldependencies.IMusestheVirtualMachineImage(VMI)Repository&Catalog(VMRC)forsearchingtheVMIs.IMintegratesacloudselectorthatselectsthemostsuitableVMIfortheapplicationdescription.Then,aconfigurationmanagerbasedonAnsibleconfigurestheVMs/containersdeployedbythecloudconnectorandinstallsthenecessarysoftware.ThecloudconnectorprovidestheindependencetothecloudIaaSplatform.

Dependencies TheIMservicerequirestwoadditionalcomponentstowork,IaaScloudresourcesandaVMIrepository.Itsupportsmultipleback-endsandstandards(AmazonEC2,MicrosoftAzure,GoogleCloud,OpenNebula,OpenStack,OCCI,Fogbow,Docker,Kubernetes),whichenablesinteractingwithotherresourcesandback-ends.IMusesVMRCastheVMIrepository,althoughitmaybeadaptedforotherrepositoriesinthecontextofEUBra-BIGSEA.

Interfaces TheIMservicesupportstwoAPIs:- ThenativeoneinXML-RPC.- ARESTinterface.

Italsoincludesacommand-linePythonclient.

Data TheIMwillusethreetypesofinformation- ApplicationdescriptionsfollowingtheOASISTOSCArepresentation.For

detailedinformationonitssyntaxandsemanticsreferto[R1].- Informationaboutthecloudprovidersend-pointsandassociatedmetadata

tobeusedbytheIM.- Informationaboutthedeployedinfrastructures(specifications,IDs,status,

end-points,etc.)inaMySQLdatabaseorinafile.ClientandserverexchangethedatathroughtheparametersoftheAPIcalls.

NeededImprovement

IMalreadysupportsTOSCAblueprints,althoughadditionalfeaturesoftheTOSCAdocumentsneedtobeaddedtofulfiltherequirementsofWP7,suchas:

- QoSspecifications.- Multi-parametricexecution.- SpecialtypesforCOMPSs,OPHIDIAandSparkjobs.


23

4.6 Reactiveelasticity

Thissectionelaboratesthecapabilitiesofthecloud-servicestoprovisionmoreorlessresources(instancesandmemory) according to the executionworkload. In the context of EUBra-BIGSEA, applicationswill beprovidedwiththeexpectedamountofresourcesneededtofulfillthejobexecutionconstraints.However,initialstaticallocationmaybecorrectedaccordingtotheexecutionprogress.

Thisadjustmentscouldhappenattwolevels:- A the level of Resource Management Framework (CMF), to adjust the infrastructure to the

workload.Multiplejobsmaycompetefortheresources,andeventually,thenumberofcomputingresourcesmaybe insufficient todealwith theworkload submitted.This information is knowna-priori, as each request will have the expected resources required. Moreover, in a fully-elasticenvironment,noresourcesmaybepreallocated,soinmostcases,theCMFwillhavetoreadjusttheinfrastructure.

- At the level of the Applications, to guarantee that the QoS is met. The monitoring system candetect a potential QoS break bymeans of the differentmetrics. Then, the reactive rules of themonitoringsystemshouldrequestadditionalresourcestobeallocatedtoanapplication.ThiswillrequiredealingwithboththeschedulerandtheCMF.

Bothverticalandhorizontalelasticityareconsidered:- HorizontalelasticitywillleadtoachangeinthenumberofresourcesintheCMFcluster.- Verticalelasticitywill leadtoachange in thememoryandCPUallocation limitsofanapplication

runningonthesystem.

Therefore,wedefinethreescenarios:- Thescheduler receivesanew job.Then, thesystemshouldguarantee thatenoughresourcesare

provisionedfortheexecutionofthejob.- Thesystemperiodicallyinspectstheschedulerqueuestodecideifresourcescanbedeallocatedto

reducecosts.- Themonitoring systemdetects the need of increasing the resources of an application. It can be

triggered by a generic metric (CPU, network, memory, etc.) or an application-specific metric.Associatedruleswilltriggertheresourceincreasing.Resourcedecreasingislessimportantasitwillbeeitherdirectlytriggeredbythereactivesystemorunnecessary(e.g.cpuormemoryallocationincontainerizedapplicationsdoesnotimplypreemption).

Therequirementsofthesystemare:- R20-Automaticscalingupofresourceswhennewjobrequestsarise.- R21-Automaticdeallocationofresourceswhenidleforagivenperiod.- R22-Customizationpoliciesforwaitingtime,coolingtimeandsimultaneouspoweron,atleast.- R23 - Reallocation of resources for the execution kernels on the memory size and number of

resources.- R24-Transparentmanagementofmemoryallocationtoenableoverprovisioning.- R25-Automaticreconfigurationofexecutionkernels.


24

4.6.1 CLUsterEnergySavingS(CLUES)

Identification CLUsterEnergySavingS(CLUES)

PotentialUsage

Elasticity. Back-end component to manage the automatic power-on/off of virtualmachines.

Type Aserviceandaclientapplication.

License GPLv.3

Website www.grycap.upv.es/clues

Purpose CLUES is an energy management system for High Performance Computing (HPC)ClustersandCloudinfrastructures.

Function Themain functionof thesystem is topoweroff internalclusternodeswhentheyarenotbeingused,andconverselytopowerthemonwhentheyareneeded.CLUESsystemintegrateswith theclustermanagementmiddleware, suchasabatch-queuing systemoracloudinfrastructuremanagementsystem,bymeansofdifferentconnectors.


CLUESalso integrateswith thephysicalinfrastructure by means of differentplug-ins,sothatnodescanbepoweredon/offusing the techniqueswhichbestsuit each particular infrastructure (e.g.usingwake-on-LAN,IntelligentPlatformManagement Interface (IPMI)orPowerDeviceUnits,PDU).

Although there exist some batch-queuing systems that provide energysaving mechanisms, some of the mostpopular choices, such as Torque/PBS,lack this possibility. As far as cloudinfrastructuremanagementmiddlewareis concerned, none of the most usualoptions for scientific environmentsprovide similar features. Theadditional advantageoftheapproachtakenbyCLUESisthatitcanbeintegratedwithvirtuallyanyresourcemanager,whetherornotthemanagerprovidesenergysavingfeatures.

Currently,CLUEScountswithconnectorsforintegrationwithsomeofthemostpopularbatch-queuingsystems(suchasTorque/PBSorSunGridEngine)andwithOpenNebulaand OpenStack which are two of the best known cloud infrastructure managementsystemswithinthescientificcommunity.

Dependencies AnLRMSqueuesystem,suchasTORQUE/PBS,CONDOR,SGEorSLURM.

Interfaces Commandlinetoolandaservice.ItseamlesslyintegrateswiththeLRMSsoitdoesnotrequireanyadditionalactionfromtheuser.

NeededImprovement

SupportofMesosQueues.


25

4.6.2 ElasticComputeClusterintheCloud(EC3)

Identification ElasticComputeClusterintheCloud(EC3)

PotentialUsage

Elasticity.ComponenttomonitortheMesosqueuetoprovideelasticity.

Type Aclient-sidetoolfortheInfrastructureManager(IM)

License Apache2.0

Website www.grycap.upv.es/ec3

Purpose EC3enablestodeployvirtualelastichybridclustersacrossCloudinfrastructures.

Function Itconsistsofasetofrecipesandacommand-lineinterface(CLI)usedasaclientfortheIMinordertodeployacustomizedfront-endnodeofavirtualclusterthatfeatures:i)aninstanceofanIMtoprovisionforadditionalcomputingresources(workingnodes);ii)CLUES,implementingtheelasticityrulesconsideringthestateoftheLocalResourceManagement System (LRMS) and iii) the specific configuration for the virtual clusterrequiredfortheexecutionoftheapplicationsthatwillberunonthecluster.EC3isalsoofferedasafreeonlineservicetodeployon-demandelasticvirtualclustersonAmazonWebServices,OpenNebulaandOpenStack.


ThefollowingfiguresummarizesthemainarchitectureofEC3.

Inthefigure,theCLIofEC3isusedtocontacttheIMinordertodeployandconfigurethefront-endnodeofavirtualclusteronaCloudinfrastructure.Thefront-endnodeisconfiguredwithanotherinstanceoftheIMtogetherwithCLUES,whichisinchargeofimplementingtheelasticityrulesbyinterceptingthejobssubmittedtotheLRMS(e.g.SLURM, PBS/Torque, etc.) and deciding when additional worker nodes should bedeployeddependingon theamountandcharacteristicsof the jobsqueuedupat theLRMS. Optional components in the architecture of EC3 that could be of interest for


26

EUBRa-BIGSEA are the checkpointmanager (CKPTManager) andBLCR (Berkeley LabCheckpoint Restart), which enable to introduce efficient and cost-effective usage ofspotinstances,availableforAmazonWebServices,tocheckpointthestateofthejobsandautomaticallyrestartthemonapoolofworkernodesdeployedonspotinstances,whichcanbeterminatedanytime.Users connect via SSH to the front-end node and submit jobs as usual. The virtualcluster will deploy additional worker nodes as required, and integrate them on theLRMS without user intervention, in order to cope with increased workload of jobs.Workernodeswillbeterminatedwhentheyarenolongerrequired.

Dependencies EC3dependsontheIMtoperformresourceprovisioningonmultipleCloudbackends(AmazonEC2,MicrosoftAzure,GoogleCloud,OpenNebula,OpenStack,OCCI,Fogbow,etc.).ItalsodependsonCLUEStomanageelasticityofthevirtualcluster.

Interfaces EC3providestwodifferentinterfaces:- Acommand-lineinterfacethatactsasclientfortheIMservice.- Aweb interface that enables anyonewithawebbrowser todeploya virtual

elasticclusteronOpenNebulaandOpenStackon-premisesCloudManagementPlatformsandAmazonWebServices

Data EC3 relies on the templates/recipes, written in IM’s native language, called RADL(Resource&ApplicationDescriptionLanguage)whicharecurrentlyavailableatGitHub.ItproducesasoutputtheIPofthefront-endnodethatcanbeaccessedviaSSH.

NeededImprovement

Advanceindealingwithallocationrequirementsfromtheproactivepolicies. SupportofMesosfurthermorethanthroughChronosandMarathon.

4.6.3 CloudVirtualmachineAutomaticProcurement(CloudVAMP)

Identification CloudVirtualMachineAutomaticMemoryProcurement(CLOUDVAMP)

PotentialUsage

Elasticity.Back-endcomponenttooverprovisionmemorytoMesosagentsthatwillrunFrameworks.

Type Agent

License Apache2.0

Website http://www.grycap.upv.es/cloudvamp

Purpose It enablesoversubscription inmemory for theVirtualMachines runningonaCMFandimplementsmemoryballooningandautomaticlivemigrationofVMs.

Function CloudVAMPimplementsthreemainfunctions:- Dynamic Memory Resize for VMs. Cloud users tend to overestimate the

memory requirements of their applications/services and templates typicallyrepresent upper bounds for application requirements, thus wasting memoryallocated to the VM. CloudVAMP monitors actual memory usage anddynamically resizes the memory allocated to VMs, relying on the memoryballooningsupportprovidedbytheKVMhypervisor.


27

- EnableOversubscription.ThestolenmemoryfromtheVMs isexposedasfreememory to theCloudManagementPlatform (CMP).TheCMP's scheduler candecidetoallocateadditionalVMsonthephysicalhosts..

- PreventMemory Overload by LiveMigration. Oversubscription may result inmemory overload of the physical host if no preventive measures areconsidered. For that, CloudVAMP prevents memory overload via the livemigration of VMs across the physical nodes, as supported both by KVM andOpenNebula. No downtime is introduced for VMs and memory overload isprevented,thusmaintainingtheLevelofService.


ThearchitectureofCloudVAMPconsistsofthreecomponents:- The Cloud Vertical Elasticity Manager (CVEM). An agent that analyzes the

amountofmemoryactuallyneededby theVMsanddynamicallyupdates thememoryallocatedtoeachofthem,accordingtoasetofcustomizablerules.

- TheMemory Reporter (MR). An agent that runs in theVMs and reports to amonitoringsystemthefree,usedmemoryandusageoftheswapspace,bytheapplicationsintheVM.ThisinformationmustbeavailableforCVEM.

- TheMemoryOversubscriptionGranter(MOG).AsystemthatinformstheCMPabouttheamountofmemorythatcanbeoversubscribedfromthehosts,tobetakenintoaccountbythescheduleroftheCMP.

ThepreviousfiguredepictsthearchitectureofthesystembasedonaOpenNebula(ONE) implementation. OpenNebula requires a cluster-based installation in whichthemainservicesareinstalledinthefront-endnodewhereastheVMsaredeployedontheinternalworkingnodes,wheretheKVMhypervisorhastobeinstalled.

Dependencies OpenNebulaandKVM

Interfaces Internalservices.Thereisnoneedtointeractwiththesystemoncestarted.


28

4.7 TechnologyAnalysis

Theabovesectionshaveanalysedasetofexistingtechnologiesandsomeimprovementsontopofthem.Some of the components complement, meanwhile other components compete. This section tries toanalyseprosandconsoftheabovecomponentstobuildupthecloudexecutionservicesoftheQoSIaaS.

4.7.1 RequirementsRelatedtoResourceManagementFrameworks

R# Description DockerSwarm Mesos YARN Myriad

R01TheRMFmustenableaschedulertoallocateresourcestoaspecifickernel.

OnlyDocker-typeresources

AnykindHadoop-based

AnyviaMesos

R02TheRMFmustsupportdifferenttypesofworkloads,fromcontainerstoSparkjobs.

Docker Any SparkAnyvia

MesosandYARN

R03Anapplicationtopologymayinvolveseveralcontainers,whichmustbedeployedinacoordinatedway.

Docker-compose

Transparent Transparent Transparent

R04ThestatusoftheresourcesshouldbeaccessiblebyanAPI.

REST REST REST REST

R05RMFsshouldexposeanAPItochangethenumberofresourcesallocated.

Yes REST REST REST

R06Thesystemshouldmonitortheusageofresourcesandtheapplicationhealth.

RESTAuto/viaSatellite

ApplicationstatusREST

REST

Table2:RequirementsRelatedtoResourceManagementFrameworks(section4.3)

4.7.2 RequirementsRelatedtoSchedulers

R# Description Marathon Chronos Spark-submit/shell

R10 Capabilityofexecutingcontainer-basedjobs. Yes Yes Yes

R11 IntegrationwiththeselectedCMF. Mesos Mesos Mesos/YARN

R12Capabilityofexecutingcomposedjobsinvolvingmultipleconcurrentprocesses.

No YesInnerparallelism(e.g.

map/reduce)

R13 SupportofSparkjobs. No No Yes

R14 Supportofdeadline-QoSperiodicjobs. Yes Yes No

R15 Highavailabilityforlong-runningjobs. Yes Partial Retry

R16 SupportforinteractiveJobs. No No Yes

Table3:RequirementsRelatedtoSchedulers(section4.4)


29

4.7.3 RequirementsRelatedtoImageRegistry

R# Description DockerHub

R07Acentrallistofcontainersimagesshouldbeavailable.

Yes

R08 Updatesshouldbeautomated. Automatedbuild

R09Ausershouldbeabletomodifyitsownapplication.

AutomatedbuildofDockerHublinkedtoGitHub

Table4:RequirementsRelatedtoImageRegistry(section4.3)

4.7.4 RequirementsRelatedtoOrchestrationandDeployment

R# Description InfrastructureManager

R17 DeploymentofTOSCAblueprints. Yes,throughTOSCAParser

R18UpdateofTOSCAblueprintstoreconfigurethesystem.

Yes

R19 Supportofmultipleplatforms. ONE,Ostack,Docker,Kubernetes,VMWARE,AWS,Azure,GoogleCloud

Table5:RequirementsRelatedtoOrchestrationandDeployment(section4.5)

4.7.5 RequirementsRelatedtoReactiveElasticity

R# Description CLUES EC3 CLOUDVAMP

R20Automaticscalingupofresourceswhennewjobrequestsarise.

Yes * *

R21Automaticdeallocationofresourceswhenidleforagivenperiod.

Yes Yes *

R22Customizationpoliciesforwaitingtime,coolingtimeandsimultaneouspoweron,atleast.

Yes * *

R23Reallocationofresourcesforthetasksonthememorysizeandnumberofresources.

*OnlyNumberofresources

*

R24Transparentmanagementofmemoryallocationtoenableoverprovisioning.

* * Yes

R25 Automaticreconfigurationofexecutionkernels. * Yes *

Table6:RequirementsRelatedtoReactiveElasticity(section4.6)

*Notapplicableduetothefunctionalityofthecomponent.


30

4.8 Conclusion

Fromthepreviousanalysisofthetechnology,weconcludeasetofimplementationdecisionsforthecloudexecutionservicesoftheQoSIaaS:- Deployment.

- IMtodealwithTOSCAtopologydocumentsfromEUBra-BIGSEA.- Write TOSCA Documents for environments dealing with Spark, Mesos, OPHIDIA and

COMPSs.- ExtendTOSCAtypestoreflectQoSinformation.

- Execution- Develop a top-level endpoint as a REST service thatwill interactwith the different scheduler

mechanisms,dependingonthetypeofjobs,andprovidingtheQoScapabilities.- Deadline-basedbatchjobsimplementedthroughcontainer-basedjobsexecutedthrough

Chronos. The Chronos Job Management REST API[https://mesos.github.io/chronos/docs/api.html] enables creating, updating, listing,refreshinganddeletingchronos jobsandgroupsof jobs.Chronoswillbeextendedwiththepossibilityofprogramming jobtimeaccordingtothedeadlineandtheestimationoftheexecutiontimeforaspecificrequest.

- Long-timerunningjobsthatdealwithcontinuousdataretrieval.Thiswillbeimplementedthrough marathon jobs. The Marathon REST API[https://mesosphere.github.io/marathon/docs/rest-api.html] enables creating, updating,listinganddeletingMarathonjobs.Thiswillbeusedasisforlong-runningjobs.

- Interactive jobs . Mesos will allocate the resources required to submit the applicationDAG. Mesos scheduler API [http://mesos.apache.org/documentation/latest/scheduler-http-api/] could be used to allocate resources and Mesos executor API[http://mesos.apache.org/documentation/latest/executor-http-api/] to access theresources.

- Inanycase,bathjobscanbecodedeitherinSparkorCOMPSs.SubmissionofSpark-basedjobstoYARNwillbedonethroughMyriad.MesoswillallocatetheresourcesrequiredtosubmittheapplicationDAG.SparknativelysupportsMesosandCOMPSscouldevolvefromDockerSwarmtoMesosclusters.

- Elasticity- Combine EC3 and Mesos to provide automatically scalable job submission through Chronos,

MarathonandSpark.EC3willaddmorenodestoaMesosclusterifthenumberofresourcesisscarce

- InstrumentMarathon,ChronosandSparktoreallocatemoreresources(memoryandnodes) ifrequired(accordingtotheexecutionestimation).

- Directlymodify the resourceallocationofCPUandmemory toanactive taskbyupdating theconfiguration in Chronos, Marathon and Spark, by means of a direct interaction from thenotificationsofthemonitoringsystemandactingontheMesosCMF.

- Scale up the number of instances in Marathon, by means of a direct interaction from thenotificationsofthemonitoringsystem.

- Use CLOUDVAMP under the scenes to dealwith themanagement of VMs at the level of theCloudManagementFramework.Thiswillbetransparenttotheapplicationdevelopers,butitwillexposemorememorythanavailable.


31

Figure6showsadetailedinteractionforthispart.

Figure6:DetailedinteractionamongcomponentsatQoScloudservicesLayer.

4.8.1 ProposedAPI

Inorder to interactwith the single submissionend-point,wewilldefineamacro-APIandacommon jobdescriptionthatcouldbemappedtoMarathon,Chronos,Sparkorsimilarschedulers.TheAPIwillbeaRESTAPIthatwillinteractwiththetwomainentities:theResourceManagementFrameworkandtheScheduler.The rest of the serviceswill use this API to submit jobs, retrieve their status, kill jobs, get the status ofresourcesandreallocatethem.

Figure6showsaJSONstructureproposaltodefinejobs.Thesamestructurecouldbeusedtoretrievetheinformation-extendedwithadditionalfieldssuchastheactualmemoryusage,theactualCPUusage,theapplication id and the values of the parameters we have left the scheduler to choose, such as portallocation.Itcanbeusedtoupdateconfigurationswhenreasonable.{ "type": "CMD", "name": "my_job_name", "deadline": "2016-06-10T17:22:00Z+2", "periodic": "R24P60M", "expectedduration": "10M" "container" : [ "type": "DOCKER", "image": "eubrabigsea/ubuntu", "forcePullImage": true "volumes": [ {


32

"containerPath": "/var/log/", "hostPath": "/logs/", "mode": "RW" }], "portMappings": [ { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }]], "environmentVariables": [ { "name": "value" } ], "cpu" : "1.5", "mem" : "512M", "disk" : "1G", "command" : "python -m SimpleHTTPServer 8000"}

Figure7:JSONjobdescription

ThemeaningofthefieldsoftheJSONstructureisthefollowing:- "type".Itdefinesthetypeofjob,inordertodistinguishthescheduler.Possiblevaluesare"spark",

"COMPSs", “CMD”. Spark jobswill be executed through spark-submit, COMPSs jobswill add theCOMPSslaunchertothe“command”line,and“CMD”willsimplyrunthe“command”.Mandatory.

- "name”.Anamegiventoidentifythejob.Itmustbeunique.Mandatory.- "deadline".Adateandtimeexpressionwhenthejobshouldhavebeenfinished.Theexpressionis

YEAR-MONTH-DAY,HOUR:MINUTE:SECONDTimezone(ZisUTC).Optional.- "periodic". It indicates if the job has to be repeated or not. Rn indicates ‘n’ repetitions and ‘R’

repeatforever.PXXindicatestheperiodicity(XXMwillbeminutes,XXHwillbehours).Optional(ifitisnotpresent,itwillrunonlyonce).

- "expectedduration",describestheexpecteddurationofajob.Optional.- "container", describes the configuration features for a container-based job.Mandatory for non-

Spark jobs. The fields are: "type”, describes the driver; "image", indicates the container image";"volumes": indicate thevolumemappingof thecontainer;“portMappings”, list theportmappingbetween the container and the host. No values indicate either no mappings or the automaticselectionofports.

- "environmentVariables",definespairsofenvironmentvariablenameandvalue.Optional.- "cpu", defines the allocation of CPUs (could be a float number) estimated for the job. The

framework should allocate enough resourcesor queue the job. Similarmeaning for thememoryanddisk.Units inKBytes(K),MBytes(M)andGBytes(G).This informationshouldbeprovidedbytheproactivepoliciessystem.Mandatory.

- "command".Itdefinesthecommandtobeexecuted.OptionalforCMDjobs(canbeembeddedinthecontainer),incompatiblewithSparkjobs.


33

TheproposedAPIwillfollowthenextstructure:

Resourcepath Description

/v1/scheduler POSTmethod,submitsajobrequestinJSON.

/v1/scheduler/jobs GETmethod, listsall the jobs in thescheduler (a JSONwithall thejob).ItcanacceptaJSONwithadeadlinethatdefinesthedatelimit.

/v1/scheduler/job/name GETmethod, getsall the information froma specific job (;DELETEmethod,killsthespecificjob;POSTmethod,reallocatesresources.

/v1/resource/slaves GET method, lists all the resources registered in the master orclusterofmasters.

/v1/resource/slave GETmethod,providesthestatusinformationofaspecificslave.

/v1/resource/slave/resource/up POSTmethod,bootsupanewresource.

/v1/resource/slave/resource/down POSTmethod,powersdownaspecificresource.


34

5 MONITORINGSERVICE

5.1 CloudcomputingmonitoringMonitoring in cloud environment differs over traditionalmodel,where you have a dynamicworkload incontrastofastaticsetofassets.Theoldtraditional infrastructureexpectedthatnewserverswererarelyinstalled and old ones decommissioned sometimes. The whole infrastructure was meant to be"permanent".Cloudcomputingincontrastisadynamicandfluidenvironment.Serversarecreatedanddestroyedallthetime, adjusting the increased traffic on the service it runs or reducing the amount of resources to savemoney.A cloud monitoring system should reflect these dynamic aspect while providing essentials informationswhich helps providers to properly plan their infrastructure capacity and resourcesmanagement; ensureSLAsandoverallQoS,offeredthroughandbythecloud;identifybottlenecks,failuresorproblems;andbillresourcesusage.

5.2 Requirementsformonitoringservice

Accordingto[R09],manymonitoringsystems(MS)havenotaddressedtherapidlychanginganddynamicinfrastructure seen in service clouds. Authors then have determined main requirements for monitoringcloudscenarios:

Scalability - itshould beabletohandle largenumberofagents,probesanddataflow;collect,transferandanalyzesuchvolumeofdatawithoutimpairingservicesperformance.

Elasticity - in order to handle drastic changes ofmonitored environments,MS should support upsizinganddownsizingofmonitoredresources(e.gagents,probes,etc.).

Migration - virtual resources may be moved from one physical host to another. All previous dataassociatedtothesevirtualresourcesmustbeindependentofthephysicalhost.

Adaptability - it should be minimally invasive and adapt itself to various computational and networkloads.

Autonomicity-itisdesirablethemonitoringsystemkeepsrunningwithminimalhumaninterventionandreconfiguration.

Federation-itshouldmonitorvirtualresourcesresidingindifferentdomainsoffederatedcloud.

Anotherlistofrequirementsrelatedtomonitoringsystemscanbefoundin[11]:

Timeliness - events andmeasurementsdata shouldbeavailableon time for their intendeduse [10].Aconsumer may require updated data from a producer in order to execute an action but theinterdependencyofTimelinesswithotherpropertiesof themonitoringsystem,suchasElasticity,AutonomicityandAdaptability,implieschallengesortrade-offsbetweenopposingrequirements.

Comprehensiveness - it should supports different levels of abstraction. Such levels includemonitoringphysicalandvirtualresourcesandserviceapplications.Ineachlevel,differentmetrics,kindofdataand probes are available. It is desirable to support isolation of environments, by segmentedauthentication,authorizationandresources,asamulti-tenantsystem.

Resilience-somefaultycomponents(non-critical)maynotdisturbthemonitoringsystem.


35

High availability - it should employ a high availability strategy, such as data replication/redundancy,workloadbalancingandsharding.

Accuracy -measuresprovidedbymonitoringsystemareaccuratewhentheyareascloseaspossibletothe real value measured and they satisfy what the application considers accurate (applicationdependent).

Extensibility - newmetrics and probesmay be added to themonitoring system, including those onesdefinedintheapplicationlevel.

Besidetheserequirements,weidentify:

Usability -configurationandusagemaybeassimpleaspossible,butusersshouldbeabletoconfigureadvancedoptions.

Integration -newcomputingallocatedresourcesmustcontainmonitoringsystemagentsandprobes intheir installation image.MetricsmustbeexposedbyanAPI (applicationprograminterface)tobeconsumedbyothercomponentsofEUBra-BIGSEAinfrastructure,speciallythoseofWP3.

Security-sensitivedatamustbeprotectedandrequireauthorization.

Alerttriggering-ifapre-configuredconditionissatisfiedinmonitoringsystem,actionsmaybeexecutedin the monitored resource or system wide. Actions include execution of programs, sendingmessagestooperators,dynamicallyadjustmeasurement,etc.

Singlesignon -authentication inanycomponentofmonitoringsystemmustbedoneusingacommonuserdatabase.

5.3 ProposedarchitectureInordertoaccomplishthoserequirementsformonitoringsystemshouldperformfourmainjobs:

Alarms processing - watch and check resources; send alerts via handles; perform actions whenthresholdsareexceededorwhenservicesunavailable.

Measurements collection - self-explanatory, gather a steady streamofnumbers and/or signals, storethem,showthem,watchthresholds,etc.

Logscollection,storage,analysis -extract informationfromlogtext,monitor log levels,search logbytext.

Userinterface-eventsdisplay,measurementsandlogvisualization,configurableinterface.


36

Figure8:High-levelofmonitoringsystemconcepts

MonascaMonitoringSystem[12] identifiesasetofarchitecturalcomponentsthatareinaccordancewithourviewofthearchitecture.Somecomponentsmaynotbeincludedinfirstversionofmonitoringsystem,buttheyarelistedhereasreference:

- Monitoring agent - runs on monitored resource, providing probes results to monitoring system;perform actions coordinated by monitoring system and triggered by some configured criteria.Measurementsincludessystemmetrics(e.gCPU,disk,memoryusage,etc.),Nagiosplugins,statsdandmanychecksforservices(e.gDBMS,WebServers,customapplications,etc.).

- Monitoring API: An API, preferably RESTful, focused on get, set and querying metrics, providingstatistics aboutmetrics, alarmmanagementandnotifications. ThisAPIwill beusedby someEUBra-BIGSEAcomponents,speciallythosedealingwithWP3dynamicandstaticresourceallocation.

- Messagequeue:A component that receivespublishedmetrics frommonitoringAPIandalarmstatemessages from threshold engine. Message queue decouples system components and defines acommoncommunicationbusbetweendifferentsystemcomponents.

- Persister services: Consumes metrics and alarms from message queue and store them in its owndatabase.

- Metricsandalarmsdatabase:Acomponent, ingeneral,aDBMS,whichprimarilystoresmetricsandalarmshistory.SomeNoSQLsolutionsprovidedifferentparadigms inrespectofhowdata is logicallyorganized.Forinstance,CassandraNoSQLdatabaseallowsmodelingdatainacolumn-orientedformat,organised in a such way that it is very efficient to retrieve data, specially those represented astemporalseries.Forunstructureddata,suchaslog(syslog,applog),ElasticSearchisusedtoindextext.

- Transformandaggregationengine:Transformsmetricnamesandvalues,suchasdeltaortime-basedderivativecalculations,andcreatesnewmetricsthatarepublishedtothemessagequeue(optional).

- Anomalyandpredictionengine:Evaluatespredictionandanomaliesandgeneratespredictedmetricsaswellasanomalylikelihoodandanomalyscores.IncaseofMonasca,itisinaprototypestatus.ButtoEUBra-BIGSEAitisaresearchtrack.

- Thresholdengine:Computesthresholdsonmetricsandpublishesalarmstothemessagequeuewhenexceeded.


37

- Notification engine: Consumes alarm state transitionmessages from themessagequeue and sendsnotifications,suchasemailsorremoteexternalAPIevocationforalarms.

- Analytics engine: Consumes alarm state transitions andmetrics from themessage queue and doesanomalydetectionandalarmclustering/correlation(optional).

- Configuration database: A component that stores the configuration and other information in thesystem.

- Dashboard monitoring interface: An application (preferably web-based) that includes configureddashboardsandreports. Itallowsusersaccessmetricsvalues,alarmstatusandresourcesallocation,showingdataascharts,tablesandanyotherkindofvisualisationapplicable.

OthercomponentsareMonascaspecificandmaybeincludeinfurtherarchitectureevolution.

Thefollowingimageshowshowmonitoringsystemcomponentsinteracteachother:

Figure9:Monitoringsystemarchitectureandcomponents

5.3.1 Minimumsupportedinfrastructuremetrics

Monitored resource may be running in different hardware abstractions: bare metal, virtual machine oroperationalsystemcontainerand,here,hostreferstoanykindoftheseabstractions.Independentlyoftheabove,monitoringsystemshouldbeabletointeractwithitsagents.Therewillbeaminimumsetofmetricsandtheirvariationsorderivationssupportedbymonitoringsystem:

- CPU - Discover if a host’s CPU is being heavily utilized by the kernel, app running code, or otherprocessesonthesamehost.Itcanbesegmentedinsystemandusertime.

- Loadaverage -Monitor loadaverage(inpredefined intervals, forexample,1-,5-,or15-minute) toknowtheaveragesystemloadoveraperiodoftime.AsCPU,canbedividedinuserandsystemtime.

- Memory-Knowwhenanapplicationisconsumingtoomuchmemory,andhowmuchof it isbeingconsumedbytheoperatingsystem.Virtualmemoryisalsoconsidered.

- File system -Monitor file systems, detectingout of spaceor available i-nodes, bytesor anyotherdimension. Also may detect unavailability or mounting state changes (for instance, read-onlymountingafterfailure).


38

- Disk - Measure input/output operations per second (IOPS) and throughput, in order to identifybottleneckswithgreaterprecision.

- Network - Detect network errors, collisions, overruns, and dropped packets, in order to diagnosenetworkinterfaceproblems.Maybeusedtodetectofflineresources(pingliketests).

- Uptime-Howlongresourcehasbeenrunning.

5.3.2 Supportedapplicationmetricsandlogging

Each application should be able to define its own metrics, related to security, performance, capacity,uptime,throughput,servicelevelagreements(SLAs),usermetricsoranyotherinternalvalue.Byusingthemonitoring system proposed API, applications can create, remove, update and query metrics,measurementsandalerts.MetricsandtimeseriesexclusivelyusedbytheapplicationitselforunrelatedtoWP3mustnotbestoredinmonitoringsysteminfrastructure.

Somecommonandgenericmetricsaresupportedbymonitoringsystematapplicationlevel:

- Pingcheck.Monitorsservercommunicationandresponsetimes.- HTTPcheck.Getspecificcontenttoverifythatuserscanseeandinteractwithapplications.- HTTPScheck.MonitorsthevalidityoftheSecureSocketLayer(SSL)certificate- SSH check. Confirms that the target server’s secure shell (SSH) protocol is running and accepting

requests,andmeasurestheresponsetime.- TCPPortcheck.Confirmsthatthetargetserver’sTCPportislistening.

TheResourceManagementFrameworkwillalsoprovideinformationaboutthestatusoftheresources.ForMesos,SatellitefromTwoSigmaprovidesacompletesetofmetricstomonitorthestatusoftheslaves2.

Monitoring systemwill support basicmanagement of application logs. Applications can submit logs to acentralmonitoringsystemcomponent, responsible forprocessing,extractingand indexingof logrecords.Eachlogrecordmusthaveasetoffieldsthatincludesdate/hour,severity,nameoftheapplicationandthemessageofthelog.

5.4 TechnologyevaluationThis section analyses solutions for the monitoring of cloud infrastructures.We first detail tools for thecollection,storageandretrievalofmonitoringmetricsand,then,discussapproachesformanaginglogs.

5.4.1 Metriccollection,storageandretrieval

5.4.1.1 Zabbix

Identification Zabbix

PotentialUsage MonitoringtheresourceconsumptionofVMs

Type MonitoringSystem

License GNUGeneralPublicLicense(GPL)version2

Website http://www.zabbix.com

Purpose Monitoringsystem

2https://github.com/twosigma/satellite/blob/master/satellite-slave/resources/metrics_snapshot.json


39

Function Zabbixisanagent-basedmonitoringsystemthatcancollect,storeandmanageawiderangeofsystemmetricsrelatedtohosts,Virtualmachines,applicationsandservices.Zabbix consumes more resources than other environments, but provides a morecompleteUIandasimplerwaytocodeprobesandmonitoringKPIs.


Dependencies MySQL,PHP,ApacheHTTPServer.Agentsmustbeinstalledinmonitoredresourceinordertocollectmetricsandtriggers.

Interfaces WelldocumentedAPIwithoperationsthatallowcontrolallaspectsofthesoftware.Richuserinterfacewithsupporttocustomgraphs,triggers,screens,actions,etc.

Data Configuration,metricsandtriggerhistoryarestoredinaMySQL.

NeededImprovement

Zabbixhasauto-registrationfeaturesincev2.0.Zabbixagentscanregisterthemselveswithout manual intervention. Although, auto-registration is not well documented.ResourcesremovalisnotautomaticanditwouldneedinteractingwiththeAPI.

5.4.2 InfluxDB

InfluxData is thealerting, visualizationandbackend forbuildingcustommonitoring solutions for servers,sensors,storageappliances,networkinfrastructure,apps,logsandmore.

Identification InfluxDB

PotentialUsage Databasefordatafrommonitoringmetrics.

Type Time-seriesdatabase

License MITLicense

Website https://influxdata.com

Purpose Storemeasurementsfromvariousprobes

Function InfluxDBismeanttobeusedasabackingstoreforanyusecaseinvolvinglargeamounts of timestamped data, including DevOps monitoring, application


40

metrics,IoTsensordata,andreal-timeanalytics.ItalsoincludespluginsforotherdataingestionprotocolssuchasGraphite,collectd,andOpenTSDB.


InfluxDBisastandalonedatabasewhichreadandwritesdataintoit.

There are many feasible high availability architectures. An example above,illustrateaproxyR/W(forexamplenginx)tobothInfluxDBinstances.

Dependencies none

Interfaces influxCommandlineinterfacewithexpressiveSQL-likequery languagetailoredtoeasilyqueryaggregateddata.TheCLIcommunicateswithInfluxDBdirectlybymakingrequeststotheInfluxDBHTTPAPIoverport8086bydefault.BuiltinwebadmininterfaceandaHTTPAPI.

NeededImprovement

Customization

5.4.3 Sensu

Sensu is an open-source monitoring framework that allows organizations to compose comprehensivemonitoring & telemetry solutions thatmeet their unique requirements. Provides a platform focuses onwhattomonitorandmeasure,ratherthanhow.

Identification SensuCore

PotentialUsage Monitoringandtelemetrysystem

Type Infrastructureandapplicationmonitoringandtelemetrysolution

License MITLicense

Website http://sensuapp.org/

Purpose Controlandintegratemonitoringresources

Function Integrates monitoring infrastructure, service & application health, and business


41

KPIs.ItsupportsdeploymentviaPuppet,ChefandAnsible.Extensibleframework(includingamessagebus,eventprocessor,monitoringagent,anddocumentedAPIs),andmanyavailableplugins.Integrateswiththetoolsandservices(e.gemails,PagerDutyalerts,Slack,HipChat,IRCnotifications,etc.).Service checks provide status and telemetry data, and event handlers processresults. Hundreds of plugins are available for monitoring the tools and servicesalready inuse.Pluginshaveavery simple specification,andcanbewritten inanyprogramminglanguage.


Clients executes servicechecks, plugins/scriptsfeeding broker(RabbitMQ) with data,which will be consumedby server/API andprocessed by handlers,mutators,etc.,performingspecific actions, such assending data to a timeseriesdatabaseorsendinganalerte-mail.Events and check statusare stored and query onRedis.

Dependencies Redis,RabbitMQ,Ruby(embedded)

Interfaces RESTfulHTTPAPI;SimpleUXdashboard

NeededImprovement

Dashboard interface is quite simple and lack features like, auth, roles, customsettings.

5.4.4 OpenStackMonasca

Identification OpenStackMonasca

PotentialUsage Completemonitoringsystem

Type Monitoring-as-a-servicesolution

License ApacheLicense,Version2.0

Website https://wiki.openstack.org/wiki/Monasca

Purpose Controlandintegratemonitoringresources

Function Fullsolutionformonitoringinfrastructure,service&applicationhealth,andbusinessKPIs.IntegrateswithOpenStackcomponentsfororchestrationandaccesscontrolandscoping.


42


Dependencies MySQL, Apache Kafka, InfluxDb, Apache Zookeeper, Apache Storm, OpenStackKeystone,OpenStackHorizonDashboard,Graphana

Interfaces Documented API and user interface integrated with OpenStack Horizon DashboardandGraphana.

NeededImprovement

Currentversiondoesnotsupportcollectingandforwardinglogdata.Thereareplanstoimplementitinfuture.Userinterfaceisinearlystagesandmaychangesoon.APIisnotstable,butmainoperationsareimplemented.


43

5.4.5 Logscollectionandanalysis

5.4.5.1 rsyslog

Identification RSYSLOGistherocket-fastsystemforlogprocessing.

PotentialUsage Logretrievalandmanagement

Type Systemlogging

License GNUGeneralPublicLicensev3

Website http://www.rsyslog.com/

Purpose Centralizedlogsmanagementandstorageforfurtherprocessingandanalyses

Function Multi-threadingTCP, SSL, TLS, RELPMySQL, PostgreSQL, Oracle and moreFilter any part of syslog messageFully configurable output formatContent-basedfiltering


Rsyslog can act in various modes. It ispossible towrite logsmessages into thelocal hard drive and/or forward logsmessages over network to a central logserver. On central log server, messagescouldbe filteredbycontentandwritteninto different local harddrives/databases.

LogmessagesaresendviaTCPorUDPtocentral server which will process andcould do things like splitting intobranches.

Interfaces SyslogprotocoloverTCPorUDP

NeededImprovement

Customizationonloggingfiltering


44

5.4.6 ELKStack

Identification ElasticSearch,LogstashandKibanastack

PotentialUsage Loganalyser.

Type Real-timedataanalyticstool

License Apache2OpenSourceLicense

Website https://www.elastic.co

Purpose ParsedatawithLogstashdirectlyintotheElasticSearch;ElasticSearchwillthenhandlethedata;Kibanawillvisualizethedata.

Function ElasticSearch- Distributed,scalable,andhighlyavailable - Real-timesearchandanalyticscapabilities - Document-Oriented & Full text search functionality, with powerful query

options- BuildontopofApacheLucene

Logstash- Receiveandprocesslogdataoranyothertime-baseddata- Filteroptionsusedtotransforminputdata,Pluginsforcustomdatasources- Centralizedataprocessingofalltypes- Normalizevaryingschemaandformats

Kibana- PresentthedatastoredfromLogstashintoElasticSearch- Customizableinterfacewithhistogramandotherpanels- Flexibleanalyticsandvisualizationplatform- Real-timesummaryandchartingofstreamingdata- Instantsharingandembeddingofdashboards


Interfaces RESTfulAPI;WebUX;CLI

NeededImprov. Customization


45

5.5 SolutionevaluationThis section analyses the different complete architectures and components for the monitoring andproposesasolutiontobeusedinEUBra-BIGSEA.

5.5.1.1 Sensu/InfluxDB/ELK

Figure10:Sensu/InfluxDB/ELKsoftwarearchitecture

Sensu/ELKsolutioniswellmature,allcomponentshasmanyfeaturesandperformwhat isproposedverywell.InthissolutionSensuisthecore,responsibleformanagingandintegratingallcomponents.

Configuration is done through JSON files, defined in sensu-server and sensu-client. Checks results andmeasurements are collected via scripts running onmonitored resources. All data are sent to amessagequeueandconsumedbysensu-server,thatwillanalyze,exhibitinformationandperformsomeaction(sendemail,proxyinformation,etc).

Measurementsare,particularly,senttoInfluxDB,thatwillstoreandmadepossiblequeryingandretrieving.MeasurementdataisexhibitbyGrafana.

IndependentfromSensu/InfluxDBthelogpartisexecutedbyacombinationbetweenrsyslog,ElasticSearch,Logstash and Kibana. Rsyslog will sent log application to Logstash server, that will parse and extractmeaningful informationfromlogtextandstore intoElasticSearchforfurtheranalyses.KibanawillexhibitthevariouskindofanalysesthatwecandonewithElasticSearch.


46

5.5.1.2 OpenstackMonasca

Figure11:MONASCAsoftwarearchitecture

Monasca is a fullmonitoring solution that bind a few softwares tomake itwork. It uses a RESTAPI forinteracting and receiving checks results andmeasurementsdata.All data areput inmessagequeueandconsumed by engines (alarm engine, notification engine, etc.). Processed data is stored in database forfurtheranalyses,visualizationandactionsexecution.MonascaisdevelopedinPythonandJavawhichusesApache Kafka as message queue, Apache Storm for real time data computation and Zookeeper toorchestratecomponents.


47

5.6 Comparativeanalysis

Requirement Monasca Sensu/InfluxDB/ELK Zabbix

Scalability Yes Yes Yes3

Elasticity Yes,butexclusionofmonitoredresourcecantriggerincorrectalarms

Yes,butexclusionofmonitoredresourcecantriggerincorrectalarms

Partial.Supportsautodiscoveringofhosts,butitdependsoncorrectconfiguration.Exclusionofmonitoredresourcecantriggerincorrectalarms.

Migration Yes Yes Yes,ifresourcekeepssameIPorDNSname.

Adaptability4 Notsupported Notsupported Static.Youcanconfiguremonitoreditemswithdifferentupdateintervals.

Autonomicity Yes Yes Yes

Federation Yes Yes Yes

Timeliness5 Yes Yes Yes(min.1secondforupdateinterval)

Comprehensiveness Yes Yes Yes

Resilience Yes,butdependsonbuildingblocksresiliencearchitecture

Yes,butdependsonbuildingblocksresiliencearchitecture

Yes,exceptforserveranddatabase.

Highavailability Yes,butdependsonbuildingblocksHA

Yes,butdependsonbuildingblocksHA

Yes,ascalabilityconfigurationalsoresultsinhighavailability.

Accuracy Yes Yes Yes

Extensibility Yes(opensource,documentedAPI)

Yes(opensource,documentedAPI)

Yes(opensource,documentedAPI)

3Zabbixcanscalebyscalingunderlyingcomponents(MySQL,OS,HTTPServer).Agooddiscussionispresentedinhttp://blog.zabbix.com/scalable-zabbix-lessons-on-hitting-9400-nvps/2615/.4Adaptabilityofagentsrunningonhostsfacingheavyload5Undernormalworkload.


48

Usability IntegrateswithotherOpenStackproducts(Horizon)

Differentproductsinterfacescanconfuseusers

Goodusability,althoughitusesanoldinterfacedesign.

Integration Great,supportsNagiospluginsandmanyothers.

Great,supportsNagiospluginsandmanyothers.

Great,supportsNagiospluginsandmanyothers.

Security Userauthenticationandauthorisationformetricaccess.

Onlyuserauthentication

Yes,communicationchannel(version>=3.0),butnotdatabase.

Alerttriggering Yes Yes Yes

Singlesignon IntegrateswithOpenStackKeystone.

Notsupportednatively.CanbeimplementedbyHTTPServer.

Basic.SupportsHTTPBasicAuth(integratedtoApacheHTTP)andLDAP.

Table7:RequirementsRelatedtoMonitoring

Allmetricsandmeasurementsshouldbekept inresilientdatabase. Insteadofdefiningastaticperiodsoftimefordataretentionswouldbeabetterapproachtoestablishadiskusagethreshold,whichpromotesamoremeaningfulusageofavailablestorage.Afterthatlimit,olddatashouldbeaggregatedinwidespanoftime.

5.6.1 ProposedAPI

Monitoring systemrequiresaprogramming interface (API) for interactingwithandmanagingmonitoringrelatedresources.SuchAPIshouldbeableto:

- includeandremovemonitoredresources(serversandservices)- activeandinactivemonitoredresources- assignmonitoredresourcestoasetofmetricsoralarms- includeandremovemetricsprobesandalarmschecks- provideanentrypointtoquerycollecteddata- collectmeasurementdata- getalarmscheckstatus- bindactions(alarms,handlerslikeemail,sms,etc)toalarmsormetricsbasedonconditionals

Proposed API is RESTful JSON and may be used by implementing clients in different programminglanguages. Clients will use HTTP or HTTPS protocols and all requests must be authenticated by using aspecialHTTPheader.CompletedocumentationoftheAPI,includingalltechnicaldetails,isoutofscopeofthisdocument.

Resourcepath Description

/v1/metrics Entrypointtoquery,create,editandexcludemetrics.

/v1/metrics/measurements Allow to store measurements values with a timestamp for a specificmetricorasetofmetricsandtoqueryoriginaloraggregatedvaluesformeasurements.


49

/v1/metrics/statistics Entry point to calculate and retrieve statistics aboutmetrics (average,minimumandmaximumvalues,sumandcount)foraperiodoftime.

/v1/alarms Entrypointtoquery,create,editandexcludealarms.

/v1/hosts Operationsonmonitoredhosts.Returnlistofmonitoredhosts.

/v1/groups Allow adding monitored resources as group members. Add metricsand/oreventstogroupmembers

5.7 ConclusionSensu and Zabbix solution are mature and working solutions, but choosing Monasca offers a greatopportunitytocontributetoaworld-wideprojectoncloudplatformsandestablishacloudstandardfortheEUBra-BIGSEAproject,asitwillbecomeapartofOpenStack.TheOpenStackecosystemisrichofmemberfromdiverseareas,academyand industry,whichcouldprovideusgoodpracticesandstateof theart inmonitoringsolutions.TheshorttermroadmapoftheMonascaprojectalsoalignswellwithEUBra-BIGSEAas itconsiderstheusageoftheELKframeworktomanagelogs.Monascaalsoprovidesstandardpatternsforcreatingmetricsandextendthewholemonitoringsystem.Support forhigher levelsofabstractions isdonethroughtheusageofStatsdorbydirectlypostingtoitsmetricsAPIs.MonascaWebhooknotificationscanalsobeusedtotriggeractionsinthereactiveelasticiymodules.

Monasca integrates with Openstack Keystone, a federated authentication and authorization solution,offeringthepossibilityoflimitingthevisibilityofthemetricsbyscopingthemtoaspecificproject.Monascaalso provides all the essentials requirements as measurements collection, high availability, extensibility,alarmstriggeringandarichinterfaceforcreating,configuringandvisualizealarms.


50

6 TECHNICALPROCEDURESThe project will not impose themigration of the components which already are available on a specificrepository to a central one. However, the use of a common repository for the new developments isencouraged.

Currently,thecomponentsidentifiedandtheiravailabilityare:

- Mesos,https://github.com/apache/mesos- YARN,https://github.com/apache/hadoop- MYRIAD,https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=summary- Marathon,https://github.com/mesosphere/marathon- Chronos,https://github.com/mesos/chronos- Spark,https://github.com/apache/spark- InfrastructureManager,https://github.com/grycap/im- TOSCAParser,https://github.com/openstack/tosca-parser.- CLUES,https://github.com/grycap/clues- CloudVAMP,https://github.com/grycap/cloudvamp- EC3,https://github.com/grycap/ec3- Spark,http://spark.apache.org/- COMPSs,https://www.bsc.es/computer-sciences/grid-computing/comp-superscalar

Theprojectwillsetupasetofpublicrepositories:

- AnEUBra-BIGSEAGitHubRepositoryhasbeencreatedincludingforksfromtheexistingrepositories(https://github.com/eubra-bigsea)

- An EUBra-BIGSEA Docker Hub organisation can be created to store the corresponding basiccontainer images, and any other image derived from the application analysis(https://hub.docker.com/r/eubrabigsea/).


51

7 CONCLUSIONSThis document has gone through a thorough review of the components that will form the QoS cloudservicesarchitecturefortheBigDataanalyticsplatformdevelopedinEUBra-BIGSEA.Themainfocusofthedeliverable was initially to focus on themonitoring architecture, but it has been extended to draw thewholelayerofservicesfortheexecutionandmonitoringofcloudservices.

The document proposes the use ofMesos as the basic foundations for themanagement of distributedresources,asitisawidelyusedcomponenttoprovideisolation,toavoidfragmentationofdatacentresandto achieve high-availability and reliability. Mesos will be enhanced with the capability of automaticallyincreasingresourcestoprovidearealadaptationofworkloadtopoweronresources,whilebeingagnosticto the upper layers. The project identifies three types of workloads, persistent, periodic batch andinteractive jobs,whichwillbeservedbydifferentschedulers.Persistent jobswillbeservedbyMarathon,periodicjobsbymeansofChronos,andinteractivejobsthroughinteractiveshells(e.g.sparkshells).Thoseschedulerswill deploy frameworks thatwill embed the executable services and negotiate the resourceswithMesos.ThoserequestswillbeinterceptedtoguaranteetheavailabilityofresourcesinMesos(e.g.byaddingmoreresourcestothesystem).

Therefore,atthelevelofthecloudservices,theprojectwillcapturetheframeworkrequestsandinteractwithMesos,aswellasdealwiththeprovisionandmanagementofbaseVMstopursueanefficientuseofresources.

QoS cloud services will deal with the reactive elasticity of the Mesos cluster and the physical (virtual)resources that Mesos base on. Frameworks could request additional resources in case of applicationstarvationanddependingontheprofileof theapplication.Then, themonitoringsystemwill triggersuchactions.

EUBra-BIGSEAwillcontributetothisecosysteminseveralaspects:First,itwillprovidereactiveelasticitytotheresourcemanagementplatform,whichwilluseTOSCAasstandardspecificationtoenhanceportability.Thiswill enableMesos clusters to grown and shrinkwithin datacentre boundaries. Next, itwill create asinglesubmissionpointthatwilldecideuponthedifferentschedulersbasedonthe jobfeatures.Third, itwillprovidereactivepoliciesconnectedtothemonitoringsystemtoensurethatthedeadlinesaremet.Andfourth, itwill integratemonitoringandnotificationsystemswiththeproactivepoliciesto learnfrompastexecutions,reducingtheneedoffine-tuningofthereactivepolicies.

ThechoiceforthemonitoringsystemisMonasca.MonascaisayoungOpenStackprojectformonitoring.IthasbeenselectedforitssuitabilitytoEUBra-BIGSEAusecaseandtheopportunitiesthatcouldarisefromthecollaborationwithOpenStack.Monascawillmonitortheresources,servicesandapplications,triggeringactionsifrequired.


52

8 REFERENCES[R1] TOSCA (Topology and Orchestration Specification for Cloud Applications), https://www.oasis-

open.org/committees/tc_home.php?wg_abbrev=tosca

[R2]http://opennebula.org

[R3]http://www.openstack.org

[R4]http://yaml.org/

[R5]https://www.consul.io/

[R6]https://zookeeper.apache.org/

[R7]https://www.bsc.es/computer-sciences/grid-computing/comp-superscalar

[R8]http://ophidia.cmcc.it/

[R9] Clayman, Stuart et al. "Monitoring Service Clouds in the Future Internet."Future Internet AssemblyApr.2010:115-126.

[R10]Wang, Chengwei et al. "A flexible architecture integrating monitoring and analytics for managinglarge-scale data centers." Proceedings of the 8th ACM international conference on Autonomiccomputing14Jun.2011:141-150.

[R11]Aceto,Giuseppeetal."Cloudmonitoring:Asurvey."ComputerNetworks57.9(2013):2093-2115.

[R12]"Monasca-OpenStack."2014.13May.2016.Avalilableathttps://wiki.openstack.org/wiki/Monasca.


53

GLOSSARYACL AccessControlLists Security

API ApplicationProgrammingInterface Interfacing

BoT BagofTasks ProgrammingModel

CHRONOS AfaulttolerantjobschedulerforMesos Scheduler

CLOUDVAMP CloudVirtualMachineAutomaticMemoryProcurement Elasticity

CLUES CLUsterEnergySavingS Elasticity

CMP CloudManagementPlatform(e.g.OpenNebulaorOpenStack) ResourceManagement

COMPSS COMPSuperscalar(COMPSs)isaprogrammingmodelwhichaimstoeasethedevelopmentofapplicationsfordistributedinfrastructures,suchasClusters,GridsandClouds

ProgrammingModel

CSV CommaSeparatedValue DataType

DBMS DatabaseManagementSystem MonitoringService

DOCKER Anopenplatformfordistributedapplicationsfordevelopersandsysadmins

Scheduler

DOCKERHUB AcloudhostedservicefromDockerthatprovidesregistrycapabilitiesforpublicandprivatecontent

Scheduler

EC3 ElasticComputeClusterintheCloud Elasticity

ELK ElasticStack:Elasticsearch,Logstash,KibanaandBeats Monitoring

GITHUB Aweb-basedrepositoryhostingserviceforaGit-basedversioncontrolsystem

Repository

HPC High-PerformanceComputing ComputingArchitecture

IAAS InfrastructureasaService ResourceManagement

IM InfrastructureManager ResourceManagement

INFLUXDB Aplatformforcollectingandmanagingseriesdata MonitoringService

JSON JavaScriptObjectNotation Datatype

MARATHON AcontainerorchestrationplatformforMesos Scheduler

MESOS AResourceManagementplatformthatabstractsCPU,memory,storage,andothercomputeresourcesawayfrommachines

ResourceManagement

MONASCA Monascaisaopen-sourcemulti-tenant,highlyscalable,performant,fault-tolerantmonitoring-as-a-servicesolutionthatintegrateswithOpenStack

MonitoringService

MS MonitoringSystems MonitoringService

Myriad DeployApacheYARNApplicationsUsingApacheMesos ResourceManagement

NoSQL NotOnlySQL Databaseparadigm

OPHIDIA ACMCCFoundationresearchprojectaddressingbigdatachallengesforeScience

Databaseengine

QoS QualityofService Scheduler

REST REpresentationalStateTransfer Interfacing


54

RMF ResourceManagementFramework ResourceManagement

RSYSLOG Rocket-fastSystemforLOGprocessing MonitoringService

SLA ServiceLevelAgreement Scheduler

SPARK Afastandgeneralengineforlarge-scaledataprocessing ProgrammingModel

TOSCA TopologyandOrchestrationSpecificationforCloudApplications ResourceManagement

VM VirtualMachine ResourceManagement

YAML YAMLAin'tAnotherMarkupLanguage ResourceManagement

YARN YetAnotherResourceNegotiator Scheduler

ZABBIX RealTimeMonitoringsolution MonitoringService

Zookeeper Acentralizedserviceformaintainingconfigurationinformation,naming,providingdistributedsynchronization,andprovidinggroupservices

ResourceManagement