37
How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 Giacinto DONVITO INDIGO Technical Director INFN Bari [email protected]

How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

HowtheINDIGO-DataCloudcomputingplatformaims

athelpingscientificcommunities

RIA-653549Giacinto DONVITO

INDIGOTechnicalDirectorINFNBari

[email protected]

Page 2: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGO-DataCloud

• AnH2020projectapprovedinJanuary2015intheEINFRA-1-2014call• 11.1M€,30months (fromApril2015toSeptember2017)

• Who:26Europeanpartnersin11Europeancountries• CoordinationbytheItalianNationalInstituteforNuclearPhysics(INFN)• Includingdevelopersofdistributedsoftware,industrialpartners,researchinstitutes,universities,e-infrastructures

• What:developanopensourceCloudplatform forcomputinganddata(“DataCloud”)tailoredtoscience.

• For:multi-disciplinaryscientificcommunities• E.g.structuralbiology, earthscience,physics,bioinformatics, culturalheritage,astrophysics,lifescience,climatology

• Where:deployableonhybrid(publicorprivate)Cloudinfrastructures• INDIGO=INtegratingDistributeddataInfrastructuresforGlobalExplOitation

• Why:answertothetechnologicalneedsofscientistsseekingtoeasilyexploitdistributedCloud/Gridcomputeanddataresources. 2

Page 3: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

FromthePaper“AdvancesinCloud”

• ECExpertGroupReportonCloudComputing,http://cordis.europa.eu/fp7/ict/ssai/docs/future-cc-2may-finalreport-experts.pdf

To reach the full promises of CLOUD computing, major aspects have not yet beendeveloped and realised and in some cases not even researched. Prominent among theseare open interoperation across (proprietary) CLOUD solutions at IaaS, PaaS and SaaSlevels. A second issue is managing multitenancy at large scale and in heterogeneousenvironments. A third is dynamic and seamless elasticity from in- house CLOUD to publicCLOUDs for unusual (scale, complexity) and/or infrequent requirements. A fourth is datamanagement in a CLOUD environment: bandwidth may not permit shipping data to theCLOUD environment and there are many associated legal problems concerning securityand privacy. All these challenges are opportunities towards a more powerful CLOUDecosystem.[…] A major opportunity for Europe involves finding a SaaS interoperable solution acrossmultiple CLOUD platforms. Another lies in migrating legacy applications without losingthe benefits of the CLOUD, i.e. exploiting the main characteristics, such as elasticity etc.

3

Page 4: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGOAddressesCloudGaps

• INDIGOfocusesonusecasespresentedbyitsscientificcommunities toaddressthegapsidentifiedbythepreviouslymentionedECReport,withregardto:• Redundancy/reliability• Scalability(elasticity)• Resourceutilization• Multi-tenancyissues• Lock-in• MovingtotheCloud• Datachallenges:streaming,multimedia,bigdata• Performance

• Reusingexistingopensourcecomponentswhereverpossibleandcontributingtoupstreamprojects (suchasOpenStack,OpenNebula,Galaxy,etc.)forsustainability.

4IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 5: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGOandotherEuropeanProjects• TheINDIGOservicesarebeingdevelopedaccordingtotherequirementscollectedwithinmanymultidisciplinaryscientificcommunities,suchasELIXIR,WeNMR,INSTRUCT,EGI-FedCloud,DARIAH,INAF-LBT,CMCC-ENES,INAF-CTA,LifeWatch-Algae-Bloom,EMSO-MOIST,EuroBioImaging.However,theyareimplementedsothattheycanbeeasilyreusedbyotherusercommunities.• INDIGOhasstrongrelationshipswithcomplementaryinitiatives,suchasEGI-EngageontheoperationalsideandAARCwithrespecttoAuthN/AuthZ policies.UsersofEC-fundedinitiativessuchasPRACE andEUDAT arealsoexpectedtobenefitfromthedeploymentofINDIGOcomponentsinsuchinfrastructures.• SeveralNational/Regionalinfrastructuresarecoveredbythe26INDIGOpartners,locatedin11Europeancountries.• INDIGOismentionedintherecentImportantProjectofCommonEuropeanInterest(IPCEI) fortheexploitationofHPCandHTCresourcesatnational,regionalandEuropeanlevels.

5IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 6: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

WorkPackages

6IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 7: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGO-DataCloudGeneralArchitecture

7

JSAGA/JSAGAAdaptorsFuture GatewayEngineFuture GatewayRESTAPI

OtherScienceGateways

Mobile Apps

OpenMobileToolkit

Ophidpiaplugin

LONIplugin

Taverna,Keplerplugin

AdminPortlets

UserPortlets

DataAnalitics

WorkflowPortlets

SGMonGUIClients

FutureGatewayPortal WorkflowsMobileclientsSupportservices

WP6Services

Kubernetes Cluster

IAM

Service

PaaS

Orchestrator

QoS/SLA

CloudProvider

Ranker

Monitoring

Infrastructure

Manager

TOSCA

TOSCAWP5

Services

Onedata Dynafed

FTSDataServices

REST/CDMI/Wedbav/posix/GridftpOIDC

Accounting

Non-INDIGO

IaaS

NativeIaaS API

Heat/IM

TOSCA

WP4Services

Mesos

ClusterMesos

Cluster

Aut.Scaling

Service

Storage

Service

S3/CDMI/Posix/WebdavGridFTP

Smart

Scheduling

SpotIstances

Native

Docker

QoS Support

Identity

Armonization

Local

Repository

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 8: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

IaaSFeatures(1)

• Improvedschedulingforallocationofresources bypopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• Enhancementswilladdressbothbetterschedulingalgorithmsandsupportforspot-instances.Thelatterareinparticularneededtosupportallocationmechanisms similartothoseavailableonpubliccloudssuchasAmazonandGoogle.

• Wewillalsosupportdynamicpartitioningofresourcesamong“traditionalbatchsystems”andCloudinfrastructures(forsomeLRMS).

• SupportforstandardsinIaaSresourceorchestrationengines throughtheuseoftheTOSCAstandard.• ThisovercomestheportabilityandusabilityproblemthatwaysoforchestratingresourcesinCloudcomputingframeworkswidelydifferamongeachother.

• ImprovedIaaSorchestrationcapabilities forpopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• EnhancementswillincludethedevelopmentofcustomTOSCAtemplatestofacilitateresourceorchestrationforendusers,increasedscalabilityofdeployedresourcesandsupportoforchestrationcapabilitiesforOpenNebula.

8IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 9: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

IaaSFeatures(2)

• ImprovedQoS capabilitiesofstorageresources.• Bettersupportofhigh-levelstoragerequirementssuchasflexibleallocationofdiskortapestoragespaceandsupportfordatalifecycle.Thisisanenhancementalsowithrespecttowhatiscurrentlyavailableinpublicclouds,suchasAmazonGlacierandGoogleCloudStorage.

• Improvedcapabilitiesfornetworkingsupport.• EnhancementswillincludeflexiblenetworkingsupportinOpenNebula andhandlingofnetworkconfigurationsthroughdevelopmentsoftheOCCIstandardforbothOpenNebula andOpenStack.

• ImprovedandtransparentsupportforDockercontainers.• IntroductionofnativecontainersupportinOpenNebula,developmentofstandardinterfacesusingtheOCCIprotocoltodrivecontainersupportinbothOpenNebulaandOpenStack.

9IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 10: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

PaaSFeatures(1)

• ImprovedcapabilitiesinthegeographicalexploitationofCloudresources.• Endusersneednottoknowwhereresourcesarelocated,becausetheINDIGOPaaSlayerishidingthecomplexityofbothschedulingandbrokering.

• StandardinterfacetoaccessPaaSservices.• Currently,eachPaaSsolutionavailableonthemarketisusingadifferentsetofAPIs,languages,etc.INDIGOwillusetheTOSCAstandardtohidethesedifferences.

• SupportfordatarequirementsinCloudresourceallocations.• Resourcescanbeallocatedwheredataisstored.

• IntegrateduseofresourcescomingfrombothpublicandprivateCloudinfrastructures.• TheINDIGOresourceorchestratoriscapableofaddressingbothtypesofCloudinfrastructuresthroughTOSCAtemplateshandledateitherthePaaSorIaaSlevel.

10IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 11: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

PaaSFeatures(2)

• Distributeddatafederations supportinglegacyapplicationsaswellashighlevelcapabilitiesfordistributedQoS andDataLifecycleManagement.• ThisincludesforexampleremotePosix accesstodata.

• IntegratedIaaSandPaaSsupportinresourceallocations.• Forexample,storageprovidedattheIaaSlayerisautomaticallymadeavailabletohigher-levelallocationresourcesperformedatthePaaSlayer.

• Transparentclient-sideimport/exportofdistributedClouddata.• Thissupportsdropbox-likemechanismsforimportingandexportingdatafrom/totheCloud.ThatdatacanthenbeeasilyingestedbyCloudapplicationsthroughtheINDIGOunifieddatatools.

• Supportfordistributeddatacachingmechanismsandintegrationwithexistingstorageinfrastructures.• INDIGOstoragesolutionsarecapableofprovidingefficientaccesstodataandoftransparentlyconnectingtoPosix filesystemsalreadyavailableindatacenters.

11IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 12: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

PaaSFeatures(3)

• Deployment,monitoringandautomaticscalabilityofexistingapplications.• Forexample,existingapplicationssuchaswebfront-endsorR-Studioserverscanbeautomaticallyanddynamicallydeployedinhighly-availableandscalableconfigurations.

• Integratedsupportforhigh-performanceBigDataanalytics.• ThisincludescustomframeworkssuchasOphidia(providingahighperformanceworkflowexecutionenvironmentforBigDataAnalyticsonlargevolumesofscientificdata)aswellasgeneralpurposeenginesforlarge-scaledataprocessingsuchasSpark,allintegratedtomakeuseoftheINDIGOPaaSfeatures.

• Supportfordynamicandelasticclustersofresources.• ResourcesandapplicationscanbeclusteredthroughtheINDIGOAPIs.Thisincludesforexamplebatchsystemson-demand(suchasHTCondor orTorque)andextensibleapplicationplatforms(suchasApacheMesos)capableofsupportingbothapplicationexecutionandinstantiationoflong-runningservices.

12IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 13: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

AAIFeatures

• Provideanadvancedsetoffeaturesthatincludes:• Userauthentication(supportingSAML,OIDC,X.509)• Identityharmonization(linkheterogeneousAuthN mechanismstoasingleVOidentity)• ManagementofVOmembership(i.e.,groupsandotherattributes)• Managementofregistrationandenrolmentflows• ProvisioningofVOstructureandmembershipinformationtoservices• Management,distributionandenforcementofauthorizationpolicies

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 14: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

StorageQualityofServiceandtheCloud

14

Amazon S3 Glacier

Google Standard DurableReducesAvailability Nearline

HPSS/GPFS CorrespondstotheHPSSClasses(customizable)

dCache Resilient TAPEdisk+tape

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 15: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Nextstep:DataLifeCycle

15

• DataLifeCycleisjustthetimedependentchangeof• StorageQualityofService• OwnershipandAccessControl(PIOwned,noaccess,SiteOwned,Publicaccess)• Paymentmodel:Payasyougo;Payinadvanceforrestoflifetime.• Maybeotherthings

6m 1years 10years

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 16: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

DataFederation

AmazonS3

DNS:p-aws-useast

INFNItaly

DockerOneclient

Docker

AWSUSA

DockerOnezone

VMonezone

DockerOneclient

Docker

NFSServer

VMoneprovider

VMnfs

VMoneclient

POSIXVolume

DockerOneclient

DockerUPVSpain

VM:demo-onedata-upv-provider

DockerOneclient

LaptopOSX

SAMBAExport

boot2docker

20IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 17: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

17

FrontendServices/Toolkit

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 18: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Integration schemas

• WeprovidethegraphicaluserinterfacesintheformofthescientificgatewaysandworkflowsandthewaytoaccesstheINDIGOPaaS servicesandsoftwarestack,andallowdefineandsetuptheon-demandinfrafortheWP2usecases.• Settingupwholeusecaseinfrastructure:Theadministratorwillbeprovidedwiththereadytousereceiptsthathewillbeabletocustomize.Thefinaluserswillbeprovidedwiththeserviceend-pointsandwillnotbeawareofthebackend.

• UsetheINDIGOfeaturesfromtheirownPortals: Usercommunities, havingtheirownScientificGatewaysetup,canexploittheFutureGateway RESTAPItodealwithINDIGOwholesoftwarestack.

• UseoftheINDIGOtoolsandportals, including theFutureGateway,ScientificWorkflowsSystems,BigDataAnalyticsFrameworks(likeOphidia),MobileApplications.InthisscenariothefinalusersaswellasdomainadministratorswillusetheGUItools.Theadministratorwilluseitasdescribedinfirstcase.Inadditiondomainspecificuserswillbeprovidedwithspecificportlets/workflows/apps thatwillallowgraphicalinteractionwiththeirapplicationsrunviaINDIGOsoftwarestack.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 18

Page 19: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

FromCSGFtoFutureGateway

GridEngine

JSAGA

Portlet Portlet …

ClassicCSGF (before INDIGO)

Liferay/Glassfish

JSAGA

Portlet Portlet …

FutureGateway Approach (INDIGO)

Liferay/Tomcat

Comunication Portlet-GridEngine-JSAGAonly possiblewithJAVAlibraries

APIServer

Comunication Portlet-APIServerviaRESTAPIs,thisallowstoserveexternalapplicationsTheAPIServerinteractsviaJAVA librariestoJSAGA

RESTAPIs

Web/MobileApps

• ThesameRESTAPIscouldbeusedbyMobileApps

• ThoseAPIsmakeeasiertheinteractionwiththePaaS layer

• ThoseRESTAPIsprovideaneasyexploitationofINDIGOCapabilitiestonon-INDIGOApplications

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud19

Page 20: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Ophidia framework

• Ophidia isabigdataanalyticsframeworkforeScience• Primarilyusedfortheanalysisofclimatedata,exploitableinmultipledomains• “Datacube”abstractionandOLAP-basedapproachforbigdata• Supportforarray-baseddataanalysisandscientificdataformats• Parallelcomputingtechniquesandsmartdatadistributionmethods• ~100array-basedprimitivesand~50datacubeoperators

• i.e.:datasub-setting, dataaggregation,array-basedtransformations,datacube roll-up/drill-down,datacubeimport,etc.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 20

Page 21: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGOmoduleforKepler

• TheKeplerscientificworkflowsystemisanopensourcetoolthatenablescreation,executionandsharingofworkflowsacrossabroadrangeofscientificandengineeringdisciplines.• FirstversionoftheINDIGOmoduledelivered,graduallyadded newfunctionalitiesavailablefortheusers.• INDIGOmodulebased ontheFutureGateway API• Atthemoment,itispossibletobuildworkflowsthatdefinetask,preparesinputsandtriggersexecution.WhileataskisexecutedwithinINDIGO'sinfrastructure,itispossibletocheckitsstatus.• FutureGateway APIclient: https://github.com/indigo-dc/indigoclient• Keplerbasedactors: https://github.com/indigo-dc/indigokepler

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 21

Page 22: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

22

Usecasesexamples

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 23: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Service Deployment andapplication execution

Integrating distributed data infrastructures with INDIGO-DataCloud 21

WearenowworkingonaddingaCaliconetworkconfiguration

Page 24: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Application execution tothe PaaS Layer

• TheINDIGOapproachtotheapplicationdistributionandexecutionis:• BasedonDocker• ExploitsMesos+Chronos• AlltheapplicationexecutionsaredescribedexploitingaTOSCATemplatesviasimpleAPIsorPortlets• Theinput/outputareautomaticallymanagedbythePaaS layer(viaOnedata andexternalendpoints)• Dependencies,retryonfailuressupportedbymeansofChronos• Geographicaldata-awareschedulingprovidebyINDIGOPaaS orchestrator

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 25: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

• chronos_job_1:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Executeapp'• command: /bin/bashrun.sh• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_SPACE: {get_input: InputOnedataSpace}• INPUT_PATH: {get_input: InputPath}• ....• artifacts:• image:• file: indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1

• chronos_job_upload:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Uploadoutputdata'• command: /bin/bashrun.sh• retries:3

• environment_variables:• PROVIDER_HOSTNAME: <ONEDATA_PROVIDER_IP>• ONEDATA_TOKEN:<ROBOTToken>• ONEDATA_SPACE:<path>• INPUT_FILENAME: <inputfilename>• OUTPUT_FILENAME: <inputfilename-->coincindeswithamber-job-01

OUTPUT_FILENAME>• OUTPUT_PROTOCOL:http(s)|ftp(s)|S3|Swift|WebDav• OUTPUT_URL: <outputURL>• OUTPUT_CREDENTIALS: <e.g.username:password>• artifacts:• image:• file: indigodatacloud/jobuploader• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1• - job_predecessor: chronos_job_1•• docker_runtime1:• type:tosca.nodes.indigo.Container.Runtime.Docker• capabilities:• host:• properties:• num_cpus: 0.5• mem_size: 512MB

Page 26: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

• tosca_definitions_version:tosca_simple_yaml_1_0• imports:• - indigo_custom_types:https://raw.githubusercontent.com/indigo-dc/tosca-

types/master/custom_types.yaml• description:>• TOSCAexamplesforspecifyingaChronos Jobthatrunsanapplicationusing

Onedata storage.• inputs:• input_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sINPUTOnedata space• required:yes• output_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sOUTPUTOnedata space.

Itcanbethesameastheinputtoken• required:yes• #data_locality:• #type:boolean• #description:FlagthatcontrolstheINPUTdatalocality:ifyestheorchestrator

willselectthebestprovider,ifnotheuserhastospecifytheprovidertobeused• #required:yes• input_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomounttheInput

Onedata space.Ifnotprovided,datalocalityalgo willbeapplied.• entry_schema:• type:string• default:['']• required:no• output_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomountthe

OutputOnedata space.Ifnotprovided,thesameprovider(s)usedtomounttheinputspacewillbeused.

• entry_schema:• type:string• default:['']• required:no• input_onedata_space:• type:string

• required:yes• output_path:• type:string• description:PathtotheoutputdatainsidetheOutputOnedata space• required:yes• output_filenames:• type:list• description:Listoffilenamesgeneratedbytheapplicationrun• entry_schema:• type:string• required:yes• cpus:• type:float• description:AmountofCPUsforthisjob• required:yes• mem:• type:float• description:AmountofMemory(MB)forthisjob• required:yes• topology_template:• node_templates:• chronos_job:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule:'R0/2015-12-25T17:22:00Z/PT5M'• name:'JOB_ID_TO_BE_SET_BY_THE_ORCHESTRATOR'• description:'Executeapp'• command:'/bin/bashrun.sh'• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_TOKEN: {get_input:input_onedata_token }• OUTPUT_ONEDATA_TOKEN: {get_input:output_onedata_token }• INPUT_ONEDATA_PROVIDERS: {get_input:input_onedata_providers }• OUTPUT_ONEDATA_PROVIDERS: {get_input:output_onedata_providers }• INPUT_ONEDATA_SPACE: {get_input:input_onedata_space }• ....• artifacts:• image:• file:indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker

Page 27: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

UC:Awebportalthat exploits abatch systemtorunapplications

• Ausercommunitymaintainsa“vanilla”versionofportalandcomputingimageplussomespecificrecipestocustomizesoftwaretoolsanddata• Portalandcomputingarepartofthesameimagethatcantakedifferentroles.• Customizationmayincludecreatingspecialusers,copying(andregisteringintheportal)referencedata,installing(andagainregistering)processingtools.

• Typicallywebportalimagealsohasabatchqueueserverinstalled.

• Alltherunninginstancesshareacommondirectory.• Differentcredentials:end-userandapplicationdeployment.

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 28: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

UCInspiration:Galaxyonthecloud

• Galaxycanbeinstalledonadedicatedmachineorasafront/endtoabatchqueue.• Galaxyexposesawebinterfaceandexecutesalltheinteractions(includingdatauploading)asjobsinabatchqueue.• Requiresashareddirectoryamongtheworkingnodesandthefront/end.• Itsupportsaseparatestorageareafordifferentusers,managingthemthroughtheportal.

28IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 29: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

UC:Awebportalthat exploits abatchsystem torunapplications

1) Thewebportalisinstantiated,installedandconfiguredautomaticallyexploitingAnsible recipesandTOSCATemplates.

2) Aremoteposix shareisautomaticallymountedonthewebportalusingOnedata

3) Thesameposix shareisautomaticallymountedalsoonworkernodesusingOnedata

4) End-userscanseeandaccessthesamefilesviasimplewebbrowsersorsimilar.5) AbatchsystemisdynamicallyandautomaticallyconfiguredviaTOSCA

Templates6) Theportalisautomaticallyconfiguredinordertoexecutejobonthebatch

cluster7) Thebatchclusterisautomaticallyscaledup&downlookingatthejobloadon

thebatchsystem.IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 29

Page 30: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

UC:UseCaseLifecycle

• Preliminary• Theusecaseadministratorcreatesthe“vanilla”imagesoftheportal+computingimage.

• Theusecaseadministrator,withthesupportofINDIGOexperts,writestheTOSCAspecificationoftheportal,queue,computingconfiguration.

• Group-specific• Theusecaseadministrator,withthesupportofINDIGOexperts,writesspecificmodulesforportal-specificconfigurations.

• Theusecaseadministratordeploysthevirtualappliance.• Dailywork• UsersAccesstheportalasifitwaslocallydeployedandsubmitJobstothesystemastheywouldhavebeenprovisionedstatically.

30IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 31: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

UC:AGraphic Overview

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Future GatewayAPIServer

WP6

WP5

Front-EndPublic IP

Provider

User2)Deploy TOSCAwithVanilla VM/Container

1)StageData

5)Mount

6)AccessWebPortal

Galaxy

4)Install /Configure

WNWNWN …

VirtualElastic Cluster

Orchestrator

IM

OpenNebula

WP4

Other PaaSCore Services

CloudSite

OpenStack

HeatClues

IM

31

TOSCADocuments andDockerfiles perUseCase

INDIGO-DataCloudDocker Hub Organization

Champion+JRA

1.a.1)build,push

1.a.2)Dockerfile(commit)

1.b)AutomatedBuild

Page 32: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

ApossiblePhenomenal-INDIGOintegrationscenario

• Phenomenalalreadyrelyonaveryrichset-upexploitingMesos• INDIGOisabletoprovideacustomizableenvironmentwhereanaprioricomplexclustercouldbedeployedinanautomaticway:• UsingaspecificTOSCATemplatebuildwiththeexpertiseoftheINDIGOPaaS developers

• INDIGOcouldprovidetoPhenomenal:• (Automatic)Resourceprovisioningexploitinganykindofcloudenvironment(privateorpublic)

• Reactingonthemonitoringthestatusoftheservicesistantiated• AdvancedandflexibleAAIsolution• Advancedandflexibledatamanagementsolution• Advancedschedulingacrossmanycloudproviderbasedon:

• SLA/QoS,Datalocation,availabilitymonitoringandrankedwithhighlyflexiblerules• Easytousewebinterfacebothfortheendusersandfortheservicesadmin/developers

32IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 33: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Phenomenal exploiting INDIGO

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Future GatewayAPIServer

WP6

WP5

MesosMasters

Public IP

Provider

User2)Deploy TOSCAwithVanilla VM/Container

1)StageData

5)Mount

6)AccessMesosServices

Chronos/Marathon

4)Install /Configure

Workers…

VirtualElastic MesosCluster

Orchestrator

IM

OpenNebula

WP4

Other PaaSCore Services

CloudSite

OpenStack

HeatClues

IM

33

TOSCADocuments andDockerfiles perUseCase

INDIGO-DataCloudDocker Hub Organization

Champion+JRA

1.a.1)build,push

1.a.2)Dockerfile(commit)

1.b)AutomatedBuild

Workers

Page 34: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGOFAQ

• HowdoINDIGOachieveresourceredundancyandhighavailability?• Thisisachievedatmultiplelevels:

• atthedatalevel,redundancycanbeimplemented exploitingthecapabilityofINDIGO'sOnedata ofreplicatingdataacrossdifferentdatacenters.

• atthesitelevel, itispossibletoaskforcopiesofdatatobeforexampleonbothdiskandtapeusingtheINDIGOQoS storagefeatures.

• forservices,theINDIGOarchitectureusesMesos andMarathontoprovideautomaticservicehigh-availabilityandloadbalancing.Thisautomationiseasilyobtainableforstateless services; forstatefulservicesthisisapplication-dependent butitcannormallybeintegratedintoMesos through,forexample,acustomframework(examplesofwhichareprovidedbyINDIGO).

• HowdoINDIGOachieveresourcescalability?• Firstofall,wecandistinguishbetweenvertical(scaleup)andhorizontal(scaleout)scalability.INDIGOprovidesboth:• Mesos andMarathonhandleverticalscalabilitybydeployingDocker containerswithanincreasingamountofresources.

• TheINDIGOPaaS OrchestratorhandleshorizontalscalabilitythroughrequestsmadeattheIaaS leveltoaddresourceswhenneeded. 34

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 35: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

INDIGOFAQ

• HowdoINDIGOachieveresourcescalability?• TheINDIGOsoftwaredoesthisinasmartway,i.e.forexampleitdoesnotlookatCPUloadonly:• InthecaseofadynamicallyinstantiatedLRMS,itchecksthestatusofjobsandqueuesandaccordinglyaddsorremovecomputingnodes.

• InthecaseofaMesos cluster,incasethereareapplicationstostartandtherenofreeresources,INDIGOstartsupmorenodes.ThishappenswithinthelimitsofthesubmittedTOSCAtemplates.Inotherwords,anygivenuserstayswithinthelimitsoftheTOSCAtemplatehehassubmitted;thisistruealsoforwhatregardsaccountingpurposes.

• Howdoyouknowwhenandwhereresourcesareavailable?• WeareextendingtheInformationSystemavailableintheEuropeanGridInfrastructure(EGI)toinformtheINDIGOPaaS orchestratorabouttheavailableIaaSinfrastructuresandabouttheservicestheyprovide.ItisthereforepossiblefortheINDIGOorchestratortooptimallychooseacertainIaaS infrastructuregiven,forexample,thelocationofacertaindataset.

35IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 36: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Conclusions

• Firstofficialreleasewillbe:endofJuly

• Thefirstprototypeisalreadyavailable:• Notalltheservicesandfeaturesareavailable• Thisisforinternalevaluation,butalreadysomeservicescouldbetested

• Alotofimportantdevelopmentarebeingcarriedonwiththeoriginaldeveloperscommunitysothatthecodemantenance isnot(only)inourhands

36 IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 37: How the INDIGO-DataCloud computing platform aims at ......INDIGO-DataCloud • An H2020 project approved in January 2015 in the EINFRA-1-2014 call • 11.1M€, 30 months (from April

Thankyou

https://www.indigo-datacloud.euBetterSoftwareforBetterScience.

37IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud