49
ENVRI plus DELIVERABLE 1 A document of ENVRI plus project - www.envri.eu/envriplus This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant D9.2 Service deployment in computing and data e-Infrastructures Version 2 WORK PACKAGE 9 – Service Validation and Deployment LEADING BENEFICIARY: EGI Foundation Author(s): Beneficiary/Institution Yin Chen EGI Foundation Ingemar Haggstrom EISCAT Science Association Justin Buck BODC Markus Stocker TIB Thierry Carval EuroArgo/Ifremer Domenico Vitale University of Tuscia Robert Huber University of Bremen Margareta Hellstrom ICOS/Lund Leonardo Candela CNR-ISTI Florian Haslinger ETHZ Accepted by: Yannick Légre (WP 9 leader) Deliverable type: [DEMONSTRATOR] Dissemination level: PUBLIC Deliverable due date: 31.8.2018/M40 Actual Date of Submission: 31.8.2019/M40

D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

ENVRIplusDELIVERABLE

1

AdocumentofENVRIplusproject-www.envri.eu/envriplus

This project has received funding from the European Union’sHorizon 2020 research and innovation programme under grant

D9.2 Service deployment in computing and data

e-Infrastructures Version 2

WORK PACKAGE 9 – Service Validation and Deployment LEADINGBENEFICIARY:EGIFoundation

Author(s): Beneficiary/Institution

YinChen EGIFoundation

IngemarHaggstrom EISCATScienceAssociation

JustinBuck BODC

MarkusStocker TIB

ThierryCarval EuroArgo/Ifremer

DomenicoVitale UniversityofTuscia

RobertHuber UniversityofBremen

MargaretaHellstrom ICOS/Lund

LeonardoCandela CNR-ISTI

FlorianHaslinger ETHZAcceptedby:YannickLégre(WP9leader)

Deliverabletype:[DEMONSTRATOR]

Disseminationlevel:PUBLIC

Deliverableduedate:31.8.2018/M40

ActualDateofSubmission:31.8.2019/M40

Page 2: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

2

ABSTRACTThisdeliverablereportsthegroupeffortsofWorkingPackage9taskT9.1onserviceintegrationand deployment during Sep 2017 (M29) to Aug 2018 (M40). It is an update of a previousdeliverableD9.1. It reports the implementationresultsofcommunityusecases for testingandvalidatingENVRIplusTheme2servicesolutions.

Projectinternalreviewer(s):

Projectinternalreviewer(s): Beneficiary/Institution

PaulMartin UniversityofAmsterdam

VitoVitale CNR-ISACAncaHienola FMIDocumenthistory:

Date Version

1.6.2018 Draftforcomments

30.7.2018 Completedversionforinternalreview

31.8.2018 AcceptedbyTheme2leaders

DOCUMENTAMENDMENTPROCEDUREAmendments, comments and suggestions should be sent to the authors (Yin [email protected])

TERMINOLOGYA complete project glossary is provided online here:https://envriplus.manageprojects.com/s/text-documents/LFCMXHHCwS5hh

PROJECTSUMMARYENVRIplusisaHorizon2020projectbringingtogetherEnvironmentalandEarthSystemResearchInfrastructures, projects and networks with technical specialist partners to create a morecoherent, interdisciplinaryand interoperable clusterofEnvironmentalResearch InfrastructuresacrossEurope. It isdrivenby threeoverarchinggoals:1)promotingcross-fertilizationbetweeninfrastructures, 2) implementing innovative concepts anddevices acrossRIs, and3) facilitatingresearchand innovation in the fieldofenvironment foran increasingnumberofusersoutsidetheRIs.

ENVRIplusaligns itsactivitiestoacorestrategicplanwheresharingmulti-disciplinaryexpertisewill bemosteffective. Theproject aims to improveEarthobservationmonitoring systemsandstrategies, including actions to improve harmonization and innovation, and generate commonsolutions tomany shared information technology anddata related challenges. It also seeks to

Page 3: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

3

harmonize policies for access and provide strategies for knowledge transfer amongst RIs.ENVRIplus develops guidelines to enhance transdisciplinary use of data and data-productssupported by applied use-cases involving RIs from different domains. The project coordinatesactionsto improvecommunicationandcooperation,addressingEnvironmentalRIsatall levels,frommanagementtoend-users,implementingRI-staffexchangeprograms,generatingmaterialfor RI personnel, and proposing common strategic developments and actions for enhancingservicestousersandevaluatingthesocio-economicimpacts.

ENVRIplus is expected to facilitate structuring and improve quality of services offered bothwithin single RIs and at the pan-RI level. It promotes efficient andmulti-disciplinary research,offering new opportunities to users, new tools to RI managers and new communicationstrategiesforenvironmentalRIcommunities.Theresultingsolutions,servicesandotherprojectoutcomes are made available to all environmental RI initiatives, thus contributing to thedevelopmentofacoherentEuropeanRIecosystem.

Page 4: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

4

TABLEOFCONTENTS

ABSTRACT 2

DOCUMENTAMENDMENTPROCEDURE 2

TERMINOLOGY 2

PROJECTSUMMARY 2

TABLEOFCONTENTS 4

EXECUTIVESUMMARY 5

1Introduction 8

1.1ScopeandPurpose 8

1.2Approach 8

1.3IntegrationandValidation 9

1.4Deployment 11

1.5CollaborationsandImpacts 13

1.6FinalScienceDemonstrators 14

2ScienceDemonstrators 16

2.1 Science Demonstrator 1: Support EISCAT_3D Users to Reprocess Data Using User’sAlgorithms(UseCaseIC_3) 16

2.2ScienceDemonstrator2:TheEddyCovarianceFluxesofGHGs(UseCaseIC_13) 19

2.3ScienceDemonstrator3:SOS&SSNOntologyBasedDataAcquisition&NearRealTimeQualityControl(UseCaseIC_14) 24

2.4ScienceDemonstrator4:EuroArgoDataSubscriptionService(UseCaseTC_2) 27

2.5ScienceDemonstrator5:SensorRegistry(UseCaseTC_4) 31

2.6 Science Demonstrator 6: New particle formation event analysis on interoperableinfrastructure(UseCaseTC_17) 36

2.7ScienceDemonstrator7:gCube-basedVREforMosquitoDiseasesStudy(UseCaseSC_2)45

Page 5: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

5

EXECUTIVESUMMARYThis document reports on implementation results of T9.1 use cases for testing and validatingENVRIplus Theme 2 service solutions. It is an update of the previous report D9.1 that wassubmittedonM28.InD9.1weexplainedtheapproachtotheserviceintegrationandvalidation.Thedriverwastodefineappropriateusecasesthataddresscommunity’s realneeds, integrateENVRIplus Theme2 services as part of the service solutions, and implement those use casesfollowinganagileapproach.

This document describes 7 well-developed use cases that are selected as the final ScienceDemonstrators.ByScienceDemonstratorwemean“ashowcaseofaservicesolutionillustratedthrough a prototype implementation, which serves as proof or evidence that the Theme2services can bring added value for supporting ENVRIplus community to deliver scientificresearch”.

ScienceDemonstrator1addressesarequirementoftheEISCATRIcommunity,namelytoallowindividualscientiststoprocesstheirexperimentaldatausingtheirownalgorithms.Thechallengeis common tomany ENVRIplus RIs,where data is often processed using standardmodels andmethods. As researchers want to use different analysis models, easily modify parameters oralgorithms, and collaboratewitheachother, theyneedaVirtualResearchEnvironment (VRE).ThisdemoshowcasesamodelmakinguseoftheD4SciencegCubeplatformdevelopedbyT7.1,whichenablesscientificresearcherstore-processdatabyimplementingandadaptingalgorithmsandparametersfromothersources.

ScienceDemonstrator 2 showcasesanovel implementationofacomputationallyefficient toolforprocessingofEddyCovariance(EC)datawhichofferstousersthepossibilitytocalculateECfluxes through the EddyPro® software (LI-COR Biosciences, 2017; Fratini and Mauder, 2014)accordingto4processingschemesresulting fromadifferentcombinationofexistingmethods.To reduce the computational runtime required, the 4 processing schemeswere implementedand executed in parallel mode. The whole service setup including a metadata managementalgorithm,was implementedand tested in theD4SciencegCubeVirtualResearchEnvironmentprovidedbyTask7.1,andthefinalcomputationalruntimeforNearRealTime(NRT)processing(i.e.fluxestimatesbasedonrawdatacollectedthepreviousday)isofabout4minutes,similartothoserequiredforastandardruninvolvingonlyasingleprocessingscheme.

Science Demonstrator 3 addresses a common problem for ENVRIplus RIs (specificallyobservatories that build on environmental sensor networks) that data acquisition service, inparticular, the preparation of data transfer prior to data transmission are often not yetsufficiently standardized. This hinders the operation of efficient cross-RI data processingroutines,e.g., fordataquality checking. Thedemonstrator showcasesa serviceprototype thatallowssubmittingandpublishingrawobservational(non-geophysical)environmentaltimeseriesdataincommonstandardformats(T-SOSXMLandSSNOJSON).AmessagingAPI(EGIARGO)isusedtoperformNearRealTime(NRT)qualitycontrolproceduresbyanApacheStormNRTQCTopology, which publishes the quality controlled and labelled data via a messaging outputqueue.

Science Demonstrator 4 describes the EuroArgo Data Subscription Service (DSS) that allowsresearcherstosubscribetocustomizedviewsofArgodata,selectingspecific regionsandtime-

Page 6: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

6

spans,andchoosethefrequencyofupdates.Tailoredupdatesarethenprovidedonscheduletoresearchers’ private storage. The demo showcases an integration solution that combines theEuroArgo community data portalwith e-Infrastructure services (EUDATB2SAFE, EGI FedCloud,etc.),andusestheDRIPservicedevelopedbyT7.2foroptimisedservicedeployment.Thepilotactivity was initiated by the marine research community, however, the possibility to receiveregular transmissions of data, especially in near-real time, directly from the organisationresponsiblefordatacollectionand(pre-)processing, isvery importanttomanylarge initiatives.ENVRIplusRIs canbenefit from the subscription services, e.g., to createmoreelaborateddataproductsby requestingdata fromother sources, and canoptimise their internalworkflowsbysigningupforautomaticupdates.

ScienceDemonstrator5showcasesa“sensorregistry”thataimsatsupportingthemanagementofsensorsdeployedforin-situmeasurements.Commonsensorsorfamiliesofsensorsareusedacross different research infrastructures, for example, oxygen optodes that are equipped onplatforms in multiple research infrastructures. The goal of this work is to define commonmethods to access the sensormetadata in such cases. The sensor registry applies the designprincipleofdata cataloguedeveloped inWP8,andusesdata technologiesand standards fromtheOGCSensorWebEnablement family includingSensorML,ObservationsandMeasurements(O&M), and Sensor Observation Service (SOS). It brings together a marine domainimplementation of these standards (the Marine SWE profile) developed by several Europeanprojectsdemonstratingtheviabilityforfuturesensorandobservationactivities.Theservicecanbeintegratedtovarioustypesofplatforms,deep-seaobservatories(e.g.,EMSO),marinegliders(e.g.,EuroGOOS)aswellassolidearth(e.g.,EPOS)oratmosphereobservations(e.g.,ICOS).Itcanalso be used to track usage of specific sensormodels (e.g., CO2) across the RI ‘s observationnetworks.

Science Demonstrator 6 describes a service prototype that supports aerosol scientists instudying new atmospheric particle formation events by moving data analysis from localcomputingenvironments to interoperable infrastructures, thusharmonizingdataanalysis itselfandmore importantly the syntax and semantics of data derived fromanalysis. As researchersinterpretprimarydataand thusgain informationand transfer information intoknowledge,weare studying andadvancing inparticular some technical aspectsof a knowledge infrastructurei.e.,arobustnetworkofscientists,artefactssuchasvirtualresearchenvironmentsandresearchdata, and institutions such as research infrastructures and e-Infrastructures that acquire,maintain and share scientific knowledge about the natural world. The science demonstratorshowcasesapossiblearchitectureofasocio-technical infrastructurethat“transformsdata intoknowledge.” The proposed approach highlights a range of novel possibilities, in particularenablingresearcherstofocusondataanalysisandinterpretationwhileleavingdataaccessandtransformationfromandtosystemstointeroperableinfrastructure.Itsignificantlycontributestoimplementing the global agenda of FAIR data by promoting the notion of “FAIR by Design”,weavingdataFAIRness into the fabricof infrastructures. Itbuildson theprinciplenot to leavemakingdataFAIRtoresearchersbuttoguaranteeitbydesignofwell-engineeredinfrastructures.The demonstrator is first and foremost of primary interest to a specific scientific community,namelythevariousaerosolresearchgroupsthatstudynewparticleformationevents.

ScienceDemonstrator7 illustrateshowaLifeWatchresearchercaneasilyuploadandintegrateananalysisalgorithm inD4Science,andshare itwithother researchers inaVRE.Theusecase

Page 7: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

7

proposedanintegrationsolutionthatlinkstheD4Science/gCubeVREtotheLifeWatchRIandtotheEGIe-Infrastructure.Thisintegration,forexample,enablesindividualresearcherstorepeatandreusealgorithmsatwill,runtrendanalysis,andaddnewparametersandcustomdata.TheVREprovidesprovenanceregistrationthatimprovesreproducibilityandalsoallowsretentionofcomputation results in the user’s workspace. This facilitates editing and adaptation ofalgorithms,featuresthatarenotprovidedbytheexistingLifeWatchICT.

Page 8: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

8

1Introduction1.1ScopeandPurposeTheaimofENVRIplusWP9istovalidatetheresultsoftheENVRIplusTheme2developmentsbyshowcasing software deployments onto computing and data infrastructure and throughinvestigatinghowtooperatedevelopedserviceswithinRIs.WP9definedtwotasks:

● T9.1 focuses on the technical issues of software validation, integration and releasemanagement,andofdeployingdevelopedresultsoncomputinganddatainfrastructure

● T9.2 focuses on tracking the actual usability and operationality of the ENVRIplusservices,oncethesearedeployedunderrealworldconditions.

This deliverable reports Tasks T9.1 activities during the project period M29 to M40. It is anupdateof thepreviouslysubmitteddeliverableD9.1,anddescribesthe implementationresultsofusecasesidentifiedinthatreport.

1.2ApproachInD9.11weexplainedtheapproachtotheserviceintegrationandvalidation.Thedriverwastodefineappropriateusecasesthataddresscommunity’srealneeds,integrateENVRIplusTheme2services as part of the service solutions, and implement those use cases following an agileapproach.

FIGURE1.THETHREEMAINSTEPSOFTHEUSECASECOLLECTIONPROCESS:1)USECASECOLLECTION,2USE

CASEEVALUATIONAND3)AGILEACTIVITIES

TheprocessfromusecaseidentificationtoimplementationissummarizedinFigure1,andwasdescribedinmoredetailinD9.1.InStep1)UseCaseCollection,weopenedacalltoallTheme2

1ENVRIplusdeliverableD9.1:http://www.envriplus.eu/wp-content/uploads/2015/08/D9.1-Service-deployment-in-computing-and-internal-e-Infrastructures.pdf

Page 9: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

9

partnersforusecasessubmission.Werequestedthattheusecasesshouldbeidentifiedtogetherwith ENVRIplus RIs, and the case description should include the investigation goals, proposedsolutionsforintegratingTheme2services,andafeasibleworkplantoachievethegoals.InStep2)UseCaseEvaluation,eachsubmittedusecasewasevaluatedbytworeviewersonaspectsofscientificsignificance,demonstrationofENVRIplusvalue,andfeasibilityforimplementationanddeployment.ThefinallyselectedusecasesarelistedinTable1.Finally,inStep3)AgileActivities,eachselectedusecasewas implemented following theagilemethodology.An implementationteam('agileteam')wasformedandteamleaderswereidentified.Theagileteamsstartedwithrefining the case descriptions, thenworked according to defined plans, and reported back atENVRI weeks. ByM28, when D9.1waswritten, the formation of agile teamswas completed.SinceM29,T9.1activitiesarefocusingonusecaseimplementations.

TABLE1.SELECTEDUSECASES,WITHHIGHLIGHTEDTHEME2FINALSCIENCEDEMONSTRATORS

No. USECASES INTEGRATEDSERVICES2 AGILEGROUPLEADERS

IC_1 Dynamicdatacitation,identification&citation

Identification&citation(WP6)

AlexVermeulen&MargaretaHellstrom(LU)

IC_2 ProvenanceImplementationcase Provenance(WP8) BarbaraMagagna(EAA)IC_3 Usersupporttore-processdata

usingtheirownalgorithmsgCubeDataAnalyticsfacility(WP7),D4ScienceVRE

IngemarHaggstrom(EISCAT),LeonardoCandela(ISTI-CNR)

IC_9 QuantitativeaccountingofOpenDatause

Identification&Citation(WP6)

MarkusFiebig(NILU)

IC_10 Domainextensionofexistingthesauri

SemanticLinking(WP5)

BarbaraMagagna(EAA)

IC_11 SemanticLinkingFramework ENVRIRM(WP5),Catalogue(WP8),SemanticLinking(WP5)

ZhimingZhao&PaulMartin(UvA)BarbaraMagagna(EAA)

IC_12 ImplementationofENVRI(plus)RMforEUFARandLTER

ENVRIRM(WP5) BarbaraMagagna(EAA)

IC_13 TheeddycovariancefluxesofGHGs

gCubeDataAnalyticsfacility(WP7),D4ScienceVRE

DarioPapale&DomenicoVitale(UNITUS)

IC_14 SOS&SSNontologybasedDataAcquisitionandNRTDataQualitycheckingservices

SemanticLinking(WP5),SensorObservationService,EGIARGOmessenger(WP9),EGIFedCloud(WP9),EGIstorm(WP9)

RobertHuber&MarkusStocker(UniHB)

TC_2 EuroArgoDatasubscriptionservice DRIP(WP7),EUDATB2FIND(WP9),EUDATDataSubscriptionService(WP9),EGIFedCloud(WP9)

ThierryCarval(IFREMER),YinChen(EGI)

2ForservicesdirectlystemmingfromTheme2workpackagesisreportedinparenthesis.

Page 10: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

10

TC_4 Sensorregistry DataCatalogue,Provenance(WP8)

JustinBuck(BODC)

TC_16 DescriptionofaNationalMarineBiodiversityDataArchiveCentre

ENVRIRM(WP5) DanLear(MBA),AbrahamNieva&AlexHardisty(CU)

TC_17 Newparticleformationeventanalysisoninteroperableinfrastructure

gCubeDataAnalyticsfacility(WP7),D4ScienceVRE,EGIJupyterLab(WP9),gCube/D4ScienceCatalogue(WP8),ENVRIRM(WP5)

MarkusStocker(TIB)

SC_3 gCube-basedVREforMosquitodiseasestudy

gCubeDataAnalyticsfacility(WP7),Provenance(WP8)

MatthiasObst(UGOT),BaptisteGrenier(EGI)

1.3IntegrationandValidationUsecasesareimplementedfollowinganagiledevelopmentapproach.Agiledevelopmentreferstoasetofsoftwaredevelopmentmethodsinwhichrequirementsandsolutionsevolvethroughcollaboration between self-organising, cross-functional teams. Establishing short-lived, agilemultidisciplinary task forces to address specific issues is key to developing the depth ofunderstandingandcommitmenttomakeprogress3.Theagileteamsareresponsiblefortestingtechnologies, service integration and deployment. Most of the agile team members areresearchers directly involved in ENVRIplus RIs,with good knowledge of respective communityrequirements. They bring information and feedback from their RIs and at the same timeintroduceENVRIplusservicesanddevelopmentresultstotheircommunities.

Table1alsoprovides themappingof theusecasesand testedTheme2services.Thedetailedintegration solutions are discussed in each Science Demonstrator in Section 2. DuringENVRIweeksweorganised sessions orworkshopswhere agile teamsdiscussed their use casesandgotfeedbackfromthecommunities.

● 1stENVRIweek,16-20Nov2015,Prague:AplenarysessionwasorganizedtoinvolveallENVRIplus RIs and WPs to jointly identify use cases that would be of interest forcommunityresearchersandbenefittheirdailywork.ThisalsoopenedupconversationsamongThemesandWPstoestablishcollaborationsandtoavoidduplicationofeffort.

● 2ndENVRIweek,9-13May2016,Zandvoort:AusecasesessionwasorganizedinvolvingbothTheme1andTheme2members.Representativesof14reviewedandselectedusecases gave presentations on their investigation plans. The purpose was to form agileteams involving both domain scientists from Theme 1 and technology experts fromTheme2.

● 3rdENVRIweek,14-18Nov2016,Prague:TocontributetotheTheme2maindiscussionsat that time on architecture design and integration of e-Infrastructure services, weorganisedasessionduringtheTheme2miniworkshopthatwasattendedbyTheme2and e-Infrastructure representatives. Use cases that tested e-Infrastructure services

3M.Atkinson,M.Carpené,E.Casarotti,etal.:VERCEdeliversaproductivee-Scienceenvironmentforseismologyresearch.InProc.IEEEeScience2015.Doi10.1109/eScience.2015.38

Page 11: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

11

were presented (including TC_2 EuroArgo data subscription service, TC_13 ICOS dataprocessinguse case, IC_3EISCAT-3Duse caseon supporting individual researchers fordata processing, and SC_2 LifeWatch use case on mosquito study). The discussionsfocused on using e-Infrastructure technology and architecture issues. Theme 2developersande-Infrastructureserviceprovidersgave feedbackonservice integrationande-Infrastructuresolutions.

● 4th ENVRIweek, 15-19 May 2017, Grenoble: A Theme 2 showcase workshop wasorganized, targeting the whole ENVRIplus community. Four well-developed use cases(IC_3,IC_13,IC_14,andTC_2)weredemonstratedandfeedbackfromRIscommunitieswas received. Following that, a discussion sessionwasorganisedonhow tomake theapproachmore generic so that it would also support other ENVRIplus RIs, taking thefourusecasesasexamples.WealsoidentifiedRIsthatcouldbefurtherinvolvedtotestandvalidatetheservicesolutions.

● 5thENVRIweek,6-10Nov2017,Malaga:InlinewiththeEuropeanOpenScienceCloud(EOSC) discussions at that time, the session agenda included presentations fromselected use cases (IC_3, IC_9, IC_13, IC_14, TC_4 and TC_17) that addressed EOSCissues (e.g., open access, FAIRness, interoperability, user-driven, using e-Infrastructureresourcesandservices,etc.)anddiscussionsonhowtoconnectENVRIplusserviceswithEOSC. Positive feedback was received from EOSC (EOSCpilot and EOSC-hub) projectrepresentatives that showed these pilot investigations were among the pioneersimplementingtheEOSCvisionandthattheycouldserveasexamplestohelpENVRIplusRIsconnecttoEOSC.Ontheotherhand,theusecasesalsoidentifiedtherequirementsfromenvironmentalsciencecommunityforEOSC,thusbenefittingEOSCdevelopments.

● 6th ENVRIweek, 14-18May 2018, Zandvoort: Due to the tight schedule , instead ofholdingadedicatedWP9session,,weorganisedaseriesofWP9webinarsafterwardstocatchuponthemisseddiscussions.Eachonlinewebinarfocusedononeusecase,whichgave sufficient time for detailed discussions. In particular, the use cases served asvehiclestoreachRIsbyaskingtheusecaseleaderstoinviteTheme1colleaguesandendusers. These webinars attracted good attention, and the use cases benefitted fromvariousfeedbackandinputs.

In order to provide quantitative measures of validation information and make it available toservicedevelopers,serviceprovidersandendusers,T9.1supportsT9.2toproposeanddevelopannewapproachthatdefinesasoundnessevaluationcriteriamatrix,anddevelopingaserviceevaluation tool with special focus on user friendliness, aspects covered and possibilities forcustomising both the user interface and the output formats. The detailed description will beprovidedintheupcomingWP9deliverableD9.4.

1.4DeploymentWP9providese-infrastructureenvironmentsforservicedeployments.TwoleadingEuropeane-Infrastructuresparticipate inWP9,namelyEGI4andEUDAT5.EGIprovidesEuropeanandglobalfederationofcomputingandstorageresources.EUDAToffersaninteroperablelayerofcommondatamanagementservicesforresearchersandresearchcommunities.

4EGI:www.egi.eu5EUDAT:www.eudat.eu

Page 12: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

12

EGI Federated Cloud (FedCloud) resources are offered for testing and deploying Theme 2servicesandthepilotimplementationsfromusecases.EGI(FedCloud)6isagridofcloudswithaharmonized operational behavior. EGI FedCloud federates institutional resource providersofferinganInfrastructure-as-a-Service(IaaS)solutioncomposedof22providersfrom14NationalGridInitiatives(NGI)runningdifferentCloudMiddlewareFrameworks(CMF),2/3OpenStack,1/3OpenNebula,and1Synnefosite.Around7000coresareavailable(asinAugust2018).

Inthefederation,theCloudsandtheirinterconnectionsarebasedonopentechnologiessuchastheOpenNebula, OpenStack and Synnefo CloudMiddleware Frameworks (CMF), and on openStandards such as the Open Cloud Computing Interface (OCCI) for Virtual Machine (VM)managementandtheCloudDataManagement Interface(CDMI) forobjectstorage.Acommonauthentication and authorization layer using x509 and VOMS is used and OpenID Connectintegrationisongoing.Operationaltoolsforaccounting,monitoringandticketingareoperatedcentrally.

The data analytics platform (DataMiner) envisaged in T7.1 is operated by the D4Scienceinfrastructureaccordingtothe“fullplatformas-a-Service”model. It ismadeavailablebyVREs7and configured to transparently exploit computing resources from EGI FedCloud by theD4Science.org Virtual Organisation that is currently supported by CESGA, GeoGRID, IISAS-FedCloud, RECAS-BARI and UPV-GRyCAP. A Service Level Agreement (SLA) has been signedbetweenEGIand these resourceproviders thatguarantees theprovisionof the resourcesandthequalityoftheservice.

The Dynamic Real-time Infrastructure Planner (DRIP) developed in T7.2 optimises the use ofvirtualizede-Infrastructures (i.e., Clouds)byautomaticallyplanningandprovisioningdedicatedinfrastructureresourcesonwhichtorunapplicationworkflows,andautonomouslyinstallingandinitialisingapplicationcomponentson those resources.DRIP isable to independentlyscale theinfrastructuredeploymentbasedonthequalityofservicerequirementsoftheapplication.DRIPhas been deployed and tested on the EGI FedCloud, and is available to the public8under theApache2.0open source licence.Additionalmicro-services dealingwith specific scenarios in e-Infrastructureprovisioningandcustomisationcanbeaddedto theDRIPservicesuitebasedonresults generalised from the ENVRIplus test/implementation use cases, with the publishedcomponentsupdatedaccordinglyafterunitandintegrationtesting.

Severalusecasesintegratee-Infrastructureservicesaspartoftheservicesolutions,anddeploythefinalsystemsontothee-InfrastructureresourcesprovidedbyT9.1.Forexample,inusecaseIC_14, the Near Real Time (NRT) Quality Control (QC) service is implemented in Java on EGIvirtualmachineswithacorrespondingApacheStorminstallation.InusecaseTC_2,accumulatingmonthlyEuroArgodatasetsarepushed toEUDATe-Infrastructure (B2SAFE5), and synchronizedbetween EUDAT and EGI FedCloud. The processwas executed byDRIP, dynamically deploying

6EGIFedCloud:https://www.egi.eu/services/cloud-compute/7SeveralVREshavebeencreatedtoprovideuserswithdataanalyticsfacilitiesincluding:https://services.d4science.org/group/envriplus(foreveryonewillingtoexperiencewiththefacility),https://services.d4science.org/group/eiscat(forsupportingtheIC_3case),https://services.d4science.org/group/icoseddycovarianceprocessing(forsupportingtheIC_13case),https://services.d4science.org/group/particleformation(forsupportingtheTC_17case),8DRIP:https://github.com/QCAPI-DRIP/

Page 13: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

13

andmanagingasmanyVirtualMachinesasrequiredtocopewiththeloadinordertobeabletoprocessthesubscriptionrequestsinatimelymanner.

1.5CollaborationsandImpactsApartfromthetechnicalcontributions,agileteamslargelysupportthecollaborationacrossWorkPackages, ENVRIplus RIs and organisations. Table 2 shows the involvement of WPs andENVRIplusRIs ineachusecase, illustratingthecollaborationsestablishedamongtheENVRIpluscommunity.

TABLE2.INVOLVEMENTOFWORKPACKAGES,ENVRIPLUSRISANDORGANIZATIONS

NO. WPS ENVRIPLUSRIS ORGANISATIONS

IC_1 WP6,WP9 ICOS,ANAEE,ACTRIS,LTER-EUROPE,IAGOS,EMSO

ICOS,ANAEE,IAGOS,ACTRIS,UNIHB/PANGAEA,EAA

IC_2 WP6,WP8 LTER-EUROPE,ANAEE,IS-ENES,EPOS,ICOS,ACTRIS

EAA,EUDAT,UVA,INRA,DLRZ,CNRS,ING,LU,CINECA

IC_3 WP7,WP9 EISCAT-3D EISCAT,ISTI-CNRIC_9 WP6,WP8 ACTRIS,ICOS,EPOS NILU,ICOS,ETHZIC_10 WP5 LTER-EUROPE,LIFEWATCHITALY,EMBRC EAA,UNIVERSITÀDISALENTOIC_11 WP5 ALLENVRIPLUSRIS UVA,EAAIC_12 WP5 LTER-EUROPE,EUFAR EAA,DLRIC_13 WP7,WP9 ICOS ICOS-ETC(UNIVERSITYOFTUSCIA,

VITERBO,ITALY),ISTI-CNRIC_14 WP3,WP9 EMSO,FIXO3,EPOS,SIOS,ANAEE,

SEADATANET,EUROARGOUNIHB,IFREMER,CNRS,CNR,EGI,EAA,PANGAEA

TC_2 WP7,WP8,WP9

EUROARGO IFREMER,EGI,EUDAT,UVA

TC_4 WP8,WP5,WP9

EMSO,EUROARGO,EPOS,ICOS,SIOS,ANAEE

IFREMER, CU, PLOCAN,NERC.BODC,LOCEAN, IPGP, CNR, IPSL,52°NORTH,MARUM,CSEM,INRN

TC_16 WP5,WP9 DASSH,MEDIN MBA,CU,UVATC_17 WP7,WP9 ICOS,SMEAR,ACTRIS,D4SCIENCE,EGI UNIHB/PANGAEA,ISTI-CNR

SC_3 WP7,WP9 LIFEWATCH-SW,EGI,D4SCIENCE,BIOVEL UGOT,EGI,ISTI-CNR

*SC:ScienceCase;TC:TestCase,IC:ImplementationCase.RefertoD9.1fordetailedinformation.

Using international conferences,workshops, andworking groups (in various organisations andcommunity networks) as vehicles, the agile teamshavemadebroad impacts beyond ENVRI; afewselectedexamplesarelistedbelow:

● RDA:TheResearchDataAlliance(RDA)waslaunchedasacommunity-drivenorganizationin2013. Today it has 7,000 members from 137 countries, with increasing global impact onbuilding the social and technical infrastructure to enableopen sharingof data. IndividualsfromseveralagileteamsareactivemembersofRDA:e.g.,theIC_1agileleaderco-chairsthe

Page 14: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

14

RDADataCitationWorkingGroup(DCWG),andIC_9leaderisamemberoftheresearchdatacollectionsworkinggroup9.

● EOSC:TheEuropeanOpenScienceCloud(EOSC)opensupneweraforEuropeanresearchersand innovators to discover, access, use and reuse a broad spectrum of resources foradvanceddata-drivenresearch.Anumberofagileteams(IC_1,IC_3andTC_2)gotinvolvedasCompetenceCentersintheEOSC-hubproject.

● DI4R:DigitalInfrastructuresforResearch(DI4R)10isoneofthehigh-impactconferencesfore-Infrastructures, jointlyorganisedbyEuropean leadinge-InfrastructuresandprojectssuchasEOSC-hub,GEANT,PRACE,OpenAIRE,EGI,EUDATetc.TwoworkshopsproposedbyWP9were accepted and organised in the past DI4R conferences (2016 in Krakow, 2017 inBrussels). Various use cases (IC_1, IC_3, and TC_2) were presented as ENVRIplus projectresults.

● EGU: the European Geosciences Union (EGU) conferences are one of Europe’s premiergeosciences conferences attracting thousands of participants each year in Vienna. AgileteamssuchasIC_1,IC_9,IC_10,andTC_17activelyparticipateintheEGUnetwork.

1.6FinalScienceDemonstratorsThankstotheagileleadersandtheirteamswhohavebeenactivelytestingTheme2serviceandvarioustechnologies,asubset11ofusecasesmadeexcellentprogressandareselectedas finalScienceDemonstrators for Theme2 services. By Theme2 ScienceDemonstrator,wemean “ashowcaseofaservicesolutionillustratedthroughaprototypeimplementation,whichservesasprooforevidencethattheTheme2servicescanbringaddedvalueforsupportingtheENVRIpluscommunity to conduct scientific research.” This deliverable reports the final implementationresultsofthoseScienceDemonstrators(listedinTable3).

TABLE3.LISTOFTHEFINALSCIENCEDEMONSTRATORS

# SCIENCEDEMONSTRATOR LINKTOTHEDEMO

ScienceDemonstrator1

Title: Support EISCAT_3D Users to Reprocess DataUsingUser’sAlgorithms(UseCaseIC_3)Author: Ingemar Haggstrom, EISCAT ScientificAssociation,[email protected]

https://youtu.be/YEEMUvnSHUM

ScienceDemonstrator2

Title:Theeddycovariancefluxes(UseCaseIC_13)Authors:• DomenicoVitale,UniversityofTuscia,[email protected]

• DarioPapale,UniversityofTuscia,[email protected]

• LeonardoCandela,CNR-ISTI,[email protected]

https://youtu.be/hod2WksKzV8

9RDAWorkingGrouponResearchDataCollections:https://www.rd-alliance.org/groups/research-data-collections-wg.html10DI4Rconferenceshttps://www.digitalinfrastructures.eu/11Otherusecasesthatarenotincludedinthisreportwillbecontinuedeveloped.Andtheresultswillbedescribedinotherreportsordocument.

Page 15: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

15

• GianpaoloCoro,CNR-ISTI,[email protected]

ScienceDemonstrator3

Title:SOS&SSNontologybasedDataAcquisitionandNRTDataQualitycheckingservices(UseCaseIC_14)Author:RoberHuber,UniversityofBremen,[email protected]

https://youtu.be/p3UQZkRRWlw

ScienceDemonstrator4

Title:EuroArgoDatasubscriptionservice(UseCaseTC_2)Authors:• ThierryCarval,IFREMER,[email protected]

• GlennJudeau,IFREMER,[email protected]

• JaniHeikkinen,CSC,[email protected]• BaptisteGrenier,EGI,[email protected]• ZhimingZhao,UvA,[email protected]• PaulMartin,UvA,[email protected]• SpirosKoulouzis,UvA,[email protected]

https://youtu.be/PKU_JcmSskw

ScienceDemonstrator5

Title:Sensorregistry(UseCaseTC_4)Authors:• JustinBuck,BODC,[email protected]• SimonJirka,52°North,[email protected]

https://youtu.be/4QxTZ2iiznk

ScienceDemonstrator6

Title:Newparticleformationeventanalysisoninteroperableinfrastructure(UseCaseTC_17)Authors:• MarkusStocker,TIBandMARUM/PANGAEA,[email protected]

• MarkusFiebig,NILU,[email protected]• LeonardoCandela,CNR,[email protected]

• GiuseppeLaRocca,EGI,[email protected]• EnolFernandez,EGI,[email protected]• AlexHardisty,CU,[email protected]

https://youtu.be/ra9W7b5DbgI

ScienceDemonstrator7

Title:gCube-basedVREforMosquitoDiseasesStudy(UseCaseSC_2)Authors:• BaptisteGrenier,EGI,[email protected]• MatthiasObst,SwedishLifewatch,[email protected]

• LeonardoCandela,CNR-ISTI,[email protected]

• GianpaoloCoro,CNR-ISTI,[email protected]

https://youtu.be/XtcCxkFX98I

The remainder of the document provides detailed information about the seven sciencedemonstrators,compiledbytherelevantagileteams.

Page 16: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

16

2ScienceDemonstrators2.1 ScienceDemonstrator1:SupportEISCAT_3DUserstoReprocessDataUsingUser’sAlgorithms(UseCaseIC_3)Overview

Often the data products from a RI, derived from lower level/raw data, are pre-cooked. Thatmeanstheyareproducedbyapplyingsomedefaultprocessingalgorithmsandparameters, likespatial and temporal resolution, the use ofmodel parameters,model and process selections.IC_3 demonstrates a model that enables scientific researchers to re-process data by definingotherselectionsoftheseinputsfromothersources.

ScientificObjectives

EISCAT_3D is an environmental research infrastructure on the European Strategy Forum onResearch Infrastructures (ESFRI) roadmap. Once assembled, it will be a world-leadinginternational research infrastructure to study the high atmosphere in the Fenno-ScandinavianArcticandtoinvestigatehowtheEarth'satmosphereiscoupledtospace.

Ingeneral,EISCATdataproducts(includingfutureEISCAT_3Ddata)arenotrawdataassampledin the receivers, but have already undergone several steps of processing (along the chainvoltages -> filtering -> time averaged spectral data -> inversion of physical parameters) bystandardanalysisalgorithmsintheEISCATICTsystem.Thisalsoimpliesthatdataareprocessedwith a predefined set of parameters, e.g. spatial and temporal resolution, inversion modelselection,modelparametersandallowedparameterranges.

ThisusecaseaddressedarequirementoftheEISCATusercommunity,namelytoallowindividualscientiststoprocesstheoriginalexperimentaldatausingtheirownalgorithms.ThechallengeinthisusecaseishowtomakeuseofENVRIplusservicestodemonstratethatEISCATscientistscanre-processdatabydefiningotherselectionsofparametersandalgorithms.

Description

A typical usage scenario is that users select data together with an algorithm and invoke aworkflowwithtunedparameters.ThedatatobeusedinthetestcaseisfromthepresentEISCATfacilities,andtheprocessingsoftwareisprovidedbyEISCATandoriginallywritteninMatlab.Inthisusecase,wechoseaMatlabprocessingpackagethatcouldbeconvertedintoopensourcesoftware.Theprocessistogenerateagraphicalvisualisationoftheexperimentaldata.Figure2showstheEISCATRealTimeGraph(RTG),whichisprimarilydevelopedtofollowtheradarruninreal-timeandtoproduceadisplayofbasicparameters,suchasreturnedpowerspectra.

Page 17: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

17

FIGURE2.EISCATREALTIMEGRAPHPRODUCESVISUALISATIONOFTHEPROCESSINGOFRAWEISCATRADARDATAWITHBASICPARAMETERS.

FIGURE3.INGCUBE,EISCATUSERSCANCREATEPROCESSINGALGORITHMSANDSHARETHEMWITHOTHERS.

TheRTGwasmodifiedtorununderGNUOctave12,andtoproducetheplotsinbatchmodewithsomeselectedinputparameters.ThewholesetupwasinstalledintotheD4SciencegCubeVirtualResearch Environment13, provided by ENVRIplusWP7. The D4Science gCube Data Analytics isexecutedviaawebdashboardbasedGUI.TheticketingsystemofD4sciencehasbeeneffectively

12GNUOctave:https://www.gnu.org/software/octave/13TheVREisavailableathttps://services.d4science.org/group/eiscat

Page 18: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

18

used for tracking the implementation and communicating with WP7 developers. Some initialresultsareshown inFigure3,whichdemonstratesthat inthegCubeplatform,theEISCATRTGalgorithmsaretranslatedintoOctave,andcompiledandpublishedinanENVRIpluscommunityspace.Inthesameway,analgorithmcreatedbyasingleEISCATusercanbesharedandreusedbyothers.

Advantages

FormanyENVRIplusRIs,theultimategoalistoprovidequality-checkedobservationdatatotheirusercommunities.Theeaseofuseraccesstodataandanalysisresourceshasadirectimpactontheusers’abilitytodelivernewresearchresultsandisthusdirectlylinkedwiththeimpactandvalue of the RIs. This issue is therefore given increased emphasis when RIs design their ICTsystems. In recentyears,VirtualResearchEnvironments (VREs)haveemergedasan importantapproachtoprovideweb-basedsystemshelpingresearcherstocollaborate.WP7(T7.1)hassetup a VRE for the ENVRIplus community using the D4Science platform. D4Science supports aflexibleandagileapplicationdevelopmentmodelbasedonthenotionofPlatformasaService(PaaS),inwhichcomponentsmaybeboundinstantlyatthetimetheyareneeded.Inthisway,itenables user communities to define their own research environments by selecting theconstituents (the services, the data collections, the machines) among the pools of resourcesmadeavailablethroughtheD4Sciencee-Infrastructure.

AccesstoaVREforuser-drivendataanalysisisacommonrequestnotonlyintheenvironmentalsciencescommunity.Theimplementationsfromthisusecasescanbeeasilyextendedtosupportother domain applications’ needs, for example, it would benefit space and solar-terrestrialphysics,solarsystemphysics(meteorsandasteroids),andastronomy.Thepilotexperiencescanbeeasilysharedwith thewholeEISCATcommunity,whichcoversmembercountries including,Finland,France,Norway,Sweden,USAandUK,anditsglobalcollaborationsreachChina,Japan,Korea.

LinktotheDemonstrator

Link:https://youtu.be/YEEMUvnSHUM

Page 19: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

19

Theserviceprototypecanbeaccessedat:

https://services.d4science.org/group/rprototypinglab/data-miner.

Simply request a use account, you will be able to login to the VRE. There, on ‘Executes anexperiment’,selectsENVRIandWebtgV2.

Contributors

● IngemarHäggström,EISCATScientificAssociation,[email protected]

2.2 Science Demonstrator 2: The Eddy Covariance Fluxes of GHGs(UseCaseIC_13)Overview

Eddy covariance (EC) fluxes calculation involves a complex set of data processing steps that,beyond the knowledge of the technique, requires a considerable amount of computationalresources. This might constitute a constraint for RIs (e.g. ICOS) that aim to simultaneouslyprocessrawdatasampledatmultiplesitesinNearRealTime(NRT)mode(i.e.provideeachdayfluxesestimatesrelativetothepreviousday).Thisdemonstratorshowcasestheservicesolutionthat integrates the gCube service to optimize the processing of EC data based on 4 differentprocessingschemesresultingfromacombinationofblockaverage(ba)orlineardetrending(ld)anddoublerotation(dr)orplanarfit(pf)processingoptionsandmakethisavailabletootherRIs.

ScientificObjectives

Theambitiousgoalofthispilotinvestigationistoprovideacomputationallyefficienttoolabletoprocess EC raw data and offer to users the possibility to calculate EC fluxes according to themultiple processing scheme described by Sabbatini et al. (2018)14. The ultimate aim is toestablish a service that can be used by RIs that use this micrometeorological technique tomeasure exchanges of greenhouse gases and energy between terrestrial ecosystems andatmosphere (e.g. ICOSbutalsootherRIsusingtheeddycovariancemethodssuch forexampleLTERorANAEE).

Description

TheECtechnique involveshigh-frequencysampling(e.g.10or20Hz)ofwindspeedandscalaratmospheric concentration data, and yields vertical turbulent fluxes. EC fluxes are computedwithin a finite averaging time (normally 30 mins) from the covariance estimates betweeninstantaneous deviations in vertical wind speed and gas concentration (e.g. CO2) from theirrespectivemeanvalues,multipliedbythemeanairdensity(seeAubinetetal.,2012)15.

Despite the simplicityof this idea,anumberofpracticaldifficultiesarise in transforminghigh-frequencydataintoreliablehalf-hourlyfluxmeasurements.Tocopewiththeseissues,hereweusedthetoolsimplementedbytheEddyPro®Fortrancode(LI-CORBiosciences,2017,Fratiniand

14SabbatiniS.etal(2018).EddycovariancerawdataprocessingforCO2andenergyfluxescalculationatICOSecosystemstations,InternationalAgrophysics.15 Aubinet, M., Vesala, T., & Papale, D. (Eds.) (2012). Eddy covariance: a practical guide tomeasurementanddataanalysis.SpringerScience&BusinessMedia.

Page 20: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

20

Mauder, 2014) 16 an open source software application avaialble for free download athttps://www.licor.com/env/products/eddy_covariance/eddypro.html. The choice of EddyPro®software is motivated by i) the availability of different methods for data quality control andprocessing (e.g. coordinate rotation, time seriesdetrending, time lagdetermination, spectralcorrections, flux randomuncertaintyquantification,etc.), ii) theavailabilityof the sourcecodeandiii)thefactthatthesoftwareisbasedonacommunitydevelopedsetoftools.

RequiredfortheprocessingofECrawdatathroughEddyPro®software,are1)theavailabilityofmetadatainformationabouttheECsystemsetupandrawdatafilestructure,and2)thechoiceofasuitablecombinationofprocessingoptions.

Concerning1),usershavetoprovideastandardizedmetadatafilein.csvformat(metadata.csv,seeTable4).ThisfileconstitutestheinputofanRscriptthatautomaticallybuildsthemandatoryfiles ingested into theEddyPro® software (i.e. the .metadataand .eddypro files) developedadhoc for this exercise. The organization and name of the metadata variables is based on aninternational standard (BADM)usedalso in theUSAnetworkAmeriFlux.The formatof thecsvhasbeeninsteaddesignedinordertodevelopatemplateeasytopreparebyindividualscientistsandorganizedRIs.IncaseofNRTdataprocessing,inordertoperformpartofthefluxcorrections(i.e.spectralcorrectionsandplanarfit),5additionalconfigurationfilesareneeded:planar_fit.txt,spectral_assessment_badr.txt, spectral_assessment_lddr.txt, spectral_assessment_bapf.txt,spectral_assessment_ldpf.txt. They can be obtained by specific EddyPro® runs based on longperiods of data (at least one month of data is usually required for a consistent parametersestimation). The above files have to be placed together with EC raw-data files in an archivefolder(data.zip)whichwillconstitutetheinputfileofthecurrentimplementation(seeFigure4).

TABLE4.DESCRIPTIONOFMETADATATOPROVIDEINTHEMETADATA.CSVFILE

COLUMN VARIABLELABEL(FILEHEADER) DESCRIPTION(UNITS)

1 SITEID Official EC station code following the FLUXNET standards(CC-Xxx)

2 LATITUDE Geographiclatitude([-90,90]fromStoN,decimal)

3 LONGITUDE Geographiclongitude([-180,180]fromWtoE,decimal)

4 ALTITUDE Altitudeofecosystemunderstudy(m)5 CANOPY_HEIGHT Distance between the ground and the top of the plant

canopy(m)

6 SA_MANUFACTURER Manufacturerofthesonicanemometer(currentlyonlygill)7 SA_MODEL ModeloftheSA(currentlyonlySA-GillHS-50or-100)

8 SA_SW_VERSION EmbeddedsoftwareversionoftheSA9 SA_WIND_DATA_FORMAT Theformatofwinddata(currentlyonlyuvw)

10 SA_NORTH_ALIGNEMENT Specify whether the SA's axes are aligned to transducers(axis)orspars(spar)

11 SA_HEIGHT Theverticaldistancebetweenthegroundandthecenterofthedevicesamplingvolume(m)

12 SA_NORTH_OFFSET Specify the SA's yawoffsetwith respect to localmagnetic

16 Fratini, G., & Mauder, M. (2014). Towards a consistent eddy-covariance processing: anintercomparisonofEddyProandTK3.AtmosphericMeasurementTechniques,7(7),2273-2281.

Page 21: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

21

north(degreepositiveeastward)

13 GA_MANUFACTURER Manufacturerofthegasanalyzer(currentlyonlylicor)14 GA_MODEL ModeloftheGA(currentlyonlyGA_CP-LI-CORLI7200)

15 GA_SW_VERSION EmbeddedsoftwareversionoftheGA16 GA_NORTHWARD_SEPARATIO

NThedistancebetween thecenterof samplevolumeof theGA and the SA asmeasured horizontally along the north-southaxis(cm)

17 GA_EASTWARD_SEPARATION Thedistancebetween thecenterof samplevolumeof theGAandtheSAasmeasuredhorizontallyalongtheeast-westaxis(cm)

18 GA_VERTICAL_SEPARATION Thedistancebetween thecenterof samplevolumeof theGAandtheSAasmeasuredalongtheverticalaxis(cm)

19 GA_TUBE_DIAMETER Theinsidediameteroftheintaketube(mm)20 GA_FLOWRATE Theflowrateoftheintaketube(l/min)

21 GA_TUBE_LENGTH Thelengthoftheintaketube(cm)22 FILE_DURATION Thetimespancoveredbyeachrawfile(min)

23 ACQUISITION_FREQUENCY Thenumberofrecordspersecondinrawfiles(10or20Hz)

24 FILE_FORMAT Specifytheformatofrawfiles(ASCIIorBIN)25 FILE_EXTENSION Specifytherawfilesextension(e.g..csv,.txt,.dat)

26 LN Loggernumber(from1to10)27 FN Numberofthefilegeneratedbythelogger(from1to10)

28 EXTERNAL_TIMESTAMP 0 or 1 if the timestamp in the file name refers to thebeginningortheendofaveragingperiod,respectively.

29 INTERNAL_TIMESTAMP 1ifthereisatimestampinternaltorawfiles,otherwise0.

30 EOL Specifytheendoflineofrawfiles(e.g.lf)

31 SEPARATOR Thecharacterthatseparatesindividualvaluesinrawfiles32 MISSING_DATA_STRING Specify the character string used for missing data in raw

files(e.g.NA,NaN,-9999)

33 NROW_HEADER Thenumberofrowsintheheaderoftherawfile33+1 COLNAMES_1 Variablenameinthefirstcolumnoftherawdatafile

33+j COLNAMES_j Variablenameinthej-thcolumnoftherawdatafile

33+N COLNAMES_N Variablenameinthelastcolumnoftherawdatafile

It is importanttonotethat inthecurrent implementationonlyfewsensorsaresupported(theoneusedinICOS)butthestructurehasbeenpreparedinordertobereadytoaddnewsensorsandnewprocessingmethods,optionsandcombinations.

The use of different processing options lead to different flux estimates. Discrepancies in fluxestimates are caused by systematic errors introduced by methods used in the raw-dataprocessingstage.Sincetherearenottoolstoestablishaprioriwhichisthebestcombinationofprocessingoptionsprovidingunbiasedfluxestimates,theviablesolution,proposedbySabbatinietal. (2018)and implementedhere, involvesamultipleprocessingschemewhereECfluxdataarecalculatedaccordingtodifferentcombinationsofmethods.

Inparticular, ,ECfluxesarecalculatedaccordingtofourdifferentprocessingschemesresultingfroma combinationofblockaverage (ba)or lineardetrending (ld) anddouble rotation (dr)or

Page 22: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

22

planar fit17(pf) processing options (for details see Aubinet et al, 2012)18. All other processingoptions remain unchanged: maximum cross-covariance method for time lag determination,spectral correctionmethod proposed by Fratini et al. (2012)19, statistical tests by Vickers andMahrt (1997) 20 and by Foken and Wichura (1996) 21 for data quality control, method byFinkelsteinandSims(2001)22toestimaterandomuncertainty.

Toreducethecomputationalruntime,theimplementationofthefourprocessingschemesaboveisperformedinparallelmodeinthegCubeVirtualResearchEnvironment(VRE).Theprocessingpath is defined as in Figure 4.When using EC raw data from a single observation tower, theestimated computational time required for a NRT run is about 4 minutes, similar to thoserequiredfortherunofasingleprocessingscheme.

FIGURE4.ECDATAPROCESSINGPATH

17Wilczak, J.M., Oncley, S. P., & Stage, S. A. (2001). Sonic anemometer tilt correction algorithms.Boundary-LayerMeteorology,99(1),127-150.18 Aubinet, M., Vesala, T., & Papale, D. (Eds.) (2012). Eddy covariance: a practical guide tomeasurementanddataanalysis.SpringerScience&BusinessMedia.19Fratini,G.,Ibrom,A.,Arriga,N.,Burba,G.,&Papale,D.(2012).Relativehumidityeffectsonwatervapour fluxes measured with closed-path eddy-covariance systems with short sampling lines.Agriculturalandforestmeteorology,165,53-63.20Vickers,D.,&Mahrt,L. (1997).Qualitycontroland fluxsamplingproblems for towerandaircraftdata.JournalofAtmosphericandOceanicTechnology,14(3),512-526.21Foken,T.,&Wichura,B.(1996).Toolsforqualityassessmentofsurface-basedfluxmeasurements.Agriculturalandforestmeteorology,78(1-2),83-105.22Finkelstein, P. L., & Sims, P. F. (2001). Sampling error in eddy correlation flux measurements.JournalofGeophysicalResearch:Atmospheres,106(D4),3503-3509.

Page 23: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

23

Advantages

The implementation of a multiple processing schemes as illustrated above and the directmanagement anduse ofmetadata according to international standard in the eddy covariancecommunityconstitutesanovelty inthecontextofECdataanalysis.Themainadvantageofthemultipleprocessingistwofold.Fromonehand,itoffersthepossibilityofanextensiveevaluationof theeffecteachmethodhasonfluxdataestimation.Ontheother,bycombiningtheoutputresultsasdescribedbySabbatinietal.(2018),itispossibletoobtainmoreconsistentestimatesof the uncertainty associated to EC fluxes.. The direct use of metadata instead ensure theneededflexibilityforalargeuseofthetoolifthenewsensorsareaddedinthesystem.

The efficiency of parallel computing implemented in the VRE, drastically reduces thecomputational runtime required to obtain flux estimates from different processing optionsschemes. This constitutes a clear advantage for any user and in particular, for RIs aiming atanalyzingroutinelylargeamountofdata.

Although, here we selected only 4 processing option schemes, the efficiency of parallelcomputing implemented in theVREoffers thepossibility to increase thenumberofprocessingschemessuitablefortheECdataprocessingandalsopost-processingsteps.

This might considerably improve our understanding about the performance of methodsdevelopedforECraw-dataprocessingandaboutinterpretationofresultingfluxes.

LinktotheDemonstrator

Link:https://youtu.be/hod2WksKzV8

Contributors

● DomenicoVitale,UniversityofTuscia,[email protected]● DarioPapale,UniversityofTuscia,[email protected]● LeonardoCandela,CNR-ISTI,[email protected]● GianpaoloCoro,CNR-ISTI,[email protected]

Page 24: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

24

2.3 Science Demonstrator 3: SOS & SSN Ontology Based DataAcquisition&NearRealTimeQualityControl(UseCaseIC_14)Overview

The Service allows to submit and publish raw observational (non-geophysical) environmentaltimeseriesdata incommonstandard formats (T-SOSXMLandSSNOJSON)viaamessagingAPI(EGI ARGO23) that is used to perform Near Real Time (NRT) quality control procedures by anApacheStormNRTQCTopology,whichpublishesthequalitycontrolledand labelleddataviaamessagingoutputqueue.

ScientificObjectives

ResearchInfrastructures,specificallyobservatoriesthatbuildonenvironmentalsensornetworks,share a commonproblem:data acquisition services and, inparticular, thepreparationof datatransferprior todata transmissionareoftennotyet sufficiently standardized.Thishinders theoperationofefficient,cross-RIdataprocessingroutines,e.g.fordataqualitychecking.

Theoverallobjectiveof this implementationcase is tomove thestandardization level close tothe sensors of RIs, thus allowing the implementation of common, generic data processingroutines,e.g.forNearRealTime(NRT)QualityControl(QC).

FIGURE5.MOVINGSTANDARDSCLOSERTOTHESENSOR.TODAYMANYRISUSEPROPRIETARYPROTOCOLSANDFORMATSWITHINTHEIRDATAFLOW(BEFORE).STANDARDSSUCHASSSNO(NOTSHOWNHERE),SOS(SENSOROBSERVATIONSERVICE)ORO&M(OBSERVATION&MEASUREMENT)FREQUENTLYAREONLYUSEDTOPUBLISHDATA.OURAPPROACHWASTOUSETHESESTANDARDSASEARLYASPOSSIBLE(AFTER):USEPUCKTOEXPOSESENSORMLMETADATA,TRANSMITDATAAST-SOSINSERTRESULTXMLANDISSUETHESEDATAAT

POTENTIALLYMULTIPLESOSSERVERSASWELLASFORCONSUMPTIONBYANRTQCSERVICE.

A further objective is to contribute to the harmonization of data transmission formats andprotocols.

ThissciencedemonstratorusecaseaimstoevaluatestandardizeddatatransmissionusingOGCSWETransactionalSOS (SensorObservationService)asaprioritystandardaswellasusing theSemanticSensorNetwork(SSN)ontology.Bothareimplementedandtested.Itwillidentifyandimplement common genericNRTQC routines suitable formultiple RIs (e.g. EMSO, EuroARGO,ANAEE,etc.)anddeploytheseonappropriatescalablecloudbasedtechnologiesatownand/orEGIplatforms.

23EGIARGO:https://wiki.egi.eu/wiki/ARGO,http://argo.egi.eu/

Page 25: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

25

Description

FIGURE6.ARCHITECTUREOFTHENRTQCSERVICE

Theserviceisbasedontwomaincloudbasedcomponents:1)anApacheStormdataprocessingunitresponsiblefornearrealtimedataqualitycontrolofagivenreal-timetimeseriesdataand2)aEGIARGOmessagingcomponentresponsibleforthemanagementandqueuingofdatasentfromthesensoraswellasforthedeliveryofqualitycontrolleddata.

As illustrated in Figure 6, the data processing and quality control builds on Apache Storm tosupportscalableNRTQConstreamedstandardizedsensordata.ApacheStorm24isadistributedreal-time computation system. It specializes on reliable processing of data streams and isdesignedtosupportreal-timeanalyticsandcontinuouscomputation.CentraltoApacheStormisthenotionofStormtopology.Topologynodesareeitherspoutsorbolts.Verticesarestreams.Astream is anunbounded sequenceof tuples. Tuples are data packages.A spout is a sourceofstreamsinatopology.Boltsperformcomputations(processing)ontuples.

Data from a sensor system using transactional SOS typically is sent in distinct intervals. WethereforechosetouseEGI’sARGOmessagingsystemwhichisbuiltuponKafkatomanagesuchincomingmessages.ARGOprovidesawell-documentedRESTAPIwhichiseasytouseandbasedonJSONenvelopesformessageswhereinbase6425encodedcontentcanbesent.Wehavesetuponemessage queue (topic) for incomingmessages and another queue for delivery of qualitylabelleddata.

Supported datamessages are either base64 encoded T-SOS XML strings or SSNO based JSONstrings.InordertoeasethingsforprocessingwedecidedtotransformthesedatamessagesintoindividualatomicobservationobjectsbasedonSSNO.

WithintheStormtopologywehaveimplementedseveralnodes:

MessageReaderisaBaseRichSpoutwhichcontinuouslyreadsmessagesfromtheARGOmessagequeue.AsARGO can sendmultiplemessageswithinoneAPI response, the spout splits thesesmessagesintoindividualmessageobjectsandemitstheseastuplesforfurtherprocessingwithinthetopology.

24ApacheStorm:http://storm.apache.org/25base64isagroupofbinary-to-textencodingschemesthatrepresentbinarydatainanASCIIstringformatbytranslatingitintoaradix-64representation.

Page 26: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

26

MessageAtomizerBoltisaBaseRichBoltwhichcollectssensormetadatafromsensorURLsgivenin theT-SOSorSSNOsuchas sensor specificmeasurement ranges. It recognizes the sentdataformat (SSNOor T-SOS), splits themessage objects into atomic observation object tuples andemitsthesetuples.

RangeCheckControllerBoltisaBaseRichBoltwhichtakestheseemittedtuplesandchecksifeachnumeric value is within the measurement range of the sensor specification. It adds aqualityOfObservationvalue(0=passed,1=failed)toeachatomicvalueandemitstheseastuples.

OutlierController Bolt is a BaseWindowedBolt which collects a given number of atomic valuetuplesandperformsasimpleoutliercheckbasedonamodifiedz-score.TheboltagainaddsaqualityOfObservationvaluetoeachcheckedatomicvalueandemitstheseastuples.

QualityControlledMessagePackerBoltisaBaseWindowedBoltwhichcollectsagivennumberofqualitycheckedatomicvaluesandaddstheseintoaJSONarraywhichisthesentaspayloadtotheARGOmessagingqueueforQC’dmessageswhereitisavailableforfurtherconsumptionbyanappropriatedomainspecificservicetoupdateorcleanrawdataholdings.

Advantages

Near real time quality control is a common problem for Research Infrastructures. Whereasdomain specificNRT routinesand standardsexist, suchas those for ICOS,ARGOor IAGOS,wehaveshown (seeENVRIplusdeliverableD3.326) thatclearcommunalitiesamong thoseRIswithrespecttoNRTQCroutinesexist.

FIGURE7.COMMONLYUSEDNRTQUALITYROUTINESWITHINENVRIPLUSRIS

Mostcommonlyusedaresimpletestsuchasoutlierorspikedetection,gradientorstuckvaluetests.Adomain independentserviceable toperformthesetestswouldthereforebeanaddedvalue for all ENVRIplus RIs. RIs which have their own services in place could use it for cross-validationoftheirQCresultsanditwouldgivethoseRIstheopportunitytoperformroutineNRTQCcheckswhichdonotyethaveownroutinesinplace.

26ENVRIplusdeliverableD3.3:http://www.envriplus.eu/wp-content/uploads/2015/08/D3.3.pdf

Page 27: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

27

Unfortunately, data transmission formats used within RIs is very diverse. In general, datatransmissionatthesensoraswellasplatformlevellargelydependsoncommunityspecificneedsandhabitsorsimplyonmanufacturerspecifications.Itisthereforedifficulttooffergeneric,crossRIprocessing services in general and inparticular serviceswhichallowNRTQC. It is thereforeclearly advantageous for the scientific community to have access to standard supportingservices.Further,suchservicespotentiallycanstronglypromotetheuseofthesestandards-oneofthemainobjectivesofthisusecase.

LinktotheDemonstrator

● Video

Link:https://youtu.be/p3UQZkRRWlw

● Githubrepository:https://github.com/ab-e/lightning/tree/demo

Contributors

● RoberHuber,UniversityofBremen,[email protected]

2.4 Science Demonstrator 4: EuroArgo Data Subscription Service(UseCaseTC_2)Overview

The EuroArgo Data Subscription Service (DSS) allows researchers to subscribe to customizedviews on Argo data, selecting specific regions and time-spans, and choosing the frequency ofupdates.Tailoredupdatesarethenprovidedonscheduletoresearchers’privatestorage.

ScientificObjectives

An extensive number of Research Infrastructures need to publish and give access to datasetsthatmayaccumulateovertimeandneedtoremainavailablefordownloadforresearchers.Theaccumulateddatasetsarequeriedandanalyzed,leadingtonewdataresults.Keepinganeyeonthe accumulated and result datasets at the Argo data center is time consuming, thus a

Page 28: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

28

subscriptionmodelwas adopted to facilitate researchers’ needs. The subscriptionmodel doesnot require direct interactions between the researcher and possibly time-consuming analysisactions,thusallowingmoreflexibledesignandintegrationofthesystemcomponents.Inpracticethismeans that evenwhen time consuming actions can be optimized to operatewithin time-constrained requirements, theadoptedmodelalleviateseffectsof long-runningactionson thedata,especiallywhenthesizeofaccumulateddatasetsincreases.

The objective of this use case is to develop and integrate a system to access, download, andsubscribe to EuroArgo DataSets. The EuroArgo community aggregates the marine domaindatasets into a community repository from which the data is pushed to the EUDAT B2SAFEservice. The new developed service allows data registration through data identification.Optionally the community registers the subscription actions and their parameters in B2SAFE.Data could be requested through interfaces provided by EUDAT and IFREMER web sites.Subscription is managed by EUDAT and actions are processed using EGI FedCloud. Users canselect different attributes for their subscription like localization, stations, parameter, updatefrequencyandmore.

Description

Architecture

As shown in (Figure 8), the Data Subscription Service (DSS) involves the following basiccomponents:1)adataselectionportalasfrontend,2)theGlobalDataAssemblyCenter(GDAC)ofEuroArgo,3)EUDATB2SAFEstorage,4)DRIP,5)EGIFedCloudresources,and6)asubscriptionservicecomponentformanagingthesubscriptionsregisteredviathedataselectionportal.

FIGURE8.SYSTEMARCHITECTUREFOREUROARGODATASUBSCRIPTIONSERVICE.

Page 29: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

29

IFREMER provided accumulated monthly marine domain datasets to be pushed to EUDAT e-Infrastructure (B2SAFE). The datasetswere then synchronized between EUDAT and EGI Cloudresources on demand. EUDAT provides services for storage and data transfer, while EGIFedCloudprovidestheservicesforcomputingdataproductsforeachsubscription.

AWebPortalhasbeendevelopedforuserstoselectandsubscribetotheinteresteddata.Dataselection ismade throughdifferentcriteria (platformtype,measure type,parameter,platformId,timeperiodandmore).Apythonscripthasbeenprovidedby IFREMERtoextractdatawithuser’scriteria.

DRIP(theDynamicReal-timeInfrastructurePlanner,developedbyWP7,TaskT7.2)isintegratedtoexecute thedataselectionprocessusingparallel computation.DRIPcandynamicallydeployandmanageasmanyVirtualMachinesasrequiredtocopewiththeloadinordertobeabletoprocessthesubscriptionsinatimelymanner.Onceresultswereavailable,theywerepushedtoB2SAFEandtheuserwasnotifiedbyemail.

Workflow

The typical workflow is as follows: users interact with the DSS via the portal, registering toreceiveupdatesforspecificareasandtimerangesforselectedparameterssuchastemperature,salinity, andoxygen levels. TheGDAC receivesnewdatasets fromregional centresandpushesthemtotheB2SAFEdataservice.TheDSSmaintainsrecordsofsubscriptionsincludingselectedparametersandassociatedactions.DRIPplans,provisions,deploys,scalesandcontrolsthedatafiltering application. EGI FedCloud provides cloud resources to host the application. Theapplicationitselfiscomposedofamasternodeandasetofworkernodes.

When new data is available to the GDAC, it pushes them to the B2SAFE service, triggering anotificationtotheDSS,whichconsequentlyinitiatesactionsonthenewdata.Iftheapplicationisnot deployed to FedCloud then DRIP provisions the necessary VMs and network so that theapplicationmaybedeployed.Next,thedeploymentagentinstallsallthenecessarydependenciesalongwith theapplication includingconfigurations toaccess theArgodata.TheDSS signals totheapplicationmasternodetheavailabilityoftheinputparameterstobeprocessed,whereuponit partitions the input tasks into sub-tasks and distributes them to the workers. If the inputparameters includedeadlines then themasterwill prioritise themaccordingly. Themonitoringprocesskeepstrackofeachrunningtaskandpassesthat informationtotheDRIPcontroller. Iftheprogrammedthreshold ispassed,thenthecontrollerwill requestmoreresourcesfromtheprovisioner.Finally,theresultsofeachtaskarepushedbacktotheB2SAFEservicetriggeringanotificationtothesubscriptionservice,afterwhichitnotifiestheuser.

UserInterface

Shown inFigure9,users canuseDSSwebportal to subscribe to interesteddatasets.A typicalsubscriptiontaskismadeupofasetofinputs:a)anareaexpressedasaboundingbox;b)atimerange;c)a listofparametersrequired indataproducts(e.g.temperature);andd)optionally,adeadline.

Page 30: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

30

FIGURE9.DSSWEBPORTAL

Advantages

The pilot activitywas initiated by themarine research community, however, the possibility toreceiveregulartransmissionsofdata,especiallyinnear-realtime,directlyfromtheorganisationresponsible for the data collection and (pre-)processing, is very important to many largeinitiatives.Generic initiativeswill themselvesbe interested tooperatesubscriptionservices fortheir outputs, based around a trusted repository hosting synchronised versions of their datacollections.Suchaservicemayalsoallowtheprovisionofnewfeaturestoendusers,generatingmorevisibility.

ResearchInfrastructurescanbenefitfromsubscriptionservicesinseveralways.Theymaysetuponetoservetheirowndedicatedendusercommunities,eitherbypushingneworupdateddatasetstotheir"customers",orbyconfiguringasystemthatautomaticallyadvertisestheexistenceofnewdatae.g. tothosewhodownloadedpreviousversions.RIsthatrequiredatafromothersources,forexample,tocreatemoreelaborateddataproducts,canoptimisetheirinternalworkflowsbysigninguptoreceiveautomaticupdates.

Theenduserscancertainlybenefitfromsigninguptoservicesthatautomaticallyadvertisestheexistence of new versions or updates to data that they have downloaded previously – forexample annually receiving the observational data from a station forwhich they have a long-standingscientificinterest.Typically,thegreatestinterestherewouldbeforaccessingfinalisedandaggregateddataproductscreatedbyanRIatitsdatacentre.However,subscriptionservicescan also be configured to allow customised processing, bringing an opportunity for researchgroupstobenefitfromlarge-scaleclustercomputationsatthedatacentreside.

Datasubscriptionservicesareexpectedtoplayanincreasingroleinthefuture,asthenumberofdataproducersandtheirrespectiveoutputcontinuestoincreaserapidly.Themechanismtestedandimplementedbythisdemonstratorcouldcontributetothedevelopmentofbothacommonstandardfor inputstreamstoenhancementsofdigitalcollaborativespacesforresearchersanddataproviders.

Page 31: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

31

LinktotheDemonstrator

Link:https://youtu.be/PKU_JcmSskw

Contributors

● ThierryCarval,IFREMER,[email protected]● GlennJudeau,IFREMER,[email protected]● JaniHeikkinen,CSC,[email protected]● BaptisteGrenier,EGI,[email protected]● ZhimingZhao,UvA,[email protected]● PaulMartin,UvA,[email protected]● SpirosKoulouzis,UvA,[email protected]

2.5 ScienceDemonstrator5:SensorRegistry(UseCaseTC_4)Overview

The “sensor registry” aims at supporting the management of sensors deployed for in-situmeasurements. Common sensors or families of sensors are used across different researchinfrastructures, for example, oxygen optodes that are equipped on platforms in multipleresearchinfrastructures.Thegoalofthisworkistodefinecommonmethodstoaccessthesensormetadatainsuchcases.

Foursub-usecasesareconsidered:

1. Listanddiscoverspecificationsofsensorsandhardwareonthemarket.2. Managetheparkofsensorbyowner,managemaintenance(e.g.calibrations),loan,etc.3. Edit deployments, enable traceability from observation data back to the sensor and

proceduresusedforacquisition(linkwithimplementationcaseonprovenanceIC_2).4. Discover infrastructures (observation network, equipped experiment sites, etc.) and

enabletheircitation.

Page 32: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

32

The use case is applicable to the management of various types of platforms, deep-seaobservatories(e.g.,EMSO),marinegliders(e.g.,EuroGOOSgliders27)aswellassolidearth(e.g.,EPOS)oratmosphereobservations(e.g.,ICOS).

This can also be used to track usage of specific sensor models (e.g., CO2) across the RI ‘sobservationnetworks.

ScientificObjectives

TheobjectiveofStandardisedsensorrepositoriestoenable:

● Easilydiscoversensorsandtheirmetadata● Sensors and sensor observations discoverable, accessible and usable via the web via

standardisedservices● Seamlesslyintegratesensorsfromonenetworkwithsensorsfromothernetworks

ThesensorregistryisbasedonOGCSensorWebEnablementtechnologies(SWE)thathavebeendevelopedandimplementedbyarangeofnationalandEuropeanprojectsandactivities.AspartoftheFP7andH2020projectsODIP,ODIP2,OceansoftomorrowcallandtheBRIDGESprojecttheMarineSWEprofilewasdeveloped.ThemarineSWEprofileisamarinespecialisationoftheOGC SensorML (for metadata) and OGC Observations & Measurements (O&M) standards. Asignificant advancewithin these templateswas the use of controlled vocabularies to describetermsandvalueswithintheSensorMLdocuments.TheNERCvocabularyserver(NVS,partoftheEuropeanSeaDataNetinfrastructure)wasusedtohostthevocabularies.ThevocaulariesaretheWxx family of vocabularies and accessible athttps://www.bodc.ac.uk/resources/vocabularies/vocabulary_search/. In addition to the humaninterface the NVS has machine readable access methods including SPARQL and SOAP withoutputs including RDF, XML and JSON. The standards developed in the OoT projects aresummarised in the SenseOcean project joint deliverable D7.828which also included futurerecommendationsforfuturework.

KeyoutcomesofthemarineSWEprofileare:

● Standardized web services will exist for accessing sensor information and sensorobservations

● Sensorsystemswillbecapableofreal-timeminingofobservationstofindphenomenaofimmediateinterest(eventstreamprocessing)

● ISO/OGC O&M as data format and the OGC SOS as data access interface enable ECINSPIREdirectivecompliancewiththedataformat

● The associated OGC Sensor Observation Service (SOS) would be the natural way toachievefullINSPIREcompliance(thereexistsadditionalINSPIREtechnicalguidancethatdescribeshowtousetheSOSasINSPIREdownloadservice).

Ourgoalinthistestcaseistodemonstratewhichelementsofthestandardisedinterfacetodataare available and how they could potentially be integrated. The development has happened

27EuroGOOSGliders:http://eurogoos.eu/gliders-task-team/28SenseOceanprojectdeliverableD7.8:http://www.senseocean.eu/senseocean/sites/senseocean/files/documents/Deliverable%20D7.8%20Policy%20Document%20Sensor%20Development%20for%20the%20Ocean%20of%20Tomorrow_r.pdf

Page 33: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

33

acrossanumberofprojectsandthealignmentoftheresultsisnotfullyachievedbecauseeachprojecthasdifferentrequirementsanddeliverables.Consequentlyexampleswillbeshownfromdifferentactivitiesdemonstratingeachgoal intheintroduction.TheoutcomesoftheWP9TestCaseTC_4Sensorregistryinclude:

● Proposedgoaltoattempttolinkrepositoriestousersforspecificexamples● Demonstrateviabilityofthetechnologyandtestinteroperability● Actasaprecursortofutureprojectswithbroaderlinkagesandharvesting

Description

1. Listanddiscoverspecificationsofsensorsandhardwareonthemarket.For theRIs SIOS, EMSO,ANAEE, ICOS andCSEM, examples of sensor, site, stationor platformdescriptionshavebeencollectedintheirnativeformats.Theinformationmodelsandtoolsusedtomanagethedescriptionshavealsobeenshared.

FIGURE10.SENSORDEPLOYMENTGRAPHICALEDITORANDSENSORMODELDATABASEFOREMSO

FIGURE11.DATAMODELFORANAEEANDICOSSENSORDESCRIPTIONS

Regardingstandards,asampleofSIOSstationdescriptionhasbeenencoded inOGC/SensorMLformat. Examples of SSNontology encoding and translation fromSensorML to SSNhavebeenprovidedfromFixO3projectinputs.

One of the commonly used software packages for providing OGC SWE services in Europeanprojects isthe52°NorthSensorObservationService(SOS).Thisprovidesan interoperableweb-

Page 34: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

34

based interface for inserting and querying sensor data and sensor descriptions. It aggregatesobservationsfromlivein-situsensorsaswellashistoricaldatasets(timeseriesdata).Thisservercomponent iscomplementedbyanextensibleWeb-basedJavaScriptSensorWebviewerwhichallows the visualisation of different types of sensor observation data: the 52°NorthHelgolandViewer.Moreinformationisavailablefrom:https://52north.org/.

TheseconceptshavebeenusedbytheOceanidsCommandandControlprojectalongwiththeoutcomes of the Marine SWE profile to store and expose the metadata for ocean gliderdeployments to the web. These metadata are then used to automate the curation andconversionofrawgliderdatatotheEGONetCDF(http://archimer.ifremer.fr/doc/00239/34980/) formatwhich is thedataexchange formatwithin theOceanGliderNetwork.OceanidsusesadatabasethatsupportsbothSSNandSensorMLmetadatarepresentationssoisstronglyalignedwith thisdemonstrator.Thesamedatabase isbeingused forhistoricalEMSOdata (specificallythe PAP site) that is held by BODC and development is on-going to expose the metadata inSensorMLanddatainO&Mformats,thisisscheduledtobecompleteinlate2018.

2. Managetheparkofsensorbyowner,managemaintenance(e.g.calibrations),loan,…The recording of a sensor history is now technically possible and facilitated by the SensorMLtemplateproducedwithintheMarineSWEprofile.Todatethishasnotbeenfullyimplementedtotheauthorsknowledge.BODCwillbe introducingthe linkingofdocumentationtoSensorMLrecordsinthedevelopmentscheduledpriortoMarch2019.

3. Edit deployments, enable traceability from observation data back to the sensor andproceduresusedforacquisition(linkwithimplementationcaseonprovenanceIC_3).

Forusersofthe52°Northsoftware(oneoftheprimaryopensourceOGCSWEsoftwaresuites)therehasbeenaSensorMLeditorSMLE(pronouncedsmilee)thatenablesuserstointeractwithandeditSensorMLdocumentsheldinOGCcompliantSOSserversviaagraphicaluserinterface(editing isonly supported forSOSservers supporting the transactionalSOSoperations).This isfreelyavailablefrom:https://github.com/52North/smle

4. Discoverinfrastructures(observationnetwork,equippedexperimentsites...)andenabletheircitation.

Thediscovery of infrastructures requires the recently developed federated sensor observationservices. These have not been trialed or investigated on the marine domain to the authorsknowledgeatthetimeofwriting.Theymayneedrefinementforthespecificrequirementsoftheusecase,akintotheMarineSWEprofileactivity.WorkonevaluationoftheseserviceswasakeyrecommendationoftheOoTprojects(seepreviousreferenceabove).

The citation of sensors is a on-going development activity in a newly formed Research DataAlliance working group on the persistent identification of sensors (https://www.rd-alliance.org/groups/persistent-identification-instruments-wg).

Advantages

Standardisedmachine-readablesensormetadatalinksdirectlytothecatalogueandprovenanceworkinWP8ofENVRIplus,itprovidesamachine-readabledocumentcanbelinkedtodatafillingacurrentcapabilitygapintheprovenancetrace.

Page 35: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

35

Common metadata standards allow the sharing of metadata and federation between RIs,especially as the scientific community moves toward observation requirements that spanmultipleRIs

The servicehaspotential users spanning thedata lifecycle and value chain. The standards areappliedatthetimeofobservationandcanbeembeddedwithinsensors.Attheotherendofthechain theywillenableusers todynamicallyharvestanddiscoverdata forusecases suchasoilspill response.Using standard suchasOGCSOSand ISO/OGCO&Mthiswill even result in theINSPIREcompliantprovisionofobservationdata.

Usagescenariosinclude:

● Embed in the standards within sensors or data processing systems to allow theautomatedprocessinganddisseminationofdata

● Standardisedmachinereadablerecordingandlinkingofmetadatatoadataprovenancetrace

● InclusionofdatainOGCSWEservicesandendpoints● ThestandardisedsharingofsensormetadatabetweenorganisationsandRIs● Thecitationofspecificsensors(dependentontheresultsoftheRDAworkinggroup)

LinktotheDemonstrator

Video

Link:https://youtu.be/4QxTZ2iiznk

LinktoESONETyellowpages:https://www.esonetyellowpages.com/

Linktoselected52Northinstances:● 52°NorthSOSserverusedforpublishingdatacollectedintheNeXOSproject:

http://nexos.demo.52north.org/52n-sos-nexos-test/sos?request=GetCapabilities&service=SOS

Page 36: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

36

● NeXOSSensorWebViewerbasedonthe52°NorthHelgolandclient:http://nexos.demo.52north.org/client/

ExampleSensorMLoutputfromthatfollowstheMarineSWEprofile:● BODC

○ AmodelofanAanderaaoxygenoptode:http://linkedsystems.uk/system/prototype/TOOL0969/current/

○ Aninstanceofanoxygenoptode:http://linkedsystems.uk/system/instance/TOOL0969_prospect/current/

● OGS○ AninstanceofaWindMonitor-JR:

http://europa.ogs.trieste.it/OGS_SOS/SensorML_3_0/Sensor_V3_E2M3A_WIND.xml

○ AninstanceofSBE37-SMP-ODOMicroCAThigh-accuracyconductivityandtemperaturerecorder:http://europa.ogs.trieste.it/OGS_SOS/SensorML_3_0/Sensor_V3_E2M3A_CT.xml

Contributors

● JustinBuck,BODC,[email protected]● SimonJirka,52°North,[email protected]

2.6 ScienceDemonstrator 6:Newparticle formationevent analysisoninteroperableinfrastructure(UseCaseTC_17)Overview

Forthescientificcommunityinaerosolsciencesthatstudiesatmosphericnewparticleformationevents (NPFEs), this service aims to prototype how the scientific community can be deeplyintegrated with interoperable Research Infrastructures and e-Infrastructures (unless specifiedotherwisehenceforth referred toas infrastructures).The result isaknowledge infrastructure29i.e.,arobustnetworkofscientists,artefactssuchasvirtualresearchenvironmentsandresearchdata, and institutions such as research infrastructures and e-Infrastructures that acquire,maintainandsharescientificknowledgeaboutthenaturalworld.

The service demonstrates how data analysis can be exposed to researchers as a Web basedservicewhile interoperable infrastructures orchestrate everything else, specifically: (1) loadingprimary observational data into computing environments for subsequent analysis byresearchers; (2) representation of data derived in analysis using data models that employdomain-relevant community vocabularies and capture machine readable data semantics i.e.,information 30 (“meaningful data”); (3) systematic and automated acquisition of derivativeinformationininfrastructures;(4)registrationofderivativeinformationincatalogues.

ScientificObjectives

29Edwards,PaulN.2010.AVastMachine:ComputerModels,ClimateData,andthePoliticsofGlobalWarming.MITPress.30Floridi,L.:ThePhilosophyofInformation.OxfordUniversityPress(2011)

Page 37: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

37

This demonstrator aims to prototype the future of scientific data analysis on interoperableinfrastructure. It showcases how well-engineered infrastructures can provide to sciencecommunitiesdataanalysisasaservicewhiletakingcareofeverythingelsee.g.,dataconversions,curatingdataderivedinanalysis-asdescribedinthissection.

Here, the scientific community focuses on what they are most interested in and most enjoydoing (i.e., address scientific hypotheses with data analysis and interpretation) and theinfrastructure guarantees that their data are FAIR31i.e., findable and accessible by systematicandautomatedacquisitionandcataloguing;interoperablebyusingaformal,accessible,shared,andbroadlyapplicablelanguageforknowledgerepresentationandvocabulariesthatfollowFAIRprinciples; and reusable by rich description of data using domain-relevant communityvocabulariesandbytheirreleasewithacleardatausagelicense.

An importantobjective is toentirelyerase theneed formanualdatadownloadanduploadbyresearchers.Thedownloadofdatafromresearchinfrastructuresis“consideredharmful”inmostcases32. Indeed, thepracticeofdownloadingdataperpetuates the infrastructuraldiscontinuitybetween local computing environments (e.g., researchers’ workstations) and engineeredinfrastructures.Suchdiscontinuitymakesitdifficultorimpossibleforengineeredinfrastructurestomonitorworkflowsandexecutedactivities,retaininformationabouttheinvolvedprimaryandderivative data, aswell as to systematically acquire derivative data.Wewant to demonstratewhat a knowledge infrastructuremay look like whenmanual download and upload is not anoption.

Asecondobjectiveistounravelwhatoccursinthedatausephaseoftheresearchdatalifecycle.Studied for a concrete use case in aerosol science involving infrastructures and the relevantscientificcommunity,weanalysethedetailsofascientificdataanalysisworkflowandtherolesofboththescientificcommunityand infrastructuresaselementsofknowledge infrastructures.Thedemonstrator showcaseshowprimaryobservationaldataacquired, curatedandpublishedbyaresearchinfrastructureareanalysedbythescientificcommunityinthedatausephaseandhow such analysis generates derivative data. Traditionally, derivative data are poorlystandardized in the community and generally reside on the workstations of researchers. Thedemonstrator shows how,when data analysis is performed on interoperable infrastructures -thus avoiding the aforementioned infrastructural disconnect - infrastructures can guaranteesystematicandautomatedacquisitionofderivativedata,thusensuringastronglinkbetweenthedata use phase and the (derivative) data acquisition phase of the research data lifecycle. Thedemonstrator emphasises that there indeed is a cycle for research data fromprimary data toscientificknowledgecommunicatedinscholarlyliterature.

Athirdobjectiveistoconnect,i.e.,deeplyintegrate,ascientificcommunitywithwell-engineeredinfrastructures.Inthefuture,thissocialobjectiveneedstobegivenmoreemphasis.Whiletherecontinuetobeissuestoironout,thetechnicalelementsofknowledgeinfrastructuresaretodaymatureenough tomove into full-scaleoperation.Theexperiencescollectedwith thisusecase

31Wilkinson,M.D.etal.:TheFAIRGuidingPrinciplesforscientificdatamanagementandstewardship.ScientificData3(mar2016).https://doi.org/10.1038/sdata.2016.1832Atkinson,M.,Filgueira,R.,Spinuso,A.,Trani,L.:Downloadconsideredharmful(2018),manuscriptinpreparation.

Page 38: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

38

underscorethatthesocialelementsarelaggingbehindandareatbestonlyslowlystartingtobereceptive to the novel approaches developed here, at least in the earth and environmentalsciencesandespeciallyforthe longtailofscience i.e., forcommunitiesthatdo“smallscience”with “little data”. This demonstrator is a contribution to integrating the social and technicalelementsofknowledgeinfrastructures.

Description

UsageScenarios

JaanaandMikkoaretwofreshgraduatestudentsofaFinnishresearchgroupthat,amongotherthings, study new particle formation events. New particle formation events are atmosphericevents whereby aerosol particles form and grow over the course of a day at specific spatiallocations.Theseeventsarestudiedtoincreaseourunderstandingfortheformationprocessandtoquantifytheformedaerosol.

PriortoJaanaandMikko,earliergenerationsofstudentshavedevelopedPythonsoftwareandpublishedthecodesasa JupyterNotebookonGitHub.Thesecodeshavebecomethede factostandard software used by the scientific community. Jaana andMikko are instructed by theirsupervisortousethesecodesfortheiranalysisandhaveobtainedfurtherinstructionsonhowtoexecutetheanalysisone-Infrastructuresbytheirpostdoccolleagues.

FIGURE12.JUPYTERNOTEBOOKFORNEWPARTICLEFORMATIONEVENTCLASSIFICATIONASSEENFROMTHE

D4SCIENCEVRE.THEVREINCLUDESTHEEGIJUPYTERLABCOMPUTINGENVIRONMENT.

ThetwograduatestudentscreateanaccountonD4ScienceandaregivenaccesstotherelevantVirtualResearchEnvironment(VRE).TheVREgivesthemaccesstoJupyterLab,servicedbyEGI.ThestudentsusetheJupyterLabTerminaltoclonetherequiredJupyterNotebookfromGitHubintotheirownworkingspace.Nowthestudentsarereadytoanalyseprimarydata(i.e.,particlesizedistributiondata) inorder todetect anddescribenewparticle formationevents thatmayhaveoccurredat specificplacesanddays. Figure12displays the JupyterNotebookas seenbyJaanaandMikko.Alltheyneedtodoistoselectadayandplaceandinterpretthecorresponding

Page 39: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

39

visualization of primary data. For days and places at which an event occurred, the result ofprimarydata interpretation is adescriptionof theevent, recording inparticular thebeginningandendtimesaswellastheclassificationoftheevent,whichfollowsaschemeacceptedbytherelevantscientificcommunity.

Architecture

Figure 13 provides an overview of the architectural design of the service implementation.ResearchersaccessJupyterLaboperatedontheEGIe-Infrastructure(providedbyWP9)inorderto analyse primary data for the purpose of new particle formation event detection anddescription. JupyterLab is accessible from the corresponding D4Science Virtual ResearchEnvironment33(VRE).Having cloned the required JupyterNotebook34fromGitHub, researcherscanstarttoanalyseprimarydatatodetectanddescribenewparticleformationevents.

FIGURE13.ARCHITECTURALDESIGNOFTHESERVICEIMPLEMENTATION.

WorkflowandInterfaces

The analysis consists of two main steps. Both are implemented as D4Science Data Mineralgorithms and are accessed from within the Jupyter Notebook, programmatically via a WPS(OGCWebProcessService)interface.Givenadayandplace,asconfiguredbytheresearcher,the

33https://services.d4science.org/group/particleformation/34https://github.com/markusstocker/pynpf-d4science/blob/master/classification.ipynb

Page 40: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

40

firststepfetchesandvisualizesprimarydata.TheprimarydataarepublishedbySmartSMEAR35,a“datavisualizationanddownload tool for thedatabaseofcontinuousatmospheric, flux, soil,treephysiologicalandwaterqualitymeasurementsatSMEARresearchstationsoftheUniversityof Helsinki.” SmartSMEAR is developed and provided in collaboration with CSC(https://www.csc.fi/home), the Finnish national supercomputing center, who also host theSMEARdata.SmartSMEAR is thusan (software)artifactof theSMEAR36(Station forMeasuringEcosystem-AtmosphereRelations)researchinfrastructure(RI).SmartSMEARprovidesanAPIfordataaccess.Theprimarydatacanthusbefetchedand loaded intoPythondatastructures inaprogrammaticmanner.

Given theprimarydata, theDataMineralgorithmcreatesa visual representation (see JupyterNotebookinFigure12).Thisvisualizationisusedbyresearcherstodecidewhetheranewparticleformationeventoccurredontheselecteddayandplaceaswellastodescribetheeventforitsproperties e.g., beginning and end times, classification, among others. The visualization is aconventional PNG image which is made accessible by D4Science. The Data Miner algorithmreturnstoJupyterNotebooktheURLforthelocationoftheimage.Thenotebookthenvisualizestheimagebyretrievingitfromthegivenlocation.

Assuming an event occurred on the selected day and place, the result of interpreting thevisualization isadescriptionof theevent.Since researchersherestudynewparticle formationevents, in this contextaneventdescription is information i.e.,meaningfuldata. Suchdataarethusrichinsemantics.

In the second step, an additional Data Miner algorithm records the event description. Theresearcher merely records the day (e.g., 2013-04-04), place (e.g., Hyytiälä), beginning (e.g.,11:00),end(e.g.,12:30)andclassification(e.g.,ClassIa).Ratherthanrecordingthesestringsintoarowofatable,thealgorithmcreatesanRDF(ResourceDescriptionFramework)descriptionofthe event. Themeaning of the strings is thus recorded aswell. The result is a self-describinginformationobject(seeFigure15).ThecurrentimplementationusestheLODE37ontology,whichprovidesaconceptEventand relations for timeandspace.Wearecurrentlyworkingwith thescientificcommunitytodevelopamoreappropriateconceptofnewparticleformationevents38.ThisconceptwillbepartoftheEnvironmentOntology39.Whenthisprocessiscompleted,wewillmodify the implementation to reflect the conceptualization developed by the scientificcommunity.

Finally,theRDFdescriptionisregisteredasaresourceontheCKANbasedD4Sciencecatalogue.This is done automatically by theDataMiner algorithm. The data derived in analysis are thusautomaticallycataloguedwithcorrespondingmetadata thatsupport search.Figure14showsanumberofcataloguedresourcesforHyytiälä(FI)onvariousdays.Figure15showsthemetadataof a selected resource, specifically the one for the description of the new particle formationevent thatoccurredatHyytiäläonApril 4, 2013.Here, users are also given the location (URL)fromwhichtheeventdescriptioncanbeaccessed.Finally,Figure16showstheRDFdescription

35https://avaa.tdata.fi/web/smart36http://www.atm.helsinki.fi/SMEAR/37http://linkedevents.org/ontology/38https://github.com/EnvironmentOntology/envo/issues/60239http://www.obofoundry.org/ontology/envo.html

Page 41: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

41

(inTurtlesyntax)ofthenewparticleformationeventthatoccurredatHyytiäläonApril4,2013,obtainedbyaccessingtheURLonD4Science.

TheRDFdescription ismachine-readable,uses formal languages for knowledge representation(RDF,RDFS,OWL),andrepresentsthesemanticsofthedata(theday,place,beginning,endandclassification) derived in analysis using domain-relevant community vocabularies. Specifically,beginning and end data elements (10:30 and 12:00, respectively) are described as an OWL-Time40IntervalwithbeginningandendOWL-TimeInstants,relatingtotherespectivetimestampsas XSD DateTime. Also notable is that the place is identified as a GeoNames 41 resource(http://sws.geonames.org/656888/) forHyytiälä, Finland. Thedataderived in analysis are thusrichly annotated. Indeed, the description adopts Linked Data principles to relate to resourcesdefinedelsewhere(here,GeoNames.org).

FIGURE14.NEWPARTICLEFORMATIONEVENTDESCRIPTIONSREGISTEREDASRESOURCESINTHECKAN

BASEDD4SCIENCECATALOGUE.

40https://www.w3.org/TR/owl-time/41http://www.geonames.org/

Page 42: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

42

FIGURE15.METADATAANDACCESSURLOFTHERESOURCEDESCRIBINGTHENEWPARTICLEFORMATION

EVENTTHATOCCURREDATHYYTIALA(FI)ONAPRIL4,2013.

FIGURE16.THERDFDESCRIPTIONOFTHENEWPARTICLEFORMATIONEVENTTHATOCCURREDATHYYTIALA

ONAPRIL4,2013ASRETRIEVEDFROMTHED4SCIENCECATALOGUE.

Page 43: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

43

Advantages

Theideaoftransformingdataintoknowledgeispopularamongresearchinfrastructures.Amongothers,theIntegratedCarbonObservationSystem(ICOS)researchinfrastructureusesthetagline"knowledgethroughobservations"42.TheEuropeanMultidisciplinarySeafloorandwatercolumnObservatory(EMSO)suggeststhattheresearch infrastructureplays"amajorrole insupportingtheEuropeanmarinesciencesandtechnology[...]toenteranewparadigmofknowledgeintheXXI Century"43. As an example beyond research infrastructures, the European Open ScienceCloud(EOSC) isenvisionedtobeanenvironmentthatenablesturningever increasingamountsof data "into knowledge as renewable, sustainable fuel for innovation in turn tomeet globalchallenges"44.

Beyondthespecificsof thedevelopedusecase inaerosolscience, thisdemonstrator isaclearcontribution to this idea. It demonstrates a possible architecture of an infrastructure that“transformsdataintoknowledge”.Essentialfactorsofsuchknowledgeinfrastructuresare(1)thedeep integration of science communities with research and e-Infrastructures; and, as animportant technical factor, (2) the curation of formal (i.e.,machine-readable) data semantics.The deep integration of science communities is essential because, never mind the Age ofArtificial Intelligence, in science it is researchers that transform data into knowledge. As thisdemonstrator underscores, deep integration with infrastructures allows for a range of novelpossibilities, inparticularenableresearchers to focusondataanalysisand interpretationwhileleaving data access and transformation from and to systems, the representation of data andtheir semantics following community standards, the capture of provenance information, andotherinfrastructuralaspectstoinfrastructures.

The curation of data semantics is an additional essential, technical, factor. Information isinherently semantic and becomes knowledge through learning (i.e., internalization). Whenresearchers interpretprimarydata, the resultingderivative information is thus rich inmeaning(relative to the context of data analysis). As demonstrated here, the data tuple (2013-04-04,Hyytiälä,11:00,12:30,ClassIa)hasaspecificsemantic.IntheAgeofSemantics,itisparamounttomoveonfromdatastructuresthatmerelycapturesuchatupleofvaluestodatamodelsthatalso support the machine readable representation of the meaning of these values. Weunderstand that it can’t be expected from researchers to (manually or, to this date also,programmatically)onbroaddisciplinaryscaletranslatedatatuplessuchastheoneaboveintoadescriptionasshown inFigure16. Indeed,akeypointofthisdemonstrator is toshowcasethepossibility of engineering such translation into infrastructures so that the whole is entirelyinvisibletoresearchers.Suchinvisibilityreflectswellthenatureofinfrastructure45.

Analysing17ENVRIplusresearchinfrastructures,Hardistyetal.46havereportedthatatthetime(2016)onlythreeidentifycorecompetenciesinthedatausephaseoftheresearchlifecycle.As

42https://twitter.com/ICOS_RI/status/80315698234972979343http://www.emsodev.eu/work_packages.html44https://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf45Star,S.L.:Theethnographyofinfrastructure.AmericanBehavioralScientist43(3),377–391(1999).https://doi.org/10.1177/0002764992195532646Hardisty,A.etal.(2016).AdefinitionoftheENVRIplusReferenceModel.ENVRIplusDeliverable5.2.http://www.envriplus.eu/wp-content/uploads/2015/08/D5.2-A-definition-of-the-ENVRIplus-Reference-Model.pdf

Page 44: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

44

per thedefinitionof theENVRIReferenceModel47,datause is thephase inwhich researchersuse data, potentially producingnew researchdata. This descriptionbest describes the activitysupported by this demonstrator, which thus focuses on the data use phase. While only fewENVRIplusresearchinfrastructuresidentifycorecompetenciesinthedatausephase,itissurelycorrectthatallresearchinfrastructuresservethedatausephase.Indeed,theirveryexistenceistoservethisphase,serveresearchers’datause.Akeykindofuseisarguablydataanalysisandinterpretation. Again, beyond the specifics of the developed use case in aerosol science, wearguethatthisdemonstratorshowcasesonepossibleinterplaybetweenresearchinfrastructuresand e-Infrastructures in the data use phase. We argue that the developed architecture isapplicabletoother(ENVRIplus)researchinfrastructuresandthesciencecommunitiestheyserve.Hence, the principles developed by this demonstrator are broadly applicable. Naturally, thespecifics(e.g.,thevocabularies,thenotebook,thecataloguing,etc.)needtobeadaptedtomeetthe requirements of other scientific communities. The architecture and implementationprinciples(e.g.,thedesignandtechnologiesused)are,however,broadlyapplicableandthusofrelevancetootherorprobablymostifnotall(ENVRIplus)researchinfrastructures.

Since FAIR data is on the global agenda of infrastructures, funders and other institutions, weunderscore that this demonstrator significantly contributes to implementing this agenda bypromoting thenotionof “FAIRbyDesign” -weavingdata FAIRness into infrastructures’ fabric.Thedemonstratorbuildson theprinciplenot to leavemakingdata FAIR to researchersbut toguaranteeitbydesignofwell-engineeredinfrastructures.Wearguethattheremovalofmanualdownloadanduploadofdatafromandtosystemsisacrucialfactortothiseffect.

Naturally, the demonstrator is first and foremost of primary interest to a specific scientificcommunity, namely the one consisting of the various aerosol research groups that study newparticle formation events. To the best of our knowledge, the globally most renown researchgroupinthisareaistheoneledbyProf.MarkkuKulmalaatUniversityofHelsinki48.Prof.Kulmalaand some of the postdocs in his group have been involved in the developments of thisdemonstrator.Mostimportantly,postdocshavebeenactivelyinvolvedinthedevelopmentofaconceptualization of new particle formation events and a corresponding concept of theEnvironment Ontology. Naturally, in its current stage the demonstrator is a prototype toshowcasetothescientificandinfrastructurecommunitieswhatispossibleusingstateoftheartinteroperableinfrastructures.Atransitioninpracticefromhowdataanalysisiscurrentlydonetosuchinfrastructuresasdemonstratedhererequiresfurtherworkaswellasfurtheracceptancebythe scientific community. While we think to have reached an important milestone with thisdemonstrator, we cannot claim to know if and when such a transition will occur, for thisscientificcommunityorbeyond.Clearer is,however, the imperativeof the transition towardapracticeasdelineatedbythisdemonstrator.

LinktotheDemonstrator

● Instructions:https://github.com/markusstocker/pynpf-d4science/blob/master/README.md

● VirtualResearchEnvironmenthttps://services.d4science.org/group/particleformation/

47https://envri.eu/rm48https://www.helsinki.fi/en/inar-institute-for-atmospheric-and-earth-system-research

Page 45: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

45

● Blogposthttp://markusstocker.com/data-analysis-on-interoperable-infrastructure/

● Video

Link:https://youtu.be/ra9W7b5DbgI

Contributors

● MarkusStocker,TIBandMARUM/PANGAEA,[email protected]● MarkusFiebig,NILU,[email protected]● LeonardoCandela,CNR,[email protected]● GiuseppeLaRocca,EGI,[email protected]● EnolFernandez,EGI,[email protected]● AlexHardisty,CU,[email protected]

2.7 Science Demonstrator 7: gCube-based VRE for MosquitoDiseasesStudy(UseCaseSC_2)Overview

ThisdemonstrationillustrateshowaLifeWatchresearchercaneasilyuploadandintegrateanR-basedalgorithminD4science,makingitavailabletootherresearches,inparticularmembersofthe VRE in which the algorithmwas published. Once published, researchers can discover thealgorithmanduseitwiththeirowndata.Itisalsopossibletoadaptthealgorithmandtoshareimprovedversions.Whenprocessingdata-intensiveanalysisalgorithms,thecomputationcanbeoutsourcedonfederatedresources,suchasthoseprovidedbytheEGIe-Infrastructures.

ScientificObjectives

Thescientificvisionofthisusecaseistoenableamoreefficientmanagementofmosquito-bornediseases and nuisancemosquitoes.Mosquito-borne infections are among themost importantnewandemergingdiseasesgloballyandinEurope,andinordertopredictdiseasestransmissionareasstatisticalcorrelationapproachesareused.

LifeWatchRIprovidesadvancedICT,suchasBioVel,supportingbiodiversityresearch.However,itcurrently only provides standard algorithms for data processing. There is a need to support

Page 46: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

46

individual researchers’ requests, e.g., import a new set of hydrological data layers into theanalysis, add new algorithms that handle presence/absence into analysis etc., and a need foraccesstoCloudresources,e.g.,toexecutealargenumberofanalyticalcyclesformanyspeciesunderdifferentclimatescenarios.

Theseobjective shouldbe achieved following the technical visionof supporting researchers incombining biological and hydrological data in a collaborative and evolving Virtual ResearchEnvironment (VRE) allowing intensive statistical computations: researchers should be able toeasilyshareandusealgorithmsthattheycanadaptandusewiththeirowndata.

Description

ArchitectureTheproposedservicearchitectureisshowninFigure17.Itcombinesdifferentinfrastructures:ata lower layer is the LifeWatchRI, containing the Swedish LifeWatchPortal that provideshigh-qualitybiologicaldataformosquitospecies,andthecommunitydatarepositoriesthatpreservesenvironmental information and a series of ecological modelling algorithms. Datasets to beexploited include speciesdata (95,730abundancemeasurements fromSweden,Denmark, andGermanyfor40disease-carryingspeciesin2016),andhydrologicaldata(generatedbyaregionalhydrologicalmodelusing15landusetypesand8soiltypes).

FIGURE17.USECASEARCHITECTUREINCLUDESTHREELAYERS:1)PHYSICALINFRASTRUCTURE,2)E-INFRASTRUCTURE,AND3)VRE

At themiddle layer is theEGIe-infrastructure,whichprovidesCloudcomputationand storageresourcessupportingdata-intensiveworkflowexecutions.

At the top layer is theD4ScienceVREand theBiodiversityVirtuale-Laboratory (BioVel)portal,thatprovidehigh-leveluserinterfaces.BioVel49isasoftwareenvironmentthatassistsscientistsin collecting, organising, and sharing data processing and analysis tasks in biodiversity and

49BioVel:https://www.biovel.eu/

Page 47: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

47

ecologicalresearch.TheservicecomponentsoftheplatformincludeaBiodiversityCatalogue(alibrarywithwellannotateddataandanalysisservices),thedataprocessingenvironments(suchas RStudio for creating R programs), a workbench (for assembling data access and analysispipelines), themyExperimentworkflow library (that storesexistingworkflows), and theBioVelPortal(thatallowsresearchersandcollaboratorstoexecuteandshareworkflows).

The existing BioVel platform can generate environmental values from species occurrences,however, it only provides standard analysis algorithms. Integrating theD4Science and gCube -basedVREcanenrichthefunctionalityoftheLifeWatchICTtoallowdynamicmodeling.

UserInterfaceTheD4Science/gCube-basedVRE formosquitodiseasestudyhasbeensetupwith thesupportfrom T7.1. The interfaces are shown in Figure 18. It provides a programming environment(shown in Figure 18, b), and it allows biodiversity researchers to develop and compileown/customisedanalysis algorithmsusingR, CLI etc.A researcher candecide to sharehis/herdata, algorithms, orworkflows by publishing it in the group area (shown in Figure 18, a) thatenablessocialcommunicationsviamessages,comments,etc.

a VREAREAFORSHARINGDATA,ALGORITHMS,ANDWORKFLOWS

b. VREAREAFORDEVELOPINGANANALYSISALGORITHMFIGURE18.VREINTERFACESFORMOSQUITODISEASESTUDY

Advantages

Page 48: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

48

UsingtheVRE,thereisnomoreneedformanualsharingofdataandalgorithms.Informationisalways synchronized, anddata andalgorithmsare joined in a singleplace.Users canenjoy aneasyanduserfriendlyaccessinterface.TheD4Science/gCube-basedVREhasaninterfacetoEGICloud/HTC resources. If needed, it can outsource the computation on the large-scale e-Infrastructurethatcanhandlecomputationinparallelandstoreandsharelargevolumesofdata.

TheintegrationservicecanbringaddedvaluetotheLifewatchcommunity. Itmakes itpossibleforindividualresearcherstorepeatandreusealgorithmsatwill,runtrendanalysis,andaddnewparameters and custom data. The VRE provides provenance registration that improvesreproducibility. TheVRE also allows retention of computation results in the user’sworkspace.Thismakesitpossibletoeditandadaptalgorithms.

The integration service also brings added value to ENVRIplus community. Enabling individualresearcherstosharedataand/oralgorithmsiscommontomanyENVRIplusRIswherecurrentlydataisprocessedusingstandardmodels.ResearcherswanttousedifferentanalysismodelsandtheyneedaVREtoworktogether.

This pilot investigation tested and validated WP7 technology. The demo illustrates theintegrationsolutionsoflinkinggCubeVREtoLifeWatchRIandtotheEGIe-Infrastructure.Thereare also some lessons learned from the pilot activities: The D4Science/gCube VRE is easy forsimplealgorithms.Itneedsintegrationeffortsforcomplicatedalgorithms,thatrequestsdomainresearcherstohavetechnicalskillstoworkwithdifferenttechnology.

LinktotheDemonstrator

This demonstrator illustrates a proof of concept of the proposed architecture that usesD4Sciencetosetupacommunity-centricVRE,allowingresearchers toshareasimplealgorithmwithsomedataandallowingrunningcomputationsonEGIFedCloude-Infrastructure.ThedatawasgatheredandproducedbytheinvolvedRIsandthedynamiccomputationwasexecutedonEGIFederation’sresources.

Link:https://youtu.be/XtcCxkFX98I

Page 49: D9.2 Service deployment in computing and data e ... · methods to access the sensor metadata in such cases. The sensor registry applies the design principle of data catalogue developed

49

Contributors

● BaptisteGrenier,EGI,[email protected]● MatthiasObst,SwedishLifewatch,[email protected]● LeonardoCandela,CNR-ISTI,[email protected]● GianpaoloCoro,CNR-ISTI,[email protected]