13
The NOMAD (Novel Materials Discovery) Laboratory – a European Centre of Excellence Deliverable 6.2 Final report on the HPC platform Deliverable No: 6.2 Expected Delivery Date: 31/10/2018, M36 Actual Delivery Date: 20/11/2018, M37 Lead Beneficiary: CSC - IT Centre for Science Ltd. Contributing Beneficiaries: Max Planck Society (MPG), Barcelona Supercomputing Centre (BSC), Leibniz-Rechenzentrum (LRZ), Humboldt-Universität zu Berlin (HUB), Aalto Univeristy (AALTO) Authors: Atte Sillanpää (CSC), T. Zastrow, H. Lederer (MPG-MPCDF), Lauri Himanen (Aalto), Ruben García-Hernández (LRZ)

The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

TheNOMAD(NovelMaterialsDiscovery)

Laboratory–aEuropeanCentreofExcellence

Deliverable6.2

FinalreportontheHPCplatform

DeliverableNo:6.2

ExpectedDeliveryDate:31/10/2018,M36ActualDeliveryDate:20/11/2018,M37

LeadBeneficiary:CSC-ITCentreforScienceLtd.

ContributingBeneficiaries:MaxPlanckSociety(MPG),BarcelonaSupercomputingCentre(BSC),Leibniz-Rechenzentrum(LRZ),Humboldt-UniversitätzuBerlin(HUB),AaltoUniveristy

(AALTO)

Authors:AtteSillanpää(CSC),T.Zastrow,H.Lederer(MPG-MPCDF),LauriHimanen(Aalto),RubenGarcía-Hernández(LRZ)

Page 2: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

Copyright 2016/2017/2018 by theNOMADConsortium. The information in this document is proprietary totheNOMADConsortium.This document contains preliminary information and is not subject to any license agreement or any otheragreementwiththeNOMADConsortium.Thisdocumentcontainsonlyintendedstrategies,developments,andfunctionalitiesandisnotintendedtobebinding upon to any particular course of business, product strategy, and/or development oftheNOMADConsortium.TheNOMADConsortiumassume no responsibility for errors or omissions in this document. Furthermore,theNOMADConsortiumdoes not warrant the accuracy or completeness of the information, text, graphics,links,orotheritemscontainedwithinthismaterial.Thisdocumentisprovidedwithoutawarrantyofanykind,eitherexpressor implied, includingbutnot limitedtothe impliedwarrantiesofmerchantability, fitnessforaparticular purpose, or non-infringement. TheNOMADConsortiumshall have noliabilityfor damages of anykind includingwithout limitationdirect, special, indirect,orconsequentialdamages thatmayresult fromtheuse of these materials. This limitation shall not apply in cases of intent or gross negligence. Thestatutoryliabilityforpersonalinjuryanddefectiveproductsisnotaffected.Inaddition,thematerialspresentedandviewsexpressedherearetheresponsibilityoftheauthor(s)only.TheEUCommissiontakesnoresponsibilityforanyusemadeoftheinformationsetout.D6.2FinalreportontheHPCplatform 2

ExecutiveSummaryThroughworkpackage (WP)6, theNOMADtechnologyplatformhasbeendesigned, implemented,deployedandputintooperation.Afterdefininganddevelopingthenecessaryoveralldataworkflow,adevelopmentplatformwasfirstputinplaceforallpartnerstostartwiththeirspecificdevelopmentwork. After development of the first NOMAD services from otherWPs, a production platform forservingtheenduserwasinstalledandoperatedbyWP6.Thisproductionplatformhasbeenfurtheroptimizedinlinewithuserneeds,securityandredundancyrequirements.AnessentialfeatureoftheNOMAD technology platform is the centralized usermanagementwith single sign on capability. Amaster instance of the NOMAD technology platform is installed and operated at MPG-MPCDF inGermany,whileacloneofselectedpartsoftheNOMADtechnologyplatformisoperatedatCSC inFinland. The modular design of the NOMAD technology platform allows for easy installation andoperationofadditionalinstancesatfurtherlocations.

Page 3: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 3

TABLEOFCONTENTS1 Introduction................................................................................................................4

2 TechnologyPlatformanditsEvolution........................................................................42.1 EvolutionofthePlatform.................................................................................................42.2 TheFinalTechnologyPlatform.........................................................................................4

2.2.1 VirtualMachinesandContainers....................................................................................62.2.2 Storage............................................................................................................................72.2.3 Authentication&authorization.......................................................................................82.2.4 Monitoring.....................................................................................................................102.2.5 DeploymentoftheNOMADproductionplatform.........................................................10

2.3 TechnologyPlatformoptimization..................................................................................102.3.1 LoweringtheBarrierforAccessingNOMADServices...................................................102.3.2 SecurityAspects............................................................................................................112.3.3 SoftwareDeploymentAutomation...............................................................................112.3.4 ExtendedComputeservices..........................................................................................122.3.5 ImprovingtheEncyclopediaIngestingProcess.............................................................12

3 LessonsLearnedandConclusion................................................................................13

RevisionHistory

Version1.0,submitted20/11/2018

Originalversion

Page 4: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 4

1 IntroductionTheobjectivesofWP6wereto:

• define and develop the necessary overall data workflow for theNOMAD Laboratory CoE,withinputfromtherespectivescientificpartners,

• deployandoptimizeitsrespectiveapplicationswithafocusonHPCsystemsandmapthemtoanadequateinfrastructure,sothateffortandresourcesinWP6thussupportallscientificWPs,and

• operatetheunderlyinginfrastructurefortheMaterialsSciencecommunity.

Thisdeliverable,D6.2‘FinalreportonHPCplatform’,describesthetechnologyplatformdeliveredattheendoftheproject.Itexploreshowtheplatformchangedandevolvedduringtheprojecttomeetuserneeds,andwhatlessonscanbelearntforfutureprojectsinsimilardomains.

2 TechnologyPlatformanditsEvolution

2.1 EvolutionofthePlatform

Thedesignand implementationof thedevelopmentplatformwas initiatedat theprojectstartandcarried out as described in detail in D6.1 ‘HPC platform requirements and architecture report’.Essentialrequirementsandfeaturesweretosupportthe:

a) needsofWP1forthecreationofthenormalizeddatafromtheRepositorydata,b) implementationoftheNOMADEncyclopediabyWP2,c) integrationofvisualizationservicesbyWP3,andd) Big-Dataanalytics,includingmachine-learningapproaches,byWP4.

Accordingtothisdesign,themastersetupofthecompletedevelopmentplatformwasimplementedbyanddeployedatMPG-MPCDF.Thissetupishighlyflexibleandcanbeeasilyadaptedtoincreasingneedsboth for increased resourceandsoftware requirementsand -optional -extensions toothercomputingcenters.

2.2 TheFinalTechnologyPlatform

The final technology platform evolved as a production platform from the development platform,according to the requirements. The production platform was implemented in addition to thedevelopment platform according to Figure 1. The development platform has been maintainedalongsidetheproductionplatformforon-goingdevelopmentandtestingpurposes.

Page 5: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 5

Figure1:Evolutionoftheproductionplatformfromthedevelopmentplatform.

NOMAD services are directly accessed with a web browser without the need for any local clientinstallations. Inaddition, it isalsopossibletoaccesse.g. theEncyclopediaAPIprogrammaticallyviatheRESTfulinterfaceandtheNOMADRepositoryandArchivecontentsdirectly.Someoftheservicesare accessible without authentication, while some require it. The common user management forusingNOMADservicesisdescribedinsection2.2.3.

Thus,fromauserperspective,mostoftheservicescanbeaccessedeitherfromthemainwebsiteorsites linkedto it,while thebackendallows forusingdistributedresources fordeployment.The fullset of services has been put into production at MPG-MPCDF where the largest part of NOMADhardware resourceshasbeenprovided. Through the flexibledesign, the system is ready for setupalsoatotherNOMADsides,asawholeoronlyinparts.Thisgivesadditionalflexibilitytosettingupthe services locally, reduces the capacity requirements, adds redundancyand resilienceand finallylowersthebarriertodeploytheservices.ThisalsomakesiteasierforscientistsoutsideofNOMADtotakesometoolsforlocaldevelopments.ThecurrentsituationofdeploymentofdifferentservicesisshowninFigure2.

Page 6: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 6

Figure2:CurrentInfrastructuresetup.Themainwebpageisanentrypointtoallservices.TheEncyclopediaAPIbackend runsbothatMPG-MPCDFandatCSC. The loadbalancer redirectsqueries to them.Theblueservicesareonline,thegreyservicesareon"standby",i.e.thefunctionalityexists,butresourceshavenotbeenallocatedandgettingthemonlinerequiressomemanualwork.

Theproductionplatformrunsthefollowingservices:

• TheNOMADRepository -thedatabaseincludingtheuploadedmaterialssciencecalculationinputandoutputs,theGUIandAPItoqueryanduploadthedataandthelinksforwardtotheWP1workflow.

• The parsing infrastructure (WP1) - the scientific code-specific libraries that convert theuploadsintocode-independentdataintheNOMADArchive.

• TheArchive-databaseprovidingaccesstothecode-independentmaterialsdata.• TheEncyclopediaGUIandAPI(WP2),includingtheloadbalancer.• The Remote Visualization service (WP3) providing GPGPU powered visualization service of

local high volume data directly via a browser and functionality to convert data for VRviewing.

• TheAnalyticsToolkit(WP4)withBig-DataAnalyticsNotebooksforqueryingandanalyzingthematerialsdatafromtheArchive,and

• IdentityProvider-userregistrationandauthenticationservice.

Due to the number of services and distributed character of the NOMAD Laboratory CoEinfrastructure,a"SingleSignOn”(SSO)Systemforexternalusersbecamenecessary(see2.2.3).AnSSOhandlesuseraccountsinacentralservice("IdentityProvider"(IDP))andvalidatesuserloginsforanyNOMADservice("ServiceProvider"(SP)).

2.2.1 VirtualMachinesandContainers

Therequirementforeasyscalabilityhasbeenmetbyextensiveusageofvirtualmachines(VMs)andcontainersorchestratedwithKubernetes,OpenStack,Ansibleandotherscripts.Figure3presentsthedeploymentofdifferentNOMADservices.TheRepositoryrunsdirectlyonVMs,whereastheparsinginfrastructure (WP1), the remote visualization (WP3) and the Analytics Toolkit (WP4) all run onDockercontainers,managedbyasharedKubernetescluster.Wheneverauserlogsin,asetofnew

WP1Parsing

WP2Encyclopedia

WP3Remote

Visualization

WP4BigDataAnalysis

CentOS VMsMPG-MPCDF

vGPUs

User,browser

www.nomad-coe.eu

Repository

WP4BigDataAnalysis

WP2Encyclopedia

WP3Remote

Visualization

WP2Encyclopedia

WP3Remote

Visualization

WP4BigDataAnalysis

CentOS VMsCSC

vGPUs

Load balancer

IdPSSO

Inoperation

Stand by

Repository(Raw data)

Archive(Normalized)

Page 7: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 7

containersaredeployedbyKubernetes.TheEncyclopediausesacombinationofbothtechnologies:itis runondedicatedVMs,where itsdifferentservicesareencapsulated inDockercontainers.Theseservices are the GUI-webserver, the API, and the database for each production system, which isextendedbythedataprocessingforthedevelopmentsystems.

Containers: The use of Kubernetes as an orchestration engine forDocker containers allows for aneasy future development and deployment path. It has become widely used and is likely to besupported in the future. During this reporting period also the Encyclopedia, VR-conversion scriptsand Remote Visualization Services have been migrated to run on Docker containers, furthersimplifying the Infrastructure. This flexibility allows for optimal use of existing heterogeneousresourcesatparticipatingHPCsiteswithouttheneedforallsitessetupan identical infrastructure.FromanHPC-center point of view, the cloud (VMs) and container solutions are also preferred, asthey enable dynamic scaling of the committed resources, thus minimizing idle time and makingefficientuseofresources.

VMs:ThefirstNOMADVMsrunningatMPG-MPCDFwerebasedonSLES.ThefirstservicesrunningatCSCcloudinfrastructurewerebuiltonCentOSVMs.Toharmonizethebuildandmanagementscripts,allVMsweremigrated to supportCentOS. It is awell-supported flavorwithout license constraintsandthereforecanbeusedatanysitewithoutadditionalcosts.

Figure3:NOMADLaboratoryCoEInfrastructurePlatform.TheplatformisbasedonVMs.TheRepositoryrunsdirectly on VMs, whereas the Encyclopedia services (WP2) run on containers on separate VMs, and theparsing infrastructure (WP1), the Remote Visualization (WP3) and theAnalytics Toolkit (WP4) all run onDockercontainers,managedbyasharedKubernetescluster.

2.2.2 Storage

Adifferentiationbetweentwokindsofstoragehasbeenmade:

• Storage for developers on the development platform: storage resources for storing andcreating the data on which theNOMAD Laboratory CoE is based on; this includes sharedstoragesystemsaswellaslocalstorages,mounteddirectlytoserversorVMs,and

• Storagefortheproductionplatform:datamadeavailablethroughtheproductionplatformisseparate from the storage of the development platform (MPG-MPCDF) or a separate filesystem(CSC)andhasread-onlyaccessviathepublicNOMADservices;throughtheNOMADtools (for analysis, visualizations etc.), the data produced by external users is available forfurtherNOMADservicesonaseparatefilesystem.

The200TiBGPFSstorageatMPG-MPCDFhasbeensuitableasthesharedstorageofthetechnologyplatform.Localstorageonthededicatedservershasbeenincreaseduponrequeststoaround10TBper server. Special storage solutions have also been used; for example, SSD disk has been

CentOS VMs

Kubernetes Cluster

vGPUs

WP1Container

WP4Container

WP4Container

WP2GUIContainer

RepositoryVM

WP3Container

SLESVMs CentOS VMs

WP1Container

WP4Container

WP4Container

…WP2APIContainer

CentOS VMs CentOS VMs

Page 8: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 8

implementedfortheEncyclopediadatabaseontheproductionplatformtospeedupthequerytimes.Additional storage capacity was added to the platforms to address the significant growth of theNOMADRepositoryandtheotherdatabasescreatedfromtheRepository.

Forexample,thelargestdatacollectionistheRepository.Thenumberofcontributionsuploadedtoithas grown from 634,001 (01 Oct 2015) to 1,235,389 (30 Oct 2015 orM1) to 3,407,035 (M18) to6.240.929 (M36). The disk footprint of the uploaded data has grown from 0.8 TB to 9.9 TB,respectively.Asdescribed in theDataManagementPlan,WPs1 - 4use a slightly augmentedanddifferentlycompressedversionofthisdata.TheEncyclopediahasgrownaccordingly.

2.2.3 Authentication&authorization

Authentication for developers on the development platform has been handled by the alreadyexistinglocalusermanagementsystemsofthecomputingcenters.Theproductionplatformrequireda separate user management solution and several already existing Single Sign On (SSO) solutionswere considered and evaluated. The only solution that could cater to all developer needs andNOMADuserrequirementswasbasedonMPASSid1,whichinturnisbasedonShibbolethandSAML.TheMPASSidIdPproxywasusedtobuildtheNOMADusermanagementsystem.

Figure4:NOMADIdentityProviderlandingpageforlogginginormanagingtheaccount.

TheimplementedSSOsystemconsistsoftwoparts:(i)anIdentifyProvider(IdP),whichisresponsiblefor the whole user account management: registration and login procedure, and (ii) an arbitrarynumberofServiceProviders(SP).TheuserinterfaceisshowninFigure4.TheseSPsaretheNOMAD

1https://mpass.fi/english/

Page 9: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 9

applications,whichusetheIdPforauthenticationandauthorizationoftheusersasshowninFigure5.

Figure5WorkflowinaSSOsystem

The NOMAD IdP was developed, operated and modified according to the needs of WPs 2 - 4,althoughtheintegrationoftheRepositoryusermanagementwiththeIdPisstillinprogress.TheIdPcomeswitha linked localLDAPforstoringtheuserdata.Aself-managedsystemgives flexibility tocreatenewuserattributesasneeded.AsMPASSidisbasedonShibbolethitiscompatiblewithmostacademicauthenticationfederations(e.g.HAKA2andSwitch3)andcanlaterbelinkedtoEuropeanornationalAAIinfrastructures.Italsoallowslinkingtosocialmediaaccountsmakingiteasierforuserstoauthenticate.

Uponrequest fromWP4developers, separatecontainerandVMdeploymentsof the IdPcompletewith its own LDAP to manage user data were created. The motivation was firstly to helpdevelopmentworkbyallowingaccess toanon-production IdPserver.Suchan IdPcouldbesetupinside the local development infrastructure, thereby removing the need to configure tunnelingaccess outside the development system firewalls. Another benefit was that this further facilitatessettinguptheNOMADsystematanewsite(e.g.inaprivatecompany)thatdoesnotwanttoaccessacommon IdP for intellectualproperty (IP)orsecurity reasons.The IdPcontainer ismanagedwithAnsiblescriptswhichhelpskeepingthemaintenanceandmanagementworklow.

Useraccountcreationandmanagementareself-serviceviaemailconfirmation,butattherequestofthe IndustryAdvisory Committee (IAC) and other industry colleagues, a guestmodehas also beenconfigured.Theguestmodeallowsaccesswithoutdisclosinganypersonalinformationtothesystem.Aftersuccessfulauthentication,theIdPwillasserttheuseridentitytotherequestingserviceusingaSAMLtoken.IndividualNOMADrequests,thatmostlytakeplaceviatheRESTAPI,includethetokenintheirmessages,therebyachievinguseridentificationandauthentication.AllNOMADservicescanutilize the same token, sonoadditional action from theuser isneeded toaccess the full rangeofNOMADservices.TheIdPrunsonavirtualmachineintheCSCcloudplatformandwhenneeded,aredundantIdPcaneasilybeaddedatanothersiteorserver.

2https://wiki.eduuni.fi/display/CSCHAKA/Federation3https://www.switch.ch/aai/about/federation/

Page 10: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 10

2.2.4 Monitoring

The involved computing centers monitor the availability of the NOMAD infrastructure. In case offutureregulationfortheaccesstoNOMADresources,monitoringatservicelevelwillbeusedasthebasisforaccounting.

Regarding the actual services, themost important resource tomonitor has been the storage. ThecomputingcapacityonthedeployedserversatMPG-MPCDFhassofarbeenadequatefortheusage.

Accountingandlimitingofresourcesforexternalusagecanbedoneintwoways:

• usersaregrantedaccesstoaplatformwheretheNOMADservicesareinstalledandaregivenadedicatedamountofresources(CPUs,RAM,storage,etc),or

• the NOMAD applications themselves track resource use of the user and potentially limitfurtherusageaboveasetquota.

2.2.5 DeploymentoftheNOMADproductionplatform

TheNOMADRepositoryisthemainsourcefordataintheNOMADLaboratoryCoE.Currently,itistheonly placewhereusers canuploadmaterials sciencedata in theNOMADecosystem. TheNOMADRepository isoperatedatMPG-MPCDF.Thedataof theNOMADRepository isnormalizedandthenmadeavailablefortheNOMADCoEapplications–theNOMADLaboratory.ThemasterinstallationintheNOMADCoEwasrealizedatMPG-MPCDFwhileasecondinstancehasbeendeployedatCSC.

Both normalized data and metadata can be synchronized by the respective service operatorsbetweenthetwositesviaconventionalcommandlinetoolsasshowninFigure6.

Figure 6: Data flow between the Master site (MPG-MPCDF) and a remote site (CSC) in the distributedNOMADinfrastructure.

2.3 TechnologyPlatformoptimization

2.3.1 LoweringtheBarrierforAccessingNOMADServices

StepwiseaccesstoNOMADservicesforeaseofuse:

MPCDF,dedicated VMs

EncyclopediaDBMaster

Repository(raw data,ZIP

Archive)Master

CSC,OpenStack

EncyclopediaDBCopy

NOMADArchive(Normalized)

copy

NOMADArchive(normalized)

Master

rsync

Manual (batch)Transfer

WP1,parsing

WP2,manual

User data,local User data,local

Page 11: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 11

1) Thefirststep,tryingoutthewebservice,canbedonewithananonymousaccount.Onlyawebbrowserisrequired.

2) Moreservicesandresourcescanbeaccessedandstoredeasilyandwithoutcostbycreatinga NOMAD user account. Only a working email address is required. Users can also accesstheseadditionalservicesthroughawebbrowseralone,asalltheprocessesrunonNOMADservers.

3) ThemodularandportabledesignofthetechnologyplatformenablesuserstosetupselectedNOMADservices locallyat theirownsite.This improves IP securityandenables theuseofexisting computational resources on site. However, few institutions would invest in theinstallationprocesswithout firstbeingconvincedthattheservice isuseful (i.e. through1-2above).

4) TheNOMADIdPallowscreatingadditionalattributes,e.g.forVIPstatusforresearcherswhoneed extensive computational resources, but do not want a local installation. This statuscouldallowadditionalresourcestobedeployed,perhapsforafee.

2.3.2 SecurityAspects

TheNOMADLaboratoryCoEservicesareweb-basedandaccessiblefromanywherebyanypotentialuser.Thissetuprequiresahighattentiononsecurity-relatedissues.

The computing centers host the NOMAD Laboratory CoE services. At the level of hardware andoperating systems, the computing centerswill take careofupcoming security issues. This includesregular checks for necessary updates of the operating system and the installed basic software.BackupcapabilitieshavebeenprovidedforallNOMADrelateddata,configurationsandapplicationcode. For example, MPG-MPCDF offers backup functionality on the GPFS cluster it operates forNOMADRepositoryandotherNOMADservices:a fileset ("/nomad/backup")which isautomaticallybackedupontapesviaTivoliStorageManager.

At the application level, the responsibility is on the developers of theNOMAD services. From theinfrastructurepointofview,usingcontinuousdeploymentandlargelyautomateddeploymentscriptsfacilitate frequent security updates and rapid fixing of reported vulnerabilities, aswell as periodiccleaninstallsfromthesource.

2.3.3 SoftwareDeploymentAutomation

The provisioning of VMs, configurations and application setups in the NOMAD InfrastructurePlatformhasbeenautomated,wherepossible,with toolssuchasAnsible,which isa free-softwareplatform for configuring and managing servers.4 Ansible scripts have been used for different usecases:

• Basic level (throughcomputingcenters): setupofoperating systemandbasic tools;willbeused to duplicate (virtual) machines across computing centers, set up and configureKubernetesclustersortheIdPfunctionality.

• Applicationlevel(throughNOMADservicedevelopers):installationandconfigurationofuserspaceapplicationslikeKubernetes,databases,applicationserversetc.

4http://docs.ansible.com/

Page 12: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 12

During theproject,MPG-MPCDFhasoffered itsGitLabplatform forusebyallNOMADdevelopers.GitLab is in firstplacea versioning system fordistributedcodedevelopmenton thebasisofGit, aversioncontrolsystemthatcanbeusedforsoftwaredevelopmentandotherversioncontroltasks.Besidesthisbasicfunctionality,theGitLabsoftwareoffersindividualwikis,issuetrackersandabroadanddeep integrationwith furtherdevelopment tools (e.g. IntegratedDevelopmentEnvironments),whichwereallutilizedintheCoE.ContinuousintegrationwasalsoconfiguredtobepossibledirectlyfromwithinGitLab(seeFig.7).Somedevelopmentteamslinkedthisupwithfurthercommunicationtools. For example, a push to the master branch of the Encyclopedia GUI repository causes anautomatic deploy to the developmentwebserver, and at the same time a notification is sent to achannel inSlack,sothatotherdevelopersare immediately informedthat therehasbeenachangetheymightwanttocheckout.

Figure7:GitLabserviceatMPG-MPCDF

2.3.4 ExtendedComputeservices

In addition to the handling of computing needs through the NOMAD computing centers, HPCresourceswereofferedbyPRACEtobeabletofillingapsinthesimulationdataintheRepository,ifrequired. Inparticular, intheearlyphasesoftheparserdevelopment,therewasfrequentneedforheavy re-parsing of NOMAD data. MPG-MPCDF provided direct HPC access that could be usedinsteadofthesmallerscalededicatedresources,whicharemoresuitableforthereactiveparsingofnewlyuploadeddata into theRepository. Ifneeded in future, the resourcesatNOMADcomputingcenters can be made available for parsing the data in a geographically distributed way andsynchronizetheresultstotheNOMADArchive.

2.3.5 ImprovingtheEncyclopediaIngestingProcess

Inthefirsthalfoftheproject,theEncyclopediaingestionprocesswasidentifiedasabottleneck.Thefirst solution to improve the performance was to restructure the database schema to allow forparallelizing the process. This improved the performance, but with ever growing data volumes,eventuallyadifferentapproachwasneeded.

Page 13: The NOMAD (Novel Materials Discovery) Laboratory – a ... · containers orchestrated with Kubernetes, OpenStack, Ansible and other scripts. Figure 3 presents the deployment of different

NOMAD–ProjectNo.676580

D6.2FinalreportontheHPCplatform 13

ThehighlyhierarchicalnatureoftheNOMADMetaInfo,frequentupdatesonthedatabaseschemaandhighinsertionvolumesmadetheoriginalrelationalPostgreSQL-basedsolutioninconvenientfortheEncyclopediadevelopmentteam.Toremedytheseproblems,WP6hasassistedtheEncyclopediateam in transitioning from an SQL-based database solution to a schemaless NoSQL environmentbasedonMongoDB.

Population of the Encyclopedia database is done by systematically crawling through the user-uploaded calculations stored in theNOMAD Archive. As described above, this taskwas efficientlyparallelized,butthefollowinginsertionstepspeedoftheSQLdatabasedidnotscalesufficientlyasmoreprocesseswereaddedtothistask.Thisbottleneckcouldbeavoidedbystoringtheinformationas single nested data entries in themore suitableNoSQL alternative.Nested JSON-documents arethusfastertostorebutalsotoserve.MongoDBalsosimplifiestheservingofEncyclopediacontentsas the documents are stored in the JSON format that can be directly served through a web-API.Additionally,theschemalessapproachprovidedbyMongoDBallowsmorerapidprototypingasthereisnoneedtoupdatethedatabaseschemaforeverychangeinthedatalayout.

ThemigrationtotheNoSQLenvironmenthasbeengradualbyfirstmakingaparallelimplementationthatworksalongsidetheolddatabase.Theproductionenvironment isscheduledtobeupdatedtousethenewdatabaseassoonasotherpartsoftheservice,includingtheAPI,havebeenupdatedtofullysupportthechanges.

3 LessonsLearnedandConclusionAt the start of the project, the first aimwas to try to design a full set ofweb-based services in adistributedtechnologyplatform.Overthecourseoftheimplementationofthedifferentpartsofthetechnology platform, it turned out that only parts of the NOMAD infrastructure needed to bedistributedoverNOMADcomputingcenters.AmasterinstanceoftheNOMADtechnologyplatformwas implementedascloseaspossibletothe largestdatabase,theNOMADRepository,withatightintegration of essential resources. Further instances of the NOMAD infrastructure can be easilyclonedtoothercomputingcentersforeaseofusefortheirlocalusers.OnecloneofessentialpartsoftheNOMADtechnologyplatformhasbeensuccessfullyinstalledatCSC.

WP6hasproventobeofessentialimportanceforthedesigning,implementationanddeploymentoftheNOMADtechnologyplatformthroughacollaborativeeffortofthetechnicalexpertsinvolved.