9

Click here to load reader

A Vision for a European e-Infrastructure for the 21st Century

Embed Size (px)

Citation preview

Page 1: A Vision for a European e-Infrastructure for the 21st Century

CER

N-O

PEN

-201

3-01

821

/05/

2013

EIROforumITWorkingGroup21thMay2013

1 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 1 

A Vision for a European e‐Infrastructure for the 21st Century 

Executive Summary Over the past decade Europe has developed world‐leading expertise in building andoperating very large scale federated and distributed e‐Infrastructures, supportingunprecedentedscalesof international collaboration inscience,bothwithinandacrossdisciplines. We have the opportunity now to capitalize on that investment andexperience, to build the next generation infrastructure to enable innovation andopportunitiesforEuropeanscienceandeducation,industryandentrepreneurs.Wearenowinaperiodofexplosivedatagrowth.Thefoundationsforhandlingthe“DataTsunami” or “Big Data” have been laid in the last 20 years as we have moved fromsimplecommoditycomputing(“Farms”), tocommoditydistributedcomputing(“Grid”)andthencommoditycomputingservices(“Cloud”).Thesehavepreparedthegroundforhandling the large amounts of data being produced today. The era of “Data IntensiveScience”hasbegun.Toaddressthesechallengesforthediverse,emerging“longtailofscience”conductedbyresearchers that do not have access to significant in‐house computing resources andskills, we propose creating a common platform for the future that builds on theexperienceofthelastdecadeandisflexibleenoughtoadapttotechnologicalandserviceinnovations. Suchaplatformmustprovidetheunderlying layersofcommonservices,but must be adaptable to the very different and evolving needs of the researchcommunities. A key feature should be that established services be operated byEuropean industry, while development of new servicesmay be publicly funded. Theproposalhas3distinctlayersofservices:

1. European and international networks; services for identity management andfederation across all European research and education institutions andintegratedwithotherregionsoftheworld;

2. A small number of facilities toprovide cloud anddata servicesof general andwidespreadusage.

3. Software services and tools to provide value‐added abilities to the researchcommunities,inamanagedrepository:

a. The tools to provide those research communities that have access tolarge sets of resources the ability to federate and integrate thoseresourcesandtooperatethemfortheircommunity,potentiallysharingwithothercommunities;

b. Tools to help build applications: e.g. tools to manage data, storage,workflows,visualisationandanalysislibraries,etc.

c. Toolsandservices toallowresearchers to integrateeverydayactivitieswith the e‐Infrastructure: collaborative tools and services; office

Page 2: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

2 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 2 

automation, negotiated licensing agreements etc. Services would beoperatedbyindustryoronthefacilitiesinlayer2above;

d. Toolstohelpresearchcommunitiesengagethegeneralpublicascitizenscientists.

Theselayerswouldbesupplementedbyinvestmentinapplicationsoftwareinordertobuildandshareexpertiseinensuringthatapplicationsarecapableofexploitingevolvingcomputingarchitectures.The expectation is that a continuum of financial models is appropriate ranging fromsponsoredresourcesforpeer‐reviewedscientificcasestocommunitieswhowouldpayfor the services they receive, thus the services they receivemust be appropriate andprovide a clear value. The governance of the platform would be created byrepresentationfromtheusercommunities.

Contents 

ExecutiveSummary............................................................................................................................1

Introduction..........................................................................................................................................3

Proposalforane‐Infrastructure...................................................................................................4

Asetofbasicinfrastructureservices:...................................................................................5

Cloudservices..................................................................................................................................5

DataFacilities...................................................................................................................................5

Distributedinfrastructure..........................................................................................................6

Softwareservicesandtools.......................................................................................................6

InvestmentinSoftware...............................................................................................................6

RelationshipwiththeHPCCommunity................................................................................7

Buildingthedatacontinuum.....................................................................................................7

ProvidingLeadership........................................................................................................................7

Fundingmodels...............................................................................................................................8

Governance.......................................................................................................................................9

This document has been prepared by the IT department of CERN on behalf of theEIROforumITworkinggroup.  

Page 3: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

3 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 3 

Introduction Looking forward over the coming 10‐15 years, there are exciting challenges ahead tocapture,manage,andprocessthevastamountsofdatalikelytobegenerated,notonlyby the established fundamental researchdomainsbut in a growing range of scientificdisciplines, large and small. This new frontier in computation will be driven by theneedsofdata‐drivenscience,simulation,modellingandstatisticalanalysisinareasfromclimate change to life sciences, art and linguistics. All will see incredible growth andacceleratedbreakthroughsduetounprecedentedaccesstodataandthecomputationalabilitytoprocessit.Europemust preserve its intellectual capital and provide the opportunity for it to benurtured, developed and to grow. Major advances in technology that have taken theworld by storm, from Linux to the World Wide Web, have often been conceived inEuropebutultimatelyexploitedelsewhere.This is a loss toEurope, in termsof skills,employmentandbusiness.Wemust take the step tomakeunprecedented scales of IT resources available to thenextgenerationofemergingscientists,researchersandentrepreneurs,nurturingthemfrom education to start‐up activities and then sustainable businesses or researchcommunities. As well as serving the direct needs of computing, the opportunity toinnovate and explore new technologies is essential. The correct environment forinnovationwillallowmanyideastobetested,andexplored,whichwillleadtothetrulyunexpected and ground‐breaking discoveries and inventions that will shape thiscentury.Over the last decade, drivenwith sustained funding from the EC, the e‐InfrastructurelandscapeacrossEuropehasgrownfromregionalprototypestoasetofpan‐Europeanproduction resources. But to go forward much more coordinated effort is needed.Today’s efforts leave gaps in the overall strategy, and suffer in part from inadequatefundingbythestakeholders.CERN, in collaboration with the EC, national funding agencies and the High EnergyPhysics community has successfully built, and today operates, the world’s largestscientifice‐infrastructure. ThisworldwideinfrastructureisindailyoperationandhasbeenusedtoproduceresultsfromthehugevolumesofdatadeliveredbytheLHCanditsdetectors. The development of this distributed grid federating resources around theworld took close to 10 years from conception through to production use at thenecessary scale, and requirednovel developments in termsof physical infrastructure,middleware, application software, and policy development. This experience alsohighlights a key aspect of the future e‐infrastructure model if it is to become theinfrastructure of choice for the European Research Area: there must a long‐termcommitmentbyall thestakeholders tomake thee‐Infrastructure themeansbywhichthey will provide/use production IT services. The future research infrastructurescurrently in construction, such as FAIR, XFEL, ELIXIR, SKA, ITER and upgrades to ILLand ESRF, need to be convinced that the e‐Infrastructure will exist and continue to

Page 4: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

4 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 4 

evolve throughout theirconstructionandoperationphases if theyare to take the riskandinvestinitscreationandexploitation.In considering theway forward, it is important thatwe foreseean infrastructure thatsupportsallofthescientificandacademicneedsoftheEuropeancommunity,includingthe“longtailofscience” conductedbyresearchersthatdonothaveaccesstosignificantin‐housecomputingresourcesandskills.Consequentlythisshouldnotbethoughtofasaone‐size‐fits‐allsolution.Ratherabroadbutcoherentsetofservicesandtoolswhichmust be available to allow the specific needs of each community to be met. Thiscommonplatform should alsobe able to act as the incubator for newbusinesses andscientific activities. It is essential that European industry engage with the scientificcommunityinbuildingandprovidingsuchservices,butitisalsoimportantthattheusercommunityhaveastrongvoiceinthegovernance.ThisviewhasbeendocumentedinarecentResponsetoEC(DGCNECT)Paper“ResearchDatae‐infrastructures:FrameworkforActioninH2020”producedbytheEIROforumITWorkingGroup.

Proposal for an e‐Infrastructure WhilethegridmodelhasbeenextremelysuccessfulforHighEnergyPhysicsandsimilarhigh‐throughputcomputingapplications(suchasastrophysics),itisnotsuitedformanyother sciences, which have very different requirements. Technological advances arecontinuous, and so during the time of the development of the grid, distributedcomputing and the underlying technologies have advanced significantly, in academiaandhavebeen adoptedbymanybusiness sectors.Cloud computing technologies, andthe huge increase in available networking capabilities are leading examples. Thegrowingcomputingneedsofsciences,inparticularthosethathaveneverbeforeneededlargescalecomputing,willbenefitfrommanyofthoseadvances.Thusitisvitalforthefuture needs of scientific e‐infrastructures, that a model be adopted encompassing awidespectrumof facilitiesandtools thatcanbeofdirectbenefit toarangeofscienceand research use cases. Rather than build such a structure as a single integrated e‐infrastructure (like the grid) it will be far more advantageous to provide a set ofcollaborating core infrastructure services, and a variety of facilities, together with abroadsetofeasilyadaptabletools. Thiswillallowtheresearchcommunitiestoselectthe services or tools that they require, andonly those,without additional complexity.Oneof the lessons fromthegridexperience is thatunnecessarycomplexityshouldbeavoidedintheinfrastructurelayers.TherearemanyexistingeffortswithinEuropethatcanbedrawnontofulfilavisionforthefuture.These“pathfinder”initiativeshaveprototypedmanyaspectsofwhatwillbeneeded in the future. This includesmuch of thework in the grid projects, but alsoprojectssuchasEUDAT,CRISP,HelixNebula,OpenAIRE,thematicdataprojects,suchasTransplantandmanyothers.In order that such an effort be sustainable and permit maximum flexibility acrossdomains,aswellasbeingabletofulfilthegoalsofworkingtogetherwithindustryandkeyglobalplayers,itisessentialthatthefutureserviceinfrastructureandtoolsbefullybasedonopenstandards,opensoftware,andprovideopenaccesstothedata.

Page 5: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

5 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 5 

This integratedmodel representing a common platform for the futuremust have thefollowingkeycomponents:

A set of basic infrastructure services:   The core network; Building on today’s GEANT/NREN, and commercially

operatednetworkstoprovideexcellentconnectivityandoperationalservicestoallscientificinstitutions,andfullyextendingtocountriesontheedgesofEurope,aswellasensuringtheinternationalandglobalconnectivityrequiredbytoday’ssciences;

Federated identity management services, allowing existing identities ofresearcherstobeusedacrossthefullsetofe‐Infrastructureservices.Bearinginmind the substantial achievements of the ORCID project, it must supportpersistent digital identifiers that uniquely distinguish every researcherthroughouttheirentirecareer,providingintegrationinkeyresearchworkflowssuchasmanuscriptandgrantsubmission,supportsautomatedlinkagesbetweenprofessional activities and ensuring the researcher’s work is recognised. TheFIM4R document1produced by representatives from a range of researchdisciplinesprovidesdetailedrequirementsforsuchservices.

Cloud services A(smallnumberof)publiclyprovidedresearchcommunitycloudfacilitiesthatcanbeusedforapplicationsthatrequiretheinstantiationofafewlong‐livedservices,oraccesstocomputeorstorageresourcesforarelativelyshorttime.Suchacloudresourcecouldbe outsourced to commercial providers, or be the product of a public‐privatepartnership.TheHelixNebulaprojectprovidesanexampleforwhatmaybeinvolvedinputtingthisinplace.

Data Facilities A general data storage facility (for example public science archives). Not onlywouldsuch a resource be of immense value to data producers, providing a sustainable,dependable and accessible archive; but alsowould provide an unrivalled opportunityfordatasharingbetweenscienceswiththeintegrationofdifferenttypesandsourcesofdata. The EUDATproject is demonstrating some of the candidate technologies in thisarea. Thesefacilitieswouldprovideopenaccesstothedata,andwouldbeafocusfordata preservation activities to ensure the long‐term guardianship of the data. Thefacilities would also provide persistent identifiers for data objects at an appropriategranularity, as well as metadata services. They would allow secure data sharingbetween sciences capable of supporting certification requirements of Data AccessCommittees. As such the e‐infrastructure will offer a knowledgebase consisting of acollection of data, organizational methods, standards, analysis tools, and interfacesrepresentingabodyofknowledge.

1 Federated Identity Management for Research Collaborations, http://cds.cern.ch/record/1442597

Page 6: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

6 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 6 

Distributed infrastructure A set of high‐level software services that allow research communities to implement afederatedanddistributedcomputinginfrastructureinordertointegrateresourcesoftenexplicitly provided for those applications. These are typically useful where thecomputingandstoragerequirementsarelarge,wherethereisaneedtocollaborate,andwhere the occupation of the resources is very high. These services would be ageneralisationoftoday’sgridservices,butshouldfocusonthemovetomoreopenandstandard implementations, and may benefit directly from cloud implementations. Bydemocratizingaccesstodataandcomputationalresources,theserviceswillenableanylaboratoryorproject,regardlessofsize,toparticipateinatransformativecommunity‐wide effort for advancing science and accelerating the pace toward its exploitation.Thus, the e‐infrastructures services will facilitate building a broader scientificcommunity thatwill contribute to fundamental sciencewithin theEuropeanResearchArea.

Software services and tools 1. A set of software services that allow researchers to integrate e‐infrastructure

with their everyday activities and personal devices, for example a “dropbox”functionality, office automation and collaborative tools and services. Many ofthesewouldbehostedontheinfrastructuresdescribedabove.Todaytherearemanytoolsavailablebutmostarenotwidelyknownorused.Thisactionwouldalso provide a means by which software licenses (which today represent asignificant and rapidly growing cost) could be potentially negotiated andmanagedonbehalfoftheentirescientificcommunity.

2. A set of tools ofwide andgeneral utility that canbe usedby the applications.Thiswouldincludeasetoftoolstomanagedatatransfer,storage,andotherdatarelatedactivities,aswellascoordinatingarepositoryforusefulsoftwaretools.TheideasoutlinebySciencePAD2couldplayarolehere.

3. A set of tools to allow scientific communities to build a citizen‐cybersciencefacilitywherethatisappropriateanduseful.

Investment in Software In addition to the above, an organisation put in place to coordinate such a set ofcoherent services and activities would also be the natural way to broker newcollaborations.Oneareathatwillbeofstrategicimportanceinthecomingyearswillbeasignificantinvestmentinsoftwarecapabilitythatwillbeabsolutelyessentialtoobtainthe best performance from current and future computer and storage architectures.Many sciences today benefit from commodity CPU and storage, and this is likely tochange as the consumer market shifts from PC’s to tablets and smartphones. Thisinvestment insoftwareisessential tomaintainEuropeancompetitiveness inthisarea,and should include coordination of existing expertise to the benefit of diversecommunities.

2 http://www.sciencepad.org

Page 7: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

7 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 7 

Theremay also be traditional software and tools at the application layer that wouldbenefitfromaEuropean‐widecollaboration. Examplesheremayincludeamechanismto obtain better licensing conditions, or collaborations to build specific applicationsoftwareofgeneralbenefittoabroadcommunity.

Relationship with the HPC Community Therelationshipwiththesupercomputingcommunity(HPCapplications)shouldalsobere‐defined.Therearetwoaspectstoconsider.Thefirstappliestothefrontier‐sciencechallengesthatneedthemostsignificantHPCresources. InthiscasetheHPCfacilitiesshouldbeviewedasscientificinstrumentsintheirownrightthatproducesciencedatafor their application communities. Today such large‐scale simulations produce hugevolumes of data. Those application communities are then naturally users of an e‐infrastructureonwhichtodistributeandanalysetheir(supercomputerproduced)data,and those applications would also be scientific stakeholders in a general scientific e‐infrastructure.ThereareotheraspectsofHPCfacilitiesthatarecomplementarytothecloudandHighThroughputresources.SomeapplicationsthatrequiremodestlevelsofanHPCresourcemay well be deployable on suitably configured cloud or data intensive computingresources. There are also workflows that cross HPC and data intensive computingresourcesandwouldbenefitfromanintegratedserviceenvironment.The HPC facilities and their scientific communities would also benefit from theunderlying technologies mentioned above (the networks, federated identities, policywork,etc.).

Building the data continuum Adata continuum, that is to say a systemcapableofnavigating thedata evolutionbylinking the different stages of the data lifecycle, from raw data to publication isnecessary to accelerate the rate of scientific discovery and increase the impact ofresearch on society. Elements of the data continuum exist and a range of projects,including openAIRE3where CERN provides the Invenio4software technology thatsupports this open access repository andmanymore around theworld, have createdrepositories for initially, publications, and now extending to data that can give goodexamplesofwhatcanbeachieved.Buttheseremainindependentprojectsandhavenotbeenintegratedintotheoveralle‐infrastructurelandscape.

Providing Leadership In order to build such a long term and broadly scoped e‐infrastructure to benefit theentireEuropean community,wemust leverage the tremendous assets that havebeenbuilt up during the last decade: in particular the knowledge and skills, aswell as theworkingprototypesofeachofthecoreservicesnotedabove.3 https://www.openaire.eu/ 4 http://invenio-software.org/

Page 8: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

8 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 8 

Whatwe envisage is a continuum addressing the needs of education and speculativeinnovationthroughtogrowingandestablishedentities.Theremustbethereforearangeofinfrastructureandservicestosatisfytheneedsofsuchabroadrangeofmaturityofactivities.Thiscontinuummustcoverthedifferentaxesoffinancialmodels(user‐pays,provider‐pays) as well as infrastructure (industry supplied and in‐house). Userrepresentation in governance is paramount and we envisage a user activity for e‐infrastructuresasawell‐definedactivity.Thedevelopmentofnewandnovelservicesandsoftwaremustbepubliclyfunded,butas these become production ready and commoditised they should move into theindustrialserviceoperation.Opensourceandopenstandardsareessentialtoensuremaximaladoption,andtoallownewentrantstobeabletoleveragetheinnovationmadewithpublicfunding.Theplatformshouldalsoenablebusinessinnovationfornewservicesandnewuserstocreatewealthandemployment.

Funding models Pastexperiencehasshownthatasthenumberofcommunitiesandactivitiesthatcouldbenefit fromEuropeane‐infrastructurecontinuestogrowandevolve, there isno“onesizefitsall”solutionthatisappropriate.Withinthelifecycleofagivenactivity, itmaybeappropriatetohavesupplierfinancedresources fora timeandthensomesubsidisedresourcesas theactivityevolves.Fullypaidresourcesbytheactivitymaycomeatalaterstage.Intermsofe‐infrastructureitisimportantthatactivitiescanrelyontheresourcesandservicessoitiscriticallyimportantthattimeframesforthee‐infrastructuresbelongandthefundingstable.Where it is expected that users should pay for the services, those services must berelevantandattractive inorder for that tohappen. Theaddedvalueof theproposedservices must be made absolutely clear. In order to remain attractive to the usercommunitiestheofferingmustevolveandadapttothechangingneeds.Thisevolutioncanbeginwith a set ofmanaged federated services that are recognised as a commonneed,with new services being added as commonalities are explored and understood.The proposed platformmust provide solutions in a timely and relevantway. This isanotherreasonwhytheusercommunitystakeholdersmustbedirectlyinvolvedintheoverallgovernance.Amodelformovingfrominnovativedevelopment(whereafundedactivitymaydevelopa service) to industrial operation is essential to avoid unproductive competitionbetween publicly funded activities and commercial offerings. Organisations such asCERNhavespecialistknowledgeoflargescaletenderingandcoordinationoftendering

Page 9: A Vision for a European e-Infrastructure for the 21st Century

EIROforumITWorkingGroup21thMay2013

9 This document produced by Members of the EIROforum and is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://www.eiroforum.org/

  Page 9 

andbrokeringthatwouldenablecostefficienciestobegainedthroughtheapplicationofscale.

Governance It is important that a future European e‐infrastructure be driven by the scientificstakeholders. Somekey strategic research communities could be selected thatwoulddrive the frontiersof thetechnologies inseveraldifferentbutcomplementaryaspects.Thisiscoveredinaseparatedocument5.

5 David Foster, Bob Jones, “Science Strategy and Sustainable Solutions; A Collaboration on the Directions of e-Infrastructure for Science”, CERN-OPEN-2013-017, http://cds.cern.ch/record/1545615/files/CERN-OPEN-2013-017.pdf