View
3
Download
0
Category
Preview:
Citation preview
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|1
PatrickFuhrmann
Onbehaveoftheprojectteam
Supportingthescientificdatalifecycle
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|2
Content• Howaresoftwarefeaturesselected.• Howaresoftwarefeaturesfunded.• Hardeningnewfeatures.• Exploringnewcommunities.• RespondingonnewtechnologiesHWandSW• SomethingaboutINDIGO-DataCloud• EssentiallyarandomwalkfocusingonthingsIthoughtmightbeinteresting.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|3
Somewordsonwhyandwhen
dCachedoeswhatitdoes.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|4
Howaresoftwarefeaturesselected?
• Scientificcommunitiesbelieve, thatOpenSourceSoftwareisgrowingontrees.
• Consequently theyarenotwillingtocontributetothedevelopment andsoftwaremanagementatall.
• Theyassumethatcomplainsareveryvaluablecontributions.• Nextconsequence isthatOpenSourceteamsmainly
implementsoftwarefeatures,whicharerequiredbythelabs,wherethecoreteammembersarehosted.
• Inordertoexplorenewcommunitiesandsatisfytheirsoftwarerequirements,OpenSourceProjectsneedexternalmoney.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|5
Howarenewfeaturesfunded?
• Thisiswhere“National”and“European”projectscomeintoplay.
• FordCache,this:– wasEMI– istheGermannationalLSDMAproject– andwillbeINDIGO-DataCloud
• Thedrawback:Theytellyouwhattheywanttoseeinyourcode.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|6
Fundedfeaturesarenotnecessarythoseyouneed?
• However,dCachehassomeinvariantobjectives:– Themasterplan(lastslideofthispresentation)– Beuptodateonnewtechnologies,eithersoftwareorhardware.
– Attractnewcommunitiesastheirspecificrequirements, iftheycanbefulfilled,makedCacheevenbetter.
• Itcanbeabittrickytotunethefundingprojectsexactlyintothedirectionofourobjectives.
• So,let’sseehowdCachemanaged/es that…….
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|7
FundinginfluencesdCachedevelopmenttopics
2010 2013
Standardization
NFS4.1/pNFS
HTTP/WebDAV
ContributingtotheDynamicFederation
INDIGO DataCloud
2015 2018
DataLifeCycleMultiTierStorage
QualityofService
MigrationArchivingAAI
Deployingnewtechnologies intoProductionandexploringnewcommunities
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|8
From2013tonow,wesloweddowndevelopmentbetweentwoverydemandingdevelopmentprojects,EMIandINDIGO-DataCloud,to:
• Deploynewlyimplementedtechnologiesintoproduction.
• Explorenewcommunitiesandlearnabouttheirneeds.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|9
DeployingNFSintoproduction
• CMSGridInfrastructure@DESY• TheDesy-Cloud• FERMIlab(variousIntensityDrontier)• Andtheissues
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|10
NewProductionSystemsbasedondCacheNFS.
NFS4.1/pNFS
DirectlowlatencyaccessWorkernodesHPC
dCacheBackendStorageLayer
WideAreaFTSGLOBUS(ONLINE)
Sync&ShareLaptopsMobileDevices
SeePaul’spresentationonThursday
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|11
CMSTierII@DESY
• SlowlymigratingCMSGridworkernodestoNFS4.1dataaccess.
• Goodexperienceaslongasthenetworkisstable.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|12
NFS4.1pNFS dCap
ExecutionTime(hours)
JobEfficiency(CPU
/W
allTim
e)
JobEfficiency(NFS– dCap)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|13
Aswithallnewsspec’s,thereareissues
• Networkproblemscausethesystemtobebehaveunpredictable.
• DataServerbehindfirewalls• WeakclientsonVM’s• SpecificationViolation– infinitestaterecoverywithLinuxkernel
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|14
Exploringnewcommunities.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|15
Exploringnewcommunities.
• Jülich – AachenResearchAssociation,JADE– "SupercomputingandmodelingfortheHumanBrain(SMHB)”,associatedtotheEuropeanHumanBrainProject(PlenarybyKHMeier)
• MoSGrid– ScientificGatewayformolecularsimulation.
• VAVID– DataGatewayforanalyzingwindenergyinfrastructures
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|16
JADE
Aachen
Jülich
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|17
ProjectsinHPC
HPCjobsonsupercomputer
HPCjobsgetaccesstodCachestorage.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|18
WiththestartofINDIGO-DataCloud,itsmoneyandalargerteam(8+3)wecancontinueto
explorenewhorizons.(Backtodevelopmentmode)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|19
• NewDiskTechnologies– OpenEthernetDisks(HGST)
• NewObject-StoreBack-ends– CEPH
• NewEuropeanProjects(INDIGODC)– FocusingonDataQualityofServiceand– DataLifecycleManagement
Respondingtonewtechnologies
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|20
HGSTOpenEthernetDisks
• SmallARMCPUwithEthernetpiggybackedonregularDisk.
• Spec:– AnyLinux(Debian ondemo)– CPU32-bitARM,512Level2– 2GBDRAMDDR-3Memory
• 1792MBavailable
– BlockstoragedriverasSCSIsda– Ethernetnetworkdriveraseth0
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|21
HGSTOpenEthernetDisks(cont)
• AdditionalCPUisnotusedbydiskitselfandcanrunarbitrarycustomerOS.
• Diskisseenasregularblockdevice.
• Notyetonthemarket.• dCachegot5disksandweareevaluatingtorunpoolnodesonthediskitself.
• SeetalkonThursday.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|22
ResponsetoCEPH
• CEPHcomplementsdCacheperfectly.– SimplifiesoperatingdCachedisks.– dCacheaccessesdataasobject-storeanywayalready.
• dCacheisevaluatinga‘twostepapproach’.– Eachpoolssees itownobjectspaceinCEPH– Allpoolshaveaccess totheentirespace,whichisaslightchangeofdCache
poolsemantics.• WouldmergeCEPHanddCacheadvantages
– MultiTier(Tape,Disk,SSD)– Multiprotocolsupportforacommonnamespace.
• Allprotocolsseethesamenamespace– AllthedCacheAAIfeatures
• SupportforX509,Kerberos,username/password
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|23
INDIGO-DataCloudCheat-Sheet
• Horizon2020projectstartingAprilorMay• Budget11.1MillionEuros(800.000fordCache)• 26Partners• Duration30months• TheprojectaimsforanOpenSourceDataandComputingplatformtargetedatscientificcommunities,deployableonmultiplehardware,andprovisionedoverprivateandpublice-infrastructures.
SeeLudek’s presentationonWednesday
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|24
INDIGOinanutshell
1. Self-service,on-demand2. Accessthroughthenetwork3. Resourcepooling4. Elasticity(withinfinite resources)5. Payasyougo
Intheend,ApplicationsRule.
StolenfromDavide Salomoni (ProjectDirector)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|25
dCacheinvolvementinINDIGO
• dCacheismostlyinvolvedinWP4,whichisaboutVirtualInfrastructures.(IaaS)
• Forstoragesystems,likedCache,thisessentiallymeansSDS(SoftwareDefinedStorage),whichaccordingtoWikipedia is:– Software-definedstorage(SDS) isanevolvingconceptforcomputerdatastoragesoftwaretomanagepolicy-basedprovisioningandmanagementofdatastorageindependentofhardware.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|26
SDSaccordingtodCache
• User/PaaS defined“QualityofService”management– User/PaaS defined“AccessLatency”
• SSDorTapedependingfromapplicationrequirements.
– User/PaaS Defined“DataProtection”• Ononedisk,twodisksortreetapesdependingonhowpreciousyourdatais.
– User/PaaS Defined“DataMigrationPolicies”• LikeAmazonGlaciervers.S3
• AutomaticStorage-Tiermigration– Basedonaccessprofile
• Allthiswouldn’tbeneededifSSD’s wouldbecheapand100%reliable.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|27
dCacheiswellprepared
HistoricallydCachesupportsmulti-tierstorageandthecorrespondingtransition.
SSDs
SpinningDisks
Tape, BlueRay…
Virtual File-systemLayer
NFS/pNFS gridFTPhttpWebDAV xRootd/dCapAutomatic
andManualMedia
transitions
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|28
Recentlyadded
Weoptimizedthe‘smallfile’problemwithdisk<->tapetransitions.
TapeSystem
Containers
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|29
What’smissing
• Mainlyacommonagreement(standard)onhowtotriggertransitions.(Protocol,API??)
• WehavesomeexperiencewithSRM,howeveritseemsnottobesuitableforthispurpose.
• AnothercandidateisCMDI(SNIA),whichisanindustrystandard.
• MigrationPoliciesarealreadydiscussed,documentedandimplementedwithinRDA(PracticalPolicyWorkingGroup).
• DetailswillonlybeavailableaftertheINDIGOkickoffmeetingendofApril‘15.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|30
Summary
Magically,uptonow,attherightmoment,therewasalwaysanEUorNationalProject,fundingdCacheexactlyforthosefeaturesoractivites,dCachewasplanningtodoanywayandwiththattheyhelpedusfollowingourmasterplan:
ThesupportoftheCompleteScientificBigDataLifeCycleManagement.
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|31
ScientificDataLifecycle
HighSpeedDataIngest
FastAnalysisNFS4.1/pNFS
WideAreaTransfers(Globus Online,FTS)byGridFTP
Visualization&SharingbyWebDAV,OwnCloud
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|32
Don’tforget
UpcomingdCacheWorkshop
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|33
TheEND
furtherreadingwww.dCache.org
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|34
Recommended