27
The Science DMZ Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley National Laboratory Building the Modern Research Data Portal GlobusWorld Chicago, IL April 20,2016

The Science DMZ - globusworld.org

Embed Size (px)

Citation preview

Page 1: The Science DMZ - globusworld.org

TheScienceDMZ

EliDart,NetworkEngineerESnetScienceEngagementLawrenceBerkeleyNational Laboratory

Building theModernResearchDataPortal

GlobusWorld

Chicago,IL

April20,2016

Page 2: The Science DMZ - globusworld.org

Outline

4/26/162

• ScienceDMZinbrief

• Context– ScienceDMZinthecommunity

• ScienceDMZandDataPortals

• ThisassumesyoualreadyhaveaScienceDMZ– Ifyoudon’thaveone,wecanchatabouthowyoumightbuildone– Ifitwouldbehelpful, Icantalktoyoursystemsandnetworking folks– Orcheckoutthefasterdataknowledgebase:

• http://fasterdata.es.net/science-dmz/

Page 3: The Science DMZ - globusworld.org

ScienceDMZDesignPattern(Abstract)

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZSwitch/Router

Enterprise Border Router/Firewall

Site / CampusLAN

High performanceData Transfer Node

with high-speed storage

Per-service security policy control points

Clean, High-bandwidth

WAN path

Site / Campus access to Science

DMZ resources

perfSONAR

perfSONAR

perfSONAR

3 – ESnet Science Engagement ([email protected]) - 4/26/16 ©2015,EnergySciencesNetwork

Page 4: The Science DMZ - globusworld.org

SupercomputerCenterDeployment

• High-performancenetworkingisassumedinthisenvironment– Dataflowsbetweensystems,betweensystemsandstorage,widearea,etc.– Globalfilesystemoftentiesresourcestogether

• Portions ofthismaynot runoverEthernet (e.g.IB)• Implications forDataTransferNodes

• “ScienceDMZ”maynotlooklikeadiscreteentityhere– Bythetimeyougetthroughinterconnectingalltheresources,youendupwithmostofthenetworkintheScienceDMZ

– Thisisasitshouldbe– thepointisappropriatedeploymentoftools,configuration,policycontrol,etc.

• Officenetworkscanlooklikeanafterthought,buttheyaren’t– Deployedwithappropriatesecuritycontrols– Officeinfrastructureneednotbesizedforsciencetraffic

4 – ESnet Science Engagement ([email protected]) - 4/26/16 ©2015,EnergySciencesNetwork

Page 5: The Science DMZ - globusworld.org

HPCCenter

©2014,EnergySciencesNetwork5 – ESnet Science Engagement ([email protected]) - 4/26/16

Routed

Border Router

WAN

Core Switch/Router

Firewall

Offices

perfSONAR

perfSONAR

perfSONAR

Supercomputer

Parallel Filesystem

Front endswitch

Data Transfer Nodes

Front endswitch

Page 6: The Science DMZ - globusworld.org

HPCCenterDataPath

©2014,EnergySciencesNetwork6 – ESnet Science Engagement ([email protected]) - 4/26/16

Routed

Border Router

WAN

Core Switch/Router

Firewall

Offices

perfSONAR

perfSONAR

perfSONAR

Supercomputer

Parallel Filesystem

Front endswitch

Data Transfer Nodes

Front endswitch

High Latency WAN Path

Low Latency LAN Path

Page 7: The Science DMZ - globusworld.org

Context:ScienceDMZAdoption

• DOENationalLaboratories– HPCcenters,LHCsites, experimental facilities– Both largeandsmallsites

• NSFCC*programshavefundedmanyScienceDMZs– Significant investments acrosstheUSuniversitycomplex– Bigshoutout totheNSF– theseprogramsarecritically important

• OtherUSagencies– NIH– USDAAgriculturalResearch Service

• International– Australiahttps://www.rdsi.edu.au/dashnet– Brazil– UK

4/26/167

Page 8: The Science DMZ - globusworld.org

StrategicImpacts• Whatdoes thismean?

– Weareinthemidstofasignificantcyberinfrastructure upgrade– Enterprise networksneednotbeundulyperturbedJ

• Significantlyenhanced capabilitiescompared to3 yearsago– Terabyte-scale datamovement ismucheasier– Petabyte-scale datamovementpossibleoutside theLHCexperiments

• ~3.1Gbps=1PB/month• ~14Gbps=1PB/week

– Widely-deployed toolsaremuchbetter (e.g.Globus)

• Metcalfe’s LawofNetworkUtility– ValueofScienceDMZproportional tothenumberofDMZs

• n2 orn(logn)doesn’tmatter– theeffect isreal– Cyberinfrastructure valueincreases asweallupgrade

4/26/168

Page 9: The Science DMZ - globusworld.org

NextSteps– BuildingOnTheScienceDMZ

• Enhancedcyberinfrastructuresubstratenowexists– Wideareanetworks(ESnet,GEANT,Internet2,Regionals)– ScienceDMZsconnectedtothosenetworks– DTNsintheScienceDMZs

• Whatdoesthescientistsee?– Scientistseesascienceapplication

• Datatransfer• Dataportal• Dataanalysis

– ScienceapplicationsaretheuserinterfacetonetworksandDMZs• Theunderlyingcyberinfrastructurecomponents(networks,ScienceDMZs,DTNs,etc.)arepartoftheinstrumentofdiscovery

• Large-scaledata-intensivesciencerequiresthatwebuildlargerstructuresontopofthosecomponents

4/26/169

Page 10: The Science DMZ - globusworld.org

ScienceDataPortals

• Largerepositories ofscientificdata– Climatedata– Skysurveys (astronomy,cosmology)– Manyothers– Datasearch,browsing,access

• Manyscientificdataportalsweredesigned15+yearsago– Single-web-server design– Databrowse/search, dataaccess, userawareness allinasinglesystem– Allthedatagoes throughtheportalserver

• Inmanycasesbydesign• E.g.embargobeforepublication (enforceaccesscontrol)

4/26/1610

Page 11: The Science DMZ - globusworld.org

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

4/26/1611

• Verydifficulttoimproveperformancewithoutarchitectural change– Softwarecomponentsalltangledtogether

– DifficulttoputthewholeportalinaScienceDMZbecauseofsecurity

– EvenifyoucouldputitinaDMZ,manycomponentsaren’t scalable

• Whatdoesarchitectural changemean?

Page 12: The Science DMZ - globusworld.org

ExampleofArchitecturalChange– CDN

• Let’slookatwhatContentDeliveryNetworksdidforwebapplications

• CDNsareawell-deployeddesignpattern– Akamaiandfriends– EntireindustryinCDNs– Assumedpartoftoday’sInternetarchitecture

• WhatdoesaCDNdo?– Storestaticcontentinaseparate locationfromdynamiccontent

• Complexity isn’t inthestaticcontent– it’sintheapplication dynamics• Webapplications arecomplex, full-featured,andslow– Databases,userawareness,etc.– Lotsofintegratedpieces

• Dataserviceforstaticcontent issimple bycomparison– Separationofapplicationanddataservice allowseachtobeoptimized

4/26/1612

Page 13: The Science DMZ - globusworld.org

ClassicalWebServerModel

4/26/1613

• Webbrowser fetches pagesfromwebserver– Allcontentstoredonthewebserver– Webapplicationsrunonthewebserver

• Webservermaycallouttolocaldatabase• Fundamentally allprocessing islocaltothewebserver

– Webserver sendsdatatoclientbrowserover thenetwork• Perceivedclientperformance changeswithnetworkconditions

– Severalproblems inthegeneral case– Latencyincreases timetopagerender– Packetloss+latencycauseproblems forlargestaticobjects

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

Web Server

Browser

Page 14: The Science DMZ - globusworld.org

Solution:PlaceLargeStaticObjectsNearClient

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

CDN

DATA

Short Distance / Low Latency

Web Server

CDN Data Server

Browser

4/26/1614

• CDNprovides staticcontent“close”toclient– Latencygoesdown

• Timetopagerendergoesdown• Staticcontentperformancegoesup

– Loadonwebserver goesdown(noneed toservestaticcontent)

– Webserver stillmanagescomplexbehavior• Localreasoning /fastchangesforapplication owner

• Significantwinforwebapplicationperformance

Page 15: The Science DMZ - globusworld.org

ClientSimplySeesIncreasedPerformance

4/26/1615

• Clientdoesn’t see theCDNasaseparate thing– Webcontentisallstillviewed inabrowser

• Browserfetcheswhatthepagetells ittofetch• Differentcontentcomesfromdifferentplaces• Userdoesn’tknow/care

• CDNsprovideanarchitectural solutiontoaperformance problem– Notbrute-force– Worksmarter, notharder

The‘NetWEB

Browser

Web Server

Rich, Slow

DATA

CDN Data Server

Simple,Fast

The‘NetWEB

Browser

Web Server

Page 16: The Science DMZ - globusworld.org

ArchitecturalExaminationofDataPortals

• Commondataportalfunctions (mostportalshavethese)– Search/query/discovery– Datadownloadmethodfordataaccess– GUIforbrowsingbyhumans– APIformachineaccess– ideallyincorporates search/query +download

• Performance painisprimarilyinthedatahandlingpiece– Rapidincrease indatascaleeclipsed legacysoftware stackcapabilities– Portalservers oftenstuckinenterprise network

• Canwe“disassemble” theportalandputthepiecesbacktogether better?– UseScienceDMZasaplatformforthedatapiece– Avoidplacingcomplexsoftware intheScienceDMZ

4/26/1616

Page 17: The Science DMZ - globusworld.org

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

4/26/1617

Page 18: The Science DMZ - globusworld.org

Next-GenerationPortalLeveragesScienceDMZ

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE

10GE

10GE10GE

DTN

DTN

API DTNs(data access governed

by portal)

DTN

DTN

perfSONAR

Filesystem (data store)

10GE

Portal Server

Browsing pathQuery path

Portal server applications:· web server· search· database· authentication

Data Path

Data Transfer Path

Portal Query/Browse Path

4/26/1618

Page 19: The Science DMZ - globusworld.org

PutTheDataOnDedicatedInfrastructure

• Wehaveseparatedthedatahandlingfromtheportallogic• Portalisstillitsnormalself,butenhanced

– PortalGUI,database,search,etc.allfunctionastheydidbefore– QueryreturnspointerstodataobjectsintheScienceDMZ– Portalisnowfreedfromtiestothedataservers(runitonAmazonifyouwant!)

• Datahandlingisseparate,andscalable– High-performanceDTNsintheScienceDMZ– Scaleasmuchasyouneedtowithoutmodifyingtheportalsoftware

• Outsourcedatahandlingtocomputingcenters– Computingcentersaresetupforlarge-scaledata– Letthemhandlethelarge-scaledata,andlettheportaldotheorchestrationofdataplacement

4/26/1619

Page 20: The Science DMZ - globusworld.org

EcosystemIsReadyForThis

• ScienceDMZsaredeployedatLabs,Universities, andcomputingcenters– XSEDEsites– DOEHPCfacilities– Manycampusclusters

• GlobusDTNsarepresent inmanyofthoseScienceDMZs– XSEDEsites– DOEHPCfacilities– Manycampusclusters

• Architectural changeallowsdataplacement atscale– Submitaquerytotheportal,Globusplaces thedataatanHPCfacility– RuntheanalysisattheHPCfacility– Theresultsaretheonlythingthatendsuponalaptoporworkstation

4/26/1620

Page 21: The Science DMZ - globusworld.org

LinksandLists

– ESnetfasterdataknowledgebase• http://fasterdata.es.net/

– ScienceDMZpaper• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf

– ScienceDMZemaillist• [email protected] withsubject"subscribeesnet-sciencedmz”

– perfSONAR• http://fasterdata.es.net/performance-testing/perfsonar/• http://www.perfsonar.net

– Globus• https://www.globus.org/

21 – ESnet Science Engagement ([email protected]) - 4/26/16 ©2015,EnergySciencesNetwork

Page 22: The Science DMZ - globusworld.org

Thanks!

[email protected](ESnet)LawrenceBerkeleyNational Laboratory

http://fasterdata.es.net/

http://my.es.net/

http://www.es.net/

Page 23: The Science DMZ - globusworld.org

ExtraSlides

4/26/1623

Page 24: The Science DMZ - globusworld.org

DTNClusterDetail

10GE10GE

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE10GE

10GE

10GE

10GE10GE

DTN

DTN

Filesystem

HEAD

“Sealed” DTNs(Globus only, no

shell access)

ClusterHead/Login

Nodes

DTN

DTN

Cluster compute nodes

HEAD

perfSONAR

Configure as DTN Cluster

4/26/1624

Page 25: The Science DMZ - globusworld.org

DTNClusterDesign

• ConfigureallfourDTNsasasingleGlobusendpoint– Globushasdocsonhowtodothis– https://support.globus.org/entries/71011547-How-do-I-add-multiple-I-O-nodes-to-a-Globus-endpoint-

• Recentoptionsforincreasedperformance– Useadditionalparallelconnections– DistributetransfersacrossmultipleDTNs(GlobusI/ONodes)– Critical– onlydothiswhenallDTNsintheendpointmountthesamesharedfilesystem

• UsetheGlobusCLIcommandendpoint-modify – Usethe--network-useoption– Adjustsconcurrencyandparallelism– Moreinfoatglobus.org (http://dev.globus.org/cli/reference/endpoint-modify/)

4/26/1625

Page 26: The Science DMZ - globusworld.org

SecurityFootprintofaGlobusTransfer

Amazon AWS

100GE

10GE10GE

100GE

10GE

10GE100GE

DATA

TCP ports50000-51000

Lab1 Science DMZ

Lab1 Border Router

ESnet 100GEESnet Router

Lab2 Border Router

Lab2 Science DMZ

Lab1 DTN

DTN DTN

OrchestrationOrchestration

Lab2 DTN

ESnet Router

Lab1 DTN security

filters

Lab2 DTN security

filters

TCP ports 443,2811, 7512

TCP ports 443,2811, 7512

Logical data path

Physical data path

Logical control path

Physical control path

Lab1 DTN security filters Lab2 DTN security filters

4/26/1626

Page 27: The Science DMZ - globusworld.org

SecurityFootprintofaGlobusDTN

4/26/1627

10GE

Amazon AWS

100GE

10GE

10GE

100GE

DATA

TCP ports50000-51000 Science DMZ

Site / Campus Border Router

World

DTN

DTN

Orchestration

Remote DTNs

DTN securityfilters

TCP ports 443,2811, 7512

DTN

DATA

Local DTN

Logical data path

Physical data path

Logical control path

Physical control path