19
Moving more data, faster (yours!) – the Science DMZ as an enabler Presented at DIRISA National Research Data Workshop by Kasandra Pillay – Senior Engineer, SANReN [email protected]

Moving more data, faster (yours!) – the Science DMZ as ... · Moving more data, faster (yours!) – the Science DMZ as an enabler Presented at DIRISA National Research Data Workshop

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Movingmoredata,faster(yours!)–theScienceDMZasanenabler

PresentedatDIRISANationalResearchDataWorkshop

byKasandraPillay–SeniorEngineer,SANReN

[email protected]

Overview

Architectures and tools for optimising big data transfers

especially for science and research – how fast can you go? •  Science DMZ network architecture

•  Data transfer nodes and tools

•  perfSONAR monitoring toolkit

•  Motivation for Science DMZ •  What is SANReN doing?

Science DMZ

“A Network Design Pattern for Data-Intensive Science”

•  Trademark of the Energy Sciences Network (ESnet – USA)

•  Built at or near lab or campus network perimeter

•  Optimised for high-performance scientific applications

•  Not general purpose / everyday business computing

•  Addresses common network performance problems

•  Tailored to high performance science applications, high- volume bulk data transfer

following slides used with permission… (http://fasterdata.es.net/science-dmz/science-dmz-community-presentation/)

ScienceDMZDesignPattern(Abstract)

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZSwitch/Router

Enterprise Border Router/Firewall

Site / CampusLAN

High performanceData Transfer Node

with high-speed storage

Per-service security policy control points

Clean, High-bandwidth

WAN path

Site / Campus access to Science

DMZ resources

perfSONAR

perfSONAR

perfSONAR

©2015,TheRegentsoftheUniversityofCalifornia,throughLawrenceBerkeleyNationalLaboratoryandislicensedunderCCBY-NC-ND4.0

3

Key components of the Science DMZ

•  Dedicated Network enclave

•  Dedicated software and systems for data transfer

•  Integrated performance measurement and monitoring (perfSONAR)

•  Tailored, performant security

4

ScienceDMZDesignPattern(Abstract)

5©2015,TheRegentsoftheUniversityofCalifornia,throughLawrenceBerkeleyNationalLaboratoryandislicensedunderCCBY-NC-ND4.0

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZSwitch/Router

Enterprise Border Router/Firewall

Site / CampusLAN

High performanceData Transfer Node

with high-speed storage

Per-service security policy control points

Clean, High-bandwidth

WAN path

Site / Campus access to Science

DMZ resources

perfSONAR

perfSONAR

perfSONAR

•  Networksareanessentialpartofdata-intensivescience–  Connectdatasourcestodataanalysis–  Connectcollaboratorstoeachother–  Enablemachine-consumableinterfacestodataandanalysisresources(e.g.portals),automation,scale

•  Performanceiscritical–  Exponentialdatagrowth–  Constanthumanfactors–  Datamovementanddataanalysismustkeepup

•  Effectiveuseofwidearea(long-haul)networksbyscientistshashistoricallybeendifficult

Motivation

©2015,TheRegentsoftheUniversityofCalifornia,throughLawrenceBerkeleyNationalLaboratoryandislicensedunderCCBY-NC-ND4.0

“but we’ve always shipped disks!”

6

DataMobilityinaGivenTimeInterval

This table available at:http://fasterdata.es.net/fasterdata-home/requirements-and-expectations/

©2015,TheRegentsoftheUniversityofCalifornia,throughLawrenceBerkeleyNationalLaboratoryandislicensedunderCCBY-NC-ND4.07

1 TB of data vs network speed

10 Mbps 300 hrs (12.5 days) 100 Mbps 30 hrs 1 Gbps 3 hrs 10 Gbps 20 mins

•  The disk subsystem can also be a bottleneck •  Parallel streams •  Don't try saturate the network - be nice •  Rule of thumb: 1/4 to 1/3 of a shared path that has

nominal background load •  E.g. 1 Gbps host: target 150-200 Mbps (20-25 MB/s)

= DHL: time to ship… L

A small amount of packet loss makes a huge difference in TCP performance

MetroArea

Local(LAN)

Regional

Continental

International

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

With loss, high performance beyond metro distances is essentially impossible

©2015,TheRegentsoftheUniversityofCalifornia,throughLawrenceBerkeleyNationalLaboratoryandislicensedunderCCBY-NC-ND4.0

PTA--CPT

10 G

1 G or <

9

DataTransferNode•  Highperformance•  Configuredspecifically

fordatatransfer•  Propertools•  GridFTP/Globus, etc.

perfSONAR•  Enablesfault

isolation•  Verifycorrect

operation•  Widelydeployedin

ESnetandothernetworks,aswellassitesandfacilities

TheScienceDMZDesignPattern

©2015,TheRegentsoftheUniversityofCalifornia,throughLawrenceBerkeleyNationalLaboratoryandislicensedunderCCBY-NC-ND4.0

Dedicated Systems for

Data Transfer

Network Architecture

Performance Testing &

Measurement

ScienceDMZ•  Dedicatednetwork

locationforhigh-speeddataresources

•  Appropriatesecurity•  Easytodeploy-no

needtoredesignthewholenetwork

10

Engagement with Network

Users

Engagement•  Resourcesand

Knowledgebase•  Partnerships•  Educationand

Consulting

Why Science DMZ?

•  Performance Improve performance of data-intensive research High-speed access to cloud resources

•  Usability Software to enable high-speed data transfer, e.g. Globus

•  Cost Delay expensive firewall upgrades High-speed switches, rather than expensive routers

•  Security Maintaining layered security, applied on both network and host

PERT Performance Enhancement Response Team

“Performance Enhancement Response Teams (PERTs) provide an investigation and consulting service to academic and research users on their

network performance issues.” Source: GEANT eduPERT

3 locations have been chosen and DTNs deployed for first phase:

§  Wits University (Johannesburg) §  CSIR (Pretoria) §  Teraco Data Centre (Rondebosch, Cape Town)

§  CHPC have their own Globus node that is operational and has been tested with the POC nodes.

§  Test DTN in SANReN lab

SANReN Proof of Concept

SANReN coverage map

Typical Use Cases for the Service

§  CSIR LandSAT generates 40TB of raw data and sends it in 1TB chunks for processing to the CHPC. Processed data results in 160TB of output data which needs to be transferred back to CSIR.

§  H3BioNet transfers > 100TB or human genome data nationally and internationally - regularly.

§  South African Weather Service transfers TB’s of data regularly between their systems and the CHPC for processing.

§  And many, many others.

Initial results?

§  All DTN’s are connected to the SANReN network at 10Gb/s. §  In real tests we are achieving between 1Gb/s and 6Gb/s or

real throughput nationally. §  And around 1Gb/s internationally. §  For example a 500GB file was transferred from CSIR to

Wits in 36 minutes at an effective throughput of ~1.7Gb/s – but this was at 1:20pm when SANReN is typically highly loaded.

§  A 100GB file was transferred from ESnet in a time of 21 minutes at an effective throughput of 1Gb/s – also in the middle of the work day.

What’s next?

§  Engagement with potential users will happen to determine specific requirements, use cases and required tools.

§  Additional data transfer tools will be setup on the DTNs based on user/project requirements (Using Globus)

§  Monitor the POC. §  Engage with users and potential sites for more DTN’s. §  Workshop additional requirements with users of the

service. §  International tests

Thanks!

Questions?

For more information and resources contact [email protected]