1 ESnet - Connecting the USA DOE Labs to the World of Science Eli Dart, Network Engineer Network Engineering Group Chinese American Network Symposium Indianapolis,

1

ESnet - Connecting the USA DOE Labs to the

World of Science

Eli Dart, Network EngineerNetwork Engineering Group

Chinese American Network SymposiumIndianapolis, IndianaOctober 20, 2008

Energy Sciences NetworkLawrence Berkeley National Laboratory

Networking for the Future of Science

2

Overview

• ESnet and the DOE Office of Science need for high performance networks

• ESnet4 architecture

• The network as a tool for science – performance

• Enabling Chinese-American science collaborations

3

DOE Office of Science and ESnet – the ESnet Mission

• ESnet’s primary mission is to enable the large-scale science that is the mission of the Office of Science (SC) and that depends on:– Sharing of massive amounts of data

– Supporting thousands of collaborators world-wide

– Distributed data processing

– Distributed data management

– Distributed simulation, visualization, and computational steering

– Collaboration with the US and International Research and Education community

• ESnet provides network and collaboration services to Office of Science laboratories and many other DOE programs in order to accomplish its mission

• ESnet is the sole provider of high-speed connectivity to most DOE national laboratories

4

The “New Era” of Scientific Data• Modern science is completely dependent on high-speed

networking– As the instruments of science get larger and more sophisticated, the

cost goes up to the point where only a very few are built (e.g. one LHC, one ITER, one James Webb Space Telescope, etc.)• The volume of data generated by these instruments is going up

exponentially– These instruments are mostly based on solid state sensors and so follow the

same Moore’s Law as do computer CPUs, though the technology refresh cycle for instruments is 10-20 years rather than 1.5 years for CPUs

– The data volume is at the point where modern computing and storage technology are at their very limits trying to manage the data

• It takes world-wide collaborations of large numbers of scientists to conduct the science and analyze the data from a single instrument, and so the data from the instrument must be distributed all over the world

– The volume of data generated by such instruments has reached the level of many petabytes/year – the point where dedicated 10 – 100 Gb/s networks that span the country and internationally are required to distribute the data

5

Networks for The “New Era” of Scientific Data

• Designing and building networks and providing suitable network services to support science data movement has pushed R&E networks to the forefront of network technology: There are currently no commercial networks that handle the size of the individual data flows generated by modern science

– The aggregate of small flows in commercial networks is, of course, much larger – but not by as much as one might think – the Google networks only transport about 1000x the amount of data the ESnet transports

• What do the modern systems of science look like?

– They are highly distributed and bandwidth intensive

LHC will be the largest scientific experiment and generate the most data that the scientific community has ever tried to manage.

The data management model involves a world-wide collection of data centers that store, manage, and analyze the data and that are integrated through network connections with typical speeds in the 10+ Gbps range.

closely coordinated and interdependent distributed systems

that must have predictable

intercommunication for effective functioning

CMS is one of two major experiments – each generates comparable amounts of

data

7

The “new era” of science data will likely tax network technology

• Individual Labs now fill 10G links – Fermilab (an LHC Tier 1 Data Center) has 5 X 10Gb/s links to ESnet hubs in Chicago and can easily fill one or more of them for sustained periods of time

• The “casual” increases in overall network capacity are less likely to easily meet future needs

historical estimated

Data courtesy of Harvey Newman, Caltech, and Richard Mount, SLAC

1 Petabyte

1 Exabyte

Exp

erim

ent

Gen

erat

ed

Dat

a, B

ytes

8

Planning the Future Network

1) Data characteristics of instruments and facilities

– What data will be generated by instruments coming on-line over the next 5-10 years (including supercomputers)?

2) Examining the future process of science

– How and where will the new data be analyzed and used – that is, how will the process of doing science change over 5-10 years?

3) Observing traffic patterns

– What do the trends in network patterns predict for future network needs?

Motivation for Overall Capacity: ESnet Traffic has Increased by10X Every 47 Months, on Average, Since 1990

Ter

abyt

es /

mo

nth

Log Plot of ESnet Monthly Accepted Traffic, January, 1990 – January 2008

Oct., 19931 TBy/mo.

Aug., 1990100 GBy/mo.

Jul., 199810 TBy/mo.

38 months

57 months

40 months

Nov., 2001100 TBy/mo.

Apr., 20061 PBy/mo.

53 months

= the R&E source or destination of ESnet’s top 100 sites (all R&E)(the DOE Lab destination or source of each flow is not shown)

The International Collaborators of DOE’s Office of Science Drive ESnet Design for International Connectivity

Currently most of ESnet’s traffic (>85%) goes to and comes from outside of ESnet. This reflects the highly collaborative nature of large-scale science (which is one of the main

focuses of DOE’s Office of Science).

11

ESnet total traffic, TBy/mo, January 1990 – April 2008

A small number of large data flows now dominate the network traffic – this motivates virtual circuits as a key network service

LHC Tier 1site

FNAL Outbound Traffic(courtesy Phil DeMar, Fermilab)

12

Requirements from Scientific Instruments and Facilities

• Bandwidth

– Adequate network capacity to ensure timely movement of data produced by the facilities

• Connectivity

– Geographic reach sufficient to connect users and analysis systems to SC facilities

• Services

– Guaranteed bandwidth, traffic isolation, end-to-end monitoring

– Network service delivery architecture

• Service Oriented Architecture / Grid / “Systems of Systems”

13

ESnet Architecture - ESnet4

• ESnet4 was built to address specific Office of Science program requirements. The result is a much more complex and much higher capacity network.

ESnet3 2000 to 2005:• A routed IP network with sites

singly attached to a national core ring

• Very little peering redundancy

ESnet4 in 2008:• The new Science Data Network (blue) is a switched network providing guaranteed bandwidth for large data movement

• All large science sites are dually connected on metro area rings or dually connected directly to core ring for reliability

• Rich topology increases the reliability of the network

14

ESnet4 – IP and SDN

• ESnet4 is one network with two “sides”

– The IP network is a high capacity (10G) best-effort routed infrastructure• Rich commodity peering infrastructure ensures global connectivity

• Diverse R&E peering infrastructure provides full global high-bandwidth connectivity for scientific collaboration

• High performance – 10G of bandwidth is adequate for many scientific collaborations

• Services such as native IPv6 and multicast

– Science Data Network (SDN) is a virtual circuit infrastructure with bandwidth guarantees and traffic engineering capabilities• Highly scalable – just add more physical circuits as demand increases

• Interoperable – compatible with virtual circuit infrastructures deployed by Internet2, CANARIE, GEANT and others

• Guaranteed bandwidth

• Interdomain demark is a VLAN tag – virtual circuits can be delivered to sites or other networks even when end2end reservations are not possible

15

ESnet4 Backbone Projected for December 2008

Las Vegas

Seattle

Su

nn

yv

ale

LA

San D

iego

Raleigh

Jacksonville

KC

El Paso

Albuq.Tulsa

Clev.

Boise

Wash. DCSLC

Port.

BatonRougeHouston

Pitts.

NY

C

Boston

Atlanta

Nashville

ESnet IP coreESnet Science Data Network coreESnet SDN core, NLR links (existing)Lab supplied linkMAN linkInternational IP Connections

Layer 1 optical nodes - eventual ESnet Points of Presence

ESnet IP switch only hubs

ESnet IP switch/router hubs

ESnet SDN switch hubs

Layer 1 optical nodes not currently in ESnet plans

Lab site

SDSC

StarLight

20G

20G

20G

20G

20G

MA

N L

AN

(Aof

A)

Lab site – independent dual connect.

USLHC

USLHC

GA

LLNL

LANL

ORNL

FNAL

BNL

PNNL

PhilDenver

LHC/CERN

Chicago

16

ESnet4 As Planned for 2010

Las Vegas

Seattle

Su

nn

yv

ale

LA

San D

iego

Raleigh

Jacksonville

KC

El Paso

Albuq.Tulsa

Clev.

Boise

Wash. DCSLC

Port.

BatonRougeHouston

Pitts.

NY

C

Boston

Atlanta

Nashville

Layer 1 optical nodes - eventual ESnet Points of Presence

ESnet IP switch only hubs

ESnet IP switch/router hubs

ESnet SDN switch hubs

Layer 1 optical nodes not currently in ESnet plans

Lab site

SDSC

StarLight

50G

50G

50G

50G

50G

MA

N L

AN

(Aof

A)

Lab site – independent dual connect.

USLHC

USLHC

GA

LLNL

LANL

ORNL

FNAL

BNL

PNNL

PhilDenver

LHC/CERN

Chicago

50G

40G

40G

40G

40G

30G

40G

30G

30G

40G

50G

40G

40G

ESnet IP coreESnet Science Data Network coreESnet SDN core, NLR links (existing)Lab supplied linkMAN linkInternational IP Connections

17

Traffic Engineering on SDN – OSCARS

• ESnet On-demand Secure Circuits and Advance Reservation System (OSCARS) http://www.es.net/oscars/

• Provides edge to edge layer 2 or layer 3 virtual circuits across ESnet

– Guaranteed bandwidth

– Advance reservation

• Interoperates with many other virtual circuit infrastructures to provide end2end guaranteed bandwidth service for geographically dispersed scientific collaborations (see next slide)

– Interoperability is critical, since science traffic flows cross many administrative domains in the general case

18

OSCARS Interdomain Collaborative Efforts– Terapaths

• Inter-domain interoperability for layer 3 virtual circuits demonstrated (3Q06)• Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07)

– LambdaStation• Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07)

– Internet2 DCN/DRAGON• Inter-domain exchange of control messages demonstrated (1Q07)• Integration of OSCARS and DRAGON has been successful (1Q07)

– GEANT2 AutoBAHN• Inter-domain reservation demonstrated at SC07 (4Q07)

– DICE• First draft of topology exchange schema has been formalized (in collaboration with NMWG)

(2Q07), interoperability test demonstrated 3Q07• Initial implementation of reservation and signaling messages demonstrated at SC07 (4Q07)

– Nortel• Topology exchange demonstrated successfully 3Q07 • Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07)

– UVA• Demonstrated token based authorization concept with OSCARS at SC07 (4Q07)

– OGF NML-WG• Actively working to combine work from NMWG and NDL• Documents and UML diagram for base concepts have been drafted (2Q08)

– GLIF GNI-API WG• In process of designing common API and reference middleware implementation

19

The Network As A Tool For Science

• Science is becoming much more data intensive

– Data movement is one of the great challenges facing many scientific collaborations

– Getting the data to the right place is important

– Scientific productivity follows data locality

• Therefore, a high-performance network that enables high-speed data movement as a functional service is a tool for enhancing scientific productivity and enabling new scientific paradigms

• Users often do not know how to use the network effectively without help – in order to be successful, networks must provide usable services to scientists

20

Some user groups need more help than others• Collaborations with a small number of scientists typically do not have

network tuning expertise

– They rely on their local system and network admins (or grad students)

– They often don’t have much data to move (typically <1TB)

– Therefore, they avoid using the network for data transfer if possible

• Mid-sized collaborations have a lot more data, but similar expertise limitations

– More scientists per collaboration, much larger data sets (10s to 100s of terabytes)

– Most mid-sized collaborations still rely on local system and networking staff, or supercomputer center system and networking staff

• Large collaborations (HEP, NP) are big enough to have their own internal software shops

– Dedicated people for networking, performance tuning, etc

– Typically need much less help

– Often held up (erroneously) as an example to smaller collaborations

• These groupings are arbitrary and approximate, but this taxonomy illustrates some points of leverage (e.g. data sources, supercomputer centers)

21

0

1

Up to 1TB 1 - 10 TB 10 - 500 TB 0.5 - 10 PB

Rough user grouping by collaboration data set sizeS

cien

tists

per

col

labo

ratio

nH

igh

Low

Low

Hig

h

Approximate data set size

Num

ber

of c

olla

bora

tions

Low

Hig

h

Small data instrument science

(Light Source users, Nanoscience Centers, Microscopy)

Supercomputer simulations

(Climate, Fusion, Bioinformatics)

Large data instrument science (HEP, NP)A few large collaborations have internal

software and networking groups

22

Bandwidth necessary to transfer Y bytes in X time

10PB 25,020.0 Gbps 3,127.5 Gbps 1,042.5 Gbps 148.9 Gbps 34.7 Gbps

1PB 2,502.0 Gbps 312.7 Gbps 104.2 Gbps 14.9 Gbps 3.5 Gbps

100TB 244.3 Gbps 30.5 Gbps 10.2 Gbps 1.5 Gbps 339.4 Mbps

10TB 24.4 Gbps 3.1 Gbps 1.0 Gbps 145.4 Mbps 33.9 Mbps

1TB 2.4 Gbps 305.4 Mbps 101.8 Mbps 14.5 Mbps 3.4 Mbps

100GB 238.6 Mbps 29.8 Mbps 9.9 Mbps 1.4 Mbps 331.4 Kbps

10GB 23.9 Mbps 3.0 Mbps 994.2 Kbps 142.0 Kbps 33.1 Kbps

1GB 2.4 Mbps 298.3 Kbps 99.4 Kbps 14.2 Kbps 3.3 Kbps

100MB 233.0 Kbps 29.1 Kbps 9.7 Kbps 1.4 Kbps 0.3 Kbps

1H 8H 24H 7Days 30Days

23

How Can Networks Enable Science?• Build the network infrastructure with throughput in mind– Cheap switches often have tiny internal buffers and cannot reliably carry high-

speed flows over long distances– Fan-in is a significant problem that must be accounted for– Every device in the path matters – routers, switches, firewalls, whatever– Firewalls often cause problems that are hard to diagnose (in many cases,

routers can provide equivalent security without degrading performance)

• Provide visibility into the network– Test and measurement hosts are critical– Many test points in the network better problem isolation– If possible, buy routers that can count packets reliably because sometimes this

is the only way to find the problem– PerfSONAR is being widely deployed for end-to-end network monitoring

• Work with the science community– Don’t wait for users to figure it out on their own– Work with major resources to help tune data movement services between

dedicated hosts– Remember that data transfer infrastructures are systems of systems – success

usually requires collaboration between LAN, WAN, Storage and Security– Provide information to help users – e.g. http://fasterdata.es.net/

24

Enabling Chinese-American Science Collaborations• There are several current collaborations between US DOE

laboratories and Chinese institutions– LHC/CMS requires data movement between IHEP and Fermilab– Daya Bay Neutrino Experiment requires data movement between

detectors at Daya Bay and NERSC at Lawrence Berkeley National Laboratory and Brookhaven National Laboratory

– EAST Tokamak – collaboration with US Fusion Energy Sciences sites such as General Atomics

– Others to come, I’m sure

• Getting data across the Pacific can be difficult (250 millisecond round trip times are common)

• However, we know this can be done because others have succeeded– 1Gbps host to host network throughput between Brookhaven and KISTI

in South Korea – this is expected to be 3-5 hosts wide in production– 60MB/sec per data mover from Brookhaven to CCJ in Japan (typically 6

hosts wide, for a total of 360MB/sec or 2.8Gbps)

• We look forward to working together to enable the scientific collaborations of our constituents!

25

Questions?

• http://www.es.net/

• http://fasterdata.es.net/