Upload
coleen-ramsey
View
212
Download
0
Embed Size (px)
Citation preview
1
ESnet - Connecting the USA DOE Labs to the
World of Science
Eli Dart, Network EngineerNetwork Engineering Group
Chinese American Network SymposiumIndianapolis, IndianaOctober 20, 2008
Energy Sciences NetworkLawrence Berkeley National Laboratory
Networking for the Future of Science
2
Overview
• ESnet and the DOE Office of Science need for high performance networks
• ESnet4 architecture
• The network as a tool for science – performance
• Enabling Chinese-American science collaborations
3
DOE Office of Science and ESnet – the ESnet Mission
• ESnet’s primary mission is to enable the large-scale science that is the mission of the Office of Science (SC) and that depends on:– Sharing of massive amounts of data
– Supporting thousands of collaborators world-wide
– Distributed data processing
– Distributed data management
– Distributed simulation, visualization, and computational steering
– Collaboration with the US and International Research and Education community
• ESnet provides network and collaboration services to Office of Science laboratories and many other DOE programs in order to accomplish its mission
• ESnet is the sole provider of high-speed connectivity to most DOE national laboratories
4
The “New Era” of Scientific Data• Modern science is completely dependent on high-speed
networking– As the instruments of science get larger and more sophisticated, the
cost goes up to the point where only a very few are built (e.g. one LHC, one ITER, one James Webb Space Telescope, etc.)• The volume of data generated by these instruments is going up
exponentially– These instruments are mostly based on solid state sensors and so follow the
same Moore’s Law as do computer CPUs, though the technology refresh cycle for instruments is 10-20 years rather than 1.5 years for CPUs
– The data volume is at the point where modern computing and storage technology are at their very limits trying to manage the data
• It takes world-wide collaborations of large numbers of scientists to conduct the science and analyze the data from a single instrument, and so the data from the instrument must be distributed all over the world
– The volume of data generated by such instruments has reached the level of many petabytes/year – the point where dedicated 10 – 100 Gb/s networks that span the country and internationally are required to distribute the data
5
Networks for The “New Era” of Scientific Data
• Designing and building networks and providing suitable network services to support science data movement has pushed R&E networks to the forefront of network technology: There are currently no commercial networks that handle the size of the individual data flows generated by modern science
– The aggregate of small flows in commercial networks is, of course, much larger – but not by as much as one might think – the Google networks only transport about 1000x the amount of data the ESnet transports
• What do the modern systems of science look like?
– They are highly distributed and bandwidth intensive
LHC will be the largest scientific experiment and generate the most data that the scientific community has ever tried to manage.
The data management model involves a world-wide collection of data centers that store, manage, and analyze the data and that are integrated through network connections with typical speeds in the 10+ Gbps range.
closely coordinated and interdependent distributed systems
that must have predictable
intercommunication for effective functioning
CMS is one of two major experiments – each generates comparable amounts of
data
7
The “new era” of science data will likely tax network technology
• Individual Labs now fill 10G links – Fermilab (an LHC Tier 1 Data Center) has 5 X 10Gb/s links to ESnet hubs in Chicago and can easily fill one or more of them for sustained periods of time
• The “casual” increases in overall network capacity are less likely to easily meet future needs
historical estimated
Data courtesy of Harvey Newman, Caltech, and Richard Mount, SLAC
1 Petabyte
1 Exabyte
Exp
erim
ent
Gen
erat
ed
Dat
a, B
ytes
8
Planning the Future Network
1) Data characteristics of instruments and facilities
– What data will be generated by instruments coming on-line over the next 5-10 years (including supercomputers)?
2) Examining the future process of science
– How and where will the new data be analyzed and used – that is, how will the process of doing science change over 5-10 years?
3) Observing traffic patterns
– What do the trends in network patterns predict for future network needs?
Motivation for Overall Capacity: ESnet Traffic has Increased by10X Every 47 Months, on Average, Since 1990
Ter
abyt
es /
mo
nth
Log Plot of ESnet Monthly Accepted Traffic, January, 1990 – January 2008
Oct., 19931 TBy/mo.
Aug., 1990100 GBy/mo.
Jul., 199810 TBy/mo.
38 months
57 months
40 months
Nov., 2001100 TBy/mo.
Apr., 20061 PBy/mo.
53 months
= the R&E source or destination of ESnet’s top 100 sites (all R&E)(the DOE Lab destination or source of each flow is not shown)
The International Collaborators of DOE’s Office of Science Drive ESnet Design for International Connectivity
Currently most of ESnet’s traffic (>85%) goes to and comes from outside of ESnet. This reflects the highly collaborative nature of large-scale science (which is one of the main
focuses of DOE’s Office of Science).
11
ESnet total traffic, TBy/mo, January 1990 – April 2008
A small number of large data flows now dominate the network traffic – this motivates virtual circuits as a key network service
LHC Tier 1site
FNAL Outbound Traffic(courtesy Phil DeMar, Fermilab)
12
Requirements from Scientific Instruments and Facilities
• Bandwidth
– Adequate network capacity to ensure timely movement of data produced by the facilities
• Connectivity
– Geographic reach sufficient to connect users and analysis systems to SC facilities
• Services
– Guaranteed bandwidth, traffic isolation, end-to-end monitoring
– Network service delivery architecture
• Service Oriented Architecture / Grid / “Systems of Systems”
13
ESnet Architecture - ESnet4
• ESnet4 was built to address specific Office of Science program requirements. The result is a much more complex and much higher capacity network.
ESnet3 2000 to 2005:• A routed IP network with sites
singly attached to a national core ring
• Very little peering redundancy
ESnet4 in 2008:• The new Science Data Network (blue) is a switched network providing guaranteed bandwidth for large data movement
• All large science sites are dually connected on metro area rings or dually connected directly to core ring for reliability
• Rich topology increases the reliability of the network
14
ESnet4 – IP and SDN
• ESnet4 is one network with two “sides”
– The IP network is a high capacity (10G) best-effort routed infrastructure• Rich commodity peering infrastructure ensures global connectivity
• Diverse R&E peering infrastructure provides full global high-bandwidth connectivity for scientific collaboration
• High performance – 10G of bandwidth is adequate for many scientific collaborations
• Services such as native IPv6 and multicast
– Science Data Network (SDN) is a virtual circuit infrastructure with bandwidth guarantees and traffic engineering capabilities• Highly scalable – just add more physical circuits as demand increases
• Interoperable – compatible with virtual circuit infrastructures deployed by Internet2, CANARIE, GEANT and others
• Guaranteed bandwidth
• Interdomain demark is a VLAN tag – virtual circuits can be delivered to sites or other networks even when end2end reservations are not possible
15
ESnet4 Backbone Projected for December 2008
Las Vegas
Seattle
Su
nn
yv
ale
LA
San D
iego
Raleigh
Jacksonville
KC
El Paso
Albuq.Tulsa
Clev.
Boise
Wash. DCSLC
Port.
BatonRougeHouston
Pitts.
NY
C
Boston
Atlanta
Nashville
ESnet IP coreESnet Science Data Network coreESnet SDN core, NLR links (existing)Lab supplied linkMAN linkInternational IP Connections
Layer 1 optical nodes - eventual ESnet Points of Presence
ESnet IP switch only hubs
ESnet IP switch/router hubs
ESnet SDN switch hubs
Layer 1 optical nodes not currently in ESnet plans
Lab site
SDSC
StarLight
20G
20G
20G
20G
20G
MA
N L
AN
(Aof
A)
Lab site – independent dual connect.
USLHC
USLHC
GA
LLNL
LANL
ORNL
FNAL
BNL
PNNL
PhilDenver
LHC/CERN
Chicago
16
ESnet4 As Planned for 2010
Las Vegas
Seattle
Su
nn
yv
ale
LA
San D
iego
Raleigh
Jacksonville
KC
El Paso
Albuq.Tulsa
Clev.
Boise
Wash. DCSLC
Port.
BatonRougeHouston
Pitts.
NY
C
Boston
Atlanta
Nashville
Layer 1 optical nodes - eventual ESnet Points of Presence
ESnet IP switch only hubs
ESnet IP switch/router hubs
ESnet SDN switch hubs
Layer 1 optical nodes not currently in ESnet plans
Lab site
SDSC
StarLight
50G
50G
50G
50G
50G
MA
N L
AN
(Aof
A)
Lab site – independent dual connect.
USLHC
USLHC
GA
LLNL
LANL
ORNL
FNAL
BNL
PNNL
PhilDenver
LHC/CERN
Chicago
50G
40G
40G
40G
40G
30G
40G
30G
30G
40G
50G
40G
40G
ESnet IP coreESnet Science Data Network coreESnet SDN core, NLR links (existing)Lab supplied linkMAN linkInternational IP Connections
17
Traffic Engineering on SDN – OSCARS
• ESnet On-demand Secure Circuits and Advance Reservation System (OSCARS) http://www.es.net/oscars/
• Provides edge to edge layer 2 or layer 3 virtual circuits across ESnet
– Guaranteed bandwidth
– Advance reservation
• Interoperates with many other virtual circuit infrastructures to provide end2end guaranteed bandwidth service for geographically dispersed scientific collaborations (see next slide)
– Interoperability is critical, since science traffic flows cross many administrative domains in the general case
18
OSCARS Interdomain Collaborative Efforts– Terapaths
• Inter-domain interoperability for layer 3 virtual circuits demonstrated (3Q06)• Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07)
– LambdaStation• Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07)
– Internet2 DCN/DRAGON• Inter-domain exchange of control messages demonstrated (1Q07)• Integration of OSCARS and DRAGON has been successful (1Q07)
– GEANT2 AutoBAHN• Inter-domain reservation demonstrated at SC07 (4Q07)
– DICE• First draft of topology exchange schema has been formalized (in collaboration with NMWG)
(2Q07), interoperability test demonstrated 3Q07• Initial implementation of reservation and signaling messages demonstrated at SC07 (4Q07)
– Nortel• Topology exchange demonstrated successfully 3Q07 • Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07)
– UVA• Demonstrated token based authorization concept with OSCARS at SC07 (4Q07)
– OGF NML-WG• Actively working to combine work from NMWG and NDL• Documents and UML diagram for base concepts have been drafted (2Q08)
– GLIF GNI-API WG• In process of designing common API and reference middleware implementation
19
The Network As A Tool For Science
• Science is becoming much more data intensive
– Data movement is one of the great challenges facing many scientific collaborations
– Getting the data to the right place is important
– Scientific productivity follows data locality
• Therefore, a high-performance network that enables high-speed data movement as a functional service is a tool for enhancing scientific productivity and enabling new scientific paradigms
• Users often do not know how to use the network effectively without help – in order to be successful, networks must provide usable services to scientists
20
Some user groups need more help than others• Collaborations with a small number of scientists typically do not have
network tuning expertise
– They rely on their local system and network admins (or grad students)
– They often don’t have much data to move (typically <1TB)
– Therefore, they avoid using the network for data transfer if possible
• Mid-sized collaborations have a lot more data, but similar expertise limitations
– More scientists per collaboration, much larger data sets (10s to 100s of terabytes)
– Most mid-sized collaborations still rely on local system and networking staff, or supercomputer center system and networking staff
• Large collaborations (HEP, NP) are big enough to have their own internal software shops
– Dedicated people for networking, performance tuning, etc
– Typically need much less help
– Often held up (erroneously) as an example to smaller collaborations
• These groupings are arbitrary and approximate, but this taxonomy illustrates some points of leverage (e.g. data sources, supercomputer centers)
21
0
1
Up to 1TB 1 - 10 TB 10 - 500 TB 0.5 - 10 PB
Rough user grouping by collaboration data set sizeS
cien
tists
per
col
labo
ratio
nH
igh
Low
Low
Hig
h
Approximate data set size
Num
ber
of c
olla
bora
tions
Low
Hig
h
Small data instrument science
(Light Source users, Nanoscience Centers, Microscopy)
Supercomputer simulations
(Climate, Fusion, Bioinformatics)
Large data instrument science (HEP, NP)A few large collaborations have internal
software and networking groups
22
Bandwidth necessary to transfer Y bytes in X time
10PB 25,020.0 Gbps 3,127.5 Gbps 1,042.5 Gbps 148.9 Gbps 34.7 Gbps
1PB 2,502.0 Gbps 312.7 Gbps 104.2 Gbps 14.9 Gbps 3.5 Gbps
100TB 244.3 Gbps 30.5 Gbps 10.2 Gbps 1.5 Gbps 339.4 Mbps
10TB 24.4 Gbps 3.1 Gbps 1.0 Gbps 145.4 Mbps 33.9 Mbps
1TB 2.4 Gbps 305.4 Mbps 101.8 Mbps 14.5 Mbps 3.4 Mbps
100GB 238.6 Mbps 29.8 Mbps 9.9 Mbps 1.4 Mbps 331.4 Kbps
10GB 23.9 Mbps 3.0 Mbps 994.2 Kbps 142.0 Kbps 33.1 Kbps
1GB 2.4 Mbps 298.3 Kbps 99.4 Kbps 14.2 Kbps 3.3 Kbps
100MB 233.0 Kbps 29.1 Kbps 9.7 Kbps 1.4 Kbps 0.3 Kbps
1H 8H 24H 7Days 30Days
23
How Can Networks Enable Science?• Build the network infrastructure with throughput in mind– Cheap switches often have tiny internal buffers and cannot reliably carry high-
speed flows over long distances– Fan-in is a significant problem that must be accounted for– Every device in the path matters – routers, switches, firewalls, whatever– Firewalls often cause problems that are hard to diagnose (in many cases,
routers can provide equivalent security without degrading performance)
• Provide visibility into the network– Test and measurement hosts are critical– Many test points in the network better problem isolation– If possible, buy routers that can count packets reliably because sometimes this
is the only way to find the problem– PerfSONAR is being widely deployed for end-to-end network monitoring
• Work with the science community– Don’t wait for users to figure it out on their own– Work with major resources to help tune data movement services between
dedicated hosts– Remember that data transfer infrastructures are systems of systems – success
usually requires collaboration between LAN, WAN, Storage and Security– Provide information to help users – e.g. http://fasterdata.es.net/
24
Enabling Chinese-American Science Collaborations• There are several current collaborations between US DOE
laboratories and Chinese institutions– LHC/CMS requires data movement between IHEP and Fermilab– Daya Bay Neutrino Experiment requires data movement between
detectors at Daya Bay and NERSC at Lawrence Berkeley National Laboratory and Brookhaven National Laboratory
– EAST Tokamak – collaboration with US Fusion Energy Sciences sites such as General Atomics
– Others to come, I’m sure
• Getting data across the Pacific can be difficult (250 millisecond round trip times are common)
• However, we know this can be done because others have succeeded– 1Gbps host to host network throughput between Brookhaven and KISTI
in South Korea – this is expected to be 3-5 hosts wide in production– 60MB/sec per data mover from Brookhaven to CCJ in Japan (typically 6
hosts wide, for a total of 360MB/sec or 2.8Gbps)
• We look forward to working together to enable the scientific collaborations of our constituents!
25
Questions?
• http://www.es.net/
• http://fasterdata.es.net/