View
212
Download
0
Category
Preview:
Citation preview
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester1
Broadband Protocols
WP 1.2.1
IP protocols, Lambda switching, multicasting
Richard Hughes-Jones The University of Manchester
www.hep.man.ac.uk/~rich/ then “Talks”
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester2
Protocols Document
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester3
Protocols Document 1
“Protocol Investigation for eVLBI Data Transfer” Document JRA-WP1.2.1.001 Jodrell & Manchester folks with hard work from Matt Completed and on the EXPReS WIKI
Introduces e-VLBI and its Networking Requirements Continuously streamed data Individual packets are not particularly valuable. Maintenance of the data rate is important Quite different to those where bit-wise correct transmission is required
e.g. file transfer Forms a valuable use case for GGF GHPN-RG
Presents the actions required in order to make an informed decision and to implement suitable protocols in the European VLBI Network. Strategy document.
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester4
Protocols Document 2
Protocols considered for investigation include: TCP/IP UDP/IP DCCP/IP VSI-E RTP/UDP/IP Remote Direct Memory Access TCP Offload Engines
Very useful discussions at Haystack VLBI meeting Agreement to make joint tests Haystack-Jodrell Use of ESLEA 1 Gbit transatlantic link
Work in progress – Links to ESLEA UK e-science Vlbi-udp – Simon: UDP/IP stability & the effect of packet loss on
correlations
Tcpdelay – Stephen: TCP/IP and CBR data
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester5
tcpdelay
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester6
tcpdelay: VLBI Application Protocol
Want to examine how TCP moves Constant Bit Rate Data tcpdelay a test program:
instrumented TCP program emulates sending CBR Data.
Records relative 1-way delay Record TCP Stack activity with web100
n bytes
Number of packets
Wait timetime
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester7
VLBI Application Protocol
Data1
●●●
Timestamp1
Time
TCP & Network Receiver
Timestamp2
Sender
Data2Timestamp4
Timestamp5
Data4
Timestamp3
Data3
Packet loss
VLBI data is produced at Constant Bit Rate
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester8
Visualising the Results
When packet loss is detected TCP: Reduces Cwnd Halves the sending rate
Expect a delay in the message arrival time
Message number / Time
Packet lossDelay in stream
Expected arrival time at CBR
Arrival time
Stephen Kershaw
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester9
Arrival Times: UKLight JB-JIVE-Manc
Message size: 1448 Bytes Wait time: 22 us Data Rate: 525 Mbit/s Route:
JB-UKLight-JIVE-UKLight-Man
RTT ~27 ms
TCP buffer 32M bytes
BDP @512Mbit 1.8Mbyte Estimate catchup possible
if loss < 1 in 1.24M
0 1 2 3 4 5 6 7 8 9 10
x 104
5
10
15
20
25
30
35
40
45
50
Message number
Tim
e /
s
Effect of loss rate on message arrival time
Drop 1 in 5k
Drop 1 in 10k
Drop 1 in 20kDrop 1 in 40k
No loss
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester10
TCP Web100: JB-Manc – Large buffer
Message size: 1448 Bytes Wait time: 22 Data Rate: 525 Mbit/s Route:
JB-UKLight-JIVE-UKLight-Man
RTT ~27 ms
Standard TCP TCP buffer 930k Drop 1 in 40,000 packets Classic Cwnd behaviour
Limited by ssthresh ! TCP requires much care!!
0
100000
200000
300000
400000
500000
600000
5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000
time ms
Da
ta B
yte
s O
ut
0
100000
200000
300000
400000
500000
600000
Cw
nd
DataBytesOut (Delta)DataBytesIn (Delta)CurCwnd (Value)
0
50
100
150
200
250
300
350
5000 7000 9000 11000 13000 15000time ms
Nu
m.
Du
p A
CK
s
0
0.5
1
1.5
2
2.5
3
5000 7000 9000 11000 13000 15000time ms
pkt r
e-tran
sm
it
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester11
iBOB
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester12
Prototype iBOB with two sampler boards attached
FPGA based signal processing board from UC Berkeley
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester13
Bryan Anderson
Station
Board
RAM
iBOB10GE
VSI
10GE CX4
systembasedDisk
VSI or headstack
CX4 - fibre media converter
iBOB block diagram
10 Gigabit Ethernet now available UDP/IP module exists Use for Demonstration of FPGA driven
IP networking Link to PC NIC – diagnostics Test over GÉANT Onsala - Jodrell
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester14
What is inside GÉANT2 What is the collaboration interesting?
10 Gigabit Ethernet UDP memory-2-memory flows TCP flows with allocated Bandwidth
Options using GÉANT Development Network 10 Gbit SDH Network
Options Using the GÉANT LightPath Service PoP Location for Network tests
Multi-Gigabit Trials on GEANT
Collaboration with Dante.
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester15
GÉANT2 Topology
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester16
GÉANT2: The Convergence Solution
NREN AccessNREN Access
ExistingIP Router
ExistingIP Router
GÉANT2POP B
GÉANT2POP A
Managed Lambda’s
1626 LM
1626 LM
L2Matrix
L2
TDM Matrix
TDM
1678 MCC
1678 MCCDar
k F
iber
EXPReS PC10 GE
EXPReS PC10 GE
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester17
From PoS to EthernetConnect. Communicate. Collaborate
•More Economical Architecture
•Highest Overall Network Availability
•Flexibility (VLAN management)
•Highest Network Performance (Latency)
Router
IP Links
1/10 Gigabit Ethernet
VC-4-nv Channels
L2Matrix
TDM Matrix
1678 MCCTransport Node
VLANs
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester18
What do we want to do?
Set up 4 Gigabit Lightpath Between GÉANT PoPs Collaboration with Dante PCs in their PoPs with 10 Gigabit NICs
VLBI Tests: UDP Performance
Throughput, jitter, packet loss, 1-way delay, stability Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP performance with current kernels Experience for FPGA Ethernet packet systems
Dante Interests: multi-Gigabit TCP performance The effect of (Alcatel) buffer size on bursty TCP when using BW limited
Lightpaths
Need A Collaboration Agreement
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester19
Options Using the GÉANT Development Network
10 Gigabit SDH backbone Alkatel 1678 MCC Node location:
London Amsterdam Paris Prague Frankfurt
Can do traffic routingso make long rtt paths
Available Dec/Jan 07 Less Pressure for
long term tests
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester20
Options Using the GÉANT LightPaths Set up 4 Gigabit Lightpath Between GÉANT PoPs
Collaboration with Dante PCs in Dante PoPs
10 Gigabit SDH backbone Alkatel 1678 MCC Node location:
Budapest Geneva Frankfurt Milan Paris Poznan Prague Vienna
Can do traffic routingso make long rtt paths
Ideal: London Copenhagen
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester21
4 Gigabit GÉANT LightPath
Example of a 4 Gigabit Lightpath Between GÉANT PoPs PCs in Dante PoPs 26 * VC-4s 4180 Mbit/s
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester22
PCs and Current Tests
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester23
Test PCs Have Arrived
Boston/Supermicro X7DBE Two Dual Core Intel Xeon Woodcrest 5130
2 GHz Independent 1.33GHz FSBuses
530 MHz FD Memory (serial)
Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc.
PCI 3 8 lane PCIe buses 3* 133 MHz PCI-X
2 Gigabit Ethernet SATA
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester24
Lab Tests 10 Gigabit Ethernet 10 Gigabit Test Lab being set up in Manchester
Cisco 7600 Cross Campus λ <1ms Server quality PCs Neterion NICs Myricom & Chelsio being purchased
B2B performance so far SuperMicro X6DHE-G2 Kernel (2.6.13) & Driver dependent! One iperf TCP data stream 4 Gbit/s Two bi-directional iperf TCP data streams 3.8 & 2.2 Gbit/s
UDP Disappointing
Propose to install Fedora Core5 Kernel 2.6.17 on the new Intel dual-core PCs
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester25
Any Questions?
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester26
Backup Slides
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester27
Research ActivityResearch Activity
Policy Middleware
Network Resource Mgr
Bandwidth on DemandOur Long-Term Vision
EthernetApplicationse.g. GRID
Ethernet1678 MCC1678 MCC
1678 MCC
Applications
e.g. GRID
BandwidthRequest
BandwidthRequest
UNI-CCommand
GMPLS
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester28
10 Gigabit Ethernet: UDP Throughput
1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s
CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s
SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s
an-al 10GE Xsum 512kbuf MTU16114 27Oct03
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30 35 40Spacing between frames us
Rec
v W
ire
rate
Mb
its/
s
16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester29
10 Gigabit Ethernet: Tuning PCI-X
16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc
Measured times Times based on PCI-X times from
the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s
mmrbc1024 bytes
mmrbc2048 bytes
mmrbc4096 bytes5.7Gbit/s
mmrbc512 bytes
CSR Access
PCI-X Sequence
Data Transfer
Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
DataTAG Xeon 2.2 GHz
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester30
Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s
130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies
in real-time 22 10Gigabit Ethernet waves
Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes
Showed real-time particle event analysis
SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight:
transatlantic HEP disk to diskVLBI streaming
2 10 Gbit Links to SALC:rootd low-latency file access
application for clusters Fibre Channel StorCloud
4 10 Gbit links to FermiDcache data transfers
SLAC-ESnet
FermiLab-HOPI
SLAC-ESnet-USNFNAL-UltraLight
UKLight
SLAC-ESnet
FermiLab-HOPI
SLAC-ESnet-USNFNAL-UltraLight
UKLight
SC2004 101 Gbit/s
In to booth
Out of booth
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester31
SC|05 Seattle-SLAC 10 Gigabit Ethernet 2 Lightpaths:
Routed over ESnet Layer 2 over Ultra Science Net
6 Sun V20Z systems per λ
dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s
Used Neteion NICs & Chelsio TOE Data also sent to StorCloud
using fibre channel links
Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester32
10 Gigabit Ethernet: TCP Data transfer on PCI-X
Sun V20z 1.8GHz to2.6 GHz Dual Opterons
Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes
66 MHz
Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s
Burst of packets length646.8 us
Gap between bursts 343 us 2 Interrupts / burst
CSR Access
Data Transfer
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester33
10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to
2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes
66 MHz One 8000 byte packets
2.8us for CSRs 24.2 us data transfer
effective rate 2.6 Gbit/s
2000 byte packet, wait 0us ~200ms pauses
8000 byte packet, wait 0us ~15ms between data blocks
CSR Access 2.8us
Data Transfer
Recommended