FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP...

Preview:

Citation preview

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester1

Broadband Protocols

WP 1.2.1

IP protocols, Lambda switching, multicasting

Richard Hughes-Jones The University of Manchester

www.hep.man.ac.uk/~rich/ then “Talks”

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester2

Protocols Document

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester3

Protocols Document 1

“Protocol Investigation for eVLBI Data Transfer” Document JRA-WP1.2.1.001 Jodrell & Manchester folks with hard work from Matt Completed and on the EXPReS WIKI

Introduces e-VLBI and its Networking Requirements Continuously streamed data Individual packets are not particularly valuable. Maintenance of the data rate is important Quite different to those where bit-wise correct transmission is required

e.g. file transfer Forms a valuable use case for GGF GHPN-RG

Presents the actions required in order to make an informed decision and to implement suitable protocols in the European VLBI Network. Strategy document.

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester4

Protocols Document 2

Protocols considered for investigation include: TCP/IP UDP/IP DCCP/IP VSI-E RTP/UDP/IP Remote Direct Memory Access TCP Offload Engines

Very useful discussions at Haystack VLBI meeting Agreement to make joint tests Haystack-Jodrell Use of ESLEA 1 Gbit transatlantic link

Work in progress – Links to ESLEA UK e-science Vlbi-udp – Simon: UDP/IP stability & the effect of packet loss on

correlations

Tcpdelay – Stephen: TCP/IP and CBR data

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester5

tcpdelay

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester6

tcpdelay: VLBI Application Protocol

Want to examine how TCP moves Constant Bit Rate Data tcpdelay a test program:

instrumented TCP program emulates sending CBR Data.

Records relative 1-way delay Record TCP Stack activity with web100

n bytes

Number of packets

Wait timetime

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester7

VLBI Application Protocol

Data1

●●●

Timestamp1

Time

TCP & Network Receiver

Timestamp2

Sender

Data2Timestamp4

Timestamp5

Data4

Timestamp3

Data3

Packet loss

VLBI data is produced at Constant Bit Rate

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester8

Visualising the Results

When packet loss is detected TCP: Reduces Cwnd Halves the sending rate

Expect a delay in the message arrival time

Message number / Time

Packet lossDelay in stream

Expected arrival time at CBR

Arrival time

Stephen Kershaw

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester9

Arrival Times: UKLight JB-JIVE-Manc

Message size: 1448 Bytes Wait time: 22 us Data Rate: 525 Mbit/s Route:

JB-UKLight-JIVE-UKLight-Man

RTT ~27 ms

TCP buffer 32M bytes

BDP @512Mbit 1.8Mbyte Estimate catchup possible

if loss < 1 in 1.24M

0 1 2 3 4 5 6 7 8 9 10

x 104

5

10

15

20

25

30

35

40

45

50

Message number

Tim

e /

s

Effect of loss rate on message arrival time

Drop 1 in 5k

Drop 1 in 10k

Drop 1 in 20kDrop 1 in 40k

No loss

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester10

TCP Web100: JB-Manc – Large buffer

Message size: 1448 Bytes Wait time: 22 Data Rate: 525 Mbit/s Route:

JB-UKLight-JIVE-UKLight-Man

RTT ~27 ms

Standard TCP TCP buffer 930k Drop 1 in 40,000 packets Classic Cwnd behaviour

Limited by ssthresh ! TCP requires much care!!

0

100000

200000

300000

400000

500000

600000

5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000

time ms

Da

ta B

yte

s O

ut

0

100000

200000

300000

400000

500000

600000

Cw

nd

DataBytesOut (Delta)DataBytesIn (Delta)CurCwnd (Value)

0

50

100

150

200

250

300

350

5000 7000 9000 11000 13000 15000time ms

Nu

m.

Du

p A

CK

s

0

0.5

1

1.5

2

2.5

3

5000 7000 9000 11000 13000 15000time ms

pkt r

e-tran

sm

it

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester11

iBOB

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester12

Prototype iBOB with two sampler boards attached

FPGA based signal processing board from UC Berkeley

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester13

Bryan Anderson

Station

Board

RAM

iBOB10GE

VSI

10GE CX4

systembasedDisk

VSI or headstack

CX4 - fibre media converter

iBOB block diagram

10 Gigabit Ethernet now available UDP/IP module exists Use for Demonstration of FPGA driven

IP networking Link to PC NIC – diagnostics Test over GÉANT Onsala - Jodrell

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester14

What is inside GÉANT2 What is the collaboration interesting?

10 Gigabit Ethernet UDP memory-2-memory flows TCP flows with allocated Bandwidth

Options using GÉANT Development Network 10 Gbit SDH Network

Options Using the GÉANT LightPath Service PoP Location for Network tests

Multi-Gigabit Trials on GEANT

Collaboration with Dante.

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester15

GÉANT2 Topology

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester16

GÉANT2: The Convergence Solution

NREN AccessNREN Access

ExistingIP Router

ExistingIP Router

GÉANT2POP B

GÉANT2POP A

Managed Lambda’s

1626 LM

1626 LM

L2Matrix

L2

TDM Matrix

TDM

1678 MCC

1678 MCCDar

k F

iber

EXPReS PC10 GE

EXPReS PC10 GE

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester17

From PoS to EthernetConnect. Communicate. Collaborate

•More Economical Architecture

•Highest Overall Network Availability

•Flexibility (VLAN management)

•Highest Network Performance (Latency)

Router

IP Links

1/10 Gigabit Ethernet

VC-4-nv Channels

L2Matrix

TDM Matrix

1678 MCCTransport Node

VLANs

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester18

What do we want to do?

Set up 4 Gigabit Lightpath Between GÉANT PoPs Collaboration with Dante PCs in their PoPs with 10 Gigabit NICs

VLBI Tests: UDP Performance

Throughput, jitter, packet loss, 1-way delay, stability Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP performance with current kernels Experience for FPGA Ethernet packet systems

Dante Interests: multi-Gigabit TCP performance The effect of (Alcatel) buffer size on bursty TCP when using BW limited

Lightpaths

Need A Collaboration Agreement

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester19

Options Using the GÉANT Development Network

10 Gigabit SDH backbone Alkatel 1678 MCC Node location:

London Amsterdam Paris Prague Frankfurt

Can do traffic routingso make long rtt paths

Available Dec/Jan 07 Less Pressure for

long term tests

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester20

Options Using the GÉANT LightPaths Set up 4 Gigabit Lightpath Between GÉANT PoPs

Collaboration with Dante PCs in Dante PoPs

10 Gigabit SDH backbone Alkatel 1678 MCC Node location:

Budapest Geneva Frankfurt Milan Paris Poznan Prague Vienna

Can do traffic routingso make long rtt paths

Ideal: London Copenhagen

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester21

4 Gigabit GÉANT LightPath

Example of a 4 Gigabit Lightpath Between GÉANT PoPs PCs in Dante PoPs 26 * VC-4s 4180 Mbit/s

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester22

PCs and Current Tests

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester23

Test PCs Have Arrived

Boston/Supermicro X7DBE Two Dual Core Intel Xeon Woodcrest 5130

2 GHz Independent 1.33GHz FSBuses

530 MHz FD Memory (serial)

Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc.

PCI 3 8 lane PCIe buses 3* 133 MHz PCI-X

2 Gigabit Ethernet SATA

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester24

Lab Tests 10 Gigabit Ethernet 10 Gigabit Test Lab being set up in Manchester

Cisco 7600 Cross Campus λ <1ms Server quality PCs Neterion NICs Myricom & Chelsio being purchased

B2B performance so far SuperMicro X6DHE-G2 Kernel (2.6.13) & Driver dependent! One iperf TCP data stream 4 Gbit/s Two bi-directional iperf TCP data streams 3.8 & 2.2 Gbit/s

UDP Disappointing

Propose to install Fedora Core5 Kernel 2.6.17 on the new Intel dual-core PCs

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester25

Any Questions?

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester26

Backup Slides

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester27

Research ActivityResearch Activity

Policy Middleware

Network Resource Mgr

Bandwidth on DemandOur Long-Term Vision

EthernetApplicationse.g. GRID

Ethernet1678 MCC1678 MCC

1678 MCC

Applications

e.g. GRID

BandwidthRequest

BandwidthRequest

UNI-CCommand

GMPLS

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester28

10 Gigabit Ethernet: UDP Throughput

1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire

rate

Mb

its/

s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester29

10 Gigabit Ethernet: Tuning PCI-X

16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times from

the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

DataTAG Xeon 2.2 GHz

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester30

Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s

130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies

in real-time 22 10Gigabit Ethernet waves

Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes

Showed real-time particle event analysis

SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight:

transatlantic HEP disk to diskVLBI streaming

2 10 Gbit Links to SALC:rootd low-latency file access

application for clusters Fibre Channel StorCloud

4 10 Gbit links to FermiDcache data transfers

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

SC2004 101 Gbit/s

In to booth

Out of booth

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester31

SC|05 Seattle-SLAC 10 Gigabit Ethernet 2 Lightpaths:

Routed over ESnet Layer 2 over Ultra Science Net

6 Sun V20Z systems per λ

dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s

Used Neteion NICs & Chelsio TOE Data also sent to StorCloud

using fibre channel links

Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester32

10 Gigabit Ethernet: TCP Data transfer on PCI-X

Sun V20z 1.8GHz to2.6 GHz Dual Opterons

Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes

66 MHz

Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s

Burst of packets length646.8 us

Gap between bursts 343 us 2 Interrupts / burst

CSR Access

Data Transfer

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester33

10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to

2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes

66 MHz One 8000 byte packets

2.8us for CSRs 24.2 us data transfer

effective rate 2.6 Gbit/s

2000 byte packet, wait 0us ~200ms pauses

8000 byte packet, wait 0us ~15ms between data blocks

CSR Access 2.8us

Data Transfer

Recommended