11
TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …

TCP/IP and ATLAS T/DAQ

  • Upload
    yale

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

TCP/IP and ATLAS T/DAQ. With help from: Richard, HansPeter, Bob, & …. TCP sliding window. Unsent Data may be transmitted immediately. Data sent and ACKed. Sent Data buffered waiting ACK. Data to be sent waiting for window to open. Received ACK advances trailing edge. Sending host - PowerPoint PPT Presentation

Citation preview

Page 1: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

TCP/IP and ATLAS T/DAQ

With help from:

Richard, HansPeter, Bob, & …

Page 2: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Micro Introduction to TCP/IP (1) TCP was designed for reliable bit-wise correct data transfer over slow,

unreliable Wide Area Networks Stream orientated – user has to ensure they have ALL the message ! Uses sliding window to control the data flow

Transmit buffer size Available space in the receive buffer Congestion window - cwnd

Unsent Datamay be transmitted immediately

Sent Databuffered waiting ACK

TCP sliding window

Data to be sentwaiting for windowto open

Data sent and ACKed

Sending hostadvances markeras data transmitted

Received ACKadvances trailing edge

Receiver’s advertisedwindow advances leading edge

Page 3: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Micro Introduction to TCP/IP (2) TCP Phases

Slow startcwnd initially 1 then increased by 1 MTU for each ACK received

– exponential growth– Send 1st packet get 1 ack increase cwnd to 2

– Send 2 packets get 2 ACKs inc cwnd to 4

– …

Congestion avoidancecwnd increased by 1 /MTU for each ACK – linear increase in rate

Slow start to Congestion avoidance transition determined by ssthresh

Fast Retransmit & Fast Recovery SACKs

02468

10121416182022

0 2 4 6 8 10Round Trip Times

cwn

d

Page 4: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Micro Introduction to TCP/IP (3) TCP takes packet loss as indication of congestion ! Lost packets detected by 2 methods:

Timeout just don’t get and ACK back. [ Timeout = RTT + 3*σ(rtt) ] 3 duplicate ACKs received by sender

Send / recv 1 2 3 4 5

ACK 1 2 2 2

Re-transmit 3

ACK 5

Action on Packet loss: Timeout

Enter Slow-start – set cwnd to 1 3 duplicate ACKs

Set ssthresh to half cwnd – so enter congestion avoidance phase (Keep sending when get duplicate ACKs )Set cwnd to half original value

Loose 1 packet at 1 Gbit between CERN - US take 27min to recover ! There is a difference between what the protocol says and what the

implementation gives you.

Page 5: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

An ATLAS TDAQ Candidate Architecture. Message Flows:

L2PU request to ROBs. SFI request to ROBs. Super to L2PU.

Low rate ~ 230Hz / L2PU Super to DFM.

Grouped accept+reject from L2 ~ 2 kHz 1Super to 1DMF

DMF to SFI . Low rate ~ 20 Hz /SFI

DFM to ROB. Mcast clears

Page 6: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

What does TCP/IP mean for T/DAQ? Properties of T/DAQ data transfers:

Many logical links are involved Links remain alive for a long time – days! Mainly Request-Response - 1 packet request 1-2 packet response generally No Continuous high rate flows i.e. no streaming

TCP 3 way hand-shake not an important time constraint TCP Slow-start not important

Fragments small – within / close to Slow-Start capability

BW limitation due to congestion avoidance not important Fragments small – halving of cwnd not an issue

Packet loss recovery You can get it out of the box!

Page 7: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Event Building: messages SFI - ROB

Each SFI processes 15 events/s so Repeat every ~ 66 ms on average. Doubled no. EB frames out of the SFI & into ROB - Increased total frames by 3/2

Extra ROB I/O is used ROB has to compute the ACK

Assume lose a SFI request TCP wont timeout and re-try for ~ 35 ms – a long time cf the RTT

Assume lose a ROB Response SFI wont get the ACK so SFI will timeout and re-send the request. ROB wont get its ACK so TCP will think about timing out and re-try. Both ends re-try !!

Assume lose ACK from SFI TCP in the ROB will resend before the next request. ROB resends data you don’t want !! You

have it already

SFI Application TCP TCP ROB ApplicationNetwork

Req.Event

Got Data

Send DataResponse1-2 kbyteIn ~ 100 us

Need to ACK

Req.Event

Need to ACKpiggyback ACK

Page 8: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Event Building: messages DFM – ROB - SFI

Each DMF/SFI processes 15 events/s so Repeat every ~ 66 ms on average. Doubled no. EB frames on Network ROB does extra work

Even more Extra ROB I/O is used ROB has to compute the ACK to send ROB has to compute the ACK received

Assume lose any ACK TCP resends data you don’t want !! You have it already

DFM Application TCP TCP ROB ApplicationNetwork

Req.Event be sent

SFI ApplicationGets Data

Send DataResponse1-2 kbyte

ACK

Req.Event

ACK

Page 9: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Level2: messages L2PU – ROB (1)

Individual L2UP - ROB req. rates: Hi Lumi. Calo 1 ROB Req rate 6kHz L2PU – requests 1 every ~ 62 ms Many other rates

L2PU accesses 20-40 ROBs per event Most cases ACK from ROB will piggyback the response. Many cases TCP will generate an ACK from L2PU to ROB. Like SFI-ROB have Doubled no. EB frames out of the SFI & into ROB

Extra ROB I/O is used Extra ROB CPU to compute the ACK

Assume loose a L2PU request. Just an example ! TCP will re-try: After the ~35 ms timeout / After the next 2 requests to the same ROB Not what you want ! TCP re-try gives a long delay cf the 10 ms processing time of L2PU Blocks all comms between that L2PU and the ROB until the lost packet is received Other worker threads may stall

L2PU ApplicationTCP TCP ROB ApplicationNetworkReq. Event

Got DataSend DataResponse 1-2 kbyteIn ~ 100 us

Need to ACK Req. Event

Need to ACK piggyback ACK

Page 10: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

LAN Tests – Req. every 66ms (used 2.4.19-SMP)

64 byte Req.

1400 byte Response

ACK of Req.Piggyback

ACK of ResponseExtra packet

Page 11: TCP/IP and ATLAS T/DAQ

TCP and ATLAS T/DAQ Dec 2002R. Hughes-Jones Manchester

Well what do we conclude ?

TCP/IP is easy to maintain (but so is UDP/IP) What you get is what is in the box! There is a difference between that the protocol says and what the

implementation gives you. TCP/IP is probably is useful for

Super to L2PU Super to DFM DMF to SFI

TCP/IP (or the implementation) does things behind your back. TCP ACKs will generate extra traffic on already loaded links. TCP does packet loss recovery – good

But sometimes when it has done it you no longer want the data !

TCP does timeouts just like the applications do now for UDP/IP or Raw but much more crudely!

TCP is doing what you are doing anyway but T/DAQ looses fine control of the network transfers and thread operation/timing.

TCP probably will do the job for all cases. But you can also wear a fur coat at sea-

level on the equator !!