TCP CONGESTION SIGNATURES - SIGCOMMTCP’s RTT Congestion Signatures • Flows experiencing...

w w w .caida.org

TCP CONGESTION SIGNATURES

Srikanth Sundaresan (Princeton Univ.)Amogh Dhamdhere (CAIDA/UCSD)

kc Claffy (CAIDA/UCSD)Mark Allman (ICSI)

w w w .caida.or

Typical Speed Tests Don’t Tell Us Much

w w w .caida.or

• Upload and download throughput measurements: no information beyond that

w w w .caida.or

What type of congestion did the TCP flow experience?

w w w .caida.or

Two Potential Sources of Congestion in the End-to-end Path

w w w .caida.or

• Self-induced congestion

- Clear path, the flow itself induced congestion

- eg: last-mile access link

w w w .caida.or

• External congestion

- Flow starts on an already congested path

- eg: congested interconnect

w w w .caida.or

• External congestion

- Flow starts on an already congested path

- eg: congested interconnect

Distinguishing the two cases has implications for users / ISPs / regulators

w w w .caida.or

How can we distinguish the two?

• Cannot distinguish using just throughput numbers

- Access plan rates vary widely, and are typically not available to content / speed test providers

- eg: Speed test reports 5 Mbps – is that the access link rate (DSL), or a congested path?

w w w .caida.or

How can we distinguish the two?

• Cannot distinguish using just throughput numbers

- Access plan rates vary widely, and are typically not available to content / speed test providers

- eg: Speed test reports 5 Mbps – is that the access link rate (DSL), or a congested path?

We can use the dynamics of TCP’s startup phase, i.e., Congestion Signatures

w w w .caida.or

TCP’s RTT Congestion Signatures

w w w .caida.or

• Flows experiencing self-induced congestion fill up an empty buffer during slow start

- Hence increase the TCP flow RTT

w w w .caida.or

• Externally congested flows encounter an already full buffer

- Less potential for RTT increases

w w w .caida.or

• Self-induced congestion therefore has higher RTT variance compared to external congestion

w w w .caida.or

• Self-induced congestion therefore has higher RTT variance compared to external congestion

We can quantify this using Max-Min and CoV of RTT

w w w .caida.or

Example Controlled Experiment• 20 Mbps “access” link

with 100 ms buffer

• 1 Gbps “interconnect” link with 50 ms buffer

• Self-induced congestion flows have higher values for both metrics and are clearly distinguishable

Max-Min RTT

CoV RTT

101 1020.0

ExternalSelf

10−2 10−1 1000.0

ExternalSelf

w w w .caida.or

Example Controlled Experiment• 20 Mbps “access” link

with 100 ms buffer

• 1 Gbps “interconnect” link with 50 ms buffer

• Self-induced congestion flows have higher values for both metrics and are clearly distinguishable

The two types of congestion exhibit widely contrasting behaviors

Max-Min RTT

CoV RTT

101 1020.0

ExternalSelf

10−2 10−1 1000.0

ExternalSelf

w w w .caida.or

• Max-min and CoV of RTT derived from RTT samples during slow start

• We feed the two metrics into a simple Decision Tree

- We control the depth of the tree to a low value to minimize complexity

• We build the decision tree classifier using controlled experiments and apply it to real-world data

w w w .caida.or

Validating the Method: Step 1- Controlled Experiments

Internet

Server 1Server 2

Server 3

Server 4

100 Mbps

Shaped “access”

1 Gbps

w w w .caida.or

Internet

Server 1Server 2

Server 3

Server 4

100 Mbps

Shaped “access”

1 Gbps

Background cross-traffic

Interconnectcross-traffic

w w w .caida.or

Internet

Server 1Server 2

Server 3

Server 4

100 Mbps

Shaped “access”

1 Gbps

Throughput tests

w w w .caida.or

It’s Real

w w w .caida.or

It’s Real

FantasticCablingeffort

Post-itdefined

networking

w w w .caida.or

• Emulated access link + “core” link

- Wide range of access link throughputs, buffer sizes, loss rates, cross-traffic (background and congestion-inducing)

- Can accurately label flows in training data as “self ” or “externally” congested

Internet

Server 1Server 2

Server 3

Server 4

100 Mbps

Shaped “access”

1 Gbps

Throughput tests

w w w .caida.or

Internet

Server 1Server 2

Server 3

Server 4

100 Mbps

Shaped “access”

1 Gbps

Throughput tests

High accuracy: precision and recall > 80%robust to model settings

w w w .caida.or

Validating the Method: Step 2

• From Ark VP in ISP A identified congested link with ISP B using TSLP*

12*Luckie et al. “Challenges in Inferring Internet Interdomain Congestion”, IMC 2014

Ark VP

w w w .caida.or

• From Ark VP in ISP A identified congested link with ISP B using TSLP*

12*Luckie et al. “Challenges in Inferring Internet Interdomain Congestion”, IMC 2014

congested link

Ark VP

w w w .caida.or

M-lab NDT server

congested link

Ark VP

• Periodic NDT tests from Ark VP to M-Lab NDT server “behind” the congested interdomain link

w w w .caida.or

Validation of the Method: Step 2

10 15 20 25 30

02/18 02/25 03/04 03/11

10 20 30 40 50 60 70

02/18 02/25 03/04 03/11TSLP

(far s

Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency

labeled as “externally” congested

w w w .caida.or

10 15 20 25 30

02/18 02/25 03/04 03/11

10 20 30 40 50 60 70

02/18 02/25 03/04 03/11TSLP

(far s

“Externally”congested

w w w .caida.or

10 15 20 25 30

02/18 02/25 03/04 03/11

10 20 30 40 50 60 70

02/18 02/25 03/04 03/11TSLP

(far s

“Externally”congested

“self”congested

w w w .caida.or

10 15 20 25 30

02/18 02/25 03/04 03/11

10 20 30 40 50 60 70

02/18 02/25 03/04 03/11TSLP

(far s

75%+ accuracy in detecting external congestion, 100% accuracy for self-induced congestion

w w w .caida.or

• We use Measurement Lab’s NDT test data for real-world validation

• Cogent interconnect issue in late 2013/early 2014

- NDT tests to Cogent servers saw significant drops in throughput during peak hours

- Several major U.S. ISPs were affected, except Cox

- The problem was identified as congested interconnects

w w w .caida.or

Using the M-lab Data

January 2014

April 2014

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

w w w .caida.or

January 2014

Drop in peak-hour throughput for for Comcast, TWC, Verizon

April 2014

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

w w w .caida.or

January 2014

Cox not affected

April 2014

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

w w w .caida.or

January 2014

Cox not affected

April 2014

Interconnection dispute resolved; no diurnal effect

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

w w w .caida.or

Peak hour tests inJan/Feb 2014 are likely “externally” congested

Off-peak tests in Mar/Apr 2014 are likely

“self” congested

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

0 5 10 15 20

Hour of day (local)

ComcastCox

TimeWarnerVerizon

w w w .caida.or

But didn’t you just say it’s hard to infer congestion using throughput tests??

w w w .caida.or

• Yes :)

w w w .caida.or

• Yes :)

• For that reason, our labeling is broad and coarse. All tests labeled “external” may not be traversing congested interconnects

w w w .caida.or

• Yes :)

• We do not expect the technique to identify all peak hour tests as externally congested, and vice versa

- Looking for qualitative differences

w w w .caida.or

• Yes :)

• We do not expect the technique to identify all peak hour tests as externally congested, and vice versa

- Looking for qualitative differences

• The general observations about congestion were verified by other sources, e.g., CAIDA’s TSLP measurements

w w w .caida.or

Applying the Model to M-lab data

Comcast

TimeWarnerVerizon Cox

Comcast

Cogent (LAX) Cogent (LGA) Level3 (ATL)

Jan-Feb Mar-Apr

w w w .caida.or

Comcast

Jan-Feb Mar-Apr

Much lower incidences of self-induced congestion for Cogent in Jan/Feb 2014 as

compared to Mar/Apr

w w w .caida.or

Comcast

Jan-Feb Mar-Apr

Level3 does not show significant differences, was not affected by

interconnection disputes

w w w .caida.or

Comcast

Jan-Feb Mar-Apr

Cox does not show significant differences, was not affected by interconnection disputes

w w w .caida.or

Looking at Throughput

• What throughput should we observe for “self ” and “external” congested flows?

• With congested interconnects affecting many flows, both “self ” and “external” should see similar throughput

• Without congested interconnects affecting many flows, “self ” congested throughput should follow access link speeds, generally higher than “externally” congested

w w w .caida.or

• Avg. throughput of self-induced congestion flows significantly higher than externally congested in Mar-Apr (no interconnection disputes)

Comcast TimeWarner Verizon Cox0

Jan-Feb SelfJan-Feb External

Mar-Apr SelfMar-Apr External

w w w .caida.or

Both “self”and

“external”get similarthroughput

w w w .caida.or

“Self” gethigher

throughputthan

“external”

w w w .caida.or

Takeaways

• It is possible to distinguish two kinds of congestion: self-induced vs. externally congested

• The difference is important to identify the solution

- Upgrade service plan? Or talk to ISP?

- Also for regulatory purposes

• Simple, accurate technique using RTT during TCP slow start dynamics

- Can be easily computed using packet captures or other tools such as Web100 (future work)

w w w .caida.or

Limitations

• Relies on buffering effect

- May not work on TCP variants that minimize buffer occupancy, e.g., BBR

• Only uses slow start dynamics

- Might be confounded by flows that perform one way during slow start but differently afterward

• Real-world validation relies on coarsely labeled data

- It would be great to validate on more real-world data!

w w w .caida.or

Thanks!Questions?

TCP CONGESTION SIGNATURES - SIGCOMMTCP’s RTT Congestion Signatures • Flows experiencing...

Documents

Tcp congestion control (1)

Chapter 6 TCP Congestion Control

TCP Congestion Control

EE 122 :TCP Congestion Control

Improving TCP Congestion Control with Machine Intelligence › ... › slides › netai › (pm0435)TCP-n… · Other TCP congestion control schemes • Mechanism-driven instead of

TCP Congestion Control - Simon Fraser Universityljilja/ENSC835/Spring08/Projects/najiminaini/... · 2009-02-06 · Various TCP Congestion Control Algorithms TCP Tahoe (1988, FreeBSD

Internet congestion control: TCP

TCP TCP Congestion Control & Quality of Service

Tcp congestion avoidance algorithm identification

Congestion Control in TCPpages.cs.wisc.edu/~suman/courses/640/s18/tcp-congcontrol.pdf · TCP Congestion Control • The idea of TCP congestion control is for each source to determine

TCP Congestion Control - KAIST

Computer Networks : TCP Congestion Control1 TCP Congestion Control

TCP: Congestion Control (part II)

Sliding window+congestion controlcongestion_control.pdf · Summary: TCP Congestion Control (Reno)TCP Congestion Control (Reno) When CongWin is below Threshold, sender in slow-start

Different TCP Flavors CSCI 780, Fall 2005. TCP Congestion Control Slow-start Congestion Avoidance Congestion Recovery Tahoe, Reno, New-Reno SACK

Recall from last time TCP congestion controlcs.wellesley.edu/~cs242/lectures/12_tcp_congestion_handouts.pdf · TCP congestion control 12-7 TCP congestion control 12-8 Slow start o

TCP and congestion control

TCP Congestion Control

Discussion 6: TCP, Congestion Control

TCP TCP flow control, congestion avoidance · TCP flow control, congestion avoidance 2 Übersicht • Einleitung • Sliding Window • Delayed Acknowledgements • Nagle Algorithm