Listen and Whisper: How to verify BGP route updates? Lakshmi Joint work with: Volker Roth, Ion...

Listen and Whisper: How to verify BGP route updates?

Lakshmi

Joint work with:

Volker Roth, Ion Stoica, Scott Shenker, Randy Katz

A short BGP primer

• The Internet is composed of 14000 autonomous systems(AS’s)

• AS’s exchange route advertisements using BGP.

• Features of BGP– Path vector protocol– Uses local preference and hop-count as the

distance metric– Supports policy routing

Route Verification problem?• BGP assumes that the routes advertised by

neighboring nodes are correct• What if this assumption is violated?

– An AS propagates spurious routes to a neighbor!

• Potential Causes– Accidental router mis-configurations

– Malicious behavior

• What are the effects?– Drop packets and render a destination unreachable

– Eavesdrop the traffic to a given destination

– Impersonate the destination

Why bother?

• Router mis-configurations are a common occurrence [Mahajan02]– Two major outages in April 1997, 2001.

• Router break-ins also occur regularly [Rob Thomas]– Many routers have open telnet interfaces

• “Evil” effects of a compromised node– Impersonation of an online banking system– Blackhole attack on root DNS servers

Causes and Effects

Effect Accidental Malicious

Blackhole

Eavesdrop

Impersonate

Implication: Accidental problems can be potentially detected in the data plane

Goals and Assumptions• Goal: Verify the correctness of BGP route

updates– Minimize the harmful effects of spurious

updates– Incrementally deployable, lightweight– Minimal modifications to BGP

• Assumptions– No PKI or any key distribution – Shared keys allowed across peering links– No dependence on a central authority (like

ICANN)

“Listen”: Addressing routermisconfigurations

Data plane vs Control Plane• Router misconfigurations occur every day

– Previous solutions mostly deal with control plane– Few of them impact reachability [Mahajan02]– Some of them can cause serious outages lasting

hours (April 97, April 01, Sept 02)

• Need a data plane component– Fast detection of reachability problems of

popular prefixes– Stale routes: control plane is correct but data

plane is not• UUNet not forwarding route advertisements

Listen: Passive TCP-Probing

• A router passively observes a TCP flow for SYN and DATA packets– If so, the ACK has been received by sender =>

Route to destination is verifiable

• Does not work for malicious nodes– Malicious nodes can send ACKs for SYN,

DATA packets

• Advantages– No modifications to BGP– Lightweight

What about port scanners?• Port scanners may generate either merely SYN or

SYN+DATA packets.• Case 1: SYN+DATA

– Active drop: Randomly drop a DATA packet and check for retransmissions.

– Retransmit check: Check for number of retransmitted packets in a window.

– Alternative: Delay packets at routers

• Case 2 :only SYNs– Step 1: Try other alternative routes

– Step 2: If no other source generates genuine TCP connections, the prefix is either unused or unreachable.

Results: Data from Tier-1 ISP• Reachability problems for popular prefixes

detectable varies between 15 sec- 1 minute– Only 700 prefixes are popular

• How many routes are verifiable?– Typical routing table has 130-140K entries of which

only 10K are active within a period of one hour

– 3K over periods of 5 minutes

• Frequency of route changes?– 99% of the routes are stable for >1 hour

– Need to verify only few flows every hour

– Specific prefixes are extremely unstable

Local Testbed Results

Number of Machines 28

Probing Period 40 days

Number of Prefixes 11141 (9% of Table)

Verifiable Prefixes 9711

Prefixes with incomplete connections

Perennial problems 42

Number of Failed Conn 15321 (3433 unique)

Detected Problems (verified using Active probing)

• Specific Examples– Two local outages lasting more than one hour

– 207.126.224.0/20 (Yahoo NET) observed regular problems

• Routing loops (detected using traceroute)– 51 different prefixes

• One prefix is perenially down 193.148.15.0/24

• Forwarding problem: No entries in routing table – 64 different prefixes

• Generic routing problems– 87 different prefixes

False Negatives• Outbound connections

– 63.5% are false negatives– Primary sources:

• Server not responding to HTTP connections• buggy BGP daemon script

• Inbound connections– 91.83% are false negatives– Primary sources:

• NetBIOS worm• Port-80 scanners• SQL Server vulbnerability on port 1433

Listen: Summary

• Strengths – Popular prefixes can be detected within a short

period of time– Low overhead– Non-popular prefixes can be verified with a

higher false positive ratio

• Limitations– False negatives do occur often due to worms

• Need to be conservative in determining when routes are not verifiable

“Whisper”: Containment and Isolation of Malicious Nodes

Reality…• Data plane solutions do not work!

– Malicious nodes can always impersonate behavior of genuine nodes

• Triggering Alarms vs Identification– Without authentication, a node cannot

distinguish between malicious and genuine speakers!

• Our Goals:– Detect route inconsistencies– Containment: A malicious node should not

harm more than a few set of nodes.

What do we mean?• Route Consistency Test: A router compares two

routes R and S to a destination D:– If R and S are genuine routes, they should be consistent– If R is genuine and S is spurious, they should be

inconsistent– If R and S are both spurious, they may be either

consistent or inconsistent

• What does route consistency check give?– Trigger alarm if any node generate spurious update.

• What does containment mean?– A malicious node should not have the capability to affect

more than a few destinations– A malicious node attempting to cause widespread damage

should be detected and isolated

Consistency test requirements

• Property 1: Malicious node should not be able to invent “spurious” advertisements that are also consistent.

• Property 2: A route advertisement modified by a malicious node should be inconsistent with genuine routes.

From Consistency to Containment

If Verifier V notices multiple spurious routes from M, V can avoid routes through M.

How to check for consistency?

Example Problem

Solution: A uses nonce x.

What about this?

Using hash chains

Secret=x

h(h(h(x)))h(h(x))

h(h(x))

End-result: A malicious node N hops away from source S can generate a spurious route of path length=N If malicious node generates shorter path, hash values will not match. Which route is incorrect is unknown?

Embed path in hash-chains?

h1=h(x,S)

h3=h(h2,B)h2=h(h1,A)

h4=h(h1,U)

End-result: One malicious node cannot lie However, two colluding malicious nodes can fake a link

Secret=xD

SA B C

Implementing in BGP• Use Community attributes

– Require two signature attributes• Seed value for the hash (512-bit,1024-bit or 2048-bit)

• Hash Signature (512-bit, 1024-bit or 2048-bit)

– Each Community attribute uses 32 bits

– Split each Signature attribute between multiple community attributes

• Our Implementation:– Hash library uses RSA-like signatures built on top of

the OpenSSL library

– Whisper library integrated with Zebra version 0.93b bgpd implementation

Effect of Simple Hashing

RSA-based Hashing

• Single Malicious Nodes can have no effect

Cost of RSA-based operations

512-bit 1024-bit 2048-bit

VerifySign 0.18 msec 0.45 msec 1.42 msec

UpdateSign 0.25 msec 0.6 msec 1.94 msec

GenSign 0.4 sec 8.0 sec 68 sec

• For 1024-bit keys, process rate >100,000 adv/minute• BGP maximum update rate is 9300 adv/min (avg=130)

Conclusions

• We identified 2 causes for spurious route advertisements– Mis-configurations, malicious behavior

• Harmful effects– Blackhole, impersonation, eavesdrop

• Remedies– Mis-configurations: TCP probing– Malicious behavior: Whisper protocols with

penalty functions

Thanks. Questions?

Backup slides

Vulnerability metric

Affected Node Malicious Unaffected

affect(D,M) = # affected / #nodes How much harm can one malicious node do?

Compute the distribution of affect(D,M) over all D

Avoid Detection (Path embedded)

Graph Containment Problem

Satellite

Problem: A malicious node in a satellite should not be able to affect good nodes in other satellites.

Model: A graph with a core and multiple satellites

What if hashes mismatch?

If the hashes of routes R and S do not match: penalize both R and S Penalty (route R)

For every vertex “x” in R (inclusive of end-points) Increment penalty m(x) by 1

Problems with simple penalty

Malicious Appear Malicious First probe

A malicious node can make many other nodes appear to be malicious

Penalize sub-path

Identify sub-paths where loop-tests cannot be performed Penalize sub-paths alone (e.g. R and S)

Renormalize penalties

A B CD

For P, it is hard to differentiate between A, B and C as to which node is malicious. However, P can deduce that D, E may not be malicious

A B CD

Effect of Single Malicious Node

Effect of Mis-configurations

Route Consistency Test

DMM M M?

Split Whisper Loop Whisper

Is X=Y?

Avoid Detection - Weak Split

Listen and Whisper: How to verify BGP route updates? Lakshmi Joint work with: Volker Roth, Ion...

Documents

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica

Midterm Review EE 122, Fall 2013 Sylvia Ratnasamy ee122/ Material thanks to Ion Stoica, Scott Shenker, Jennifer Rexford,

The Transport Layer CS168, Fall 2014 Scott Shenker (understudy to Sylvia Ratnasamy) ee122/ Material thanks to Ion Stoica,

Interdomain Routing EE 122, Fall 2013 Sylvia Ratnasamy ee122/ Material thanks to Ion Stoica, Scott Shenker, Jennifer Rexford,

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica

Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

1 Reliable Transport: The Prequel EE122 Fall 2012 Scott Shenker ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica,

Final Review EE 122, Fall 2013 Sylvia Ratnasamy ee122/ Material thanks to Ion Stoica, Scott Shenker, Jennifer Rexford, Nick

DNS and the Web EE 122, Fall 2013 Sylvia Ratnasamy ee122/ Material thanks to Ion Stoica, Scott Shenker, Jennifer Rexford,

BGP EE 122, Fall 2013 Sylvia Ratnasamy ee122/ Material thanks to Ion Stoica, Scott Shenker, Jennifer Rexford, and many other

UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica

1 The Fundamentals of Routing EE122 Fall 2011 Scott Shenker ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica,

CS 268: Project Suggestions Scott Shenker and Ion Stoica (Fall, 2010) 1

Internet Indirection Infrastructure (i3) Ion Stoica Daniel Adkins Shelley Zhuang Scott Shenker Sonesh Surana (Published in SIGCOMM 2002) URL:

Geographic Routing without Location Information Ananth Rao, Sylvia Ratnasamy, Christos Papadimitriou, Scott Shenker and Ion Stoica MobiCom 2003

CS 268: Computing Networking Scott Shenker and Ion Stoica (Fall, 2010) 1

1 Midterm Review EE122 Fall 2011 Scott Shenker ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxson

TCP EE 122, Fall 2013 Sylvia Ratnasamy ee122/ Material thanks to Ion Stoica, Scott Shenker, Jennifer Rexford, Nick McKeown,

1 Transport and TCP EE122 Fall 2011 Scott Shenker ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxson

Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark