41
Listen and Whisper: How to verify BGP route updates? Lakshmi Joint work with: Volker Roth, Ion Stoica, Scott Shenker, Randy Katz

Listen and Whisper: How to verify BGP route updates? Lakshmi Joint work with: Volker Roth, Ion Stoica, Scott Shenker, Randy Katz

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Listen and Whisper: How to verify BGP route updates?

Lakshmi

Joint work with:

Volker Roth, Ion Stoica, Scott Shenker, Randy Katz

A short BGP primer

• The Internet is composed of 14000 autonomous systems(AS’s)

• AS’s exchange route advertisements using BGP.

• Features of BGP– Path vector protocol– Uses local preference and hop-count as the

distance metric– Supports policy routing

Route Verification problem?• BGP assumes that the routes advertised by

neighboring nodes are correct• What if this assumption is violated?

– An AS propagates spurious routes to a neighbor!

• Potential Causes– Accidental router mis-configurations

– Malicious behavior

• What are the effects?– Drop packets and render a destination unreachable

– Eavesdrop the traffic to a given destination

– Impersonate the destination

Why bother?

• Router mis-configurations are a common occurrence [Mahajan02]– Two major outages in April 1997, 2001.

• Router break-ins also occur regularly [Rob Thomas]– Many routers have open telnet interfaces

• “Evil” effects of a compromised node– Impersonation of an online banking system– Blackhole attack on root DNS servers

Causes and Effects

Cause

Effect Accidental Malicious

Blackhole

Eavesdrop

Impersonate

Implication: Accidental problems can be potentially detected in the data plane

Goals and Assumptions• Goal: Verify the correctness of BGP route

updates– Minimize the harmful effects of spurious

updates– Incrementally deployable, lightweight– Minimal modifications to BGP

• Assumptions– No PKI or any key distribution – Shared keys allowed across peering links– No dependence on a central authority (like

ICANN)

“Listen”: Addressing routermisconfigurations

Data plane vs Control Plane• Router misconfigurations occur every day

– Previous solutions mostly deal with control plane– Few of them impact reachability [Mahajan02]– Some of them can cause serious outages lasting

hours (April 97, April 01, Sept 02)

• Need a data plane component– Fast detection of reachability problems of

popular prefixes– Stale routes: control plane is correct but data

plane is not• UUNet not forwarding route advertisements

Listen: Passive TCP-Probing

• A router passively observes a TCP flow for SYN and DATA packets– If so, the ACK has been received by sender =>

Route to destination is verifiable

• Does not work for malicious nodes– Malicious nodes can send ACKs for SYN,

DATA packets

• Advantages– No modifications to BGP– Lightweight

What about port scanners?• Port scanners may generate either merely SYN or

SYN+DATA packets.• Case 1: SYN+DATA

– Active drop: Randomly drop a DATA packet and check for retransmissions.

– Retransmit check: Check for number of retransmitted packets in a window.

– Alternative: Delay packets at routers

• Case 2 :only SYNs– Step 1: Try other alternative routes

– Step 2: If no other source generates genuine TCP connections, the prefix is either unused or unreachable.

Results: Data from Tier-1 ISP• Reachability problems for popular prefixes

detectable varies between 15 sec- 1 minute– Only 700 prefixes are popular

• How many routes are verifiable?– Typical routing table has 130-140K entries of which

only 10K are active within a period of one hour

– 3K over periods of 5 minutes

• Frequency of route changes?– 99% of the routes are stable for >1 hour

– Need to verify only few flows every hour

– Specific prefixes are extremely unstable

Local Testbed Results

Number of Machines 28

Probing Period 40 days

Number of Prefixes 11141 (9% of Table)

Verifiable Prefixes 9711

Prefixes with incomplete connections

1460

Perennial problems 42

Number of Failed Conn 15321 (3433 unique)

Detected Problems (verified using Active probing)

• Specific Examples– Two local outages lasting more than one hour

– 207.126.224.0/20 (Yahoo NET) observed regular problems

• Routing loops (detected using traceroute)– 51 different prefixes

• One prefix is perenially down 193.148.15.0/24

• Forwarding problem: No entries in routing table – 64 different prefixes

• Generic routing problems– 87 different prefixes

False Negatives• Outbound connections

– 63.5% are false negatives– Primary sources:

• Server not responding to HTTP connections• buggy BGP daemon script

• Inbound connections– 91.83% are false negatives– Primary sources:

• NetBIOS worm• Port-80 scanners• SQL Server vulbnerability on port 1433

Listen: Summary

• Strengths – Popular prefixes can be detected within a short

period of time– Low overhead– Non-popular prefixes can be verified with a

higher false positive ratio

• Limitations– False negatives do occur often due to worms

• Need to be conservative in determining when routes are not verifiable

“Whisper”: Containment and Isolation of Malicious Nodes

Reality…• Data plane solutions do not work!

– Malicious nodes can always impersonate behavior of genuine nodes

• Triggering Alarms vs Identification– Without authentication, a node cannot

distinguish between malicious and genuine speakers!

• Our Goals:– Detect route inconsistencies– Containment: A malicious node should not

harm more than a few set of nodes.

What do we mean?• Route Consistency Test: A router compares two

routes R and S to a destination D:– If R and S are genuine routes, they should be consistent– If R is genuine and S is spurious, they should be

inconsistent– If R and S are both spurious, they may be either

consistent or inconsistent

• What does route consistency check give?– Trigger alarm if any node generate spurious update.

• What does containment mean?– A malicious node should not have the capability to affect

more than a few destinations– A malicious node attempting to cause widespread damage

should be detected and isolated

Consistency test requirements

• Property 1: Malicious node should not be able to invent “spurious” advertisements that are also consistent.

• Property 2: A route advertisement modified by a malicious node should be inconsistent with genuine routes.

From Consistency to Containment

V

BA

FED

C

M

A,B,C

If Verifier V notices multiple spurious routes from M, V can avoid routes through M.

How to check for consistency?

A

B

M

C

A

A?

Example Problem

A

B

M

C

A,x

A,y

x

x=y?

Solution: A uses nonce x.

A

B

M

C

A,x

A,x

x

What about this?

Using hash chains

Secret=x

h(x)

h(x)

h(h(h(x)))h(h(x))

h(h(x))

DS

End-result: A malicious node N hops away from source S can generate a spurious route of path length=N If malicious node generates shorter path, hash values will not match. Which route is incorrect is unknown?

Embed path in hash-chains?

h1=h(x,S)

h1=h(x,S)

h3=h(h2,B)h2=h(h1,A)

h4=h(h1,U)

End-result: One malicious node cannot lie However, two colluding malicious nodes can fake a link

Secret=xD

SA B C

U V

Implementing in BGP• Use Community attributes

– Require two signature attributes• Seed value for the hash (512-bit,1024-bit or 2048-bit)

• Hash Signature (512-bit, 1024-bit or 2048-bit)

– Each Community attribute uses 32 bits

– Split each Signature attribute between multiple community attributes

• Our Implementation:– Hash library uses RSA-like signatures built on top of

the OpenSSL library

– Whisper library integrated with Zebra version 0.93b bgpd implementation

Effect of Simple Hashing

RSA-based Hashing

• Single Malicious Nodes can have no effect

Cost of RSA-based operations

512-bit 1024-bit 2048-bit

VerifySign 0.18 msec 0.45 msec 1.42 msec

UpdateSign 0.25 msec 0.6 msec 1.94 msec

GenSign 0.4 sec 8.0 sec 68 sec

• For 1024-bit keys, process rate >100,000 adv/minute• BGP maximum update rate is 9300 adv/min (avg=130)

Conclusions

• We identified 2 causes for spurious route advertisements– Mis-configurations, malicious behavior

• Harmful effects– Blackhole, impersonation, eavesdrop

• Remedies– Mis-configurations: TCP probing– Malicious behavior: Whisper protocols with

penalty functions

Thanks. Questions?

Backup slides

Vulnerability metric

M

Ds

r

BA C

Affected Node Malicious Unaffected

affect(D,M) = # affected / #nodes How much harm can one malicious node do?

Compute the distribution of affect(D,M) over all D

Avoid Detection (Path embedded)

Graph Containment Problem

M GG

G

Core

Satellite

Problem: A malicious node in a satellite should not be able to affect good nodes in other satellites.

Model: A graph with a core and multiple satellites

What if hashes mismatch?

R S

a b

v

d

If the hashes of routes R and S do not match: penalize both R and S Penalty (route R)

For every vertex “x” in R (inclusive of end-points) Increment penalty m(x) by 1

Problems with simple penalty

Malicious Appear Malicious First probe

A malicious node can make many other nodes appear to be malicious

Penalize sub-path

R S

P Q

A

Identify sub-paths where loop-tests cannot be performed Penalize sub-paths alone (e.g. R and S)

Renormalize penalties

A B CD

E

For P, it is hard to differentiate between A, B and C as to which node is malicious. However, P can deduce that D, E may not be malicious

A B CD

E

P

P

Effect of Single Malicious Node

Effect of Mis-configurations

Route Consistency Test

A

C

B

E

F

D

A

C

B

E

F

DMM M M?

X Y

Split Whisper Loop Whisper

Is X=Y?

Avoid Detection - Weak Split