Free Riding Multicast Sylvia Ratnasamy (Intel Research) Andrey Ermolinskiy (U.C. Berkeley) Scott...

Preview:

Citation preview

Free Riding Multicast

Sylvia Ratnasamy (Intel Research)

Andrey Ermolinskiy (U.C. Berkeley)

Scott Shenker (U.C. Berkeley and ICSI)

ACM SIGCOMM 2006

Berkeley SysLunch (10/10/06)

Talk Outline Introduction

Overview of the IP Multicast service model Challenges of Multicast routing

Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation

Talk Outline Introduction

Overview of the IP Multicast service model Challenges of Multicast routing

Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation

Internet Routing – a High-Level View

Routing protocols (BGP, OSPF) establish forwarding state in routers

C3C3

C1C1

C4C4

C2C2

Each routable entity is assigned an IP address

Internet is a packet-switched network

C1: Send(Packet, C2Addr);

Routers forward packets towards their recipients

Problem: Some applications

require one-to-many packet delivery Streaming media delivery Digital conferencing Online multiplayer games

GG

GG

GG

GG

SS

Internet Routing – a High-Level View Traditionally, Internet routing infrastructure offers a

one-to-one (unicast) packet delivery service

IP Multicast Service Model In 1990, Steve Deering proposed IP Multicast

extension to the IP service model for efficient one-to-many packet delivery

GG

GG

GG

GG

SS

Group-based communication: Join (IPAddr, GrpAddr); Leave (IPAddr, GrpAddr); Send (Packet, GrpAddr);

Multicast routing problem: Set up a dissemination tree rooted

at the source with group members as leaves

IP Multicast Routing

GG

GG

GG

GG

SS

IP Multicast Routing New members must find

tree

GG

GG

GG

GG

SS

join G? ?

?

IP Multicast Routing

GG

GG

GG

GG

SS

join G? ?

?

New members must find tree

Tree changes with new members, sources

IP Multicast Routing New members must find

tree

Tree changes with new members, sources

Tree changes with network failures

GG

GG

GG

GG

SS

join G? ?

?

IP Multicast Routing New members must find

tree

Tree changes with new members, sources

Tree changes with network failure

Admin. boundaries and policies matter

GG

GG

GG

GG

SS

join G? ?

?

IP Multicast Routing New members must find

tree

Tree changes with new members, sources

Tree changes with network failure

Admin. boundaries and policies matter

Forwarding state grows with number of groups, sources

GG

GG

GG

GG

SS

join G? ?

?

IP Multicast – a Brief History Extensively researched, limited deployment

Implemented in routers, supported by OS vendors Some intra-domain/enterprise usage Virtually no inter-domain deployment

Why? Too complex? PIM-SM, PIM-DM, MBGP, MSDP,

BGMP, IGMP, etc.

FRM goal: make inter-domain multicast simple

Talk Outline Introduction

Overview of the IP Multicast service model Challenges of Multicast routing

Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation

FRM Overview Free Riding Multicast: radical restructuring of inter-domain multicast

Key design choice: decouple group membership discovery from multicast route construction

Principal trade-off: avoidance of distributed route computation at the expense of optimal efficiency

FRM Approach Group membership discovery

Extension to BGP - augment route advertisements with group membership information

FRM Approach Group membership discovery

Extension to BGP - augment route advertisements with group membership information

Multicast route construction Centralized computation at the origin border router Exploit knowledge of unicast BGP routes Eliminate the need for a separate routing algorithm

Group Membership Discovery AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*AS X Augment BGP with per-prefix group

membership information

AS X

Group Membership Discovery AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Domain X joins G1

Augment BGP with per-prefix group membership information

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Domain X joins G1

FRM group membership

{G1 }a.b.*.* XAS PathDest

BGP UPDATE

Border router at X re-advertises its prefix, attaches encoding of active groups

Augment BGP with per-prefix group membership information

a.b*.* {G1}

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

a.b*.* {G1}

Prefix AS Path Active Groups

a.b.*.* V Q P X

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Border routers maintain membership info. as part of per-prefix state in BGP RIB

BGP disseminates membership change

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

a.b*.* {G1}

Prefix AS Path Active Groups

a.b.*.* V Q P X

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Border routers maintain membership info. as part of per-prefix state in BGP RIB

BGP disseminates membership change

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}

Prefix AS Path Active Groups

a.b.*.* V Q P X

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Border routers maintain membership info. as part of per-prefix state in BGP RIB

BGP disseminates membership change

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

a.b*.* {G1}

a.b*.* {G1} a.b*.* {G1}

a.b*.* {G1}

a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}

Prefix AS Path Active Groups

a.b.*.* V Q P X

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Border routers maintain membership info. as part of per-prefix state in BGP RIB

BGP disseminates membership change

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

a.b*.* {G1}Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Border routers maintain membership info. as part of per-prefix state in BGP RIB

BGP disseminates membership change

AS Z

Group Membership Discovery AS X AS Y

AS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

AS VPrefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Domains Y and Z join G1

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

f.g.*.* {G1}

c.d.e.* {G1} Domains Y and Z join G1

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y

f.g.*.* V R Z

h.i.*.* V Q T

Group Membership Discovery AS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

f.g.*.* {G1}

c.d.e.* {G1} Domains Y and Z join G1

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y {G1}

f.g.*.* V R Z {G1}

h.i.*.* V Q T

Packet ForwardingAS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Domain V: Send(G1, Pkt)

Packet ForwardingAS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y {G1}

f.g.*.* V R Z {G1}

h.i.*.* V Q T

Dissemination tree

{G1 }

Domain V: Send(G1, Pkt)

Lookup

Packet ForwardingAS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y {G1}

f.g.*.* V R Z {G1}

h.i.*.* V Q T

V

Q

P

X

Dissemination tree

Domain V: Send(G1, Pkt)

{G1 } Lookup

Packet ForwardingAS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y {G1}

f.g.*.* V R Z {G1}

h.i.*.* V Q T

V

Q

P

X

Dissemination tree

Y

Domain V: Send(G1, Pkt)

{G1 } Lookup

Packet ForwardingAS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y {G1}

f.g.*.* V R Z {G1}

h.i.*.* V Q T

V

Q

P

X

Dissemination tree

R

Z

Y

Domain V: Send(G1, Pkt)

{G1 } Lookup

Packet ForwardingAS X AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Prefix AS Path Active Groups

a.b.*.* V Q P X {G1}

c.d.e.* V Q P Y {G1}

f.g.*.* V R Z {G1}

h.i.*.* V Q T

V

Q

P

X

Dissemination tree

R

Z

Y

Domain V: Send(G1, Pkt)

{G1 } Lookup

Packet Forwarding

V

Q

P

X

R

Z

Y

AS V

AS X AS Y

AS ZAS T

AS Q AS R

AS P

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

G1 SubtreeR

G1 SubtreeQ

Domain V: Send(G1, Pkt)

SubtreeQ SubtreeR

V forwards packet to its children on the tree, attaches encoding the subtree in a “shim” header

Packet Forwarding

V

Q

P

X

R

Z

Y

AS V

AS X AS Y

AS ZAS T

AS P

c.d.e.*a.b.*.*

AS Q AS R

G1 SubtreeQ

G1 SubtreeR

h.i.*.* f.g.*.*

Domain V: Send(G1, Pkt)

V forwards packet to its children on the tree, attaches encoding the subtree in a “shim” header

Packet Forwarding

AS V

G1 SubtreeR

AS X AS Y

AS ZAS T

AS Q AS R

AS P

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

G1 SubtreeQ

Transit routers inspect FRM header, forward packet to their children on the tree

V

Q

P

X

R

Z

Y

Domain V: Send(G1, Pkt)

Packet Forwarding

AS V

AS X AS Y

AS Z

AS Q AS R

AS P

c.d.e.*

f.g.*.*

a.b.*.*

G1 SubtreeQ

No

V

Q

P

X

R

Z

Y

AS T

h.i.*.*

Domain V: Send(G1, Pkt)

Transit routers inspect FRM header, forward packet to their children on the tree

G1 SubtreeR

Packet Forwarding

V

Q

P

X

R

Z

Y

AS X AS Y

AS ZAS T

AS Q AS R

AS P

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

G1 SubtreeQ

No

VAS

Domain V: Send(G1, Pkt)

Transit routers inspect FRM header, forward packet to their children on the tree

G1 SubtreeR

Packet Forwarding

V

Q

P

X

R

Z

Y

AS V

AS X AS Y

AS ZAS T

AS Q AS R

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*

Yes

AS P

G1 SubtreeQ

Domain V: Send(G1, Pkt)

Transit routers inspect FRM header, forward packet to their children on the tree

G1 SubtreeR

Packet Forwarding

V

Q

P

X

R

Z

Y

AS V

AS ZAS T

AS Rf.g.*.*h.i.*.*

AS P

AS Q

AS X

a.b.*.*AS Y

c.d.e.*

G1 TREE_BFQG1 SubtreeQ

Domain V: Send(G1, Pkt)

Transit routers inspect FRM header, forward packet to their children on the tree

G1 SubtreeR

Packet Forwarding

V

Q

P

X

R

Z

Y

AS V

AS ZAS T

AS Rf.g.*.*h.i.*.*

AS Q

AS X

a.b.*.*AS Y

c.d.e.*

AS P

G1 SubtreeR

Domain V: Send(G1, Pkt)

Transit routers inspect FRM header, forward packet to their children on the tree

Packet Forwarding

V

Q

P

X

R

Z

Y

AS V

AS T

AS Rh.i.*.*

AS Q

AS X

a.b.*.*AS Y

c.d.e.*

AS P

AS Z

f.g.*.*

G1 SubtreeR

Domain V: Send(G1, Pkt)

Transit routers inspect FRM header, forward packet to their children on the tree

FRM Details Encoding group membership

Simple enumeration is hard to scale Border routers encode locally active groups using a Bloom filter Transmit encoding using a new path attribute in BGP UPDATE message

Encoding the dissemination tree Encode edges into a shim header using a Bloom filter

Tree computation is expensive Border routers maintain shim header cache

Talk Outline Introduction

Free Riding Multicast (FRM) Approach overview Overhead evaluation

Router storage requirements Forwarding bandwidth overhead (in paper)

Design tradeoffs Implementation

FRM Overhead – Router Storage AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*AS X

1. Source forwarding state (per-group, line card memory)

2. Group membership state (per-prefix, BGP RIB)

Origin border router

Transit forwarding state (per-neighbor, line card memory)

Transit router

FRM Overhead – Router Storage AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*AS X

1. Source forwarding state (per-group, line card memory)

2. Group membership state (per-prefix, BGP RIB)

Origin border router

Transit forwarding state (per-neighbor, line card memory)

Transit router

Forwarding State (Source Border Router)

0

100

200

300

400

500

600

700

800

900

100 1000 10000 100000 1M

Number of groups with active sources (A)

Ca

ch

e s

ize

(M

B)

256 MB of line card memory enables fast-path forwarding for ~200000 active groups

A -- number of groups with sources in the local domain

Zipfian group popularity with a minimum of 8 domains per group

25 groups have members in every domain (global broadcast)

FRM Overhead – Router Storage AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*AS X

1. Source forwarding state (per-group, line card memory)

2. Group membership state (per-prefix, BGP RIB)

Origin border router

Transit forwarding state (per-neighbor, line card memory)

Transit router

Group Membership State Requirements Total of A multicast

groups Domains of prefix

length p have 232-p users

Each user chooses and joins k distinct groups from A

10 false positives per prefix allowed

1M simultaneously active groups and 10 groups per user require ~3GB of route processor memory (not on the fast path)

FRM Overhead – Router Storage AS Y

AS ZAS T

AS Q AS R

AS P

AS V

c.d.e.*

f.g.*.*h.i.*.*

a.b.*.*AS X

1. Source forwarding state (per-group, line card memory)

2. Group membership state (per-prefix, BGP RIB)

Origin border router

Transit forwarding state (per-neighbor, line card memory)

Transit router

Forwarding State (Transit Router) Number of forwarding entries = number of neighbor ASes

Independent of number of groups!

90% of ASes: 10 forwarding entries 99% of ASes: 100 forwarding entries Worst case: 2400 forwarding entries

AS V

AS Q

AS PAS T ? ?

?

Talk Outline

Introduction

Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation

FRM Design Tradeoffs Protocol simplicity

Can be implemented as a straightforward extension to BGP Centralized route construction (tree is computed at source

border router from existing unicast routes)

FRM Design Tradeoffs Protocol simplicity

Can be implemented as a straightforward extension to BGP Centralized route construction (tree is computed at source

border router from existing unicast routes)

Ease of configuration Management within familiar BGP framework Avoid rendezvous point selection

FRM Design Tradeoffs Protocol simplicity

Can be implemented as a straightforward extension to BGP Centralized route construction (tree is computed at source

border router from existing unicast routes)

Ease of configuration Management within familiar BGP framework Avoid rendezvous point selection

Enables ISP control over sources/subscribers To block traffic for an undesired group, drop it from BGP

advertisement Source controls dissemination tree facilitates source-based

charging [Express].

FRM Design Tradeoffs

FRM Design Tradeoffs Group membership state maintenance

Membership information disseminated more widely

FRM Design Tradeoffs Group membership state maintenance

Membership information disseminated more widely

Nontrivial bandwidth overhead (see paper for results) Per-packet shim header Redundant packet transmissions

FRM Design Tradeoffs Group membership state maintenance

Membership information disseminated more widely

Group membership state maintenance Membership information disseminated more widely

Nontrivial bandwidth overhead (see paper for results) Per-packet shim header Redundant packet transmissions

New packet forwarding techniques Full scan of the BGP RIB at source border router Bloom filter lookups at transit routers

FRM Implementation A proof-of-concept prototype on top of Linux 2.4 and the eXtensible Open Router Platform (http://www.xorp.org).

Functional components: FRM kernel module (3.5 KLOC of new Linux kernel code)

Interfaces with the Linux kernel IP layer and implements the packet forwarding plane FRM user-level component (1.9 KLOC of new code)

Extension to the XORP BGP daemon Implements tree construction and group membership state dissemination

Configuration and management tools (1.4 KLOC of new code)

Summary Free Riding Multicast is a very different approach to inter-domain multicast routing

FRM makes use of existing unicast routing infrastructure for group membership discovery and route construction

Reduce protocol complexity via aggressive use of router resources

Thank you

Challenges and Future Work Incremental Deployment

Legacy BGP routers rate-limit their path advertisements (30 seconds), thus delaying dissemination of group membership state.

Large group Bloom filters that exceed maximum BGP UPDATE message size (4KB) require fragmentation and reassembly.

Explore alternative tree encoding techniques to reduce per-packet bandwidth overhead

Backup Slides

FRM Overhead – Redundant TransmissionsTotal number of transmissions required to transfer a single packet to all

group members (FRM header size = 100 bytes) Ideal Mcast – precisely 1 packet

is transmitted along each edge

Per-AS Unicast – source unicasts to each members AS individually

For all group sizes, the overall bandwidth consumed by FRM is close to that of Ideal Mcast (within 2.4%).

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

1000 10000 100000 1M 10M

Group Size

Nu

mb

er o

f p

acke

t tr

ansm

issi

on

s

Per-AS Unicast FRM Ideal Mcast

FRM Overhead – Redundant TransmissionsNumber of transmissions per AS-level link required to transfer a single

packet to all group members (FRM header size = 100 bytes)

Per-AS Unicast with 10M users:• 6% of links see redundant transmissions.• Worst case: 6950 transmissions per link.

FRM with 10M users:• Less than 0.5% of links see redundant transmissions.• Worst case: 157 transmissions per link• Worst case with optimization (see paper): 2 transmissions per link

Encoding Group Membership State Simple enumeration is hard to scale.

Border routers encode the set of locally active groups using a constant-size Bloom filter (GRP_BF) of length L.

{G1, G2, G3, G4, …} 011011011010…

GRP_BFK hash functions

BGP speakers communicate their GRP_BF state as part of their regular route advertisements (BGP UPDATE message) using a new path attribute.

Encoding Group Membership State Use of Bloom filters introduces possibility of false

positives – a domain may on occasion receive traffic for a group it has no interest in.

To deal with unwanted traffic, recipient domain can install an explicit filter rule at the upstream provider’s network.

For a given number of available upstream filters f, the recipient computes the maximum tolerable false positive rate r and chooses its filter length L accordingly.

r = Min(1, f / (A – G))A = size of the group address spaceG = number of groups to be encoded

Summary Free Riding Multicast is a very different approach to inter-domain multicast routing

FRM makes use of existing unicast routing infrastructure for group membership discovery and route construction

Reduce protocol complexity via aggressive use of router resources

Might be interesting to consider the viability of this approach in broader context

Group Membership Bandwidth Overhead For GRP_BFs with 5 hash functions and bit

positions represented by 24-bit values, the payload of a membership update message for a single group join/leave event is approx. 15 bytes.

Assuming 200000 prefixes in the BGP RIB and 1 group membership event per second per prefix, the aggregate rate of incoming GRP_BF update traffic at a border router is approx. 3MBps.

Why IP Multicast? Technical feasibility aside, now might be a good time

to revisit the desirability question Multicast applications now more widespread

IP-TV, MMORPG, digital conferencing Better understanding of ISP requirements

Bottom line: simple multicast design might open the door to more widespread adoption

Recommended