Packet Caching on Network Links - Universitetet i oslo · So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a

Piotr Srebrny

Part I: Multicasting in the Internet Basics & critique

Part II: CacheCast Internet redundancy Packet caching systems CacheCast design CacheCast evaluation ▪ Efficiency ▪ Computational complexity ▪ Environmental impact

Presenter

Presentation Notes

In the Beginning there was the Internet and the Internet was connectionless. It provided best-effort services, i.e. services without any guarantee on reliability, time, and even consistency of a datagram itself! It had two very simple tasks to do: to build end-to-end routes and to forward packets according to their destination addresses.

S

Presenter

Presentation Notes

And the Internet was multicast-less. It could only transfer a single packet to a single destination. Therefore, if a single source would like to send the same packet to many destination it has to transmit it many times. Which was obviously inefficient, especially if we were to build Internet radio or TV.

S

Presenter

Presentation Notes


S

Presenter

Presentation Notes


Presenter

Presentation Notes

Therefore, researchers thought that they will improve the Internet by allowing it to grow communication trees.

Presenter

Presentation Notes


Presenter

Presentation Notes


Presenter

Presentation Notes

So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a packet header. However, the packet size is limited by MTU of the transmission path, thus, the more addresses we put into the packet the less space is for the message. Eventually there is no space for a message if there is too much addresses. Obviously such approach does not scale well with the number of receivers.

Ellen Munthe-Kaas

Presenter

Presentation Notes


Vera Goebel

Daniel Rodríguez Fernández

Ellen Munthe-Kaas

Kjell Åge Bringsrud

Presenter

Presentation Notes


Vera Goebel

Daniel Rodríguez Fernández

Ellen Munthe-Kaas

Kjell Åge Bringsrud

Hi…

Presenter

Presentation Notes

So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a packet header. However, the packet size is limited by MTU of the transmission path, thus, the more addresses we put into the packet the less space is for the message. Eventually there is no space for a message if there is too much addresses. Obviously such approach does not scale well with the number of receivers. Processing of this type of message on each hop.

Presenter

Presentation Notes

Meanwhile, another idea emerged. Stephen Deering proposed to use group name in the destination field of a packet and let the Internet to resolve to whom the packet should be delivered. The problem with limited space for destination was resolve immediately! One could send a message to unlimited number of receivers without constrains.

Hi, I would like to invite you to a presentation about IP Multicast.

Presenter

Presentation Notes

Meanwhile, another idea emerged. Stephen Deering proposed to use group name in the destination field of a packet and let the Internet to resolve to whom the packet should be delivered. The problem with limited space for destination was resolve immediately! One could send a message to unlimited number of receivers without constrains.

Scalability was like the Holy Grail for multicast community.

Presenter

Presentation Notes

So the desire was to have multicast that will be able to handle huge amount of receivers. It would be perfect that it would behave like broadcast TV! Not only one to ten, or one to hundred, but one to million and beyond. That was the thinking that droved IP Multicast design and still impacts the major design decisions.

DMMS Group

Presenter

Presentation Notes

Thus, the first design of IP Multicast was called Any Source Multicast. Regardless of how frightening it may sound ASM actually means exactly that any source can send a message to any group, not even being a member of that group. Moreover, anyone can listen to any group. Lay down all the good properties and the remaining goals here.

http://images.google.no/imgres?imgurl=http://commoncentssoftware.com/images/uploads/envelope_48.png&imgrefurl=http://commoncentssoftware.com/index.php/CommonCents_Site/features/&usg=__DF7jr2KIEIJVjFvN2rb1QF2bpIs=&h=48&w=48&sz=3&hl=en&start=10&um=1&tbnid=ZDiB8_Ap5ETwpM:&tbnh=48&tbnw=48&prev=/images?q=envelope&imgsz=i&hl=en&sa=N&um=1

DMMS Group

Presenter

Presentation Notes









DMMS Group

Presenter

Presentation Notes



DMMS Group

Presenter

Presentation Notes










Presenter

Presentation Notes

The number of available multicast channels is approx. 268 millions.

Presenter

Presentation Notes

Even if a source would be aware of the paths capacity to all of the receivers, it is still pain in the neck to create reasonable control mechanism.

Presenter

Presentation Notes

AAA – authentication, authorization, and accounting

… and after 10 years of non-random mutations

Presenter

Presentation Notes

Many PhD were involved to solve above mentioned issues. A lot of effort was put into it resulting in many scientific publications. Actually, this was a very good topic for PhD as routing protocols. ASM evolved to SSM. So instead of grow in complexity we see the reduction.

Presenter

Presentation Notes

What is so creative in the SSM model? SSM solves the problem of unauthorized senders, since there can be one sender which is in the root of the multicast tree. Because the tree is unidirectional there is no problem with routing algorithms which are basically the reverse path forwarding. The source can be found via directory service located somewhere in the Internet. What does it fix when compared to ASM?

Presenter

Presentation Notes

And still, the main principle was to make it enormously scalable!! What does it entail? No control over receivers, because the number of receivers might be so huge that it is impossible to control them all! Therefore, receivers join and leave a multicast tree themselves. How to enforce authorization? As a result a source does not know who is listening to it. Can a source relay on a good will of receivers? Most of you heard about ACK Feedback implosion, feedback suppression…

DVMRP PIM-DM

PIM-SM

MOSPF

MSDP

MLDv2 IGMPv2

IGMPv3

MBGP

BGP4+

MLDv3

AMT

PIM-SSM

PGM

MAAS

MADCAP

MASC IGMP Snooping

CGMP RGMP

Presenter

Presentation Notes

AMT – Automatic Multicast Tunneling MBGP – Multiprotocol BGP Extensions for IP Multicast PGM – Pretty Good Multicast, or Pragmatic General Multicast CGMP – Cisco Group Management Protocol RGMP – Router-port Group Management Protocol MASC – Multicast Address-Set Claim MAAS – Multicast Address Allocation Server MADCAP – Multicast Address Dynamic Client Allocation Protocol MSDP – Multicast Source Discovery Protocol MOSPF – Multicast Open Shortest Path First PIM-SM – Protocol Independent Multicast - Sparse Mode PIM-SSM – Protocol Independent Multicast - Source Specific Multicast PIM-DM – Protocol Independent Multicast - Dense Mode DVMRP – Distance Vector Multicast Routing Protocol MLDv[23] – Multicast Listener Discovery (works with IPv6) IGMPv[23] - Internet Group Management Protocol (works with IPv4) IGMP Snooping – link layer mechanism snooping the IGMP messages to optimise link layer transport

http://multicasttech.com/status/ (obsolete)

http://multicasttech.com/status/

Presenter

Presentation Notes

Another thing that it entails is that any changes, upgrades to the system must be deployed in a network, since it is the network which is mediator between a source and receivers. IP Multicast will never solely fulfil the task of removing redundant data transmission for single source multiple destination data transfer. It requires support from the underlying technology regardless of whether it is Ethernet, ATM , MPLS, or different VPN. However, this support must be provided in the way the IP Multicast imposes it – i.e. connection oriented distribution tree. For example Ethernet standard, which has grown in usage recently, experienced CISCO experiment which introduced a state for multicast tree on the Ethernet switches.

Presenter

Presentation Notes

IP Multicast will never solely fulfil the task of removing redundant data transmission for single source multiple destination data transfer. It requires support from the underlying technology regardless of whether it is Ethernet, ATM , MPLS, or different VPN. However, this support must be provided in the way the IP Multicast imposes it – i.e. connection oriented distribution tree. For example Ethernet standard, which has grown in usage recently, experienced CISCO experiment which introduced a state for multicast tree on the Ethernet switches.

Part II

Internet redundancy Packet caching systems CacheCast design CacheCast evaluation Efficiency Computational complexity Environmental impact

Related system by Anand et al. Summary

Internet is a content distribution network

35 Ipoque 2009 (http://www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2008-2009.pdf)

Presenter

Presentation Notes

Key facts: - Internet is becoming content centric. We see the proliferation of IP TV, IP radio, video conferencing, content distribution networks both video and audio - Therefore importance of single source multiple destination data transport mechanism is growing, since it is a paramount transport mechanism for content delivery. - In the absence of efficient multicast, this data is distributed using unicast connections which creates redundancy in the network layer.

Single source multiple destination transport mechanism becomes fundamental!

At present, the Internet does not provide an efficient multi-point transport mechanism

“Datagram routing for internet multicasting”, L. Aguilar, 1984 – explicit list of destinations in the IP header

“Host groups: A multicast extension for datagram internetworks”, D. Cheriton and S. Deering, 1985 – destination address denotes a group of host

“A case for end system multicast”, Y. hua Chu et al., 2000 – application layer multicast

37

Presenter

Presentation Notes

The problem of efficient single source multiple destination transfer was also address on the application layer. However, this class of solutions only distribute the transmission burden among a server and receivers and does not avoid redundancy at the network layer.

Server transmitting the same data to multiple destinations is wasting the Internet resources The same data traverses the same path multiple

times

D

C

S

A

B P D P C P B P A

38

Consider two packets A and B that carry the same content and traverse the same few hops

P A A

B

39

Consider two packets A and B that carry the same content and travel the same few hops

A

B P

40

P A


A

B P P P

41

P B


A

B P P P

42

B


A

B P P P

43

P B

In practice: How to determine whether a packet payload is in

the next hop cache? How to compare packet payloads? What size should be the cache?

45

Network elements:

Link

Medium transporting packets

Router

Switches data packets to links

46

Presenter

Presentation Notes

The fundamentals of our design.

Link Logical point to point connection

Highly robust & very deterministic Throughput limited per bit [bps]

It is beneficial to avoid redundant payload transmission over a link

Router Switching node Performs three elementary tasks per packet ▪ TTL update ▪ Checksum recalculation ▪ Destination IP address lookup

Throughput limited per packet [pps] Switching packets with redundant payload

does not impact the router performance

Presenter

Presentation Notes

Switching packets with redundant payload does not impact the router performance, unless it does not involve data move. If the packet data must be moved during the switching process the packet size impacts the router performance. This does not occur for routers based on shared memory architecture.

Caching is done on per link basis Cache Management Unit (CMU) removes payloads

that are stored on the link exit Cache Store Unit (CSU) restores payloads from a

local cache

49

Presenter

Presentation Notes

This is the first fundamental design decision: cache packets on per link basis which results in a very simple caching mechanism

Link cache processing must be simple ~72ns to process the minimum size packet on a 10Gbps

link Modern memory r/w cycle ~3-10ns

Link cache size must be minimised At present, a link queue is scaled to 250ms of the link

traffic, for a 10Gbps link it is already 315MB Difficult to build!

50

Link cache processing must be simple ~72ns to process the minimum size packet on a 10Gbps

link Modern memory r/w cycle ~3-10ns

Link cache size must be minimised At present, a link queue is scaled to 250ms of the link

traffic, for a 10Gbps link it is already 315MB Difficult to build!

51

A source of redundant data must support link caches!

1. Server can transmit packets carrying the same data within a minimum time interval

2. Server can mark its redundant traffic

3. Server can provide additional information that simplifies link cache processing

52

CacheCast packet carries an extension header describing packet payload Payload ID Payload size Index

Only packets with the header are cached

53

Packet train Only the first packet carries the payload The remaining packets truncated to the header

54

Packet train duration time

It is sufficient to hold payload in the CSU for

the packet train duration time What is the maximum packet train duration

time? 55

Back-of-the-envelope calculations ~10ms caches are sufficient

56

Two components of the CacheCast system Server support

Distributed infrastructure of small link caches

D

C

S

A

B D C B P A

CMU CSU CMU CSU

57

P2

P4 P3

Index

0 1 2

Index

0 1 2

Payload ID

P3 P4

Payload

A P1

P2

• Cache miss

CMU table Cache store

CMU CSU

59

Index

0 1 2

Index

0 1 2

Payload ID

P1 P2 P3

Payload - - -

B P2 x

P1

B

P2 P3

• Cache hit

P2


CMU CSU

60

P2

P4 P3

Index

0 1 2

Index

0 1 2

Payload ID

P3 P4

Payload

A P1 x

P2

A P1 0

• Cache miss


CMU CSU

61

What can go wrong?

P2

P4 P3

Index

0 1 2

Index

0 1 2

Payload ID

P3 P4

Payload

P1

• Cache hit


CMU CSU

62

B P1 x B 0

P2

How to protect against this error?

Tasks: Batch transmissions of the same data to multiple destinations Build the CacheCast headers Transmit packets in the form of a packet train

One system call to transmit data to all destinations msend()

msend() system call Implemented in Linux Simple API

fds_write – a set of file descriptors representing connections to

data clients fds_written – a set of file descriptors representing connections

to clients that the data was transmitted to

int msend(fd_set *fds_write, fd_set *fds_written, char *buf, int len)

64

Presenter

Presentation Notes

A break to digest the knowledge

OS network stack Connection endpoints represented as sockets Transport layer (e.g. TCP, UDP, or DCCP) Network layer (e.g. IP) Link layer (e.g. Ethernet) Network card driver

msend() execution

Two aspects of the CacheCast system

I. Efficiency How much redundancy CacheCast removes?

II. Computational complexity Can CacheCast be implemented efficiently with

the present technology?

68

CacheCast and ‘Perfect multicast’ ‘Perfect multicast’ – delivers data to multiple

destinations without any overhead CacheCast overheads

I. Unique packet header per destination II. Finite link cache size resulting in payload

retransmissions III. Partial deployment

69

Presenter

Presentation Notes

In order to evaluate the CacheCast efficiency we compared it with ‘Perfect multicast’, where by ‘Perfect multicast’ we mean a system that delivers the same data to multiple destination without incurring any overhead. ‘Perfect multicast’ efficiency shows the maximum efficiency. We relate CacheCast to ‘Perfect multicast’ and measure how close it is to the optimum. Using analysis and simulation, we assessed this overheads as following

Metric expresses the reduction in traffic volume

Example:

u

mm L

L−=1δ

5,9 == mu LL

%4494

951 ≈=−=mδ

70

links unicast of amount totalthe -

links multicast of amount totalthe -

u

m

LL

Presenter

Presentation Notes

We use the following metric to compare the ‘perfect multicast’ and CacheCast systems. The efficiency metric shows the reduction in network traffic when using multicast transmission instead of unicast transmission. It is one minus the total amount of multicast links divided by the total amount of unicast links that are used to deliver the same data to multiple destinations. When the efficiency approaches 1, the bigger gains are from using multicast, while when the metric approaches 0, there is no use of multicast. Resource consumption measured as the total amount of links that a data chunk traverses to reach all destinations. The metric shows reduction in resource consumption when using multicast instead of unicast.

CacheCast unicast header part (h) and multicast payload part (p)

Thus:

E.g.: using packets where sp=1436B and sh=64B, CacheCast

achieves 96% of the ‘perfect multicast’ efficiency

uph

mpuhCC Lss

LsLs)(

1+

+−=δ

u

mm L

L−=1δ ⇒

p

hmCC s

srr

=+

= ,1

1 δδ

71

Presenter

Presentation Notes

When we apply this metric to CacheCast, we have to take into account that in the CacheCast system packet headers are unicast while packet payloads are multicast which is reflected in the modified efficiency metric.

Single link cache efficiency is related to the amount of redundancy that is removed

Traffic volumes:

VVc−=1η

CacheCast withoutvolume traffic

CacheCastwith volume traffic-

−VVc

)( hp ssnV +=

hpc nssV +=

Presenter

Presentation Notes

We focus on packet trains

Link cache efficiency:

Thus:

hp

hp

nsnsnss+

+−=1η

Cn

−=

11ηp

h

ssr

rC =

+= ,

11

Cn

−=

11η

Presenter

Presentation Notes

Consider transmission to 100 destinations, the link cache efficiency does not decrease much when retransmitting payload over a link every 8th packet

The more destination the higher efficiency E.g. 512Kbps – 8 headers in 10ms, e.g. 12 destinations

Slow sources transmitting to many

destinations cannot achieve the maximum efficiency

A P B C D E F G H I P J K L

System efficiency δm for 10ms large caches

76

Step I

Step II

Step III

Presenter

Presentation Notes

Packet trains are spreading in time..

Step IV

How could we improve?

S

CMU and CSU deployed partially

1

81

S


1 2

82

S


1 2 3 4 5 6

83

84

Considering unique packet headers CacheCast can achieve up to 96% of the ‘Perfect multicast’

efficiency Considering finite cache size 10ms link caches can remove most of the redundancy

generated by fast sources Considering partial deployment CacheCast deployed over the first five hops from a server

achieves already half of the maximum efficiency

85

Presenter

Presentation Notes

In order to evaluate the CacheCast efficiency we compared it with ‘Perfect multicast’, where by ‘Perfect multicast’ we mean a system that delivers the same data to multiple destination without incurring any overhead. ‘Perfect multicast’ efficiency shows the maximum efficiency. We relate CacheCast to ‘Perfect multicast’ and measure how close it is to the optimum. Partial deployment impact will not be discussed.

Computational complexity may render CacheCast inefficient

Implementations Link cache elements – implemented with Click

Modular Router Software as processing elements

Server support – a Linux system call and an auxiliary shell command tool

87

CacheCast can be deployed as a software update Click Modular Router Software

CacheCast router modell

⇒

88

Presenter

Presentation Notes

We measure how these two routers perform while forwarding traffic with redundant data. The traffic consists of packet trains of different size and carrying payloads of different length

Router configuration: CSU – first element CMU – last element

Packet drop occurs at the output link queue, however before CMU processing

Workload Packet trains

Payload sizes: 500B, 1500B, 9000B, 16000B Group size: 10, 100, 1000

Due to CSU and CMU elements CacheCast router cannot forward packet trains at line rate

91

When compared with a standard router CacheCast router can forward more data

92

Presenter

Presentation Notes

The original router forwards traffic with redundant payloads while the CacheCast router avoids the redundant payload transmission. To understand the benefits of the CacheCast router we compare the volume of traffic forwarded by both routers within a given time unit. However, we count the CacheCast traffic volume as if it carried the redundant payloads. We refer to it as the effective throughput. This table shows the ratio of the effective throughputs of the CacheCast router and the original router.

Efficiency of the server support is related to the efficiency of the msend() system call

Basic metric: Time to transmit a single packet

Compare the msend() system call with the standard send() system call

93

Measure:

Time to transmit a single packet:

Transmit to group size: 10, 100, 1000

// Start measurement

int msend(fd_set *fds_write, fd_set *fds_written, char *buf, int len); // Stop measurement

// Start measurement

for each fd in fds_write {

int send(fd, char *buf, int len); }

// Stop measurement

Measured time

Number of transmitted packets t=

Server transmitting to 100 destinations using Loop of send() sys. calls A single msend() sys. call

msend() system call outperforms the standard send() system call when transmitting to multiple destinations

95

Presenter

Presentation Notes

Msend system call execution time does not depend on the transmitted payload size We compare the execution time

Paraslash audio streaming software

96

while (written < len) { size_t num = len - written; if (num > DCCP_MAX_BYTES_PER_WRITE) num = DCCP_MAX_BYTES_PER_WRITE; msend(&dss->client_fds, &fdw, buf, num); written += num; } /* We drop chunks that we don't manage to send */

http://paraslash.systemlinux.org/

dccp_send.c

Setup

Paraslash clients located at the machines A

and B gradually request an audio stream from the server S

97

Original paraslash server can only handle 74 clients CacheCast paraslash server can handle 1020 clients and

more depending on the chunk size Server load is reduced when using large chunks

98

99

Internet congestion avoidance relies on communicating end-points that adjust transmission rate to the network conditions

CacheCast transparently removes redundancy increasing network capacity

It is not given how congestion control algorithms behave in the CacheCast presence

100

CacheCast implemented in ns-2 Simulation setup: Bottleneck link topology 100 TCP flows and

100 TFRC flows Link cache operating

on a bottleneck link

101

TCP flows consume the spare capacity TFRC flows increase end-to-end throughput

CacheCast preserves the Internet ‘fairness’ 102

“Packet caches on routers: the implications of universal redundant traffic elimination” Ashok Anand, Archit Gupta, Aditya Akella, Srinivasan Seshan, and Scott Shenker, SIGCOMM’08 Fine grain redundancy detection ▪ 10-50% removed redundancy

New redundancy aware routing protocol ▪ Further 10-25% removed redundancy

Large caches ▪ Caching 10s of traffic traversing a link

IP Multicast Based on the host group model to achieve great scalability Breaks end-to-end model of the Internet communication Operates only in ‘walled gardens’

CacheCast Only removes redundant payload transmission and

preserves end-to-end connections Can achieve near multicast bandwidth savings Is incrementally deployable Preserves fairness in Internet Requires server support

P. Srebrny, T. Plagemann, V. Goebel, and A. Mauthe, “CacheCast: Eliminating Redundant Link Traffic for Single Source Multiple Destination Transfer,” ICDCS’2010.

A. Anand, A. Gupta, A. Akella, S. Seshan, and S. Shenker, “Packet caches on routers: the implications of universal redundant traffic elimination,” SIGCOMM’08

J. Santos and D. Wetherall, “Increasing effective link bandwidth by suppressing replicated data,” USENIX’98 106

Documents

Packet Caching on Network Links - Universitetet i oslo · So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a