Packet Caching on Network Links - Universitetet i oslo · So the first idea to enable multicast...
106
Piotr Srebrny
Packet Caching on Network Links - Universitetet i oslo · So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a
Part I: Multicasting in the Internet Basics & critique
Part II: CacheCast Internet redundancy Packet caching systems CacheCast design CacheCast evaluation ▪ Efficiency ▪ Computational complexity ▪ Environmental impact
Presenter
Presentation Notes
In the Beginning there was the Internet and the Internet was connectionless. It provided best-effort services, i.e. services without any guarantee on reliability, time, and even consistency of a datagram itself! It had two very simple tasks to do: to build end-to-end routes and to forward packets according to their destination addresses.
S
Presenter
Presentation Notes
And the Internet was multicast-less. It could only transfer a single packet to a single destination. Therefore, if a single source would like to send the same packet to many destination it has to transmit it many times. Which was obviously inefficient, especially if we were to build Internet radio or TV.
S
Presenter
Presentation Notes
And the Internet was multicast-less. It could only transfer a single packet to a single destination. Therefore, if a single source would like to send the same packet to many destination it has to transmit it many times. Which was obviously inefficient, especially if we were to build Internet radio or TV.
S
Presenter
Presentation Notes
And the Internet was multicast-less. It could only transfer a single packet to a single destination. Therefore, if a single source would like to send the same packet to many destination it has to transmit it many times. Which was obviously inefficient, especially if we were to build Internet radio or TV.
Presenter
Presentation Notes
Therefore, researchers thought that they will improve the Internet by allowing it to grow communication trees.
Presenter
Presentation Notes
Therefore, researchers thought that they will improve the Internet by allowing it to grow communication trees.
Presenter
Presentation Notes
Therefore, researchers thought that they will improve the Internet by allowing it to grow communication trees.
Presenter
Presentation Notes
So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a packet header. However, the packet size is limited by MTU of the transmission path, thus, the more addresses we put into the packet the less space is for the message. Eventually there is no space for a message if there is too much addresses. Obviously such approach does not scale well with the number of receivers.
Ellen Munthe-Kaas
Presenter
Presentation Notes
So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a packet header. However, the packet size is limited by MTU of the transmission path, thus, the more addresses we put into the packet the less space is for the message. Eventually there is no space for a message if there is too much addresses. Obviously such approach does not scale well with the number of receivers.
Vera Goebel
Daniel Rodríguez Fernández
Ellen Munthe-Kaas
Kjell Åge Bringsrud
Presenter
Presentation Notes
So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a packet header. However, the packet size is limited by MTU of the transmission path, thus, the more addresses we put into the packet the less space is for the message. Eventually there is no space for a message if there is too much addresses. Obviously such approach does not scale well with the number of receivers.
Vera Goebel
Daniel Rodríguez Fernández
Ellen Munthe-Kaas
Kjell Åge Bringsrud
Hi…
Presenter
Presentation Notes
So the first idea to enable multicast communication in the Internet was simply to attach all destinations addresses to a packet header. However, the packet size is limited by MTU of the transmission path, thus, the more addresses we put into the packet the less space is for the message. Eventually there is no space for a message if there is too much addresses. Obviously such approach does not scale well with the number of receivers. Processing of this type of message on each hop.
Presenter
Presentation Notes
Meanwhile, another idea emerged. Stephen Deering proposed to use group name in the destination field of a packet and let the Internet to resolve to whom the packet should be delivered. The problem with limited space for destination was resolve immediately! One could send a message to unlimited number of receivers without constrains.
Hi, I would like to invite you to a presentation about IP Multicast.
Presenter
Presentation Notes
Meanwhile, another idea emerged. Stephen Deering proposed to use group name in the destination field of a packet and let the Internet to resolve to whom the packet should be delivered. The problem with limited space for destination was resolve immediately! One could send a message to unlimited number of receivers without constrains.
Scalability was like the Holy Grail for multicast community.
Presenter
Presentation Notes
So the desire was to have multicast that will be able to handle huge amount of receivers. It would be perfect that it would behave like broadcast TV! Not only one to ten, or one to hundred, but one to million and beyond. That was the thinking that droved IP Multicast design and still impacts the major design decisions.
DMMS Group
Presenter
Presentation Notes
Thus, the first design of IP Multicast was called Any Source Multicast. Regardless of how frightening it may sound ASM actually means exactly that any source can send a message to any group, not even being a member of that group. Moreover, anyone can listen to any group. Lay down all the good properties and the remaining goals here.
Thus, the first design of IP Multicast was called Any Source Multicast. Regardless of how frightening it may sound ASM actually means exactly that any source can send a message to any group, not even being a member of that group. Moreover, anyone can listen to any group. Lay down all the good properties and the remaining goals here.
Thus, the first design of IP Multicast was called Any Source Multicast. Regardless of how frightening it may sound ASM actually means exactly that any source can send a message to any group, not even being a member of that group. Moreover, anyone can listen to any group. Lay down all the good properties and the remaining goals here.
Thus, the first design of IP Multicast was called Any Source Multicast. Regardless of how frightening it may sound ASM actually means exactly that any source can send a message to any group, not even being a member of that group. Moreover, anyone can listen to any group. Lay down all the good properties and the remaining goals here.
The number of available multicast channels is approx. 268 millions.
Presenter
Presentation Notes
Even if a source would be aware of the paths capacity to all of the receivers, it is still pain in the neck to create reasonable control mechanism.
Presenter
Presentation Notes
AAA – authentication, authorization, and accounting
… and after 10 years of non-random mutations
Presenter
Presentation Notes
Many PhD were involved to solve above mentioned issues. A lot of effort was put into it resulting in many scientific publications. Actually, this was a very good topic for PhD as routing protocols. ASM evolved to SSM. So instead of grow in complexity we see the reduction.
Presenter
Presentation Notes
What is so creative in the SSM model? SSM solves the problem of unauthorized senders, since there can be one sender which is in the root of the multicast tree. Because the tree is unidirectional there is no problem with routing algorithms which are basically the reverse path forwarding. The source can be found via directory service located somewhere in the Internet. What does it fix when compared to ASM?
Presenter
Presentation Notes
And still, the main principle was to make it enormously scalable!! What does it entail? No control over receivers, because the number of receivers might be so huge that it is impossible to control them all! Therefore, receivers join and leave a multicast tree themselves. How to enforce authorization? As a result a source does not know who is listening to it. Can a source relay on a good will of receivers? Most of you heard about ACK Feedback implosion, feedback suppression…
DVMRP PIM-DM
PIM-SM
MOSPF
MSDP
MLDv2 IGMPv2
IGMPv3
MBGP
BGP4+
MLDv3
AMT
PIM-SSM
PGM
MAAS
MADCAP
MASC IGMP Snooping
CGMP RGMP
Presenter
Presentation Notes
AMT – Automatic Multicast Tunneling MBGP – Multiprotocol BGP Extensions for IP Multicast PGM – Pretty Good Multicast, or Pragmatic General Multicast CGMP – Cisco Group Management Protocol RGMP – Router-port Group Management Protocol MASC – Multicast Address-Set Claim MAAS – Multicast Address Allocation Server MADCAP – Multicast Address Dynamic Client Allocation Protocol MSDP – Multicast Source Discovery Protocol MOSPF – Multicast Open Shortest Path First PIM-SM – Protocol Independent Multicast - Sparse Mode PIM-SSM – Protocol Independent Multicast - Source Specific Multicast PIM-DM – Protocol Independent Multicast - Dense Mode DVMRP – Distance Vector Multicast Routing Protocol MLDv[23] – Multicast Listener Discovery (works with IPv6) IGMPv[23] - Internet Group Management Protocol (works with IPv4) IGMP Snooping – link layer mechanism snooping the IGMP messages to optimise link layer transport
Another thing that it entails is that any changes, upgrades to the system must be deployed in a network, since it is the network which is mediator between a source and receivers. IP Multicast will never solely fulfil the task of removing redundant data transmission for single source multiple destination data transfer. It requires support from the underlying technology regardless of whether it is Ethernet, ATM , MPLS, or different VPN. However, this support must be provided in the way the IP Multicast imposes it – i.e. connection oriented distribution tree. For example Ethernet standard, which has grown in usage recently, experienced CISCO experiment which introduced a state for multicast tree on the Ethernet switches.
Presenter
Presentation Notes
IP Multicast will never solely fulfil the task of removing redundant data transmission for single source multiple destination data transfer. It requires support from the underlying technology regardless of whether it is Ethernet, ATM , MPLS, or different VPN. However, this support must be provided in the way the IP Multicast imposes it – i.e. connection oriented distribution tree. For example Ethernet standard, which has grown in usage recently, experienced CISCO experiment which introduced a state for multicast tree on the Ethernet switches.
Part II
Internet redundancy Packet caching systems CacheCast design CacheCast evaluation Efficiency Computational complexity Environmental impact
Key facts: - Internet is becoming content centric. We see the proliferation of IP TV, IP radio, video conferencing, content distribution networks both video and audio - Therefore importance of single source multiple destination data transport mechanism is growing, since it is a paramount transport mechanism for content delivery. - In the absence of efficient multicast, this data is distributed using unicast connections which creates redundancy in the network layer.
Single source multiple destination transport mechanism becomes fundamental!
At present, the Internet does not provide an efficient multi-point transport mechanism
“Datagram routing for internet multicasting”, L. Aguilar, 1984 – explicit list of destinations in the IP header
“Host groups: A multicast extension for datagram internetworks”, D. Cheriton and S. Deering, 1985 – destination address denotes a group of host
“A case for end system multicast”, Y. hua Chu et al., 2000 – application layer multicast
37
Presenter
Presentation Notes
The problem of efficient single source multiple destination transfer was also address on the application layer. However, this class of solutions only distribute the transmission burden among a server and receivers and does not avoid redundancy at the network layer.
Server transmitting the same data to multiple destinations is wasting the Internet resources The same data traverses the same path multiple
times
D
C
S
A
B P D P C P B P A
38
Consider two packets A and B that carry the same content and traverse the same few hops
P A A
B
39
Consider two packets A and B that carry the same content and travel the same few hops
A
B P
40
P A
Consider two packets A and B that carry the same content and travel the same few hops
A
B P P P
41
P B
Consider two packets A and B that carry the same content and travel the same few hops
A
B P P P
42
B
Consider two packets A and B that carry the same content and travel the same few hops
A
B P P P
43
P B
In practice: How to determine whether a packet payload is in
the next hop cache? How to compare packet payloads? What size should be the cache?
45
Network elements:
Link
Medium transporting packets
Router
Switches data packets to links
46
Presenter
Presentation Notes
The fundamentals of our design.
Link Logical point to point connection
Highly robust & very deterministic Throughput limited per bit [bps]
It is beneficial to avoid redundant payload transmission over a link
Router Switching node Performs three elementary tasks per packet ▪ TTL update ▪ Checksum recalculation ▪ Destination IP address lookup
Throughput limited per packet [pps] Switching packets with redundant payload
does not impact the router performance
Presenter
Presentation Notes
Switching packets with redundant payload does not impact the router performance, unless it does not involve data move. If the packet data must be moved during the switching process the packet size impacts the router performance. This does not occur for routers based on shared memory architecture.
Caching is done on per link basis Cache Management Unit (CMU) removes payloads
that are stored on the link exit Cache Store Unit (CSU) restores payloads from a
local cache
49
Presenter
Presentation Notes
This is the first fundamental design decision: cache packets on per link basis which results in a very simple caching mechanism
Link cache processing must be simple ~72ns to process the minimum size packet on a 10Gbps
link Modern memory r/w cycle ~3-10ns
Link cache size must be minimised At present, a link queue is scaled to 250ms of the link
traffic, for a 10Gbps link it is already 315MB Difficult to build!
50
Link cache processing must be simple ~72ns to process the minimum size packet on a 10Gbps
link Modern memory r/w cycle ~3-10ns
Link cache size must be minimised At present, a link queue is scaled to 250ms of the link
traffic, for a 10Gbps link it is already 315MB Difficult to build!
51
A source of redundant data must support link caches!
1. Server can transmit packets carrying the same data within a minimum time interval
2. Server can mark its redundant traffic
3. Server can provide additional information that simplifies link cache processing
52
CacheCast packet carries an extension header describing packet payload Payload ID Payload size Index
Only packets with the header are cached
53
Packet train Only the first packet carries the payload The remaining packets truncated to the header
54
Packet train duration time
It is sufficient to hold payload in the CSU for
the packet train duration time What is the maximum packet train duration
time? 55
Back-of-the-envelope calculations ~10ms caches are sufficient
56
Two components of the CacheCast system Server support
Distributed infrastructure of small link caches
D
C
S
A
B D C B P A
CMU CSU CMU CSU
57
P2
P4 P3
Index
0 1 2
Index
0 1 2
Payload ID
P3 P4
Payload
A P1
P2
• Cache miss
CMU table Cache store
CMU CSU
59
Index
0 1 2
Index
0 1 2
Payload ID
P1 P2 P3
Payload - - -
B P2 x
P1
B
P2 P3
• Cache hit
P2
CMU table Cache store
CMU CSU
60
P2
P4 P3
Index
0 1 2
Index
0 1 2
Payload ID
P3 P4
Payload
A P1 x
P2
A P1 0
• Cache miss
CMU table Cache store
CMU CSU
61
What can go wrong?
P2
P4 P3
Index
0 1 2
Index
0 1 2
Payload ID
P3 P4
Payload
P1
• Cache hit
CMU table Cache store
CMU CSU
62
B P1 x B 0
P2
How to protect against this error?
Tasks: Batch transmissions of the same data to multiple destinations Build the CacheCast headers Transmit packets in the form of a packet train
One system call to transmit data to all destinations msend()
msend() system call Implemented in Linux Simple API
fds_write – a set of file descriptors representing connections to
data clients fds_written – a set of file descriptors representing connections
to clients that the data was transmitted to
int msend(fd_set *fds_write, fd_set *fds_written, char *buf, int len)
64
Presenter
Presentation Notes
A break to digest the knowledge
OS network stack Connection endpoints represented as sockets Transport layer (e.g. TCP, UDP, or DCCP) Network layer (e.g. IP) Link layer (e.g. Ethernet) Network card driver
msend() execution
Two aspects of the CacheCast system
I. Efficiency How much redundancy CacheCast removes?
II. Computational complexity Can CacheCast be implemented efficiently with
the present technology?
68
CacheCast and ‘Perfect multicast’ ‘Perfect multicast’ – delivers data to multiple
destinations without any overhead CacheCast overheads
I. Unique packet header per destination II. Finite link cache size resulting in payload
retransmissions III. Partial deployment
69
Presenter
Presentation Notes
In order to evaluate the CacheCast efficiency we compared it with ‘Perfect multicast’, where by ‘Perfect multicast’ we mean a system that delivers the same data to multiple destination without incurring any overhead. ‘Perfect multicast’ efficiency shows the maximum efficiency. We relate CacheCast to ‘Perfect multicast’ and measure how close it is to the optimum. Using analysis and simulation, we assessed this overheads as following
Metric expresses the reduction in traffic volume
Example:
u
mm L
L−=1δ
5,9 == mu LL
%4494
951 ≈=−=mδ
70
links unicast of amount totalthe -
links multicast of amount totalthe -
u
m
LL
Presenter
Presentation Notes
We use the following metric to compare the ‘perfect multicast’ and CacheCast systems. The efficiency metric shows the reduction in network traffic when using multicast transmission instead of unicast transmission. It is one minus the total amount of multicast links divided by the total amount of unicast links that are used to deliver the same data to multiple destinations. When the efficiency approaches 1, the bigger gains are from using multicast, while when the metric approaches 0, there is no use of multicast. Resource consumption measured as the total amount of links that a data chunk traverses to reach all destinations. The metric shows reduction in resource consumption when using multicast instead of unicast.
CacheCast unicast header part (h) and multicast payload part (p)
Thus:
E.g.: using packets where sp=1436B and sh=64B, CacheCast
achieves 96% of the ‘perfect multicast’ efficiency
uph
mpuhCC Lss
LsLs)(
1+
+−=δ
u
mm L
L−=1δ ⇒
p
hmCC s
srr
=+
= ,1
1 δδ
71
Presenter
Presentation Notes
When we apply this metric to CacheCast, we have to take into account that in the CacheCast system packet headers are unicast while packet payloads are multicast which is reflected in the modified efficiency metric.
Single link cache efficiency is related to the amount of redundancy that is removed
Traffic volumes:
VVc−=1η
CacheCast withoutvolume traffic
CacheCastwith volume traffic-
−VVc
)( hp ssnV +=
hpc nssV +=
Presenter
Presentation Notes
We focus on packet trains
Link cache efficiency:
Thus:
hp
hp
nsnsnss+
+−=1η
Cn
−=
11ηp
h
ssr
rC =
+= ,
11
Cn
−=
11η
Presenter
Presentation Notes
Consider transmission to 100 destinations, the link cache efficiency does not decrease much when retransmitting payload over a link every 8th packet
The more destination the higher efficiency E.g. 512Kbps – 8 headers in 10ms, e.g. 12 destinations
Slow sources transmitting to many
destinations cannot achieve the maximum efficiency
A P B C D E F G H I P J K L
System efficiency δm for 10ms large caches
76
Step I
Step II
Step III
Presenter
Presentation Notes
Packet trains are spreading in time..
Step IV
How could we improve?
S
CMU and CSU deployed partially
1
81
S
CMU and CSU deployed partially
1 2
82
S
CMU and CSU deployed partially
1 2 3 4 5 6
83
84
Considering unique packet headers CacheCast can achieve up to 96% of the ‘Perfect multicast’
efficiency Considering finite cache size 10ms link caches can remove most of the redundancy
generated by fast sources Considering partial deployment CacheCast deployed over the first five hops from a server
achieves already half of the maximum efficiency
85
Presenter
Presentation Notes
In order to evaluate the CacheCast efficiency we compared it with ‘Perfect multicast’, where by ‘Perfect multicast’ we mean a system that delivers the same data to multiple destination without incurring any overhead. ‘Perfect multicast’ efficiency shows the maximum efficiency. We relate CacheCast to ‘Perfect multicast’ and measure how close it is to the optimum. Partial deployment impact will not be discussed.
Computational complexity may render CacheCast inefficient
Implementations Link cache elements – implemented with Click
Modular Router Software as processing elements
Server support – a Linux system call and an auxiliary shell command tool
87
CacheCast can be deployed as a software update Click Modular Router Software
CacheCast router modell
⇒
88
Presenter
Presentation Notes
We measure how these two routers perform while forwarding traffic with redundant data. The traffic consists of packet trains of different size and carrying payloads of different length
Router configuration: CSU – first element CMU – last element
Packet drop occurs at the output link queue, however before CMU processing
Due to CSU and CMU elements CacheCast router cannot forward packet trains at line rate
91
When compared with a standard router CacheCast router can forward more data
92
Presenter
Presentation Notes
The original router forwards traffic with redundant payloads while the CacheCast router avoids the redundant payload transmission. To understand the benefits of the CacheCast router we compare the volume of traffic forwarded by both routers within a given time unit. However, we count the CacheCast traffic volume as if it carried the redundant payloads. We refer to it as the effective throughput. This table shows the ratio of the effective throughputs of the CacheCast router and the original router.
Efficiency of the server support is related to the efficiency of the msend() system call
Basic metric: Time to transmit a single packet
Compare the msend() system call with the standard send() system call
93
Measure:
Time to transmit a single packet:
Transmit to group size: 10, 100, 1000
// Start measurement
int msend(fd_set *fds_write, fd_set *fds_written, char *buf, int len); // Stop measurement
// Start measurement
for each fd in fds_write {
int send(fd, char *buf, int len); }
// Stop measurement
Measured time
Number of transmitted packets t=
Server transmitting to 100 destinations using Loop of send() sys. calls A single msend() sys. call
msend() system call outperforms the standard send() system call when transmitting to multiple destinations
95
Presenter
Presentation Notes
Msend system call execution time does not depend on the transmitted payload size We compare the execution time
Paraslash audio streaming software
96
while (written < len) { size_t num = len - written; if (num > DCCP_MAX_BYTES_PER_WRITE) num = DCCP_MAX_BYTES_PER_WRITE; msend(&dss->client_fds, &fdw, buf, num); written += num; } /* We drop chunks that we don't manage to send */
http://paraslash.systemlinux.org/
dccp_send.c
Setup
Paraslash clients located at the machines A
and B gradually request an audio stream from the server S
97
Original paraslash server can only handle 74 clients CacheCast paraslash server can handle 1020 clients and
more depending on the chunk size Server load is reduced when using large chunks
98
99
Internet congestion avoidance relies on communicating end-points that adjust transmission rate to the network conditions
“Packet caches on routers: the implications of universal redundant traffic elimination” Ashok Anand, Archit Gupta, Aditya Akella, Srinivasan Seshan, and Scott Shenker, SIGCOMM’08 Fine grain redundancy detection ▪ 10-50% removed redundancy
New redundancy aware routing protocol ▪ Further 10-25% removed redundancy
Large caches ▪ Caching 10s of traffic traversing a link
IP Multicast Based on the host group model to achieve great scalability Breaks end-to-end model of the Internet communication Operates only in ‘walled gardens’
CacheCast Only removes redundant payload transmission and
preserves end-to-end connections Can achieve near multicast bandwidth savings Is incrementally deployable Preserves fairness in Internet Requires server support
P. Srebrny, T. Plagemann, V. Goebel, and A. Mauthe, “CacheCast: Eliminating Redundant Link Traffic for Single Source Multiple Destination Transfer,” ICDCS’2010.
A. Anand, A. Gupta, A. Akella, S. Seshan, and S. Shenker, “Packet caches on routers: the implications of universal redundant traffic elimination,” SIGCOMM’08
J. Santos and D. Wetherall, “Increasing effective link bandwidth by suppressing replicated data,” USENIX’98 106