Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
�����������������
���������������
����������
����������������������������������������
Design and Implementation of Dynamic
Routing Tables
������� � � �
������ � � � �� � � �� ��
�!�"���#�$�%�&�'�(
�
I
����������������������������������������
�
� � � �� � � ����� � � � � �� � �
� � � � � � � � � � � �
�
) * �
�������� � � � � � � � � � �� � � � � � � �� � � ! "
# $ � % & ' ( ) * +, ( - . �/ 0 1 2 � � � 3 4 5 � 6 7 � � 8 �9:( ; -
$ < = > 7 � � 8 � � � �/ 0 1 2 ? @ $ A B C � � � � � �D 2 3 4 E � F G
H 5 I �� �J K LM N O P Q # R S T U � � �V I W X Y Z - [ \ ] ^ _ ` a
:b � � F c d �e f F G H 5 $ � g h i j 4 E �� i 4 � �k Z l ! ! m i � n
o p q � � r � � s j � � �t u v � $ A A > 7 � � � w 4 E � � 8 � � � � �
� - T x y � �z
zzzz�{ | } [ ��~ �� � � , f $ < = > 7 � � 8 � � � � � F G H 5 �� � � � - �
� � � 8 �3 � � � � � � � F c 3 4 E � , f � � � � � � � �I W � � � � � �
� � F c ; % � � � � � � � � � � � ��. � � � � � ��, f � � � � 8 3 � � �
��] ^ L` a � ¡ ¢ $ � £ ¤ ¥ ¦ § z̈ © d ª « ¬ �¨ & � � F c � f ® �¯ � ° £ ° �
� ± � ² < = > 7 � � 8 � � � A ³ ® f ´ � / 0 1 2 � � � � �D 2 µ T ¶ w �
� � � � � µ � ° £ ° L � ± � T U � � · �� � � �J K LM N O P Q # ² Lv � A /
0 1 2 & ¸ ¹ � � � � �D 2 � º �u » �~ �3 � � � � � ² � B ¼ L ½ � ¾
¿ � « :À � Á Â Á � Ã Ä ¤ Å � Æ Ç © _ T ! � � � 8 # I È S É Ê � � �z
������������� � � � � � ��> 7 � � 8 �/ 0 1 2 z
II
Design and Implementation of Dynamic
Routing Tables
Author: Dung-Jiun Lin Advisor: Yeim-Kuan Chang
Department of Computer Science and Information Engineering
National Cheng Kung University, Tainan, Taiwan, R.O.C.
Abstract In the last couple of years, various schemes for high-performance IP address lookups
have been proposed. Those schemes can be broadly classified into two categories: the
schemes that use precomputation to build static routing tables, and the schemes that build
dynamic routing tables. The precomputation usually can simplify the entire data structure
of the routing tables and thus improve the performance of the lookup speed and memory
requirement. However, a disadvantage of the precomputation is that when a single prefix is
added or deleted, the entire data structure may need to be rebuilt. Rebuilding the routing
tables seriously affects the update performance of a backbone router. Thus, those schemes
that can build the routing tables dynamically are suitable.
In this thesis, we develop a data structure called Most Specific Prefix Tree (MSPT)
that is suitable for dynamic routing tables. MSPT is a balanced binary search tree which is
constructed by the most specific prefixes that do not cover any other prefixes in the routing
table. The rest of prefixes (non-most specific prefixes) are allocated to the enclosure set of
each most specific prefix node in MSPT. Based on MSPT, the search, insertion, and
deletion operations can be performed in O(log N) time, where N is the number of prefixes
in the routing table. Comparing with the schemes that also build dynamic routing tables
III
such as PBOB (prefix binary tree on binary tree) and MRT (multiway range tree), and
several precomputation-based schemes, MSPT gets better performance than PBOB and
MRT and the performance of lookup speed and memory requirement is near to those
precomputation-based schemes. Moreover, our proposed scheme also scales well to IPv6
and large routing tables.
Keywords����IP address lookup, dynamic routing table, precomputation
IV
Ë Ì z
�� ' �� Í Î Ï Ð ��Ñ 0 Ò Ì ~ � Ó Ô Õ Ö z× Ø Ù Ú Û ' �w �Í Î Ü Ý
¯Þ ß à á # â â Ú Ô �ã � ä å æ � Í Î ç è �é{ | } � ê ë � � ¬ ��u "
# ì � � í î ¯Ì î �z
�Í Î ¯ï ð � �k ��ñ Ò Ì ò ó � ï ô } õ �ö ÷ �ø ù Lú ö �~ � û ü
A ³ ý þ � î ��é~ � � å È A ¬ u | } �� � Ì Ì � 4 �� � � � � W
� ï ��� � � t & � � �� ] ^ I � È s � � � Ï � �é~ �É � �Í Î Ü Ý #
$ A � � ï ð � ã · � � » � > ² �� � � �z
� � �{ } � � � ~ � � � ! " A ³ # # L$ $ �Ò Ì � �� < % ¯ & ' �é
~ $ A ( ) Í Î � * ¯ � �, ¶ % + � , - . �z
z
z
z
z
z
z
z
z
z
z
z
/ 0 1 zz 2 Ë 3 z
½ ! ï zF c 3 z
� 4 5 6 7 8 9� : ; z
V
Table of Contents
Chapter 1 Introduction ................................................................................ 1
1.1 Motivation ........................................................................................................... 1 1.2 Overview of the Thesis ........................................................................................ 5
Chapter 2 Background................................................................................. 6
2.1 The Challenges to IP Address Lookup.................................................................. 6 2.2 IPv6 Addressing Architecture............................................................................... 7
2.2.1 IPv6 Address Syntax ..................................................................................... 7 2.2.2 Type of IPv6 Address .................................................................................... 8
2.2.2.1 Unicast Address .................................................................................. 8 2.2.2.2 Anycast Address ................................................................................. 9 2.2.2.3 Multicast Address ............................................................................. 10
Chapter 3 Review of the Previous Works ..................................................11
3.1 Trie base scheme................................................................................................ 11 3.1.1 1-bit ( Binary ) trie ...................................................................................... 11 3.1.2 Patricia Trie................................................................................................. 12 3.1.3 Multibit Trie ................................................................................................ 13 3.1.4 Level Compressed Trie................................................................................ 15
3.2 End-Point Array Scheme.................................................................................... 17 3.2.1 IP Lookups Using Multiway and Multicolumn Search................................. 17 3.2.2 An Efficient IP Routing Lookup by Using Routing Interval ......................... 18
3.3 Sets of Equal-Length Prefixes Scheme............................................................... 19 3.3.1 Scalable high-speed IP routing lookups ....................................................... 19
3.4 Search Tree Base Scheme .................................................................................. 20 3.4.1 Enhanced Interval Tree for Dynamic IP Router-Tables ................................ 20 3.4.2 An O(log n) Dynamic Router-Table Design ................................................. 21 3.4.3 Multiway Range Tree .................................................................................. 22
3.5 Hybrid Base Scheme.......................................................................................... 24 3.5.1 Lulea compressed trie.................................................................................. 24 3.5.2 Huang’s Compact Algorithms...................................................................... 24
3.6 Summary ........................................................................................................... 26
Chapter 4 Propose IP Lookup Scheme...................................................... 29
4.1 Preliminaries...................................................................................................... 29
VI
4.2 Most Specific Prefix Tree – MSPT..................................................................... 30 4.2.1 Most Specific Prefix Tree ............................................................................ 30 4.2.2 Data Structure for MSPT............................................................................. 32
4.3 Finding the Longest Prefix Match ...................................................................... 35 4.4 Update ............................................................................................................... 38
4.4.1 Inserting a Prefix ......................................................................................... 38 4.4.2 Deleting a Prefix ......................................................................................... 39 4.4.3 Rotations Problem....................................................................................... 43
4.5 Enhance Our IP Address Lookup Scheme .......................................................... 47 4.6 Migrate to IPv6.................................................................................................. 49
Chapter 5 Performance.............................................................................. 51
5.1 Simulation Environment .................................................................................... 51 5.2 Simulation Result for IPv4 ................................................................................. 51 5.3 Simulation Result for IPv6 ................................................................................. 60
Chapter 6 Conclusion................................................................................. 64
References …………………………….…………………………………………………. 65
VII
List of Tables
Table 3.1 The representation of the LC-trie in Figure 3.4(c). ....................................... 16 Table 3.2 Summary for precomputation-based IP address lookup schemes. ................. 27 Table 3.3 Summary for non precomputation-based IP address lookup schemes. .......... 28 Table 4.1 A sample prefix set. ..................................................................................... 34 Table 4.2 Data structure analysis for MSPT................................................................. 48 Table 5.1 BGP routing tables. ..................................................................................... 51 Table 5.2 The statistics of memory requirement (in KB) for IPv4................................ 53 Table 5.3 Data structure analysis for non precomputation-based schemes. .................. 54 Table 5.4 The statistics of search time (in Microsecond) for IPv4................................ 55 Table 5.5 The statistics for 16-bits segmentation table................................................. 57 Table 5.6 The statistics of update time (in Microsecond) for IPv4. .............................. 59 Table 5.7 The statistics of memory requirement (in KB) for IPv6................................ 60 Table 5.8 Data structure analysis for IPv6 prefix databases. ........................................ 61 Table 5.9 The statistics of search and update time (in Microsecond) for IPv6.............. 62
VIII
List of Figures
Figure 2.1 An IP Address Lookup example. ................................................................ 7 Figure 2.2 Aggregatable global unicast address format. .............................................. 8 Figure 2.3 The three-level structure of an aggregatable global unicast address. ........... 9 Figure 2.4 The Pv6 multicast address format............................................................. 10 Figure 3.1 A 1-bit trie example. ................................................................................ 12 Figure 3.2 A Patricia Trie example. ........................................................................... 13 Figure 3.3 An example multibit trie of 3-bit stride. ................................................... 14 Figure 3.4 (a) Binary trie. (b) The Path-Compressed version of (a). (c) The
level-compressed version of (b). .............................................................................. 16 Figure 3.5 (a) Representation of the prefixes and ranges. (b) End point array. ........... 17 Figure 3.6 Routing interval example. ........................................................................ 18 Figure 3.7 Graphical representation of binary search on prefix length. ...................... 19 Figure 3.8 (a) A possible PTST. (b) An example RST for range(20). ......................... 20 Figure 3.9 The bit vector for range(20). .................................................................... 21 Figure 3.10 (a) Base interval tree (BIT). (b)~(f) prefix tree for P1~P5......................... 22 Figure 3.11 An example of multiway range tree. ......................................................... 23 Figure 3.12 Basic concept of Huang’s scheme. ........................................................... 25 Figure 4.1 The relationships for a set of prefixes. ...................................................... 31 Figure 4.2 A binary trie for a set of prefixes. ............................................................. 32 Figure 4.3 A most specific prefix tree (MSPT) example for Table 4.1. ...................... 34 Figure 4.4 The enclosure set of node b in Figure 4.3. ................................................ 35 Figure 4.5 Algorithm to find longest prefix math. ..................................................... 37 Figure 4.6 Algorithm of enclosure(x).search(d, key(x), port)..................................... 37 Figure 4.7 Algorithm to insert a prefix. ..................................................................... 39 Figure 4.8 Algorithm to delete a prefix. .................................................................... 41 Figure 4.9 Algorithm to maintain MSPT enclosure constraint following a delete. ..... 42 Figure 4.10 (a) A unbalanced MSPT after inserting a new prefix P6 (b)A rebalanced
MSPT after performing a rebalancing rotation. ........................................................ 44 Figure 4.11 Correct MSPT after adjusting. .................................................................. 44 Figure 4.12 LL and RR rotations................................................................................. 46 Figure 4.13 LR and RL rotations................................................................................. 47 Figure 4.14 An MSPT example for IPv6. .................................................................... 50 Figure 5.1 16-bits segmentation table........................................................................ 52 Figure 5.2 Total memory requirement (in KB) for IPv4............................................. 54
IX
Figure 5.3 Search time (in Microsecond) for IPv4..................................................... 56 Figure 5.4 Update time (in Microsecond) for IPv4. ................................................... 59 Figure 5.5 Total Memory requirement (in KB) for IPv6. ........................................... 61 Figure 5.6 The mean times of search and update for IPv6. ........................................ 63
1
Chapter 1 Introduction
1.1 Motivation
Since to the exponential growth rate of the traffic in the Internet, backbone links of
several gigabits per second, such as OC-192, 10 Gigabits and OC-768, 40 Gigabits are
commonly deployed. To handle gigabit-per-second traffic rates, these backbone routers
must be able to forward millions of packets per second at each port. The IP address lookup
in the routers a critical task to reach the capability of forwarding millions of packets per
second. Moreover, Internet host count is also rapidly increasing, the scarcity of IP
addresses of IPv4 leads to the approach to using classless IP subnet scheme called
Classless Inter-Domain Routing (CIDR) [6]. With CIDR, routers aggregate forwarding
information by storing address prefixes that represent a group of addresses reachable
through the same interface and each route entry (prefix) in the routing table can have
arbitrary length ranging from 1 to 32 bits, instead of 8, 16, 24 bits in Classful Address
scheme. When a router receives a packet, it uses the destination address in the packet’s
header to lookup routing database. There may be more than one route entries in the routing
table that match the destination address. Therefore, it may require some comparisons with
every route entries to determine which one is the longest matching. The longest route from
all the matched entries is called the longest prefix match (LPM). The IP address lookup
problem becomes a longest prefix matching problem and even more difficult in the router
design.
To design a good IP address lookup scheme, we should consider several key
requirements: Lookup speed, Storage requirement, Update time and Scalability. We discuss
each of these requirements in turn.
2
� Lookup speed: In order to handle the increased traffic, the IP address lookup scheme
should quickly decide each incoming packet where to be sent it next. This is clearly
important for lookup to not be a bottleneck in the Internet.
� Storage requirement: Schemes that are memory-efficient can also lead to good
search time because compact data structures can fit in fast but expensive Static RAM
memory.
� Scalability: Due to the fast growth of the Internet and increasing address needs, it is
expected that the prefix databases are growing and the address prefix length will
significantly increases when switching to IPv6. Today, IPv6 has been gaining wider
acceptance to replace its predecessor, IPv4 and has early deployed in Europe, Asia,
and North America [9]. Therefore, an IP address lookup scheme must have the
capacity to handle large routing tables and longer addresses.
� Update: Currently, the Internet has a peak of a few hundred BGP update per second.
Thus, the address lookup schemes with fast update time are desirable to avoid routing
instabilities. These updates should interfere little with normal address lookup
operation.
In the last couple of years, various algorithms for high-performance IP address lookup
have been proposed. In the survey paper [17], a large variety of routing lookup algorithms
are classified and their complexities of worst case lookup, update, and memory references
are compared. Among them, a category of algorithms is based on trie structure. Based upon
this primitive data structure, a set of prefix compression and transformation techniques are
used to either make the whole data structure small enough to fit in a cache, or to facilitate
the tree traversal procedure. IP lookup in the BSD kernel is done by using the Patricia data
structure [16], which is a variant of a compressed binary trie. This scheme takes O(W)
3
memory access for per lookup, where W is the address length. LC tries for longest prefix
match speeds up the search performance by reducing the height of a trie are developed in
[15]. Degermark et al. [2] have proposed a three-level tree structure for the routing table.
The data structure of [2], called the Lulea scheme, is essentially a three-level fixed-stride
trie in which trie nodes are compressed using a bitmap. Based on the two-level
variable-stride data structure, Huang’s compact algorithm [8] uses a compact technique to
build entire forwarding information (Compressed-Next-Hop-Array and Code-Word- Array).
This compact technique is similar to the technique used in [2].
In [18], a binary search is conducted on a set of hash tables, where prefixes with same
length are organized in one hash table. Using this scheme, we can perform finding the
longest prefix match in O(log W) expected time. Lampson et al [11] have proposed an IP
address lookup mechanism that the longest prefix match is found by performing a simple
binary search on an order array in which stores the end points of the ranges defined by the
prefixes. This scheme permits one to determine the longest prefix match in O(log N) time;
insertion and deletion operations take O(N) time, where N is the number of prefixes in a
routing table.
Sahni and Kim [10] develop a data structure, called a collection of red-black tree
(CRBT), that supports the three operations of a routing table (longest prefix match, prefix
insert, prefix delete) in O(log N) time each. In [12], Lu and Sahni develop a data structure
called BOB (binary tree on binary tree) for dynamic routing tables. Based on the BOB,
related structure PBOB (prefix BOB) and LMPBOB (longest matching prefix BOB) are
proposed for highest-priority prefix matching and longest-matching prefix. On practical
routing tables, the data structure LMPBOB and PBOB permit longest prefix matching in
O(W) and O(log N). For the insertion and delete operations, they both take O(log N) time.
Suri et al. [21] have proposed a B-tree data structure called multiway range tree. This
4
scheme achieves the optimal lookup time of binary search, but also can be updated in
logarithmic time when a prefix is inserted or deleted.
Despite the intense research that has been conducted in recent years, we think there
should be a balance between lookup speed, memory requirement, update, and scalability
for a good IP address lookup scheme. Summarizing above schemes, we can find schemes
like [2], [8], [11], [15], [18], they perform a lot of precomputation to speed up the lookup
speed and reduce the memory requirements. These precomputation may lead to rebuild the
entire data structure when adding or deleting a single prefix. It seriously affects the update
performance of a backbone router. Thus, those schemes are usually not suitable for
dynamic routing tables. On the other hand, schemes based on the trie data structure like
binary trie, multibit trie and Patricia trie [16] do not use precomputation; however, their
performances grow linearly with the address length, and thus these schemes lack the
scalability when switching to IPv6 or large routing table.
The capability of fast update is always a lacked portion for today’s IP address lookup
schemes. Although [10], [21] overcome the update problem, the complex data structures
lead to the memory requirement expanded and reduce the performance of lookup. In this
thesis, we develop a Most Specific Prefix Tree (MSPT) data structure that is suitable for
the representation of dynamic routing tables. Based on MSPT, the search, insertion, and
deletion operations can be finish in O(log N) time for a real routing table. Comparing with
some schemes which are suitable for dynamic routing tables as PBOB [12] (prefix binary
tree of binary tree), MRT [21] (multiway range tree) and several precomputation-based
schemes. MSPT gets better performance than PBOB and MRT and the performance of
lookup speed and memory requirement is near to those precomputation-based schemes.
Moreover, our proposed scheme also scales well to IPv6 and large routing tables.
5
1.2 Overview of the Thesis
The rest of the thesis is organized as follows. In chapter 2, the background knowledge
of IP address lookup problem will be given. Firstly, we explain the difficulty of IP address
lookup problem in today’s environment. Secondly, in order to switch our proposed scheme
to the next generation Internet protocol, IPv6, the address format of IPv6 will be
introduced. In chapter 3, the existing IP address lookup schemes will be classified into five
categories at first. Then we brief review these schemes in turn of lookup speed, memory
requirement, scalability, and update overhead. Chapter 4 illustrates the basic data structure
and the detailed operations (search, insert and delete) of our proposed IP address lookup
scheme. Performance comparisons using real routing tables are presented in Chapter 5.
Finally, concluding remark is given in the last chapter.
6
Chapter 2 Background
2.1 The Challenges to IP Address Lookup
As the Internet has evolved and grown, it faces two serious scaling problems:
� Exhaustion of IP address spaces: Thought the 32-bit address space of IPv4 supports
about 4 billion IP devices, the IPv4 addressing scheme is not optional, as described by
RFC 3194 [5].
� Routing information overload: As the number of network on the Internet increased,
the size and rate of growth of the routing table in Internet router is beyond the ability
to efficiently manage it.
CIDR is a mechanism to slow the growth of the router tables and allow for more
efficient allocation of IP addresses than the old class A, B and C address scheme. In CIDR
mechanism, it allows address aggregation at several levels and the routing address can be
divided into two portions, network and host identifier. The address is written in the
following format <route prefix / prefix length>, where the prefix length ranges between
1-32 bits for IPv4 and 1-128 bits for IPv6. An IP address might match several prefixes in a
routing table. The matched prefix with the longest length is the valid route and is called
longest prefix match (LPM). Figure 2.1 shows a simple IP address lookup example for five
prefixes The incoming packet’s destination address (140.116.82.25) matches three entries
(entry 1: 140.0.0.0/8, entry 2: 140.116.0.0/16, and entry 3: 140.116.82.0/24). Since entry 3
(140.116.82.0/24) has the longest prefix length, the packet will be forwarded through
next-hop R3. As a result, determining the longest prefix matching involves not only
comparing the bit pattern itself, but also finding the appropriate length. It makes IP address
7
lookup operation become more complex and difficult.
Figure 2.1 An IP Address Lookup example.
2.2 IPv6 Addressing Architecture
2.2.1 IPv6 Address Syntax
IPv6 uses 16-bit hexadecimal number fields separated by colons (:) to represent the
128-bit addressing format making the address representation less cumbersome. Here is an
example of a valid IPv6 address: 2001:0400:13F0:0000:0000:09C0:876A:130B. Some
types of address contain long sequences of zeros. To further simply the representation of
IPv6 address, IPv6 uses the following conventions:
� Leading zero in the address field are optional can be compressed. For example, the
following hexadecimal numbers can be represented as shown in a compressed format:
2001:0400:13F0:0000:0000:09C0:876A:130B (original form)
=> 2001:400:13F0:0:0:9C0:876A:130B (compressed form)
� A pair of colons (::) represents successive field of 0. However, the pair of colons is
allowed only once un a valid IP address.
2001:0400:13F0:0000:0000:09C0:876A:130B (original form)
Entry Number Prefix Next-Hop 1 140.0.0.0/8 R1 2 140.116.0.0/16 R2 3 140.116.82.0/24 R3 4 140.116.246.0/24 R4 5 140.118.0.0/24 R5
Forwarding table
Destination Address
140.116.82.25
8
=> 2001:400:13F0::9C0:876A:130B (compressed form)
The IPv6 prefix is part of the address that represents the left-most bits that have a
fixed value represent the network identifier. IPv6 prefix is represented using the
IPv6-prefix/prefix-length format just like an IPv4 address represented in the classless
interdomain routing CIDR notation.
2.2.2 Type of IPv6 Address
There are three major types of IPv6 address: unicast, anycast and multicast address
2.2.2.1 Unicast Address
A unicast address is an address for a single interface. There are three types of unicast
address: Aggregatable Global unicast address, Site-local unicast address and Link-local
unicast address. Site-local and Link-local unicast address are used in LAN, routers would
not deal with those two kind of address. Therefore, we just introduce the address format of
Aggregatable Global unicast address. Figure 2.2 shows the structure of an aggregatable
global unicast address. The fields in the aggregatable global unicast address are:
Figure 2.2 Aggregatable global unicast address format.
� TLA ID – Top-Level Aggregation Identifier. The size of this field is 13 bits. The TLA
ID identifies the highest level in the routing hierarchy.
� RES – Bits that are reserved for future use in expanding the size of either the TLA ID
or the NLA ID.
TLA ID 001 Res NLA ID SLA ID Interface ID
3 bit 13bit 8 bit 24 bit 16 bit 64 bit
9
� NLA ID – Next-Level Aggregation Identifier. The NLA ID is used to identify a
specific customer site.
� SLA ID – Site-Level Aggregation Identifier. The SLA ID is used by an individual
organization to identify subnets within its site.
� Interface ID – uses IEEE EUI-64 identifier to indicate the interface on a specific
subnet.
The fields within the aggregatable global unicast address create a three-level structure
is shown in Figure 2.3. The public topology is collection of larger and smaller ISPs that
provide access to the IPv6 internet. The site topology is the collection of subnet within an
organization’s site.
Figure 2.3 The three-level structure of an aggregatable global unicast address.
2.2.2.2 Anycast Address
The anycast address is a global unicast address that is assigned to a set of interfaces
that typically belong to different nodes. Hence an anycast address identifies multiple
interfaces. A packet sent to an anycast address is delivered to the closed interface. Anycast
address is syntactically indistinguishable from global unicsat address because anycast
address is allocated from the global unicast address spaces.
TLA ID 001 Res NLA ID SLA ID Interface ID
48 bit 16 bit 64 bit Pubilc Topology Site Topology
10
2.2.2.3 Multicast Address
In IPv6, multicast traffic operates in the same way that it does in IPv4. An Ipv6
multicast address is a n IPv6 address that has a prefix of FF00::/8. It is easy to classify as
multicast because it always begins with “FF”. Figure 2.4 gives the IPv6 multicast address
format. The fields in the global multicast address are:
Figure 2.4 The Pv6 multicast address format.
� Flags – the flag uses the low-order bit of the flag field. When set to 0, it indicates that
the multicast address is a permanently (well-know) multicast address. When set to 1,
it indicates that the multicast address is a transient (non-permanently-assigned).
� Scope – indicates the scope of the IPv6 internetwork for which the multicast traffic is
intended.
� Group ID – identifies the multicast group and is the unique within the scope.
11111111 Flags Scope Group ID
8 bits 4 bits 4 bits 112 bits
11
Chapter 3 Review of the Previous Works
Many algorithms have been proposed in recent years regarding the longest prefix
match. In this section, we present a survey of IP address lookup algorithms and compare
their performance in terms of lookup speed, memory requirement, scalability, and update
overhead.
3.1 Trie base scheme
3.1.1 1-bit ( Binary ) trie
A trie is a tree–based date structure allowing the organization of prefixes on a digital
basis by using the bits of prefixes to direct the branching. Figure 3.1 shows an example of
1-bit trie (binary trie). 1-bit trie is a basic and simple data structure used in IP lookup
algorithms in which each node contains two pointer, 0-pointer (pointer to left child) and
1-pointer (pointer to right child). The 1-bit trie is also referred as the binary trie. It is in fact
a binary search tree using the bit value (0 or 1) to guide the search to the left or the right
part of the tree.
Binary trie has the characteristic that long sequences of one-child nodes may exist.
Those one-way branch nodes may consume additional memory. Moreover, since those
nodes need to be inspected, search time can be longer than necessary in some case. In
binary trie we potentially traverse a number of nodes equal to the length of addresses.
Therefore, the search complexity is O(W), where W is the address length. Update
operations are basically need a search, so update complexity also is O(W). The memory
consumption for a set of N prefixes has complexity O(WN).
12
Figure 3.1 A 1-bit trie example.
3.1.2 Patricia Trie
Binary trie consumes a lot of space to store prefixes. In order to reduce space and time
complexity, a technique called Path-Compressed can be used. Path compression consists of
collapsing one-way branch nodes. Patricia Trie is first proposed in this technique. It is a
variation of trie, and it also called BSD radix trie [16]. Patricia trie must be a complete
binary tree. It has exactly N external nodes and N-1 internal nodes. So, the space
complexity is O(log N). The space complexity of Patricia Tree is better than binary trie.
For example if we have four entries a, b, c and d, it corresponds to Figure 3.2. If we search
the LPM 01001, we find bit 0 is off and then compare to prefix a. This is a correct result. If
we want to search another LPM 10111, first, we find bit 0 is on, and then bit 2 is on and
then bit 4 is on, finally we check it with prefix d, we find this is not our answer, when this
situation occur , we must recursively backtrack and find entry b is correct answer.
According to the example mentioned above, if backtracking problem does not occur,
the lookup complexity is O(W). When backtracking problem occurs, the lookup
complexity is down to O(W2). Hence, backtracking is the big problem of Patricia trie.
P1
P2
P3
P4 P5
P1 * P2 0101* P3 100* P4 1001* P5 10111
Prefixes set
13
Figure 3.2 A Patricia Trie example.
3.1.3 Multibit Trie
Binary trie needs many number of memory access to search LPM. In order to reduce
the number of memory accesses, we can use multibit trie. It matches several bits at a time.
The depth of the subtrees combined to form a single multibit trie node is called the stride.
In a multibit trie, if all nodes at the same level have the same stride size, we say that it is a
fixed stride; otherwise, it is a variable stride. Figure 3.3 shows a multibit trie of 3-bit stride.
If we modify binary trie to a multibit trie of m bit stride, the number of memory accesses
can be reduced from W to W/m.
a : 0* b : 10* c : 10110 d : 10101*
Prefixes set
Bit 0
Bit 2 a
Bit 4 b
c d
14
Figure 3.3 An example multibit trie of 3-bit stride.
� � �
� � �
� � �
� � � � � � � �
� � � � � � � �
� � � a b b g
� � � � � � �
� � � � �� �� ��
�� �� � �� �� � � �� �� � � � � � �� �� � � � � � �� �� � � � � � � � �� � � � � � �
� �
� � �
��
� �
�
Expanding the binary trie
� ��
15
3.1.4 Level Compressed Trie
Path-Compressed technique is a good idea to reduce the unnecessary path and
decrease space complexity. LC trie [15] extends this idea and reduces the depth of tree
further. For example, the binary trie of six prefixes is shown in Figure 3.4(a) and the
Path-Compressed version of binary trie is shown in Figure 3.4(b). Finally, Figure 3.4(c)
shows the Level-Compressed version of Path-Compressed trie. The construction of an
LC-trie for N prefixes takes time O(Nlog N). Each node (including internal node) has three
columns to represent the LC-trie. The first 5 bits represents the branching factor, and the
number is always a power of 2, and hence, the maximum branching factor is 312 . The next
7 bits are skip value. In this way, we can represent values in the range form 0 to 127. The
remaining 20 bits are served as the pointer to the most left child. Table 3.1 shows the
representation of the LC-trie in Figure 3.4(c). For example, we want to search the LPM
1110110, we start at the root, node number 1. We see that the branching factor is 2, and
skip value is 0 and therefore we extract the first two bits from the string. These 2 bits have
the value 3, which is added to the pointer, leading to position 5 in Table 3.1. At this node,
the branching factor is 1 and the skip value is 3 and therefore we extract the sixth bit. This
value is 1, and when we add 1 to the pointer, we arrive at position 9. At this node, the
branching value is 0, and the pointer is at position f. This is correct answer. According to
the above description, the LC-trie is proposed for compressing the level of tree. It means
we need less space to store forwarding table.
16
Figure 3.4 (a) Binary trie. (b) The Path-Compressed version of (a). (c) The
level-compressed version of (b).
Table 3.1 The representation of the LC-trie in Figure 3.4(c).
�������
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� �
�
�
�
�
�
�
� �
�
�
�
� �
� �
��
�
�
� �
�
�
�
�
�
�
��
������������
� �
��
�
� �
� �� � � � � � � � � � � �� �
� �
�
� �
��
�
(a)
(b) (c)
� ������� � � ��� � � � � � ��� ��
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
17
3.2 End-Point Array Scheme
3.2.1 IP Lookups Using Multiway and Multicolumn Search
Lamposn, Srinivasan, and Varghese [11] have proposed a data structure in which the
end points of the ranges defined by the prefixes are stored in ascending order in an array.
The longest prefix match is found by performing a simple binary search on this order array.
They also propose a way to use a initial array index by the first X bits of the address,
together with taking advantage of cache line size to do a multiway search with
six-branching. By this scheme, the longest prefix match can be determined in O(log N) by
performing a binary search. Updating the range end-point array following the insertion or
deletion of a prefix also takes O(N) time, where N is the number of prefixes in router table.
Figure 3.5(a) shows an example for a set of five prefixes together with their start and finish
points and the distinct range end points are stored in ascending order in Figure 3.5(b).
Figure 3.5 (a) Representation of the prefixes and ranges. (b) End point array.
Prefix Range Start
Range Finish
P1 * 0 31 P2 01* 8 15 P3 1* 16 31 P4 10* 16 23 P5 0011* 6 7
Prefix set of 5-bit address
6
8
15 16
23
31
P1
P4
P5
P3
P2
0
7
End Point > =
0 P1 P1
6 P5 P5 7 P1 P5
8 P2 P2
15 P1 P2 16 P4 P4
23 P3 P4
31 - P3
(a) (b)
18
3.2.2 An Efficient IP Routing Lookup by Using Routing Interval
Based on the number of possible next-hops for a segment is always much less than the
total number of ports, Wang et al. [19] proposes a new routing concept named as "Routing
Interval" to simplify the finding longest prefix match problem into a much simpler search
problem. By sorting the routing prefixes based on their length, it can build a new next-hop
array in which each element maps to an IP address interval and is filled with related
next-hop. An example of three prefixes 140.116.0.0/255.255.0.0/NH1,
140.116.3.0/255.255.255.0/NH2, and 140.116.215.0/255.255.2555.0/NH3 is showed in
Figure 3.6. Moreover, in order to achieve higher performance. Unlike the cache line
alignment described in the previous scheme [11], this scheme focus on the characteristic of
memory bus between L1 and L2 cache to speedup the search performance. Based on this
scheme, the search, memory and update complexities are all the same as those in [11].
Figure 3.6 Routing interval example.
140.116.0.0/255.255.0.0/NH1
140.116.3.0/255.255.255.0/NH2
140.116.215.0/255.255.255.0/NH3
140.116.216.0~140.116.255.255/NH1
140.116.215.0~140.116.215.255/NH3
140.116.4.0~140.116.214.255/NH1
140.116.3.0~140.116.3.255/NH2
140.116.0.0~140.116.2.255/NH1
Routing Prefixes Next-Hop Array Routing Intervals
19
3.3 Sets of Equal-Length Prefixes Scheme
3.3.1 Scalable high-speed IP routing lookups
Waldvogel et al. [18] have proposed a data structure to determine longest prefix match
by performing a binary search on prefix length. According to the routing table, we can
distribute routing table into 32 parts for IPv4 and 128 parts for IPv6. First, according to
prefix length, it does binary search, and then uses hash function to find appropriate answer.
However, it must create markers for each prefix to ensure the search can be run correctly.
For example, if we have three prefixes a:0*, b:10*, c:110*, we want to search the
LPM of 1101. Figure 3.7 shows the graphical representation of this example. In binary
search, we search the level of length 2, because it does not match with b, it stops searching.
But we know the prefix c is the correct answer. In this situation, we must add a marker 11
together with b. This algorithm can be run correctly. This is one problem of this method. In
the other hand, if binary search stops, our result is mark not real prefix. Traditionally, we
must backtrack recursively and find the correct answer. But it costs a lot of time. The
author used another trick to solve this problem. That is precomputation. It means
precomputing each marker’s correct answer. In this way, it needs large spaces to store all
markers, and is difficult to support fast updates.
Figure 3.7 Graphical representation of binary search on prefix length.
3
a=0*
c=110*
11 b=10*
Binary Search Hash Table
marker
Prefix Length
2
1
20
3.4 Search Tree Base Scheme
3.4.1 Enhanced Interval Tree for Dynamic IP Router-Tables
Lu and Sahni [12] have proposed a data structure called BOB (binary tree on binary
tree) for dynamic router tables. The first-level is a red-black tree which is called point
search tree (PTST). For each node z in PTST stores a point, point(z) and a range subset,
range(z). The points in the left subtree of node z are < point(z) and those in its right subtree
are > point(z). Let R be the set of ranges stored in the PTST. For all ranges r � R such
that start(r) ≤ point(z) ≤ finish(r) are stored in range(z) of the node z. All r � R such that
finish(r) < point(z) are stored in the left subtree of z and the remaining ranges of R are
stored in the right subtree of z. For every node z in PTST, range(z) is represented as a
balanced binary search tree called the range search tree (RST). The RST in each node is
called a second-level tree. Figure 3.8(a) is an example of possible PTST for a range set and
Figure 3.8(b) is an example RST for range(20).
Figure 3.8 (a) A possible PTST. (b) An example RST for range(20).
Prefix Range length
P1 * [0, 31] 0
P2 001* [4, 7] 3
P3 1* [16, 31] 1
P4 10* [16, 23] 2
P5 1000* [16, 17] 4
P6 100* [16, 19] 3
P7 101* [20, 23] 3
Prefix set of 5-bit address
6
16
20
12 ([0, 31], 0)
([16, 31], 1) ([16, 23], 2) ([20, 23], 3) ([4, 7], 3)
([16, 17], 4) ([16, 19], 3)
[16, 23], 2
(b)
P3 P7
[16, 31], 1 [20, 23], 3
(a)
P4
21
Further to reduce the memory usage, it can replace the RST stored in each node of
PTST by a W-bit vector, bit(z)[i] denotes the ith bit of bit vector stored in node z of the
PTST, bit(z)[i] = 1 iff range(z) has a prefix whose length is i. Figure 3.9 shows the bit
vector for range(20) in PTST. Based on this scheme, it takes O(W) time to determine the
LPM, and take O(log N) time to handle insertion/deletion operation.
Figure 3.9 The bit vector for range(20).
3.4.2 An O(log n) Dynamic Router-Table Design
Sahni and Kim [10] propose the use of a collection of red-black tree to determine
longest prefix match. The CRBT comprises a front-end data structure that is call the binary
interval tree (BIT) and a back-end data structure called a collection of prefix trees (CPT).
For each of the external node in the BIT contains a basic interval x points to the nonheader
node that represents the basic interval in the prefix tree for next(x), and the prefix tree for
prefix p comprises a header node plus one internal node for every prefix or basic interval x
such next(x) = p. The next(x) is defined as the smallest range prefix whose range includes
the range x for each prefix and basic interval x. The basic interval tree and the prefix tree
for the five prefixes are showed in Figure 3.10(a)-(f).
1 0 1 0 1
P3 P4 P7
0 1 2 4 3 Bit number = prefix length
22
Figure 3.10 (a) Base interval tree (BIT). (b)~(f) prefix tree for P1~P5.
The search for LPM begins with a search of the BIT for the matching basic interval
for destination address of the incoming packet. and then by determining whether the
destination address equals the left (right) end-point of the matching basic interval. If not
equal, it begins to use the basic interval pointer stored in the external node of BIT to reach
the header node of the prefix tree that corresponds to LPM. When a CRBT is used, finding
longest matching prefix as well as to insert and delete a prefix in O(log N) time, where N is
the number of prefixes in the router table.
3.4.3 Multiway Range Tree
Since ordinary binary search on the end point array relies on precomputation [11],
Prefix Range Start
Range Finish
P1 P2 P3 P4 P5
* 001* 100* 1001* 10111
0 3 16 18 23
31 7 19 19 23
Prefix set of 5-bit address
7
18
3
0
16
23
19 31
r1
r2 r3 r4 r5 r6 r7
(a)
7 0
3
16
23
19
P1
P3
P2
r1 r3 r6
r7
(b)
3
P2
r2
(c)
18
16
P3
r4
P4
(d)
18
P4
r5
(e)
P5
(f)
23
Suri et al. [21] have proposed a new data structure called multiway range tree for dynamic
router tables, which achieves the optimal lookup time of binary search, but can also be
updated fast when a prefix is added or deleted. The main idea behind this scheme is that
each prefix maps to a range in the address domain. A set of n prefixes partition the address
line into at most 2n intervals. It builds a tree, whose leaves correspond to the endpoints of
these intervals. Figure 3.11 shows an example. The search for longest matching prefix can
be done by searching the matching interval of the tree. Moreover, by increasing the arity of
the tree and taking advantage of cache line size, it can reduce the height of the tree to
improve search time. By this scheme, finding the longest prefix match and inserts/deletes
take O(log N), where N is the is the number of prefixes in the route table.
Figure 3.11 An example of multiway range tree.
i
c f
i c f a b d e g h j k
k m
l m n o
P1
P2 P3 P4 P5 P6 P7 P8 P9
Start Point Finish Point P1 a o P2 b g P3 h j P4 k n P5 c d P6 e f P7 i j P8 K l P9 m n
24
3.5 Hybrid Base Scheme
3.5.1 Lulea compressed trie
Degermark et. al [2] have proposed the use of a three-level trie in which the strides of
each level are 16, 8 and 8. They also propose encoding the node in this trie using bit vector
and three other small arrays used to store the information about how to calculate the index
to the first-level array. Those three arrays are 1-D base, 1-D codeword, and 2-D maptable
arrays to reduce memory requirements.
Level two and level three of data structure consist of chunks. A chunk covers a
subtree of height 8 and can contain at most 28 = 256 heads. There are three varieties of
each chunk depending on how many heads the imaginary bit-vector contains. When there
are l-8 heads, the chunk is sparse, 9-64 heads, the chunk is dense, 65-256 heads, the chunk
is very dense. Dense and very dense are searched analogously with the first level. For
sparse chunks, we can use the linear scan to obtain the routing information.
3.5.2 Huang’s Compact Algorithms
Nen-Fu Huang proposed an address lookup scheme with a two-level memory
organization base on variable-stride trie concept in [8]. The first 16 bits of destination IP
address is used as index to the first level array. Each element of the first level array is
added with a length field to indicate the size of the next level array called next hop array
(NHA). NHA is further compressed by using a Code Word Array (CWA) and a compressed
NHA (CNHA). Figure 3.12 show the basic concept of Huang’s scheme. By using this
compression technique, the forwarding table is small enough to fit into faster SRAM and
can be implemented using a hardware pipeline to improve the speed of address lookup.
25
Figure 3.12 Basic concept of Huang’s scheme.
1000000010000000 0000000010001000 0 2
Map Base
Code Word
16 bits 16 bits
offset offset offset offset
…..
Segment
Segmentation Table
….. ….. ….. ….. 2k0 KB
Next Hop Array
2k1 KB
Next Hop Array
2k2 KB
Next Hop Array
2k3KB
Next Hop Array
Segment offset k bit 16 bit
K0 K1 K2 K3
� � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
�� � � �
� � � � � � � � � � �
� � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � �
26
3.6 Summary
In fact, current IP lookup schemes can be broadly classified in two categories: one is
the schemes that use precomputation to build static routing tables and the schemes that
build dynamic routing tables. Schemes like "LC-trie" [15], "multiway and multicolumn
search" [11], "routing interval search" [19], "binary search on prefix length" [18], "Lulea
compressed trie" [2] and "Huang’s compact algorithm" [8], they perform a lot of
precomputation. The precomputation usually can simplify the entire data structure
constructed by the IP address lookup algorithm to get better performance on the lookup
speed and memory requirement. However, the down side of this precomputation is that
when a single prefix is added or deleted, the entire data structure may need to be rebuilt. It
seriously affects the update performance of a backbone router. Thus, those
precomputation-based schemes are usually not suitable for dynamic routing tables. Table
3.2 shows the comments and characteristics of those precomputation-based schemes.
On the other hand, scheme based on the trie data structure like "binary trie", "Patricia
trie" [16], "multibit trie" do not use precomputation. However, their performances grow
linearly with the address length, and thus these schemes lack the scalability when
switching to IPv6 or large routing table. CRBT (collection of red-black search trees [10]),
MRT (multiway range tree [21]) are the schemes for dynamic routing tables. But the
complex data structure may lead to inflate the memory requirement or reduce the
performance for lookup speed. The comments and characteristics of those non
precomputation-based schemes are showed in Table 3.3.
27
Table 3.2 Summary for precomputation-based IP address lookup schemes.
schemes Comments and characteristics
LC-trie [15]
This scheme extends the Path-Compressed idea and reduces the depth of tree further. Author uses array data structure to represent this tree. Each entry in the array is represented a node in LC-trie and contains routing information. For the update operation, rebuilding entire data structure is necessary. Since this scheme is trie-based, switching to IPv6 may cause the memory requirement inflated and search, update performances become poor.
Multiway and
multicolumn search [11]
Lookup speed (O(log N)) and memory requirement (O(N)) and scalability depend on number of distinct end points stored in an order array. The corresponding routing information with the end points must be precomputed. This precomputation lead to rebuild whole data structure when inserting or deleting a prefix.
Routing interval
search [19]
For lookup speed, update, memory requirement and scalability are similar to those in [11]. Based on scheme, author declares it can take less time to rebuild entire data structure than [11] when performing an update operation.
Binary search on prefix
length [18]
Choosing a good hash function is a critical task. When adopting a perfect hash, the LPM can be found in O(log W) expected time. But it needs a lot of memory requirement. For the update issue, the maker’s LPM needs to be reconstructed.
Lulea compressed
trie [2]
Huang’s compact
algorithm [8]
Both them use the bit vector to represents the node stored in the trie. For IPv4, they both get good performance on memory requirement and lookup speed. When switching to IPv6, the first-level (16-bit segmentation table) is no longer suitable. Longer address length may lead to the stride framework needs a lot of memory requirement and reduce the performance of lookup speed. The precomputation operation also leads to rebuilt whole data structure when performing an update operation.
28
Table 3.3 Summary for non precomputation-based IP address lookup schemes.
schemes Comments and characteristics
Binary trie
Search time and update performance depend on the address length (O(log W)). The long sequences of one-child nodes consume additional memory (O(log NW)). All performances will become obviously poor when switching to IPv6.
Multibit rie Choosing the strides affects search time, update and memory requirement. When switching to IPv6, the memory requirement intense increase.
Patricia trie [16]
It makes a lot of sense when a binary trie is sparsely, but as the number of prefixes increases and the trie gets denser, using path compression has little benefit.
BOB [12]
CRBT [10]
Multiway range tree
[21]
Those three schemes can deal with the dynamic routing tables. Based on those schemes, the three operations for IP lookup (search, insertion and deletion) can be performed in O(log N) time in a real routing table. Due to the complex data structure, [10] and [21] need more large memory spaces to store entire routing information and take much time to finish an update operation. On the contrary, using the simple data structure described in [12] can get better performance for the memory requirements and updates. For the lookup speed, since the tree height in [21] is lowest, it may have the best performance among those three schemes. By comparing those three schemes in turn of lookup speed, memory requirement, update and scalability. We think that using BOB [12] to find the LPM may be a good choice among those three schemes.
29
Chapter 4 Propose IP Lookup Scheme
In this chapter, firstly, we introduce the notations and terminology used in this chapter.
Secondly, our proposed IP address lookup scheme (MSPT) and the detail of search,
insertion, and deletion operations will be discussed in the next. Finally, we also migrate our
scheme to the next generation Internet protocol, IPv6.
4.1 Preliminaries
Definition 1 (prefix representation): A prefix P is really a range r of addresses from b to
e. It can be represented as P = r = [b, e], b ≤ e. b is the start address of prefix P and e is the
finish address of prefix P. The prefix P matches a contiguous interval [b, e] of addresses.
Definition 2 (relation): Let A = [b, e] and B = [u, v] be the ranges of two prefixes.
(a) Disjoint: A and B are said to be disjoint if none of them is enclosed by the other, i.e., A
�B = Ø.
(b) Enclosure: A and B are enclosure iff the address space covered by one range is a subset
of that covered by the other, i.e., B ⊇ A or A ⊇ B.
(c) Intersecting: A and B are intersecting iff A and B have a nonempty intersection, i.e., b <
u ≤ e < v or u < b ≤ v < e.
(d) The relation A < B or A > B only occurs when A and B are disjoint. The relation A < B
iff e < u (the finish address of A is smaller than the start address of B) and the relation A
> B iff b > v (the start address of A is larger than the finish address of B).
For example, we assume the address length is five. P1 = 1111* = [30, 31], P2 = 0101* =
30
[10, 11], P3 = 100* = [16, 19], P4 = 1001* = [18, 19]. Above those four prefixes, we can
see that P1, P2 and P3/P4 are disjoint and P1 > P3/P4 > P2. Moreover, P3 and P4 both prefixes
are enclosure.
Lemma 1: For any two different prefixes in a routing table, A = [b, e] B = [u, v]. A and B
are either enclosure or disjoint (i.e., they cannot intersect).
Proof: When the prefix length of A is equal to the prefix length of B, the address matched
by A and B are different. Therefore, the ranges covered by A and B are disjoint. When the
prefix length of A is not equal to the prefix length of B, without loss of generality, assume
the prefix length of A is larger than the prefix length of B. If B is not a prefix of A (i.e., A
and B differ in one of the specified bits), then the ranges covered by A and B are disjoint.
Otherwise, if B is a prefix of A, we have u ≤ b ≤ e ≤ v. Consequently, A and B are
enclosure.
4.2 Most Specific Prefix Tree – MSPT
4.2.1 Most Specific Prefix
Definition 3: A prefix is called a most specific prefix if it does not enclose any other
prefixes in the routing table. Otherwise, it is called a non-most specific prefix.
An example for a set of prefixes is as following: P1 (1*), P2 (0101*), P3 (100*), P4
(1001*), and P5 (10111). Since P2 does not overlap with any other prefixes, P4 has the
longest prefix length in an overlapping prefixes set (P1, P3, P4), and P5 also has the longest
31
prefix length in an overlapping prefixes set (P1, P5). Therefore, it is obviously that P2, P4
and P5 are the most specific prefixes. P1 and P3 are non-most specific prefixes. Figure 4.1
shows the relationships of those prefixes. To further illustrate the definition of the most
specific prefix clearly, if we use the binary trie scheme to represent all prefixes in a routing
table, all external nodes of the binary trie must be the most specific prefixes; the remaining
prefixes (non-most specific prefixes) must be internal nodes. Figure 4.2 is a binary trie
version for above example.
Figure 4.1 The relationships for a set of prefixes.
Prefix name Prefixes Start Address Finish Address P1
P2 P3 P4
P5
1* 0101* 100* 1001* 10111
16 10 16 18 23
31 11 19 19 23
23 19 18 16 11 10 0 31
P1
P2 P3
P4 P5
� P2 does not overlap with any other prefixes. � P4 and P5 have the longest prefix length in the overlapping prefix sets
(P1, P3, P4) and (P1, P5). � P2, P4 and P5 are the most specific prefixes; P1 and P3 are non-most specific
prefixes.
32
Figure 4.2 A binary trie for a set of prefixes.
Lemma 2 Let R be a set of all most specific prefixes in a routing table. For any two
prefixes a, b ∈ R, a and b must be disjoint ( a �� b = Ø ).
Proof : It is easy to see that a and b do not cover each other. As a result, a and b are
disjoint. Thus, the lemma follows.
MSPT is a balanced binary search tree. Each node in MSPT represents a most specific
prefix. All most specific prefix nodes in MSPT must be disjoint. Moreover, MSPT has an
additional enclosure constraint for placing the non-most specific prefixes, defined as
follows.
Definition 4 (MSPT Enclosure Constraint): Each non-most specific prefix p is allocated
to the enclosure set of the most specific prefix node x which is nearest to the root node of
MSPT and is enclosed by this non-most specific prefix p.
4.2.2 Data Structure for MSPT
We classify all prefixes in a routing table into two types: one is the most specific
P1
P2
P3
P4 P5
� P2, P4 and P5 are the most specific prefixes and they are the external nodes of this binary trie.
� P1 and P3 are non-most specific prefixes and they are the internal nodes of this binary trie.
33
prefix and the other is the non-most specific prefix. Since all the most specific prefixes in a
real routing table are disjoint, the prefix comparison rule described in the Definition 2(d)
can be used to construct a balanced binary search tree, called the Most Specific Prefix Tree
(MSPT). In MSPT, each node x represents a most specific prefix, prefix(x) and stores a key
value, key(x), length value, length(x), port value, port(x) and an enclosure set, enclosure(x).
The key value (we use the start address of the most specific prefix) and length value are
used to represent a most specific prefix and the port value is the output interface of the
most specific prefix. All key values in the left subtree of node x are smaller than key(x) and
those in the right subtree are larger than key(x).
Let R be the set of all non-most specific prefixes in a routing table. For the enclosure
set, enclosure(x) of a node x in MSPT, it stores those prefixes a∈ R that enclose prefix(x).
All prefixes a∈ R that are disjoint with prefix(x) and a < prefix(x) are stored in the left
subtree of node x; and the remaining prefixes of R are stored in the right subtree of node x.
This non-most specific prefixes allocation rule is recursively applies to the left and right
subtree of the MSPT. In fact, a non-most specific prefix stored in the enclosure set can be
represented by it’s prefix length and the key value stored in the node of MSPT (i.e., all the
non-most specific prefix stored in the enclosure(x) are the prefix of the most specific prefix
node x, prefix(x)). Therefore, we can only store the prefix length and port number of those
non-most specific prefixes in each enclosure set. Moreover, since all non-most specific
prefixes stored in the enclosure set are enclosure, the corresponding prefixes have different
length. We can also construct a balanced binary search tree by comparing those different
prefix lengths and each node in the enclosure set represents a non-most specific prefix.
Figure 4.3 shows an MSPT example for a prefix set in Table 4.1. Each node in Figure
4.3 represents a most specific prefix The enclosure set structure of node b is showed in
Figure 4.4. Each node in Figure 4.4 represents a non-most specific prefix
34
Table 4.1 A sample prefix set.
Figure 4.3 A most specific prefix tree (MSPT) example for Table 4.1.
Prefix Port P1 0* A P2 0101* B P3 100* C P4 1001* D P5 10111 E P6 11* F P7 0001* G P8 01* H P9 00111 I P10 001* J P11 0011* K
P2, P4, P5, P6, P7, P9 are the most specific prefixes. P1, P3, P8, P10, P11 are non-most specific prefixes.
P6 : 11*
P5 : 10111 P9 : 00111
P7 : 0001* P2 : 0101*
P1 : 0* P10 : 001* P11 : 0011*
P3 : 100*
P8 : 01*
a
b
d e f
c
P4 : 1001*
enclosure(b) enclosure(a)
enclosure(e)
35
Figure 4.4 The enclosure set of node b in Figure 4.3.
4.3 Finding the Longest Prefix Match
The longest prefix that matches the destination address d may be found by a path from
the root node toward a leaf node of the MSPT. Figure 4.5 gives the algorithm. In LPM(d)
function. When encountering a node x of the MSPT, we first check whether the destination
address d is enclosed by node x. If the most specific prefix represented by node x, prefix(x)
matches the address d, the search procedure stops. Otherwise, the search will execute
function enclosure(x).search(d, key(x), port) to check whether there exists any other
non-most specific prefixes match the address d in enclosure(x), and continues toward to the
leaf node of the MSPT.
Since all non-most specific prefixes stored in the enclosure set are constructed a
balanced binary search tree, the algorithm of function enclosure(x).search(d, key(x), port)
showed in Figure 4.6 is similar as that in LPM(d). Consider all non-most specific prefix
stored in the enclosure(x). If a non-most specific prefix represented by a node y matches
the address d, then the search will continue to the right subtree of node y to see whether
there exists another non-most specific prefix with longer prefix length also matches the
address d. Otherwise, goes to the left subtree of node y to see whether there exists another
non-most specific prefix with shorter prefix length matches the address d.
For example, if we want to find the LPM for address Dst = 00110 in Table 4.1. First,
P10
P1 P11
g
h i
36
in Figure 4.3, the most specific prefix P4 represented by root node a of the MSPT, prefix(a)
does not match Dst (prefix(a) does not encloses Dst), then we check the enclosure set of
node a, enclosure(a) to see whether there exists any other non-most specific prefixes (P3)
enclose the Dst. The result is not. By comparing the Dst and key(a), since the 00110 <
key(a) = 10010, the search will continue to the node b of the MSPT. The most specific
prefix P9 represented by node b of the MSPT, prefix(b) also does not match Dst. Checking
the enclosure set of node b, enclosure(b). In enclosure(b), we first find the non-most
specific prefix P10 represented by root node g matches Dst. Prefix P10 is temporarily stored
and the search continues to the node i. The prefix P11 represented by node i also matches
the Dst. Since P11 has longer prefix length than P10, we have to store prefix P11 to replace
P10. The search ends after checking the leaf node d of the MSPT does not match the Dst and
the enclosure set of node d is empty. Therefore, the prefix P11 is the LPM for Dst.
The time required to find the LPM can be determined by analyzing function LPM(d)
and enclosure(x).search(d, key(x), port). The time complexity of enclosure(x).search(d,
key(x), port) is readily to be O(height(enclosure(x))) = O(log(max number of the non-most
specific prefixes stored in enclosure(x))). Since all the non-most specific prefixes in
enclosure(x) are enclosure, the corresponding prefixes have different length. Thus, the max
number of the non-most specific prefixes stored in enclosure(x) is equal to the address
length, for IPv4 is 32, IPv6 is 128. Consequently, the longest prefix match can be found in
O((log N)(log W)), where N is the number of prefixes in a routing table and W is the
address length.
37
Figure 4.5 Algorithm to find longest prefix math.
Figure 4.6 Algorithm of enclosure(x).search(d, key(x), port).
Algorithm enclosure(x).search( d, key(x), port ) { /* length(y) is the length of the non-most specific prefix represented by node y in
enclosure(x). port(y) is the port information of the non-most specific prefix represented by
node y in enclosure(x). */ y = root; /* root node of enclosure set */ temp_port = port; while ( y ≠ nul l ) { /* >> is a right shirt operator */ if ( (key(x) >> (32-length(y))) = = (d >> (32-length(y))) ) { temp_port = port(y); y = RightChild(y); } else y = LeftChild(y); } return temp_port; }
Algorithm LPM(d, root) { /* prefix(x) is the most specific prefix represented by node x in MSPT.
key(x) is the key value (usually use start point of the most specific prefix). length(x) is the length of the most specific prefix represented by node x. port(x) is the port information of the most specific prefix represented by node x. enclosure(x) is the enclosure set of node x in MSPT. */
x = root; /* root node of the MSPT */ port = Default; while ( x ≠ null ) { /* first check whether the most specific prefix represented by node x encloses the address d */ if ( d ⊆ prefix(x) ) return port(x); port = enclosure(x).search( d, key(x), port ); if ( d > key(x) )
x = RightChild(x); else x = LeftChild(x); } return port; }
38
4.4 Update
4.4.1 Inserting a Prefix
Figure 4.7 shows the algorithm to insert a prefix P. In the while loop, we find the node
x nearest to the root of MSPT such that prefix P encloses prefix(x) or prefix P is enclosed
by prefix(x). If such a node x exists. According to the relation of prefix(x) and P, we insert
the prefix P into the enclosure(x) if prefix P encloses prefix(x) (i.e. now, prefix P is a
non-most specific prefix). Otherwise, if prefix P is enclosed by prefix(x), prefix(x) must to
be inserted into the enclosure(x) and use the prefix P to replace prefix(x) (i.e. now, prefix(x)
becomes a non-most specific prefix and prefix P is a most specific prefix).
If there has no node x such that prefix P encloses prefix(x) or is enclosed by prefix(x),
we must insert a new node y into the MSPT and define the key value, key(y), port value,
port(y), length value, length(y) and initialize the enclosure set of node y, enclosure(y). This
node y will represent the prefix P in MSPT (i.e., now prefix P is a most specific prefix).
The procedure of inserting a new node into the MSPT can be done by using the function
Insert_node(P) (use balanced binary tree node delete algorithm). However, inserting a new
node into the MSPT may cause to the balanced binary search tree become unbalanced. The
rebalancing procedure requires at least one rotation and may lead to a violation of the
MSPT enclosure constraint. We discuss the rebalancing rotation problem in Section 4.4.3.
Exclusive of the time required to perform the tasks associated with a rebalancing
rotation, the time required to insert a prefix is O(height(MSPT) + height(enclosure())) =
O(log N + log W).
39
Figure 4.7 Algorithm to insert a prefix.
4.4.2 Deleting a Prefix
The procedure of deleting a prefix is more complex than that inserting a prefix into a
prefix set. Figure 4.8 gives our algorithm to delete a prefix P. First, we have to determine
the prefix P is a non-most specific prefix stored in an enclosure set or it is a most-specific
prefix represented by a node x in MSPT. If the prefix P is former, we just only remove
prefix P from the enclosure set. Otherwise, if this prefix is a most specific prefix
represented by a node x in MSPT, we have to perform a delete_internal_prefix_node
operation that is necessary to maintain the MSPT enclosure constraint. Figure 4.9 gives the
Algorithm insert (P, root) { /* insert a new prefix P at root node */ /* prefix(x) is the most specific prefix represented by node x in MSPT */ x = root; /* root node of MSPT */ while ( x ≠ null ){ if ( P ⊆ prefix(x) ) { /* prefix(x) encloses prefix P */ /* inserting prefix(x) into enclosure(x) and then using prefix P to
replace prefix(x) */ if (P = prefix(x) ) return; insert prefix(x) into enclosure(x); prefix(x) = P;
return; }else if ( prefix(x) ⊆ P ) { /* prefix P encloses prefix(x) */
insert P into enclosure(x); return;
} /* if prefix P does not encloses prefix(x) or it is not enclosed by prefix(x), prefix P must disjoint with prefix(x) */ if ( P > prefix(x) ) /* the start address of P is larger than the key, key(x) */
x = RightChild(x); else x = LeftChild(x); /* the start address of P is smaller than the key, key(x)
} /* P is disjoint with all the most specific prefixes in MSPT create a new node and insert into MSPT */
Insert_node(P); }
40
steps in the method delete_internal_prefix_node.
Notice that the deletion of a most specific prefix represented by a node x in MSPT,
this node x may be a leaf node or an internal node in MSPT. If node x is a leaf node and the
enclosure(x) is empty, node x is deleted from the MSPT and a rotation is done as described
in Section 4.4.3 if the MSPT becomes unbalanced. Otherwise, if the enclosure(x) is not
empty, we need to use the longest non-most specific prefix stored in enclosure(x) to replace
prefix(x) and delete this non-most specific prefix from enclosure(x).
Performing the delete_internal_prefix_node function is necessary to maintain the
MSPT enclosure constraint when node x is an internal node. If the degree of node x is 1,
inserting all non-most specific prefixes stored in enclosure(x) into the subtree of node x
and following the binary tree node deletion algorithm to delete node x. However, when the
degree of node x is 2. Let y be the node with the largest key in the left subtree of node x or
the node with the smallest key in the right subtree of node x, temp is a temporary memory
space and P1 to Ps be nodes in the path from y to x, so P1 = y, Ps = x. First, use temp to
store all prefixes in enclosure(x) and then node y will replace node x after performing
balanced binary search delete algorithm. Second, insert all prefixes stored in temp at node y.
Third, delete the prefixes that enclose prefix(y) in enclosure(Pi), for i=1 to s, and then add
those prefixes into enclosure(y) to satisfy the MSPT enclosure constraint.
Besides the time requirement of rotation operation, the complexity of deleting a prefix
P from a prefix set is the O(log N) time needed to find the node x such that prefix(x) is
equal to the prefix P or is enclosed by prefix P, pluses O(Wlog N) time to perform the
delete_internal_prefix_node function. Consequently, it may take O(Wlog N) time to
perform a deletion operation.
41
Figure 4.8 Algorithm to delete a prefix.
Algorithm delete (P, root) { /* delete a prefix P */ x = root; /* root of MSPT */ while ( x ≠ null ){ if ( P = prefix(x) ){ /* prefix(x) equals to prefix P */ if ( x is a leaf node ){ if ( enclosure(x) is not empty ){
delete the prefix A which is the longest prefix in enclosure(x); prefix(x) = A; } else Delete_node(x); /* enclosure(x) is empty */ return;
} else { delete_internal_prefix_node(x); return; } /* x is a internal node */
} else if ( (prefix(x) ⊆ P) && (enclosure(x) is not empty) ) {
/* prefix P encloses prefix(x) */ if ( P exists in enclosure(x) ) delete P from enclosure(x);
return; } else { prefix P does not exist in MSPT; return; }
/* if prefix P does not encloses prefix(x) or is not enclosed by prefix(x), prefix P must disjoint with prefix(x). */
if ( P > prefix(x) ) /* prefix P is larger than prefix(x) */
x = RightChild(x); else /* prefix P is smaller than prefix(x) */
x = LeftChild(x); } }
42
Figure 4.9 Algorithm to maintain MSPT enclosure constraint following a delete.
Algorithm delete_internal_prefix_node(x) { /* delete the prefix of an internal node */ if ( the degree of node x is 1) { if ( enclosure(x) is empty) Delete_node(x); else /* enclosure(x) is not empty */
{ /* let y is the only child of x */
for each prefix q in enclosure(x) insert(q, y); Delete_node(x);
} } else /* the degree of node x is 2 */ {
/* let y be the node with the largest key value in the left subtree of node x or the node with the smallest key value in the right subtree of node x. node y will replace node x.
temp temporary stores all prefixes of the enclosure(x). */ temp = enclosure(x);
Delete_node(x); /* replace x with y */ for each prefix q in temp insert(q, y); /* let P1 to Ps be nodes in the path from y to x, so P1 = y, Ps = x */
for i=1 to s, delete the prefixes that enclose prefix(y) in enclosure(Pi), and then add those prefixes into enclosure(y) to satisfy the MSPT enclosure constraint.
} return; }
43
4.4.3 Rotations Problem
MSPT is a balanced binary search tree. When inserting/deleting a node into/from
MSPT, it requires at least one rotation to rebalance the MSPT if MSPT becomes
unbalanced. Those rotations may lead to a violation of the non-most specific prefix
allocation rule and make the search operation failure. For example, Figure 4.10 shows a
rebalanced MSPT after inserting a new prefix P6. However, we can find this rebalanced
MSPT (Figure 4.10(b)) conflicts with the non-most specific prefix allocation rule. As that
described in Section 4.2.2, based on MSPT, all non-most specific prefixes allocated to the
left subtree of a most specific prefix node x in MSPT must disjoint with prefix(x) and <
prefix(x) and those are allocated to the right subtree of node x in MSPT must disjoint with
prefix(x) and > prefix(x). But, in Figure 4.10(b), prefixes P3, P4 stored in enclosure(a)
(non-most specific prefix) are not disjoint with prefix(b) and P3, P4 enclose prefix(b). In
such case, an example to find the LPM for an address Dst = 00111 is described as
following. In Figure 4.10(b), the root node b, prefix(b) does not match the Dst and the
enclosure(b) is empty. Comparing the key value, key(b) and Dst. Since key(b) is smaller
than Dst, the search will continue to node c. At node c, prefix(c) also does not match the
Dst and the enclosure(c) is empty. Thus, we can’t find the LMP for address Dst. Obviously
the result is incorrect since the LPM should be P3 = 00*. Therefore, to ensure the search
operation can be performed correctly, we have to remove those two non-most specific
prefixes P3, P4 from the enclosure(a) and then insert them into the enclosure(b). Figure
4.11 is a correct MSPT after adjusting.
44
Figure 4.10 (a) A unbalanced MSPT after inserting a new prefix P6 (b)A rebalanced MSPT after performing a rebalancing rotation.
Figure 4.11 Correct MSPT after adjusting.
Insert a new prefix P6
rotation
( P1, 0001* )
( P2, 0010*)
( P6, 100*)
(P3, 00*) (P4, 0*) (P5, 000*)
a
b
c (a)
enclosure(a)
( P1, 0001*)
( P2, 0010*)
( P6, 100*)
( P3, 00*) ( P4, 0*) ( P5, 000*)
a
b
c (b)
enclosure(a)
( P1, 0001*)
(P2, 0010*)
( P6, 100*)
(P5, 000*)
(P3, 00*) (P4, 0*)
b
a c
enclosure(a)
enclosure(b)
45
Balanced binary search tree can be implemented by Red-Black tree or AVL tree. The
LL and RR rotations used to rebalance a red-black tree following an insert or delete
operation are show in Figure 4.12. However, for the AVL tree, besides LL and RR rotations,
it also causes the LR and RL rotations. We may respectively view the LR and RL rotations
as a RR rotation followed by an LL rotation and an LL rotation followed by an RR rotation.
Figure 4.13 shows these two rotation types.
In Figure 4.12, we observe that in the balanced binary search tree, the related position
of node a and node b changes after performing a LL or RR rotation. Node a becomes a
node in the subtree of node b. To avoiding a violation of the non-most specific prefix
allocation rule, we have to find a set S, such that after performing a rebalancing rotation,
enclosure(b) = enclosure(b)�S and enclosure(a) = enclosure(a) – S, where S = { p | p ∈
enclosure(a) and p encloses prefix(b) } (i.e., delete those non-most specific prefixes that
are stored in the enclosure(a) and enclose prefix(b), then insert them into the enclosure(b)).
The time required to perform an LL and RR rotation depends on the time to determine the
set S, remove S from enclosure(a), and then insert S into enclosure(b). As for LR and RL
rotations, the time is roughly twice that for LL and RR rotations.
Since all prefixes stored in the enclosure set are enclosure. To find the set S, we can
just find the prefix pMax with the longest prefix length that encloses prefix(b) in the
enclosure(a) Moreover, since the data structure of each enclosure set is a balanced binary
search tree of an ordered set of prefix lengths, the prefix pMax can be found in
O(height(enclosure(a)) time by following a path from the root to leaf node. If the pMax
exists, we can use the split [7] operation to extract the prefixes that belong to S from
enclosure(a). We first separate enclosure(a) into a red-black tree pSmall in which all
prefixes with smaller prefix length than pMax and a red-black tree pBig in which all
prefixes with longer prefix length than pMax, and then we use the join [7] operation to
46
combine the red-black tree pSmall, the prefix pMax, and the red-black tree enclosure(b)
into a single red-black tree. Thus, we see that enclosure(a) = pBig, and enclosure(b) =
join(pSmall, pMax, enclosure(b)) after performing a rebalancing rotation. Although, the
split and join operations of [7] need to be modified slightly, this modification does not
affect the complexity. So, the complexity of performing a LL or RR (include LR and RL
rotations) rotation in the enclosure set is O(log W).
Since a rebalancing rotation can be done in O(log W) and we have described in
Section 4.4.1 and Section 4.4.2 that the time required to insert and delete a prefix is O(log
N + log W) and O(Wlog N) without counting the time for performing a rebalancing
rotation. So, the overall insert and delete time are still O(log N + log W) and O(Wlog N +
log W).
Figure 4.12 LL and RR rotations.
a
b
ar br bl
LL a
b
ar
bl
bl
RR a
b
bl
a
b
br bl
al
al br
(a): LL rotation
(b): RR rotation
47
Figure 4.13 LR and RL rotations.
4.5 Enhance Our IP Address Lookup Scheme
By analyzing several real routing tables obtained from [1], [14], we observe that about
91% ~ 93% prefixes in a routing table are the most specific prefixes and the remainders are
the non-most specific prefixes. Based on our scheme, we use all of the most specific
prefixes in a routing table to construct a balanced binary search tree, and the remaining
prefixes (non-most specific prefixes) are allocated to the enclosure set of each most
specific prefix node in MSPT. In Table 4.2, we further analyze that almost 91% ~ 92%
enclosure sets are empty. Excluding those empty enclosure sets, we can see that the
average size of nonempty enclosure sets is slight (average number of non-most specific
prefixes in an nonempty enclosure set) and the max size of nonempty enclosure sets is 4 or
a
b
br
al c
cr cl
a
c
br
al b
cr
cl
RR LL
a
b
ar bl
c
cr cl
RR
a
c
ar
bl
b cr
cl
LL
(a): LR rotation
b
c
br cr
a
cl al
(b): RL rotation
a
c
ar cr
b
cl bl
48
5. Thus, for a real routing table, the enclosure set of each most specific prefix node in
MSPT is either empty or stores a few amounts of non-most specific prefixes.
Table 4.2 Data structure analysis for MSPT.
In our implementation, we use the key value and length value stored in each node of
the MSPT to represent a most specific prefix. In order to use memory more efficient, the
prefix representation method and comparison rule scribed in [3] can be adopted. Based on
[3], a prefix can be only represented by a key value without saving any length information
in each node of the MSPT. Moreover, according to our statistical analysis for several
routing tables showed in Table 4.2 and the fact that a prefix is at most enclosed by six
prefixes [12], [13] in a real routing table. We may expect to get better performance by
using a simpler structure for an enclosure set than the structure (balanced binary search tree)
described in Section 4.2.2. We replace the balanced binary search tree in each enclosure set
with an array, of pairs of the form (prefix length, port). The pairs in this array are in
ascending order of prefix length.
We have described that inserting or deleting a prefix may cause the rotation problem
and redistribute some non-most specific prefixes stored in the enclosure set. The time
required to do a rotation and redistribute non-most specific prefixes depends on the data
Database
(year-mouth)
AS6447
(2000-4)
AS6447
(2002-4)
AS6447
(2005-4)
AS7660
(2005-4)
AS2493
(2005-4)
# of prefixes
# of repeated prefixes
# of most specific prefixes
# of non-most specific prefixes
# of empty enclosure sets
# of nonempty enclosure sets
Max size of nonempty enclosure sets
Average size of nonempty enclosure sets
79560
25
73900
5635
68639
5261
4
1.07
124824
21
114745
10058
105561
9184
5
1.09
163574
32
150245
13297
138114
12131
4
1.09
159816
120
145849
13847
133297
12552
4
1.10
157154
133
143684
13337
131520
12164
4
1.10
49
structure used to represent the enclosure set and the number of non-most specific prefixes
stored in the enclosure set. As noted earlier, a prefix is at most enclosed by six prefixes in
practice. So, in practice, MSPT can take O(log N) time to deal with per operation (search,
insertion, deletion).
4.6 Migrate to IPv6
In this section, we sketch an extension of our scheme to IPv6. While IPv4 uses 32 bit
IP addresses, the next generation, IPv6, will use 128 bit addresses. On a 32-bit machine,
we need four words (one word is 32 bits) to present an IPv6 address. It leads to access a
single 128-bit address will require four memory accesses. Thus, the performance could
suffer a slowdown. Moreover, IPv6 has the characteristic of hierarchical addressing. How
to use this characteristic to make routing more efficiently is an important problem.
Since we need four words to present an IPv6 prefix and IPv6 has the characteristic of
hierarchical addressing, we can divide each IPv6 address into four parts, each parts is 32
bit (a word). In other words, we build a four levels MSPT structure in which each level
MSPT is constructed by a word (32 bit). Base on four levels MSPT structure, each level
MSPT just represents a segment of whole IPv6 addresses. It can reduce the tree height to
improve the performance. The constructing procedure is described as following: we first
initialize W=32 and build a level-one MSPT L1 on the first words of the prefixes in a prefix
database S. We observe that the prefixes of S fall in two categories: (1) those with prefix
length fewer than W or equal to W are treated as a normal prefixes (2) those with prefix
length more than W are treated as prefixes with prefix length is 32 and continue to insert
those prefixes into next level. The procedures of constructing level-two, level-three and
level-four are similar as that in level-one, but the vale W need to be changed to 64, 96 and
50
128 for two to four level. Figure 4.14 gives an example of seven IPv6 prefixes. Since the
max prefix length in these seven IPv6 prefixes is 48, we don’t build the level-three and
level-four MSPT in our constructing structure.
Figure 4.14 An MSPT example for IPv6.
(2001:0200, 32)
(3FFE:0100, 32)
(3FFE:0608, 32)
P7, (3FFE:1100 24, G)
P4
a
b c
d
enclosure(a)
P3 (0600:0000, 8, C)
P1 (0000:0000, 8, A)
P2 (0500:0000, 8, B)
P5 (0002:0000, 16, E)
P6 (1002:0000, 16, F)
Level-one MSPT
Level-two MSPT
prefix port
P1 2001:0200:0000::/40 A
P2 2001:0200:0500::/40 B
P3 3FFE:0100:0600::/40 C
P4 3FFE:0100::/24 D
P5 3FFE:0608:0002::/48 E
P6 3FFE:0608:1002::/48 F
P7 3FFE:1100::/24 G
51
Chapter 5 Performance
In this chapter, we first introduce the simulation environment and then divide the
simulation results into two parts: one is the simulation results for IPv4 and the other is that
for IPv6. In each part, we compare these different IP address lookup schemes in turn of the
memory requirement, lookup speed and update.
5.1 Simulation Environment
All tested schemes are implemented in C and run on a 2.4G Pentium IV processor,
8KB L1, 256KB L2 caches and 768MB main memory running Redhat 9.0. The gcc-3.2.2
compiler with optimization level –O4 is used. Moreover, a special instruction called
RDTSC (read time stamp counter) is used to estimate the performance of search and
update. The time-stamp counter can keep an accurate count of every clock cycle that
occurs on the processor. Thus, the time unit of measurement can be obtained by using this
RDTSC instruction.
5.2 Simulation Result for IPv4
Our experiments are conducted by five BGP routing tables with different size
obtained from [1], [14]. Those BGP routing tables reflect the realistic size of the routing
tables in the backbone routers currently deployed on the Internet. The detailed information
is showed in Table 5.1.
Table 5.1 BGP routing tables.
Database
(AS number) AS6447 AS6447 AS6447 AS7660 AS2493
year-mouth 2000-4 2002-4 2005-4 2005-4 2005-4
# of prefixes 79560 124824 163574 159816 157154
52
We experiment with using the following non precomputation-based (dynamic routing
tables structure) schemes as the PBOP (prefix binary tree on the binary tree structure [12]),
MRT (multiway range tree structure [21]), BTRIE (binary trie) and our proposed scheme
MSPT (most specific prefix tree). Moreover, we also include some precomputation-based
schemes as BRS (binary range search [11]), BPS (binary prefix search [3]), HCA (Huang’s
compact algorithm [8]), BLS (binary length search [18]) and Lulea (Lulea compressed trie
[2]) in our performance measurements to compare with the MSPT. The schemes whose
name ends with a number "16" (for example, PBOB-16) are variants of the corresponding
pure schemes. For example, PBOB-16 uses the first 16 bits of IP address to maintain a
segmentation table. Many IP address lookup schemes use this method to further improve
their lookup speed and update performance. The concept of the 16-bits segmentation table
is showed in Figure 5.1.
Figure 5.1 16-bits segmentation table.
Total Memory Requirement. Table 5.2 shows the amount of memory used by each
of the tested schemes. These memory requirements are histogrammed in Figure 5.2. Since
precomputation simplifies entire search data structure, we may find that the
precomputation-based schemes indeed have the smaller memory requirement. But, when
we talk about the non precomputation-based schemes, our MSPT has the best performance
among all schemes. Comparing with the PBOB and MSPT, we find that MSPT structure
0 1 2 3 … i … 65535.
16-bits segmentation table (216entries)
segment-1 segment-3 segment-i segment-65535
53
uses about 30% less memory than is used by the PBOB structure. This result can be
attributed to that the number of nodes in MSPT is always equal or less than that in PBOB.
Moreover, less than 1 percent of range sets in the constructed PBOB are empty. It needs
additional memory spaces to store those nonempty range sets (each nonempty range set is
constructed by an array structure with six entries). On the contrary, almost all enclosure
sets in MSPT are empty. Hence, MSPT always requires less memory space than PBOP. As
for BTIRE and MRT, they both need larger memory spaces to store the long sequences of
one-child nodes and link pointers. The detailed data structure analysis of non
precomputation-based schemes is showed in Table 5.3. This result of the memory
performance is still the same as when we adopt a 16-bits segmentation table to improve
lookup speed.
Table 5.2 The statistics of memory requirement (in KB) for IPv4.
scheme AS6447 2000-4
AS6447 2002-4
AS6447 2005-4
AS7660 2005-4
AS2493 2005-4
Non Precomputation-Based (dynamic routing tables structure) PBOB 2122 3299 4317 4192 4129 MSPT 1433 2237 2930 2853 2809 BTRIE 2789 4231 5383 5147 5056 MRT 3665 5689 7399 7220 7099 PBOB-16 2465 3550 4484 4356 4298 MSPT-16 1828 2533 3143 3068 3027 BTRIE-16 3167 4555 5655 5397 5308
Precomputation-Based (static routing tables structure) BRS-16 1474 1972 2391 2336 2308 BPS-16 1006 1223 1399 1359 1345 HCA-16 1298 1876 2072 2019 2005 BLS 1459 2315 2817 3074 3292 Lulea 521 802 946 921 895 PBOP (prefix binary tree on the binary tree [12]) MSPT (most specific prefix tree), BTRIE (binary trie) MRT (multiway range tree [21], 32 way) BRS (binary range search [11]), BPS (binary prefix search [3]) HCA (Huang’s compact algorithm [8]), BLS (binary length search [18]) Lulea (Lulea compressed trie [2])
54
�
� ���
� ���
� ���
� ���
� ���
� ���
� ���
���
���
� ����
� � ���
� � ���
� � ���
� � ���
� � ���
� � ���
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
��������
������������ ��
���
� � �
�� � � �
� � �
���� � �
� � � � � �
�� � � � � � �
�� � � � �
� � � � �
� � � � � �
�� �
� � � � �
Figure 5.2 Total memory requirement (in KB) for IPv4.
Table 5.3 Data structure analysis for non precomputation-based schemes.
scheme AS6447 2000-4
AS6447 2002-4
AS6447 2005-4
AS7660 2005-4
AS2493 2005-4
PBOB # of nodes: # of empty range sets:
75075
319
116722
562
152741
707
148343
785
146109
727 MSPT
# of nodes: # of empty enclosure sets:
73900 68639
114745 105561
150245 138114
145849 133297
143684 131520
BTRIE # of nodes:
237976
361106
459351
439246
431516
MRT # of nodes:
7303
11338
14743
14381
14140
PBOB-16 # of nodes: # of empty range sets:
69353
309
110620
548
146109
701
141267
730
139038
691 MSPT-16
# of nodes: # of empty enclosure sets:
68484 64698
109029 101958
144072 134549
139252 129356
137046 127510
BTRIE-16 # of nodes:
215664
334058
427935
405993
398345
55
Search Time. To measure the lookup speed, we also conduct trace-drive simulations
to obtain the lookup time distributions of the tested schemes. A simulated IP traffic
described as following. We first use an array A to store the start address of all prefixes in a
database and then add one to each of these start address. A random permutation of A is
generated and this permutation determines the order in which we search for the longest
prefix match for each of addresses in A. The time required to determine all the LPM is
measured and averaged over the number of addresses in A. The experiment is repeated 100
times, and the mean of these average times is computed. These mean times are reported in
Table 5.4 and the mean times are also histogrammed in Figure 5.3.
Table 5.4 The statistics of search time (in Microsecond) for IPv4.
scheme AS6447 2000-4
AS6447 2002-4
AS6447 2005-4
AS7660 2005-4
AS2493 2005-4
Non Precomputation-Based (dynamic routing tables structure) PBOB 0.57 0.69 0.79 0.77 0.72 MSPT 0.47 0.60 0.68 0.68 0.66 BTRIE 0.63 0.70 0.75 0.74 0.73 MRT 0.42 0.47 0.52 0.51 0.52 PBOB-16 0.34 0.38 0.43 0.42 0.41 MSPT-16 0.25 0.32 0.36 0.34 0.34 BTRIE-16 0.48 0.54 0.58 0.56 0.55
Precomputation-Based (static routing tables structure) BRS-16 0.22 0.28 0.32 0.31 0.31 BPS-16 0.15 0.18 0.21 0.19 0.19 HCA-16 0.15 0.18 0.18 0.20 0.20 BLS 0.23 0.35 0.40 0.42 0.41 Lulea 0.18 0.21 0.23 0.23 0.22
56
�
�� �
�� �
�� �
�� �
�� �
�� �
��
��
�� �
�
�� �
�� �
�� �
�� �
�� �
�� �
� � � � � � � � � � � � � � � � � � � � �
��������
�������������� ��� ���
� � � �
� � �
� � � � �
� � �
� � � � � ��
� � � � ��
� � � � � � ��
� � � ��
� � � ��
� � � � ��
� �
� � � � �
Figure 5.3 Search time (in Microsecond) for IPv4.
First, only consider the non precomputation-based schemes. Exclusive of using the
16-bits segmentation table, we find that the performance of PBOB becomes bad than
binary trie when the number of entries in BGP table becomes large and the MRT has the
best performance among all schemes. We attribute the MRT has the best search
performance to that MRT has lower tree height than PBOP, MSPT and binary tree.
Considering the MSPT and PBOB, the lookup time for MSPT is about 90% that of PBOB.
This is because each node in MSPT represents a prefix in a routing table. Almost all
enclosure sets in MSPT are empty (i.e., few amounts of prefixes in a routing table are
stored in all enclosure sets of MSPT). For the PBOB, however, each node in PBOB just
presents a point. All prefixes in a routing table must be stored in all range sets of PBOB. It
leads less than 1% range sets are empty. When the search procedure traverses from root
57
node to leaf node, almost the enclosure set of each node in MSPT is empty, it does not take
extra time to check whether there exits other prefixes also match the destination address,
but checking the range set of each node in PBOB is always necessary due to almost the
range set of each node in PBOB is not empty. Thus, our MSPT indeed has the better search
performance than PBOB.
When considering the precomputation-based schemes and the 16-bits segmentation
table. The search performance of MSPT-16 is very near to those precomputation-based
schemes. This is because after adopting the 16-bits segmentation table, the number of
prefixes in each segment is much smaller than the number of prefixes in original prefix
database. The few amounts of prefixes in each segment reduce the difference of each IP
address lookup schemes. Table 5.5 gives the statistics of 16-bits segmentation table for
each prefix database.
Table 5.5 The statistics for 16-bits segmentation table.
Update Time. For the average update (insert/delete) time, we start by selecting 2000
prefixes from the database. Excluding those 2000 prefixes, we first use the remaining
prefixes of the database to build the data structure. After constructing the data structure, the
selected 2000 prefixes are inserted into the data structure. Once the 2000 insertions are
Database
(AS number) AS6447 AS6447 AS6447 AS7660 AS2493
# of entry 79560 124824 163574 159816 157154
# of nonempty segments 4784 6882 8698 9160 9070
max size of nonempty segments 261 297 2078 243 243
Average size of nonempty segments 15.14 15.48 17.74 16.37 16.25
58
done, the selected 2000 prefixes are removed from the database. The total elapsed time of
inserting 2000 prefixes and removing 2000 prefixes is divided by 4000 to get the average
time for a single update. This experiment is repeated 10 times and the mean of the average
update times is computed.
However, this measurement is not suitable for the precomputation-based schemes,
because inserting a prefix into a precomputation-based scheme may affects entire data
structure. In order to measure their update time (rebuild time), for BRS-16, BPS-16 and
HCA-16, the update time can be obtained by calculating the time of maintaining a 16-bits
segmentation table and pulsing the rebuilding time of a segment. As for BLS, the time of
creating all makers when inserting a prefix and the time of finding the LPM of all makers
should be counted. Finally, we have to calculate the time of rebuilding entire data structure
for the Lulea.
Table 5.6 gives the computed mean times and Figure 5.4 histograms these mean times
(exclusive of binary length search and Lulea compressed trie). Obviously, the update
performance of non precomputation-based schemes is much better than that of the
precomputation-based schemes. Exclusive of using the 16-bit segmentation table, PBOB
and MSPT have almost the same performance on update, and the performances of them are
the best among all schemes. But, when using the 16-bits segmentation table, it makes the
MSPT-16 have the best performance among all tested schemes.
59
Table 5.6 The statistics of update time (in Microsecond) for IPv4.
�
�
�
�
�
� �
� �
� �
� �
� �
��
��
��
��
��
� �
� �
� �
� �
� �
��
� ��� � ��� � ��� � ��� � ��� �
��������
������������ ���� ����
� �
� � �
� � � �
� � �
� � � � �
� � � � � �
� � � � � � �
� � � �
� � � �
� � � � � �
Figure 5.4 Update time (in Microsecond) for IPv4.
scheme AS6447 2000-4
AS6447 2002-4
AS6447 2005-4
AS7660 2005-4
AS2493 2005-4
Non Precomputation-Based (dynamic routing tables structure) PBOB 1.15 1.17 1.25 1.21 1.23 MSPT 1.15 1.15 1.24 1.21 1.17 BTRIE 1.19 1.31 1.39 1.35 1.32 MRT 1.67 1.78 1.80 1.84 1.86 PBOB-16 0.97 0.99 1.01 1.02 0.99 MSPT-16 0.76 0.79 0.82 0.82 0.78 BTRIE-16 0.77 0.81 0.87 0.83 0.88
Precomputation-Based (static routing tables structure) BRS-16 9.06 9.59 9.90 9.16 9.14 BPS-16 5.75 6.41 6.73 6.12 6.27 HCA-16 14.93 17.50 28.95 14.87 14.46 BLS 132.78 210.01 266.77 259.16 253.76 Lulea 1454.9 1633.9 1891.3 1864.5 1885.4
60
5.3 Simulation Result for IPv6
In order to measure the performance of IP address lookup algorithms for IPv6, IPv6
routing tables are needed. Since there are few users on IPv6 at present, current IPv6 table
sizes are small and unlikely to reflect future IPv6 network growth. In our experiment, we
use four IPv6 prefix databases. The databases V6table-1 and V6table-2 obtained from [1]
are the real routing tables in the IPv6 backbone routers. The databases GV6table-1 and
GV6table-2 are generated by two real IPv4 routing tables. The IPv6 table generation
schemes are described in [20].
Although many proposed IPv4 schemes adopt 16-bits segmentation table to speed up
the lookup speed, but for IPv6, the address length becomes 128, we think the 16-bits
segmentation table is no longer suitable. In this Section, we only experiment the non
precomputation-based schemes as MSPT, PBOP and BTIRE. The implementation of
PBOB for IPv6 is the same as that of MSPT described in Section 4.6.
Total Memory Requirement. Table 5.7 shows the memory requirement of each
tested schemes, and the detailed data structure analysis is showed in Table 5.8. The
experimental results for IPv6 are the same as those for IPv4. We find that MSPT has the
best memory performance among all schemes. Figure 5.5 is the histogram of the memory
requirement.
Table 5.7 The statistics of memory requirement (in KB) for IPv6.
Database V6table-1 V6table-2 GV6table-1 GV6table-2
# of entries 274 593 9788 20070
MSPT 9.9� 18.3� 312� 605�
PBOB 12.5� 24.1� 386.5� 763.1�
BTRIE 22.6� 29.2� 822.2� 1786.5�
61
Table 5.8 Data structure analysis for IPv6 prefix databases.
�
� ��
� ��
� ��
� ��
� ��
� ��
� ��
��
��
� ���
� � ��
� � ��
� � ��
� � ��
� � ��
� � ��
� � ��
� ��
�������� �������� � �������� � ��������
� � � � � � �
������������ ��
� � �
� � � �
� � � � �
Figure 5.5 Total Memory requirement (in KB) for IPv6.
scheme V6table-1 V6table-2 GV6table-1 GV6table-2
# of prefixes 274 593 9788 20070 MSPT
# of nodes: # of empty enclosure set:
361 325
675 632
11491 10723
22412 21219
PBOB # of nodes: # of empty range set:
382 115
706 123
11866 3604
23024 5870
BTRIE # of nodes:
2313
2992
84195
182941
62
Search and Update Time. The search and update performance are showed in Table
5.9. Figure 5.6 histograms the mean search time and update time. As can be seen, for small
tables as V6table-1 and V6table-2, MSPT has the similar performance with PBOB. But for
the large tables as GV6table-1 and GV6table-2, MSPT is slightly better than PBOB.
However, BTRIE always has the poor performance among all tested schemes. This is
because the performance of BTRIE grows linearly with the prefix length and thus it does
not scale well to longer IP addresses (i.e., BTRIE is a trie based schemes, we have
introduced in Chapter 3 that trie based schemes usually do not scale well to longer IP
addresses).
Table 5.9 The statistics of search and update time (in Microsecond) for IPv6.
Database V6table-1 V6table-2 GV6table-1 GV6table-2
# of prefixes 274 593 9788 20070
MSPT 0.249� 0.318� 0.415� 0.532�
PBOB 0.255� 0.356� 0.516� 0.752�Search
BTRIE 0.392� 0.48� 0.79� 0.943�
MSPT 0.562� 0.631� 0.678� 0.734�
PBOB 0.592� 0.655� 0.721� 0.792�Update
BTRIE 1.539� 1.183� 1.723� 1.79�
63
Figure 5.6 The mean times of search and update for IPv6.
Update Search �
�� �
�� �
�� �
�� �
�� �
�� �
��
��
�� �
�
�� �
�� �
�� �
�� �
�� �
�� �
��
��
�� �
�
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
��������
��������
����
�� ��� ���
� � � �
� � � �
� � � � �
�
�� �
�� �
�� �
�� �
�
�� �
�� �
�� �
�� �
�
�� �
�� �
�� �
�� �
�
�� �
�� �
�� �
�� �
�
� � � � � � � � � � � � � � � � � � � � � � � � � �
��������
��������
����
���� ����
� � � �
� � � �
� � � � �
64
Chapter 6 Conclusion
We have developed a Most Specific Prefix Tree (MSPT) data structure that is suitable
for the representation of dynamic routing tables. Our MSPT is a balanced binary search
tree which is constructed by all most specific prefixes in a routing table. The rest of
prefixes (non-most specific prefixes) are allocated to the enclosure set of each most
specific prefix node in MSPT. Based on our MSPT, the search, insertion, and deletion
operations can be finished in O(log N) time for a real routing table, where N is the number
of prefixes.
Comparing with some schemes that are suitable for dynamic routing tables and
several precomputation-based schemes, our experiments show that MSPT is to be preferred
over PBOB, MRT and BTRIE for the representation of dynamic routing tables and the
performance of lookup speed and memory requirement is near to those
precomputation-based schemes. Moreover, our scheme also can scales well to IPv6 and
large routing tables.
Balanced binary search tree may need more memory accesses to accomplish the
search, update operations when the number of prefixes in the routing table becomes large.
In the future, we try to use the balanced m-way search tree to construct our MSPT. By
adopting this balanced m-way search tree, we think the lower tree height can reduce the
memory accesses to get better performance.
65
Reference
[1] BGP Routing Table Analysis Reports, http://bgp.potaroo.net/.
[2] A. Brodnik, S. Carlsson, M. Degermark, S. Pink, "Small Forwarding Tables for Fast
Routing Lookups," ACM SIGCOMM, pp. 3-14, Sept. 1997.
[3] Y. K. Chang, "Fast Binary and Multiway Prefix Searches for Packet Forwarding,"
Submitted for publication.
[4] S. Deering and R. Hinden, RFC 2460 Internet Protocol, Version 6 (IPv6)
Specification.
[5] A. Durand and C. Huitema, RFC 3194 The H-Density Ratio for Address Assignment
Efficiency An Update on the H ratio.
[6] V. Fuller, T. Li, J. Yu and K. Varadhan, "Classless inter-domain routing (CIDR): an
address assignment and aggregation strategy," RFC 1519, Sept. 1993.
[7] E. Horowitz, S. Sahni, and D. Mehta, Fundamentals of Data Structure in C++. New
York: W.H. Freeman, 1995.
[8] N. F. Huang, S. M. Zhao, J. Y. Pan, and C. A. Su, "A Fast IP Routing Lookup Scheme
for Gigabit Switching Routers," in Proc. INFOCOM, pp. 1429-1436, Mar. 1999.
[9] IPv6 Forum, http://www.ipv6forum.com.
[10] K. Kim, S Sahni, "An O(logn) Dynamic Router-Table Design," IEEE Transactions on
Computers, pp. 351-363, Mar. 2004.
[11] B. Lampson, V. Srinivasan and G. Varghese, "IP Lookups Using Multiway and
Multicolumn Search," IEEE/ACM Transactions on Networking, Vol. 3, No. 3, pp.
324-334, Jun.1999.
[12] H. Lu, S. Sahni, "Enhanced Interval Tree for Dynamic IP Router-Tables," IEEE
Transactions on Computers, pp. 1615-1628, Dec. 2004.
66
[13] X. Meng, Z. Xu, B. Zhang, G.. Huston, S. Lu, L. Zhang, "IPv4 Address Allocation
and the BGP Routing Table Evolution," ACM SIGCOMM, pp. 71-80, Jan. 2005.
[14] D. Meyer, "University of Oregon Route Views Archive Project", at
http://archive.routeviews.org/.
[15] S. Nilsson and G. Karlsson "IP-Address Lookup Using LC-trie," IEEE Journal on
selected Areas in Communications, 17(6):1083-1092, June 1999.
[16] K. Sklower, "A Tree-based Packet Routing Table for Berkeley Unix," Proc. Winter
Usenix Conf, pp. 93-99, 1991.
[17] M. A. Ruiz-Sanchez, Ernst W. Biersack, and Walid Dabbous, "Survey and Taxonomy
of IP Address Lookup Algorithms," IEEE Network Magazine, 15(2):8--23,
March/April 2001.
[18] M. Waldvogel, G. Varghese, J. Turner and B. Plattner, "Scalable High-Speed IP
Routing Lookups," ACM SIGCOMM, pp. 25-36, Sept. 1997.
[19] P. C. Wang, C. T. Chan and Y. C. Chen, "An Efficient IP Routing Lookup by Using
Routing Interval," Journal of Communication and Networks, pp. 374-382, Mar. 2001.
[20] M. Wang, S. Deering, T. Hain, L. Dunn, "Non-random Generator for IPv6 Tables," in
Proc. 12th Annual IEEE Symposium of High Performance Interconnects, pp. 35-40,
Aug. 2004.
[21] P. Warkhede, S. Suri, G.. Varghese, "Multiway Range Trees: Scalable IP Lookup with
Fast Updates," The International Journal of Computer and Telecommunications
Networking, pp. 289-303, Feb. 2004.
���
���� � � �
� � � ��
��� � �� ��� ��� ��
�
� � ��
� �� ��� ������� � � � � � ! " # $ % & ' ( ) �
� �� ��* � �����+ , � � - . / � 0 ( ) �
� �� ��* 1�����2 3 4 5 + � ( ) �
�
6 7 � � �8 9 : ; < = > ? * 1 @ A1 B �C> D AE�
F G �H I AJ A� >� 11�� �