2
The Design of Efficient Hashing Techniques for IP Address Lookup Devang Pandya Chris Martinez Wei-Ming Lin Parimal Patel Department of Electrical and Computer Engineering The University of Texas at San Antonio San Antonio, TX 78249-0669, USA [email protected] Abstract Hash results delivered by traditional hashing algorithms usually are far from optimal when the database presented is not uniformly distributed. This paper proposes a unique hashing algorithm to tackle such a non-uniformly distributed database prevalent in computer network ap- plications. The original database is first pre-processed to extract information that would facilitate the design of an ad-hoc hashing algorithm. 1. Hashing Algorithm The proposed algorithm is based on the same feature- extracting technique used in the ad hoc algorithm presented in [6]. Fundamentals of this technique are briefly described here for the sake of completeness. In [6], a sorting pro- cess is first performed on the database according to the bit value distribution. The database is defined as consisting of M =2 m entries with each entry having n bits in length. In order to render the best (uniform) distribution in the final hashed data set, all the bits in the final hashing function H should demonstrate a distribution as probabilistically ran- dom as possible, i.e. evenly distributed between 0’s and 1’s. An optimal H will have each of its bits demonstrate even distribution of 0’s and 1’s, and thus leading to the highest probability in reaching the best hashing. For bit position i, d i is defined as the absolute difference between the number of 0’s and 1’s in that bit vector across the data set. Once the d value is found for each bit vector they are then sorted into a non-decreasing order. Comparison between hash tech- niques are based on three different performance measure- ments (indicators): (1) Average Collision Ratio (ACR), (2) Average Maximal Search Length (ASL), and (3) Maximal Search Length (MSL). XOR-folding is a commonly used hashing technique by simply folding the n-bit key into m-bit hash result through a simple process XORing every n m key bits into a final hash bit. Two obvious ways presented in [6] to exploit the ben- efit from the d-value sorted sequence are to perform XOR hashing in the following order. In-Order XOR (d-IOX), pro- vides a straightforward XOR hashing sequence as a normal hashing would do. Snake-Order XOR (d-SOX), is supposed to deliver a better (more balanced) combination among all hash bits at the end. Note that the “Bit Position” refers to the indices after the sorting process is applied with respect to their corresponding d values. Analysis in [6] in general indicates that bits with smaller d values are to be group- XORed with bits with larger d values in order to provide a more balanced resulted d value distribution, i.e., the fi- nal hash bits are more uniformly distributed. While the d-SOX, versus the d-IOX, seems to provide in general a more “balanced” XOR combination among the sorted bits, specific grouping among the bits for XORing is still far from optimal depending on the relation between n and m. For example, when n =3m as shown in Figure 1 for d- XOR, segment a and segment c are supposed to be reversely grouped to lead to the best balancing, instead they are in- order matched. This leads to a potentially beneficial con- version process demonstrated in Figure 1. a c a b b 2 1 b b 2 1 b b 2 1 b c c a NFX NFD SOX d- d- d- Figure 1. Converting from d-XOR to the Natu- ral Folding Technique The proposed technique is called “Natural-Fold XOR” (d- NFX). Instead of snake-ordering from the beginning bits as in d-SOX, the d-NFX folds the sorted bit sequence from both ends matching pair of bits accordingly. Thus segments a and c are paired in a “natural folding” order. In this case of n =3m, the middle segment will be folded in half as shown in the figure. After testing on this approach, performance is not significantly improved due to loss in uniformity in terms of number of bits XORed to produce the final hash bits. That is, half of hash bits are from XORing 2 bits each and the other half from XORing 4 bits each in this example. To remedy this problem, we choose to duplicate the middle sub-segments, b 1 and b 2 , to patch up the missing portion for uniformity, which leads to the final proposed technique, the 531 1-4244-0419-3/06/$20.00 ©2006 IEEE

[IEEE 2006 31st IEEE Conference on Local Computer Networks - Embassy Suites Hotel, Tampa, FL, USA (2006.11.14-2006.11.16)] Proceedings. 2006 31st IEEE Conference on Local Computer

  • Upload
    parimal

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2006 31st IEEE Conference on Local Computer Networks - Embassy Suites Hotel, Tampa, FL, USA (2006.11.14-2006.11.16)] Proceedings. 2006 31st IEEE Conference on Local Computer

The Design of Efficient Hashing Techniques for IP Address Lookup

Devang Pandya Chris Martinez Wei-Ming Lin Parimal PatelDepartment of Electrical and Computer Engineering

The University of Texas at San AntonioSan Antonio, TX 78249-0669, USA

[email protected]

Abstract

Hash results delivered by traditional hashing algorithmsusually are far from optimal when the database presentedis not uniformly distributed. This paper proposes aunique hashing algorithm to tackle such a non-uniformlydistributed database prevalent in computer network ap-plications. The original database is first pre-processed toextract information that would facilitate the design of anad-hoc hashing algorithm.

1. Hashing Algorithm

The proposed algorithm is based on the same feature-extracting technique used in the ad hoc algorithm presentedin [6]. Fundamentals of this technique are briefly describedhere for the sake of completeness. In [6], a sorting pro-cess is first performed on the database according to the bitvalue distribution. The database is defined as consisting ofM = 2m entries with each entry having n bits in length.In order to render the best (uniform) distribution in the finalhashed data set, all the bits in the final hashing function Hshould demonstrate a distribution as probabilistically ran-dom as possible, i.e. evenly distributed between 0’s and 1’s.An optimal H will have each of its bits demonstrate evendistribution of 0’s and 1’s, and thus leading to the highestprobability in reaching the best hashing. For bit position i,di is defined as the absolute difference between the numberof 0’s and 1’s in that bit vector across the data set. Once thed value is found for each bit vector they are then sorted intoa non-decreasing order. Comparison between hash tech-niques are based on three different performance measure-ments (indicators): (1) Average Collision Ratio (ACR), (2)Average Maximal Search Length (ASL), and (3) MaximalSearch Length (MSL).

XOR-folding is a commonly used hashing technique bysimply folding the n-bit key into m-bit hash result througha simple process XORing every n

m key bits into a final hashbit. Two obvious ways presented in [6] to exploit the ben-efit from the d-value sorted sequence are to perform XORhashing in the following order. In-Order XOR (d-IOX), pro-

vides a straightforward XOR hashing sequence as a normalhashing would do. Snake-Order XOR (d-SOX), is supposedto deliver a better (more balanced) combination among allhash bits at the end. Note that the “Bit Position” refers tothe indices after the sorting process is applied with respectto their corresponding d values. Analysis in [6] in generalindicates that bits with smaller d values are to be group-XORed with bits with larger d values in order to providea more balanced resulted d value distribution, i.e., the fi-nal hash bits are more uniformly distributed. While thed-SOX, versus the d-IOX, seems to provide in general amore “balanced” XOR combination among the sorted bits,specific grouping among the bits for XORing is still farfrom optimal depending on the relation between n and m.For example, when n = 3m as shown in Figure 1 for d-XOR, segment a and segment c are supposed to be reverselygrouped to lead to the best balancing, instead they are in-order matched. This leads to a potentially beneficial con-version process demonstrated in Figure 1.

a

c

a

b

b2

1 b

b2

1b

b2

1b

c

c

a

NFX NFDSOXd− d−d−

Figure 1. Converting from d-XOR to the Natu-ral Folding Technique

The proposed technique is called “Natural-Fold XOR” (d-NFX). Instead of snake-ordering from the beginning bits asin d-SOX, the d-NFX folds the sorted bit sequence fromboth ends matching pair of bits accordingly. Thus segmentsa and c are paired in a “natural folding” order. In this case ofn = 3m, the middle segment will be folded in half as shownin the figure. After testing on this approach, performanceis not significantly improved due to loss in uniformity interms of number of bits XORed to produce the final hashbits. That is, half of hash bits are from XORing 2 bits eachand the other half from XORing 4 bits each in this example.To remedy this problem, we choose to duplicate the middlesub-segments, b1 and b2, to patch up the missing portion foruniformity, which leads to the final proposed technique, the

5311-4244-0419-3/06/$20.00 ©2006 IEEE

Page 2: [IEEE 2006 31st IEEE Conference on Local Computer Networks - Embassy Suites Hotel, Tampa, FL, USA (2006.11.14-2006.11.16)] Proceedings. 2006 31st IEEE Conference on Local Computer

“Natural-Fold with Duplication XOR” (d-NFD). This tech-nique may lead to over-duplication or under-duplication onthe center sub-segments. A simple method is adopted insimply truncating the bits overshot. Note that, due to the po-tential duplication, the number of actual bits to be XORed,denoted as n′ can be decided as: n′ = � n

2m� × 2m Whenn mod 2m = 0, n′ = n; that is, the proposed d-NFD isidentical to d-SOX when n is an integral multiple of 2m.

2. Simulation Results and Implementation

Data randomly generated to reflect IP addresses are usedfor simulation. Figures 2 gives the performance comparisonamong the techniques. All simulation results clearly show

0.0

0.1

0.2

0.3

0.4

0.5

0.6

6 7 8 9 10 11 12 13 14

m

Ave

rag

e C

olli

sio

n R

atio

(A

CR

)

XORd-IOXd-SOXd-NFD

0.0

0.5

1.0

1.5

2.0

2.5

6 7 8 9 10 11 12 13 14

m

Ave

rag

e S

earc

h L

eng

th (

AS

L)

XORd-IOXd-SOXd-NFD

0.0

5.0

10.0

15.0

20.0

25.0

6 7 8 9 10 11 12 13 14m

Max

imu

m S

earc

h L

eng

th

(MS

L)

XORd-IOXd-SOXd-NFD

Figure 2. Performance Comparison

that improvement from the proposed d-NFD XORing tech-nique over all other techniques becomes more significantwhen m increases. The reason behind this is that, when nis much larger than m, order of XORing and/or specific bitsto XOR does not matter as much since more randomness isintroduced form XORing more bits to dilute the effect of dvalue on performance. By adopting a more natural foldingprocess versus the d-SOX, the newly proposed one postsan additional gain of 7% in ACR, 5% in ASL, and 11% inMSL. Part of performance gain actually should be attributedto the duplication process.

In order to implement the desired hashing function inhardware to allow fast search and comparison, we pro-pose a simple design allowing the needed mapping flexi-bility from the original n-bit input to the final m-bit hashvalue. Figure 3(a) shows a block diagram of the designwhere each bit from the input is connected to a 1-to-mDMUX (de-multiplexer) selected by a log2 m-bit RSV (Re-

programmable Select Vector).

.. . .. . .. .

... ... ...DMUX1−to−m

DMUX1−to−m

DMUX1−to−m

. ... ..

. . .

. . .. . .

n

m

RSV

RSV

RSV

RSV

n−1

n−2

0

i

: Reprogrammable Select Vector for bit i

(a)

RSV

n

RSV

n

RSV

n

RSV

n

RSV

n

RSV

n

RSV

n

RSV

n

RSV

n

RSV

n

RSV

nn

RSV

n

RSV

n

RSV

n

RSV

n

14

n

m

: reprogrammable Select VectorRSV : n−to−1 MUX for bit ii

i

RSV 791011 8 6 5 4 3 2 1 0

01234567891011121315

15 14 13 12

(b)

Figure 3. Hashing Circuit: (a) General Design(b) Design for d-NFD

Each of the m output signals of the DMUX is then sent tothe corresponding XOR circuit for each of the m-bit hashvalue. The value of each RSV is downloaded from the sys-tem that determines the sorted sequence and the hashing ap-proach (e.g. d-IOX, d-SOX, etc.). The above design, how-ever, cannot be used to implement the d-NFD technique dueto the extra potential duplication. A modification is made toincorporate such an expansion as shown in Figure 3(b).

References

[1] S. Chung, J. Sungkee, H. Yoon and J. Cho, “A Fast and Up-datable IP Address Lookup Scheme”, International Confer-ence on Computer Networks and Mobile Computing, 2001.

[2] R. Jain, “A Comparison of Hashing Schemes for AddressLookup in Computer Networks,” IEEE Transactions onCommunications,, Vol. 40, No. 10, Oct 1992.

[3] C. Martinez, W.-M. Lin and P. Patel, “Optimal XOR Hash-ing for A Linearly Distributed Address Lookup in ComputerNetworks”, Symposium on Architectures for Networking andCommunications Systems, Oct., 2005, Princeton, New Jersey

[4] A. Moestedt and P. Sjodin, “IP Address Lookup in Hardwarefor High-speed Routing”, Proc. IEEE Hot Interconnects 6symposium, Stanford, California, pp.31-39, August 1998.

[5] X. Nie, D.J. Wilson, J. Cornet, G. Damm, Yiqiang Zhoa, “PAddress Lookup Using A dynamic Hash Function”, IEEEElectrical and Computer Engineering, Canadian Confer-ence, Page(s) 1646 - 1651, May 1-4, 2005.

[6] D. Pandya, C. Martinez, W.-M. Lin and P. Patel, “AdvancedHashing Techniques for Non-Uniformly Distributed IP Ad-dress Lookup”, Third IASTED International Conference onCommunications and Computer Networks (CCN2006), Oc-tober 2006, Lima, Peru.

[7] D. Pao, C. Liu, L. Yeung and K.S. Chan, “Efficient Hard-ware Architecture for Fast IP Address Lookup”, IEEE IN-FOCOM, 2002.

532