9
A Survey of String Matching Approaches in Hardware Sevin Fide and Stephen Jenks Department of Electrical Engineering and Computer Science University of California, Irvine Irvine, CA 92697 {sde, sjenks}@uci.edu  AbstractThe string match ing proble m can be divided into two maj or cat egories, known as exac t str ing mat chi ng and  appro ximate string match ing . T oday, exa ct str ing mat chi ng is used in several applications such as network intrusion detection and IP address lookup in routers. This report focuses on these appli cati on area s and present s a surv ey of sev eral exact strin g matching approaches in hardware. I. STRING MATCHING APPROACHES IN HARDWARE The st ri ng matc hi ng pr oble m can be di vi ded into two majo r cate gori es, known as exa ct stri ng matc hing and ap-  pr oxima te stri ng matc hing . As we ar e inte re st ed in ex ac t stri ng matc hing approac hes, this report does not cov er ap- proxi mate stri ng matc hing appr oache s (e.g. ndi ng spel ling errors in a document). Exact string matching approaches can be further divided into software- based solutions and hardware- based solutions. Since software-based solutions are slower and less efcien t, hard ware -bas ed solu tions are high ly pref erre d. There fore , this report cov ers only hard ware -base d solut ions in exact string matching. This section continues with a brief discussion of early exact string matching algorithms. One of the earliest exact string matching algorithms is Aho- Corasick algorithm [1]. The algorithm locates all occurrences of any ke ywo rds in a te xt str ing . It works by con struct ing a nit e state patt ern matchin g mach ine from the keyword s, and then using the pattern matching machine to process the text string in a single pass. The state machine starts with an empty root node. Each pattern to be matched adds states to the machine, starting at the root and going to the end of the pattern. The state machine is then traversed and failure pointers are added to indicate any disconnection between two states. Figure 1 illustrates the state machine for the set of keywords {he, she, his, hers} 1 . The time complexity of Aho-Corasick algorithm is linear in the size of the input. The Boy er -Mo ore alg ori thm [4] is als o one of the ear ly algorithms and is the most widely used algorithm for string matching. It is based on two heuristics: bad-charact er heuristic and good sufx heuristic. The novel aspect of the Boyer-Moore algorithm and the reason for its effectiveness is that character matching is performed right-to-left. The bad character heuristic shifts the search string to align the mismatching character with 1 Figure 1 is courtesy of [1]. Fig. 1. Aho- Cora sick Algo rith m the righ tmost posit ion at whic h the mismatch ing char acte r appears in the searc h str ing . If the mismatc h occurs in the middle of the search string, then there is sufx that matches. The good sufx heuristic shifts the search string to the next occurrence of the sufx in the string. For example, given a single pattern of length n to match, we look ahead in the input string by n characters. If the character at this point does not match with the pattern, we move the search pointer ahead by n +1 characters without inspecting the characters in between. If there is a match, we start comparing the previous characters. The Boyer-Moore algorithm shows a sublinear performance in the average case. Raq et al. simplied the Boyer-Moore algorithm, reduced the me mor y req uir ements, and mad e it fa ster [15 ]. In the orig inal Boyer -Moor e algor ithm, the mismatched char acte r of text is sea rch ed in pattern. In their propos ed algo rith m, the mismatch ed chara cter of pattern is sea rch ed in text . In addi tion , they made opti miza tions in calc ulat ing jump s be- tween characters and strings. The proposed algorithm’s time complexity is O(n/m) in best case, O(n) in average case, and O(nm) in worst case, where n is the length of text and m is the length of pattern. Today, exact string matching is used in several applications suc h as net wor k int rus ion det ect ion and IP add res s loo kup in routers. To ha ve a better und erstandin g of how str ing matc hing is done in hard ware , the approach es propo sed for

String Matching - Lê Đắc Nhường

Embed Size (px)

Citation preview

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 1/9

A Survey of String Matching Approaches inHardware

Sevin Fide and Stephen Jenks

Department of Electrical Engineering and Computer ScienceUniversity of California, IrvineIrvine, CA 92697

{sde, sjenks }@uci.edu

Abstract — The string matching problem can be divided intotwo major categories, known as exact string matching and approximate string matching . Today, exact string matching isused in several applications such as network intrusion detectionand IP address lookup in routers. This report focuses on theseapplication areas and presents a survey of several exact stringmatching approaches in hardware.

I. S TRING M ATCHING A PPROACHES IN H ARDWARE

The string matching problem can be divided into twomajor categories, known as exact string matching and ap- proximate string matching . As we are interested in exactstring matching approaches, this report does not cover ap-proximate string matching approaches (e.g. nding spellingerrors in a document). Exact string matching approaches canbe further divided into software-based solutions and hardware-based solutions . Since software-based solutions are slower andless efcient, hardware-based solutions are highly preferred.Therefore, this report covers only hardware-based solutionsin exact string matching. This section continues with a brief

discussion of early exact string matching algorithms.One of the earliest exact string matching algorithms is Aho-

Corasick algorithm [1]. The algorithm locates all occurrencesof any keywords in a text string. It works by constructinga nite state pattern matching machine from the keywords,and then using the pattern matching machine to process thetext string in a single pass. The state machine starts with anempty root node. Each pattern to be matched adds states tothe machine, starting at the root and going to the end of thepattern. The state machine is then traversed and failure pointersare added to indicate any disconnection between two states.Figure 1 illustrates the state machine for the set of keywords{he, she, his, hers }1 . The time complexity of Aho-Corasick algorithm is linear in the size of the input.

The Boyer-Moore algorithm [4] is also one of the earlyalgorithms and is the most widely used algorithm for stringmatching. It is based on two heuristics: bad-character heuristicand good sufx heuristic . The novel aspect of the Boyer-Moorealgorithm and the reason for its effectiveness is that charactermatching is performed right-to-left. The bad character heuristicshifts the search string to align the mismatching character with

1 Figure 1 is courtesy of [1].

Fig. 1. Aho-Corasick Algorithm

the rightmost position at which the mismatching characterappears in the search string. If the mismatch occurs in themiddle of the search string, then there is sufx that matches.The good sufx heuristic shifts the search string to the nextoccurrence of the sufx in the string. For example, given asingle pattern of length n to match, we look ahead in the inputstring by n characters. If the character at this point does notmatch with the pattern, we move the search pointer ahead byn + 1 characters without inspecting the characters in between.If there is a match, we start comparing the previous characters.The Boyer-Moore algorithm shows a sublinear performance inthe average case.

Raq et al. simplied the Boyer-Moore algorithm, reducedthe memory requirements, and made it faster [15]. In theoriginal Boyer-Moore algorithm, the mismatched characterof text is searched in pattern . In their proposed algorithm,

the mismatched character of pattern is searched in text . Inaddition, they made optimizations in calculating jumps be-tween characters and strings. The proposed algorithm’s timecomplexity is O(n/m ) in best case, O(n) in average case, andO(nm ) in worst case, where n is the length of text and m isthe length of pattern.

Today, exact string matching is used in several applicationssuch as network intrusion detection and IP address lookupin routers. To have a better understanding of how stringmatching is done in hardware, the approaches proposed for

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 2/9

these applications will be discussed in Section III and Sec-tion IV, respectively. However, the following section discussesContent-Addressable Memory, which is used in straightfor-ward implementations of string matching.

I I . C ONTENT -A DDRESSABLE M EMORY

Traditional computers rely upon a memory design that storesand retrieves data by its address rather than its contents [5].This retrieval-by-address approach has become very successfuldue to its simplicity. In late 1970’s, the separation betweenthe CPU and memory lead to what is known as the von Neumann bottleneck , where the memory-access path becomesthe limiting factor for system performance. Today, the cachehierarchy between CPU and main memory addresses some of the performance issues of the bottleneck.

In content-addressable memory (CAM) (also known asassociative memory ), information is stored, retrieved, or mod-ied based on the data itself, rather than its arbitrary storagelocation. The basic functions in an associative memory includebroadcasting and comparison of a search key with every stored

location simultaneously, and identication of the matchingwords. However, it is difcult to decide where in the memoryto write information in such memory architectures. Sincedata are not addressable by location, methods of identifyingcurrently available memory areas must be employed. Forexample, free memory areas can be identied by a separatetag bit, or by their contents.

CAMs are useful for applications such as data processingapplications, database applications, le maintenance, patternrecognition, symbolic representation of information, data re-trieval, speech recognition, spell checking, list and stringprocessing, language translations, and networking applicationsincluding routing lookups, packet classication and address

translation [5]. However, network routing systems and asso-ciative distributed memory systems (e.g. neural networks) arearchitectures suitable for CAM use. In addition, the translationlookaside buffer (TLB) is a small associative memory whichmaps virtual addresses to real addresses. It is often organizedas a number of groups of elements, each consisting of a virtualaddress and a real address.

The cost of a CAM depends on the storage elements, the cellinterconnection, and the amount of logic associated with eachelement. Therefore, a CAM is more expensive than a randomaccess memory (RAM) because each cell must have storagecapability as well as logic circuits for matching its content withan external argument. For this reason, CAMs are just used inapplications where the search time is very critical and mustbe very short. On the other hand, programming for a CAMis not more difcult than that for a conventional memory; infact, it is conceptually more straightforward for certain typesof applications, such as database applications.

Furthermore, there are three storage and retrieval architec-ture types for CAMs: bit serial, byte serial, and word serial .The data in a bit-serial CAM is inspected one bit at time, withthe same bit in every word in the memory simultaneously.Byte-serial CAMs inspect a byte with each memory access,

comparing the same byte position in every memory wordsimultaneously. Word-serial systems read one whole word ata time and compare all the words in the memory with eachassociative access.

III . N ETWORK INTRUSION D ETECTION S YSTEMS

Traditionally networks have been protected using rewallsthat monitor and lter network trafc. Firewalls usually ex-amine packet headers to determine whether to block or allowpackets. Due to busy network trafc and smart attackingschemes (e.g. Code Red), rewalls are not as effective asthey used to be. Signature-based network intrusion detectionsystems (NIDS) use patterns of well-known attacks to matchand identify intrusions. They monitor incoming trafc forsuspicious packet contents by inspecting packet payload forattack signatures. Therefore, they have to do string matchingin incoming packet contents and often rely on exact stringmatching approaches.

String matching is the most computationally expensivetask in intrusion detection systems, and must be performed

at wire-speed so that it does not become a bottleneck tothe system’s performance. It is difcult for software-basedpacket monitors to keep up with rapidly increasing networkingspeeds; therefore, software-based solutions are not as effectiveas hardware-based solutions. Snort [20] is a freely availableand widely used intrusion detection tool. It uses a set of rules that are derived from known attacks or other suspiciousbehaviors. Rules are added to Snort as new vulnerabilitiesare discovered. Each rule contains a content string, associatedrules for its location, and the type of packet it can appear in.Most of the approaches discussed in this section use Snortrules in their implementations.

Mishina et al. presented a string matching algorithm that is

suitable for vector processors in 1993 [12]. The algorithm wasdesigned for Hitachi’s pipelined vector processor, IntegratedDatabase Processor (IDP). IDP was rst commercially intro-duced in 1986 as an optional hardware for one of Hitachi’sgeneral-purpose computers. It was attached to the CPU of a host machine to improve the host machine’s performance.The proposed string matching algorithm consists of two parts:cutout and check . In the cutout part, a text string is dividedinto independently operable substrings so that each substringcan be matched with pattern strings in a pipelined mannerusing vector processors. The cutout part was implemented asan added instruction of the IDP. The idea in the check partis to apply the Aho-Corasick string matching algorithm con-currently to all substrings obtained from the cutout procedure.Experiment results show that the proposed algorithm is 10times faster than a scalar program using the Aho-Corasick algorithm.

Sidhu et al. proposed an approach based on nondeterministicnite automaton (NFA) for regular expression matching [19].A regular expression is a pattern that matches one or morestrings of characters. An NFA is a directed graph in whicheach node is a state and each edge is labeled with a singlecharacter or an empty string. In the proposed approach, regular

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 3/9

expressions are generated for every string in the rule setand an NFA that examines the input one byte at a time isimplemented. The time complexity of this approach is O(n),and the area complexity is O(n 2 ). The time complexity tosearch through a text of length m is O(n + m), and thearea complexity is O(n 2 ). Experiment results show that theproposed architecture can achieve 0.75 Gbps on a Virtex 100device with operating frequency of 93.5 MHz, and the arearequired per search pattern is approximately 31 logic cells. Itis important to note that FAs are generally complex, hard toimplement, have to be rebuilt every time a string is added andachieve low throughput.

Tuck et al. proposed modications to the well-known Aho-Corasick string matching algorithm to reduce the amountof memory required to store known malicious strings andimprove worst-case timing [22]. They achieve results in bothof these areas, while slightly degrading average-case perfor-mance. The proposed data storage methods for string matchingare bitmap compression and path compression . This compactstorage is desired to t the data structure in on-chip SRAM

or the cache of a commodity processor. Their experimentsconsider both an ASIC and a programmable router design.The ASIC design is tailored to only string matching, whilethe programmable design assumes an implementation that canbe used for many different types of router applications. Experi-ment results show that the proposed compression optimizationsresulted in a 50 times reduction in database size over the Aho-Corasick implementation.

Cho et al. proposed a rule-based inspection rewall systembased on a parallel architecture [6]. Figure 2 illustrates the pro-posed parallel architecture 2 . Each rule is separately processedin parallel. First, packet data is passed to the units througha 32-bit bus. Then, the header information of each packet is

compared with the predened header data. If there is a match,the payload data is sent to the content pattern match unit wherethe predened pattern is searched. The content pattern matchunits contain 8-bit registers and 8-bit comparators. To increasethroughput, four bytes of data are matched in each stage of thepipeline. In other words, four parallel comparators are used perstring. Experiment results show that the proposed architecturecan achieve over 2.88 Gbps on an Altera EP20K with operatingfrequency of 90 MHz, and the area required per search patternis 10 logic cells.

Sourdis et al. proposed an FPGA-based approach for thestring matching problem in network intrusion detection sys-tems [21]. The reason they chose an FPGA-based approachis because of hardware speed and also parallelism can beexploited. Figure 3 illustrates the proposed system 3 . Packetsarrive and are distributed to the matching engines. There areN parallel comparators that can process N characters percycle. The matching results are encoded to determine theaction for packets. The parallel comparators have a CAM-like functionality. The proposed architecture uses ne-grain

2 Figure 2 is courtesy of [6].3 Figure 3 is courtesy of [21].

Fig. 2. Parallel Datapath of NIDS System

pipelining for all the modules illustrated in Figure 3. Experi-ment results show the best throughput that can be achieved is11 Gbps on a Virtex2 1000 device with operating frequencyof 340 MHz, and the match cost per search pattern characteris approximately 17 logic cells.

Dharmapurikar et al. proposed a hardware architecturebased on parallel Bloom lters for network packet inspec-tion [8]. A Bloom lter is a space-efcient probabilistic datastructure that is used to test whether or not an element isa member of a set. It stores a set of signatures compactly bycomputing multiple hash functions on each member of the set.

The answer to querying a database of strings to check for themembership of a particular string can be “false positive”, butnever “false negative”. False positive means a condition existswhen in fact it does not. False negative means a conditiondoes not exist when in fact it does. The computation timeinvolved in performing the query is independent of the numberof strings in the database, provided the memory used by thedata structure scales linearly with the number of strings storedin it. Bloom lters use less memory, are easy to reprogram,and achieve a higher throughput than FA implementations. Inthe proposed approach, signatures are grouped according totheir length, and each Bloom lter scans the incoming data andchecks the strings of corresponding length. Figure 4 illustratesthe proposed hardware architecture 4 . If a string is identied asa member of any Bloom lter, the system declares the stringas a possible matching signature. Such strings are then sent toan analyzer , which determines if the string is a member of theset or a false positive. The analyzer uses a deterministic string-matching algorithm. In the proposed architecture, each Bloomlter gives one query result per clock cycle. Experiment resultsshow that the proposed architecture can achieve a throughput

4 Figure 4 is courtesy of [8].

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 4/9

Fig. 3. FPGA-Based NIDS System

Fig. 4. NIDS System Using Bloom Filters

of 2.46 Gbps.By using Bloom lters, an NIDS can be implemented

that scans for tens of thousands of strings at gigabit persecond rates, all within a single FPGA [3]. Attig et al.proposed a Bloom lter architecture for intrusion detectionsystems, as illustrated in Figure 5 5 . Packets enter the systemand are processed by Internet Protocol (IP) wrappers. Thedata in the packet goes to the input buffer and then owsthrough the content pipeline. As the packet passes throughthe pipeline, multiple Bloom engines scan different window

5 Figure 5 is courtesy of [3].

lengths for signatures of different lengths. Data leaves thecontent pipeline, ows to the output buffer, streams through

the wrappers, and then packets are re-injected into the network.If a Bloom engine detects a match, a hash table is queried todetermine if an exact match occurred. If the queried signatureis an exact match, the malicious content can be blocked andan alert message is generated. Experiment results show thatthe proposed architecture can achieve a throughput of over 2Gbps on a VirtexE 2000 device.

Raq et al. presented a systolic array architecture forstring searching/matching by using a formal procedural ap-proach [16]. Their approach includes drawing a dependency

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 5/9

Fig. 5. Bloom Filter Architecture

graph for a simple string searching/matching algorithm, andthen forming a systolic array architecture from the dependencygraph. Figure 6 illustrates the systolic array architecture whenpattern ( P ) is pipelined and text ( T ) is broadcasted 6 . The sizeof P is m and the size of T is n . Therefore, there are mprocessing units in this approach. Each processing unit has axed character of P to compare with different characters fromT . The time complexity is O(n) and the area complexity isO(m).

Figure 7 illustrates another systolic array architecture whenP is broadcasted and T is pipelined 7 . There are n − m +1 processing units in this approach. A subset of T , T k , ispipelined to the k th processor, where 0 ≤ k ≤ n − m . Then,match/mismatch is produced at T k after m time units in thek th processing unit. The time complexity is O(m) and the areacomplexity is O(n − m + 1) . If n >> m , then the area-timecomplexity is O(nm ).

Aldwairi et al. suggested a recongurable memory basedaccelerator for intrusion detection [2]. The accelerator is apart of the congurable network processor architecture shownin Figure 8 8 . It consists of a 2-wide multiple issue VLIWprocessor with hardware support for eight hyper threads. The

6 Figure 6 is courtesy of [16].7 Figure 7 is courtesy of [16].8 Figure 8 is courtesy of [2].

Fig. 6. Systolic Array Architecture (P is pipelined, T is broadcasted)

Fig. 7. Systolic Array Architecture (P is broadcasted, T is pipelined)

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 6/9

Fig. 8. Congurable Network Processor Architecture

Fig. 9. String Matching Accelerator

memory system consists of multi-port RAM and a high speedDMA. A number of congurable accelerators are used to speedup specic networking tasks such as IP forwarding, quality of service and string matching for intrusion detection. The pro-posed system consists of two components: a software that runson the VLIW core and a hardware string matching acceleratorfor FSM implementation. The software generates an FSM and

creates the state tables, and the FSM matches strings basedon the Aho-Corasick string matching algorithm. They use adifferent method to store the Aho-Corasick database in anSRAM that achieves a higher throughput than Tuck et al. ’simplementation while having a similar memory requirement.The hardware is shown in Figure 9 9 . It implements a MealyFSM and consists of a RAM to store the state tables, a registerto hold the current state, and control logic to access the RAMand nd a match. Experiment results show that the proposedarchitecture can achieve up to 14 Gbps throughput using 8FSMs in parallel. The experiments took place on an AlteraEP20k400E device. The throughput degrades as the numberof characters grows. This is due to the fact that as the numberof characters increases, the number of states increases and thestate table size increases as well.

IV. IP A DDRESS L OOKUP IN ROUTERS

IP routing requires that a router perform a longest-prex-match address lookup for each incoming datagram in orderto determine the datagram’s next hop. Therefore, router per-formance is limited by the route lookup mechanism. Today,

9 Figure 9 is courtesy of [2].

there are two addressing schemes in use: classful addressingscheme and classless interdomain routing (CIDR) scheme . Theclassful addressing scheme uses a simple two-level hierarchy.The 32-bit IP address is broken down into the network addresspart and the host address part. IP routers forward packetsbased only on the network address until the packets reachthe destined network. Typically, an entry in the forwardingtable stores the address prex (e.g. the network address) andthe routing information (i.e. the next-hop router and outputinterface). The address lookup operation includes nding anexact prex match in the forwarding table.

Arbitrary length prexes are allowed in the CIDR scheme toprovide a more efcient use of the IP address space and avoidthe problem of forwarding table explosion. With CIDR, the ad-dress lookup operation is more difcult. It includes nding thelongest address prex in the forwarding table that matches thedestination address. Several software and hardware solutionsto the IP address lookup problem have been published in theliterature. In this report, a brief overview of some interestinghardware solutions is presented. The hardware solutions are

mainly CAM-based or processor-memory based solutions. Inprocessor-memory based solutions, the routing table is storedin memory and the lookup algorithm runs on a processor. Asthere may be several memory accesses to retrieve the longestmatching prex, memory bandwidth becomes a bottleneck.Therefore, the objective of many hardware-based IP lookupalgorithms is to keep the memory accesses as few as possible.

The most straightforward way to implement a lookupscheme is to have a forwarding table in which an entry isdesignated for each 32-bit IP address. In this case, the sizeof the forwarding table (next-hop array (NHA)) becomes toolarge (4 GB) to be practical. Figure 10 illustrates this directlookup approach 10 . Huang et al. reduced the size of the

associated NHA by considering the distribution of the prexesbelonging to the same segment [10]. With 45,000 routingprexes, it results in about 750 KB entries in the NHA. Byemploying the concept of compression, the required memorysize can be further reduced, but the number of memoryaccesses increases to three. The time complexity for buildingthe so-called Code Word Array (CWA) and the compressedNHA (CNHA) is O(n log n), where n denotes the number of prexes in a segment [10]. However, due to the mechanism of compression, it is needed to rebuild the CNHA for updating theforwarding table. Since the routing updates may occur everyfew seconds, the performance might degrade severely due tothe memory bandwidth contention.

Waldvogel et al. proposed an address lookup scheme basedon binary search of hash tables [23]. The scheme requiresa worst-case time of log2 (address bits ) hash lookups andadditional memory space to store markers remembering thelast found best matching prex (BMP). Adding or deleting asingle prex can change the BMP values of a large number of markers, and hence updating the forwarding table is expensivein the scheme. Waldvogel et al. presented a binary search

10 Figure 10 is courtesy of [10].

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 7/9

Fig. 10. Direct Lookup Approach

algorithm in each prex length using hashing, but the binarysearch takes times of memory accesses in worst case whichis the number of distinct prexes in IP address. The schemeassumes to use a perfect hashing hardware and does notconsider the occasion of collisions in hashing.

Gupta et al. presented a route lookup mechanism that

when implemented in a pipelined fashion in hardware, canachieve one route lookup every memory access [9]. Figure 11illustrates the hardware scheme presented 11 . The approachemploys a multi-level lookup mechanism, and uses two tables,both stored in DRAM. The rst table stores all possible routeprexes that are up to 24 bits long (i.e. 224 entries). If anentry is longer than 24 bits, the rst table entry contains apointer to a set of entries in the second table. One observationin this approach is that in a typical router, most of the entrieshave prexes of length 24 bits or less. As a result, using 24bits in the rst table allows the match to be found in onememory access for the majority of cases. This approach, ingeneral, requires two memory accesses for a lookup in a 33

MB forwarding table, and achieves up to 20 million lookupsper second. By using an additional intermediate length tableand splitting the 32-bit address into three portions of 21, 3,and 8 bits, respectively, the amount of memory can be reducedto 9 MB. However, it uses memory inefciently; insertion anddeletion of routes from the tables may require many memoryaccesses.

Pao et al. presented a hardware architecture modeled as asearching problem on a binary-trie [14]. A trie is an orderedtree data structure that is used to store an associative arraywhere the keys are strings. Unlike a binary search tree, no nodein the trie stores the key associated with that node; instead,its position in the trie shows what key it is associated with. InPao et al. ’s approach, the complete binary-trie is partitionedinto non-overlapping subtrees of 255 nodes. Each subtree isrepresented using a bit-vector and can be searched in parallel.A k-bit binary-trie can be represented by a bit-vector with2k +1 − 1 bits, called the tree-vector . The bits in the tree-vectorare numbered from left to right. Bit i of the tree-vector is equalto 1, if the prex that corresponds to node i of the binary-trieis present in the forwarding table, otherwise bit i is equal to 0.

11 Figure 11 is courtesy of [17].

Similarly, the search path can be represented by a bit-vectorcalled the mask-vector . To nd the best matching prex in asubtree, the following operations are performed in a pipelinedarchitecture:

(i) read the tree-vector and the mask-vector from memory(ii) perform a bit-wise AND operation of the mask-vector

and the tree-vector to obtain the result-vector and nd

the position of the rightmost ‘1’ in the result-vector(iii) read the next-hop identier from the routing-vector

The longest matching prex, if any, is given by the bit numberof the rightmost ‘1’ in the result-vector of the AND operation.To reduce the processing time of step (ii), a digital circuitcalled the RMB locator circuit is implemented. If all datastructures are stored in SRAM, a cycle time of 12.5 nsis achieved. If DRAM is preferred, then the cycle time isabout 50 ns. The total amount of memory required for areasonable implementation is about 8 MB. Together with apipelined architecture, a throughput of 20 millions and 80millions lookup per second can be achieved if the majormemory modules are implemented using DRAM and SRAM,respectively.

An approach based on hashing is presented by Lim et al. in [11], which converts longest prex matching probleminto the exact matching problem. Figure 12 illustrates thehardware scheme presented 12 . In the proposed architecture, theforwarding table is composed of multiple SRAMs, and eachSRAM represents an address lookup table in a single prex.Hashing functions are applied to each address lookup table inorder to nd out matching entries in parallel, and the entrymatched with the longest prex among them is selected. Sub-tables are provided to solve collisions; binary search is appliedto the sub-tables for the collided entries. Experiment resultsshow that the proposed approach can achieve 1-5 memoryaccesses, and the forwarding table size is 189 KB.

Mohammadi et al. state that software lookups are exibleand scalable but slow, and hardware lookups are fast but inex-ible [13]. Therefore, they propose hardware-assisted softwarelookups by making small modications in the instruction set of a general-purpose processor. The approach is based on DMP-Tree (Dynamic M-way Prex Tree), which is a superset of the B-Tree data structure. In general, a B-Tree consists of several buckets, where each bucket is a node, and containssorted elements and pointers to the next level. To nd thelongest prex in the DMP-Tree, the following instructions areimplemented in hardware to help software lookups:

(i) prex matching(ii) nding place of an IP address in a bucket

(iii) nding the longest matching prex of an IP address inbucket

Experiment results show that the number of lookups decreasesas the branching factor increases. For instance, when thebranching factor is 4, 12 million lookups per second can beachieved in a routing table of 100 KB. However, when the

12 Figure 12 is courtesy of [11].

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 8/9

Fig. 11. Multi-level Lookup Approach

Fig. 12. Hardware Architecture Using Hashing

branching factor is 40, 4 million lookups per second can beachieved.

Sangireddy et al. suggest that most of the available schemesfor IP address lookup depend on the memory access tech-nology which limits their performance [18]. Therefore, theypropose a binary decision diagram (BDD) based computationallogic for address lookup scheme. One key observation in thisscheme is that even at the largest network access point, thenumber of next-hop ports (NHPs) is generally not greater than256. Hence, an NHP associated with any prex in the routing

table can be encoded using an 8-bit binary code. Therefore,the computational logic design is done for eight output bits,and that gives eight BDDs to be processed. The results showthat the BDD hardware engine gives a throughput of up to175.7 million lookups per second for a large AADS routingtable with 33,796 prexes, a throughput of up to 168.6 millionlookups per second for an MAEWest routing table with 29,487prexes, and a throughput of up to 229.3 million lookups persecond for the Pacbell routing table with 6,822 prexes.

A different approach from CAM-based and processor-

8/6/2019 String Matching - Lê Đắc Nhường

http://slidepdf.com/reader/full/string-matching-le-dac-nhuong 9/9

Fig. 13. Lookup Engine Architecture

memory based solutions was proposed by Desai et al. [7].They present an architecture for an IP address lookup en-gine based on programmable nite-state machines (FSMs).Basically, the lookup engine is a recongurable hardwareimplementation in the form of a FSM. Figure 13 illustratesthe architecture of the lookup engine 13 . The recongurablehardware illustrated in Figure 13 performs the address lookup.The processor computes the FSM for a given routing databaseof address prexes and then compiles it in a format appropriatefor programming the recongurable hardware. If there is achange in the routing database, then the state machine isrecomputed again. Experiment results show that the average

number of cycles required for lookup is about 5 to 8 cycles [7].

V. S UMMARY

In this report, we have presented several hardware ap-proaches for exact string matching. Network intrusion de-tection (NIDS) and IP address lookup in routers are twoapplications that have been discussed for string matching. Theapproaches presented for NIDS applications include CAM-based solutions, nite automata, discrete comparators, bloomlters, and systolic array structures, while the approaches pre-sented for IP lookup include CAM-based solutions, multi-levellookup, binary search, binary tries, hashing, binary decisiondiagrams, and nite state machines.

REFERENCES

[1] A. V. Aho and M. J. Corasick. Efcient String Matching: An Aid toBibliographic Search. Communications of the ACM , 18(6):333–340,1975.

[2] M. Aldwairi, T. Conte, and P. Franzon. Congurable String MatchingHardware for Speeding up Intrusion Detection. ACM SIGARCH Com- puter Architecture News , 33(1):99–107, 2005.

13 Figure 13 is courtesy of [7].

[3] M. Attig, S. Dharmapurikar, and J. Lockwood. Implementation Resultsof Bloom Filters for String Matching. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines(FCCM’04) , Napa, California, USA, April 2004.

[4] R. S. Boyer and J. S. Moore. A Fast String Searching Algorithm.Communications of the ACM , 20(10):762–772, 1977.

[5] L. Chisvin and R. J. Duckworth. Content-Addressable and Associa-tive Memory: Alternatives to the Ubiquitous RAM. IEEE Computer ,22(7):51–64, 1989.

[6] Y. H. Cho, S. Navab, and W. H. Mangione-Smith. Specialized Hardwarefor Deep Network Packet Filtering. In Proceedings of the 12th In-ternational Conference on Field-Programmable Logic and Applications(FPL’02) , pages 452–461, London, UK, 2002. Springer-Verlag.

[7] M. Desai, R. Gupta, A. Karandikar, K. Saxena, and V. Samant. Recon-gurable Finite-Sate Machine Based IP Lookup Engine for High-SpeedRouter. IEEE Journal on Selected Areas in Communications , 21(4),May 2003.

[8] S. Dharmapurikar, P. Krishnamurthy, T. S. Sproull, and J. W. Lockwood.Deep Packet Inspection using Parallel Bloom Filters. IEEE Micro ,24(1):52–61, 2004.

[9] P. Gupta, S. Lin, and N. McKeown. Routing Lookups in Hardware atMemory Access Speeds. In Proceedings of IEEE Infocom’98 , volume 3,pages 1240–1247, San Francisco, USA, March 1998.

[10] N. Huang and S. Zhao. A Novel IP-Routing Lookup Scheme andHardware Architecture for Multigigabit Switching Routers. IEEE Journal on Selected Areas in Communications , 17(6), June 1999.

[11] H. Lim and J. Seo andY. Jung. High Speed IP Address Lookup

Architecture Using Hashing. IEEE Communications Letters , 7(10):502–504, October 2003.

[12] Y. Mishina and K. Kojima. String Matching on IDP: A String MatchingAlgorithm for Vector Processors and its Implementation. In Proceedingsof 1993 IEEE International Conference on Computer Design (ICCD’93) ,1993.

[13] H. Mohammadi, N. Yazdani, B. Robatmili, and M. Nourani. HASIL:Hardware Assisted Software-based IP Lookup for Large Routing Tables.In Proceedings of the 11th IEEE International Conference on Networks(ICON2003) , pages 99–104, September 2003.

[14] D. Pao, C. Liu, A. Wu, L. Yeung, and K. Chan. Efcient HardwareArchitecture for Fast IP Address Lookup. In Proceedings of IEEE Infocom 2002 , New York, USA, June 2002.

[15] A. N. M. E. Raq, M. W. El-Kharashi, and F. Gebali. A Fast StringSearch Algorithm for Computer Networking. In Proceedings of the IEEE Pacic Rim Conference on Communications, Computers and SignalProcessing , volume 2, pages 764–767, August 2003.

[16] A. N. M. E. Raq, F. Gebali, and M. W. El-Kharashi. A SystolicArray Structure for String Searching. In Proceedings of the Interna-tional Conference on Electrical, Electronic and Computer Engineering,(ICEEC’04) , pages 281–284, September 2004.

[17] M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. Survey andTaxonomy of IP Address Lookup Algorithms. IEEE Network , 15(2):8–23, April 2001.

[18] R. Sangireddy and A. K. Somani. High-Speed IP Routing with BinaryDecision Diagrams Based Hardware Address Lookup Engine. IEEE Journal on Selected Areas in Communications , 21(4), May 2003.

[19] R. Sidhu and V. K. Prasanna. Fast Regular Expression Matching UsingFPGAs. In Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’01) , pages 227–238, Rohnert Park, CA, USA, May 2001.

[20] Snort. www.snort.org.[21] I. Sourdis and D. Pnevmatikatos. Fast, Large-Scale String Match for a 10

Gbps FPGA-based Network Intrusion Detection System. In Proceedings

of the 13th International Conference on Field Programmable Logic and Applications (FPL’03) , Lisbon, Portugal, September 2003.

[22] N. Tuck, T. Sherwood, B. Calder, and G. Varghese. DeterministicMemory-Efcient String Matching Algorithms for Intrusion Detection.In Proceedings of IEEE Infocom , Hong Kong, March 2004.

[23] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. ScalableHigh Speed IP Routing Lookups. In SIGCOMM ’97: Proceedings of the ACM SIGCOMM ’97 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication , pages 25–36, New York, NY, USA, 1997. ACM Press.