9
IEEE SYSTEMS JOURNAL, VOL. 4, NO. 2, JUNE 2010 147 Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems Paul C. Hershey, Senior Member, IEEE, and Charles B. Silio, Jr., Senior Member, IEEE Abstract—Emerging high-speed network systems are capable of transporting and delivering data at rates faster than hardware and software components can economically process them. The result is data overflow in which performance information that is crit- ical for effective network system monitoring and management may be lost, thereby leaving the system vulnerable to quality of ser- vice degradation and the service provider unable to meet customer service level agreements. Service providers seek a solution to this problem that minimizes the amount of high-speed, high-cost elec- tronics required to comprehensively recognize such information. This paper addresses the challenge of surmounting data overflow problems in the collection of information by introducing a new pro- cedure to transform finite state recognizers into new machines that can recognize bit-level information as it passes a monitoring point while operating slower than bit-rate for implementation in recon- figurable hardware, such as RAM and Field Programmable Gate Arrays. This is accomplished by mapping N-bit sets from the input stream into new symbols that can be processed at rate 1/N while also generating N-bit output symbols. The process is illustrated by implementation examples, and a time versus space tradeoff anal- ysis is presented. Index Terms—Data overflow, finite state recognizer, informa- tion collection, network monitoring and management, network systems, reconfigurable hardware. I. INTRODUCTION T HE global Internet has the potential to provide end-users access to thoughts and ideas almost instantly with respect to human perception. However, this vision is not presently re- alizable because the network systems over which high speed data are transmitted are subject to data overflow problems that occur whenever the amount of available data exceeds the pro- cessing capacity of the components used within those systems. In fact, many commercial businesses, government agencies, and academic institutions are overwhelmed with too much data and are struggling to make intelligent and timely decisions based on proper interpretation of the data content. A critical area that suf- fers as a result of data overflow is network systems monitoring and management that requires observation of all relevant perfor- mance information to enable service providers to meet customer service level commitments. Manuscript received May 15, 2009; revised August 30, 2009. Date of publi- cation May 20, 2010; date of current version June 03, 2010. P. C. Hershey is with Raytheon, Dulles, VA 20166 USA (e-mail: cphershey@ aol.com). C. B. Silio, Jr. is with the University of Maryland, College Park, MD, 20742 USA Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSYST.2010.2050083 The severity of data overflow problems becomes more pro- nounced as high speed network systems transform into a net- centric environment driven by methodologies such as service- oriented architecture (SOA). Net-centricity is the realization of a robust, secure, and globally interconnected networked envi- ronment (i.e., infrastructure, systems, processes, and people) that enables the distribution of relevant data to all involved par- ties (i.e., users, applications, and platforms) independent of time or location. Net-centric implementations are based on an SOA methodology for which network resources are available as in- dependent services that can be accessed without knowledge of their underlying platform implementation [1]. This paper addresses the challenge of surmounting data overflow problems in emerging high-speed network systems by extending previous work of Hershey and Silio [2], [3] to include a formal procedure to enable information collection over high speed transport media using economical electronic circuitry. The remainder of this paper is organized as follows: Section II provides a survey of related work. Section III describes the proposed approach. Section IV presents the procedure for surmounting data overflow problems. Section V provides implementation examples, Section VI provides conclusions, and Section VII briefly discusses future work to extend the procedure to support collection and analysis for SONET/SDH OC768 transport networks. II. RELATED WORK The issue of surmounting data overflow problems is not new, and there have been various approaches to address it. This sec- tion provides a survey of those works focusing on tracing the actual data, collecting statistical information only, packet classi- fication, and pattern matching. A “trace” is a record of all frames and bytes transmitted on a network that usually provides a com- plete picture of network behavior. The trace approach for col- lecting data traditionally has been accomplished in two ways. Trace method 1 requires that all network activity be copied di- rectly to disk storage where the stored record of activity is then organized by a post-processor into a form that can be used in the analysis process. For trace method 2, the trace data are pre- processed so that only a subset of all the available network ac- tivity is written to disk storage. Hence, method 2 requires less disk storage and less post-processing than trace method 1 for the same time period of network activity. These two methods, shown in Fig. 1, capture network activity so that an “after the fact” analysis of the captured network data can be done to de- rive the required information. 1932-8184/$26.00 © 2010 IEEE

Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

  • Upload
    cb

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

IEEE SYSTEMS JOURNAL, VOL. 4, NO. 2, JUNE 2010 147

Surmounting Data Overflow Problems in theCollection of Information for Emerging

High-Speed Network SystemsPaul C. Hershey, Senior Member, IEEE, and Charles B. Silio, Jr., Senior Member, IEEE

Abstract—Emerging high-speed network systems are capable oftransporting and delivering data at rates faster than hardware andsoftware components can economically process them. The resultis data overflow in which performance information that is crit-ical for effective network system monitoring and management maybe lost, thereby leaving the system vulnerable to quality of ser-vice degradation and the service provider unable to meet customerservice level agreements. Service providers seek a solution to thisproblem that minimizes the amount of high-speed, high-cost elec-tronics required to comprehensively recognize such information.This paper addresses the challenge of surmounting data overflowproblems in the collection of information by introducing a new pro-cedure to transform finite state recognizers into new machines thatcan recognize bit-level information as it passes a monitoring pointwhile operating slower than bit-rate for implementation in recon-figurable hardware, such as RAM and Field Programmable GateArrays. This is accomplished by mapping N-bit sets from the inputstream into new symbols that can be processed at rate 1/N whilealso generating N-bit output symbols. The process is illustrated byimplementation examples, and a time versus space tradeoff anal-ysis is presented.

Index Terms—Data overflow, finite state recognizer, informa-tion collection, network monitoring and management, networksystems, reconfigurable hardware.

I. INTRODUCTION

T HE global Internet has the potential to provide end-usersaccess to thoughts and ideas almost instantly with respect

to human perception. However, this vision is not presently re-alizable because the network systems over which high speeddata are transmitted are subject to data overflow problems thatoccur whenever the amount of available data exceeds the pro-cessing capacity of the components used within those systems.In fact, many commercial businesses, government agencies, andacademic institutions are overwhelmed with too much data andare struggling to make intelligent and timely decisions based onproper interpretation of the data content. A critical area that suf-fers as a result of data overflow is network systems monitoringand management that requires observation of all relevant perfor-mance information to enable service providers to meet customerservice level commitments.

Manuscript received May 15, 2009; revised August 30, 2009. Date of publi-cation May 20, 2010; date of current version June 03, 2010.

P. C. Hershey is with Raytheon, Dulles, VA 20166 USA (e-mail: [email protected]).

C. B. Silio, Jr. is with the University of Maryland, College Park, MD, 20742USA

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSYST.2010.2050083

The severity of data overflow problems becomes more pro-nounced as high speed network systems transform into a net-centric environment driven by methodologies such as service-oriented architecture (SOA). Net-centricity is the realization ofa robust, secure, and globally interconnected networked envi-ronment (i.e., infrastructure, systems, processes, and people)that enables the distribution of relevant data to all involved par-ties (i.e., users, applications, and platforms) independent of timeor location. Net-centric implementations are based on an SOAmethodology for which network resources are available as in-dependent services that can be accessed without knowledge oftheir underlying platform implementation [1].

This paper addresses the challenge of surmounting dataoverflow problems in emerging high-speed network systems byextending previous work of Hershey and Silio [2], [3] to includea formal procedure to enable information collection over highspeed transport media using economical electronic circuitry.The remainder of this paper is organized as follows: Section IIprovides a survey of related work. Section III describes theproposed approach. Section IV presents the procedure forsurmounting data overflow problems. Section V providesimplementation examples, Section VI provides conclusions,and Section VII briefly discusses future work to extend theprocedure to support collection and analysis for SONET/SDHOC768 transport networks.

II. RELATED WORK

The issue of surmounting data overflow problems is not new,and there have been various approaches to address it. This sec-tion provides a survey of those works focusing on tracing theactual data, collecting statistical information only, packet classi-fication, and pattern matching. A “trace” is a record of all framesand bytes transmitted on a network that usually provides a com-plete picture of network behavior. The trace approach for col-lecting data traditionally has been accomplished in two ways.Trace method 1 requires that all network activity be copied di-rectly to disk storage where the stored record of activity is thenorganized by a post-processor into a form that can be used inthe analysis process. For trace method 2, the trace data are pre-processed so that only a subset of all the available network ac-tivity is written to disk storage. Hence, method 2 requires lessdisk storage and less post-processing than trace method 1 forthe same time period of network activity. These two methods,shown in Fig. 1, capture network activity so that an “after thefact” analysis of the captured network data can be done to de-rive the required information.

1932-8184/$26.00 © 2010 IEEE

Page 2: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

148 IEEE SYSTEMS JOURNAL, VOL. 4, NO. 2, JUNE 2010

Fig. 1. Traditional methods for collecting data.

For high speed network systems, each of these trace methodssuffers from severity of disk storage and processor speed limi-tations. For example, an OC192 Synchronous Optical Network(SONET) ring with a line rate of 10 Gigabits per Second (Gbps)can generate a new 40-byte minimum size IP packet every 35 ns.Using trace method 1, this amount of data collection will com-pletely fill ten 64 terabyte system disks, each with a 1 GHz pro-cessor, in slightly less than ten minutes [4]. For trace method2, a 10,000 MIPS preprocessor, such as the Intel Pentium 4E,would have only 35 instructions to process each packet. As net-work speeds increase, data collection using these techniques be-comes impractical. For example, an OC768 SONET ring with aline rate of 40 Gbps generates a minimum size IP packet every8 ns, thereby increasing the amount of storage required for tracemethod 1 by a factor of four and decreasing the number of in-structions per packet to eight for trace method 2.

A third method uses statistics. For the statistical method,parametric information that is usable in mathematical modelsfor performance evaluation is collected and processed in orderto categorize the data and maintain counters for each category.The statistical method’s advantage over that of the trace is that itstores a small fraction of the data (e.g., only the counter values).While this is in the right direction of data reduction, traditionalstatistical approaches have been inflexible, providing input foronly one particular function or service usage. Furthermore,the processing requirements are greater than those of the tracemethod, making traditional statistical approaches difficult toimplement as network speeds increase.

Packet classification provides a way to categorize packetsinto flows that obey predefined rules and are processed in asimilar manner, such as by a router [5]. In complex networks,there exist multiple, different packet characterizations that canbe represented by algorithms for recognizing regular expres-sions. If these networks are also high-speed networks, then hard-ware-based algorithms can be implemented using devices suchas Ternary Content Addressable Memories (TCAM). However,as network speed and complexity increase, TCAM technologysuffers from lack of density, increasing power dissipation, andprice sensitivity when compared with static RAM [5].

More recently, pattern matching has been used for evolvinghigh-speed networks [6], [7]. Scalable pattern matching hasbeen achieved using Field Programmable Gate Array (FPGA)designs that search packet payloads for patterns, includingthose corresponding to regular expressions to be recognized.These implementations claimed to support pattern matching atnetwork rates of 1 Gbps to 100 Gbps [6]. One application forhigh performance pattern matching is intrusion detection whereFPGAs have been implemented to detect patterns of interest atrates of 10 Gbps to 20 Gbps. The main issues with these pattern

Fig. 2. High speed multiplexer.

matching implementations are extreme processing rates andexcessive storage requirements. What is needed is a method todesign and implement finite state machines (FSMs) that enablesthe processing of multiple characters per state transition whilematching the clock rate of the transmission media [7].

Even if present technology allowed for efficient storage andprocessing of the vast amount of data available on the Internet,there would still exist data overflow problems. For example, be-cause people think in different ways and use different termi-nologies to store information, it would still be difficult to searcheach data store available in a net-centric environment. The Se-mantic Web and emerging applications, such as the Google In-ternet search engine, attempt to resolve this issue by using se-mantics in the information retrieval process. The semantics iscaptured in ontologies that provide criteria for distinguishingvarious types of objects and their ties [8], [9]. Although theseapplications offer some relief for trace, statistical, packet clas-sification, and pattern matching methods, their ability to effi-ciently address data overflow problems is decreasing as networksystems speeds increase.

III. APPROACH

In this section, we summarize the approach for the formal pro-cedure to be presented in Section IV. The key issue addressedherein is how to provide collection and analysis of data on ahigh speed network channel with a bit rate that exceeds theprocessing capability of the observing hardware. Our approachuses a high speed device such as a switch or the multiplexershown in Fig. 2. This device receives a high speed serial input bitstream that feeds an N-bit shift register whose input clock rateequals that of the bit stream. Each flip flop of the input registeris clocked by a separate output signal from a 1 out of N decoderthat runs at the bit rate of the incoming data stream. The inputto this decoder is a bit count from control circuitry that countsthe occurrence of each bit and sends an input symbol of length

to the decoder. The Nth output signal from the decoderprovides a common clock signal to the output flip-flops of the

Page 3: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

HERSHEY AND SILIO,: SURMOUNTING DATA OVERFLOW PROBLEMS 149

Fig. 3. Simplified shift register.

register, creating an N-bit input symbol to be recognized by cir-cuitry that runs at 1/N the rate of the original bit stream.

By slowing down the bit steam in this way, a device such asStatic RAM (SRAM) can be used to recognize patterns of in-terest, including critical performance metrics. SRAM has theadded advantage that it can be “reprogrammed” to recognizedifferent patterns of interest depending upon the service require-ments of the end user. This approach reduces the need for ahigh-speed, large capacity SRAM that could quickly grow tounreasonable size depending on both the original bit rate andthe number of performance metrics to be monitored.

To gain better understanding of the approach, consider thesimplified shift register shown in Fig. 3. For this example, let

correspond to the number of input symbols. Passing aninput string of length through the shift register of Fig. 3 yieldsthe new output symbols of length that can be processed at arate of , a rate much easier for the recognizer hardware toprocess.

For example, if , the binary input stream (i.e., symbols0 and 1) would become quaternary (i.e., symbols 00, 01, 10,11). Four distinct input symbols instead of two causes thenumber of SRAM memory words required in this case todouble when implementing a sequence recognizer. We addressthis issue in Section V where we discuss space and complexitytradeoffs. The following definitions are used in the procedureof Section IV.

Definition 1: A finite state sequential machine (FSM) is a5-tuple: where is a finite set of states,

is a finite input alphabet, is a finite output alphabet, is amapping from (the Cartesian product) into called thenext-state function, and is a mapping from onto calledthe output function. The initial state is designated , and the setof final, or terminating, states is designated [10].

Definition 2: A Moore machine is a sequential machine forwhich is the output function mapping onto .

Definition 3: A Mealy machine is a sequential machine forwhich is the output function mapping onto .

Definition 4: Let and be two states in a deterministicFSM, . Let be the first state in the state sequence that re-sults from processing the input sequence. Let be the final statein the state sequence that results from processing the input se-quence (i.e., state is the terminal state). The terminal statepath is defined as the state sequence from starting state tofinal state .

The secondary state assignment (SSA) problem [11] is that offinding an encoding for the states of a FSM using a set of state

Fig. 4. Implementation by RAM lookup table block diagram.

variables so that a distinct code word is assigned to each state.The rules that govern the encoding technique for secondary stateassignment are determined by the physical implementation ofthe FSM. Fig. 4 provides an implementation architecture fora recognizer FSM presented by Barker and Lingafelt [12] andlater enhanced by Hershey, et al. [13], [14].

This architecture extends the work of Wilkes [15] in the im-plementation of FSMs using a memory element with micro-se-quencing. The architecture includes an SSA implementation bylook up table in RAM that is composed of an N-bit wide ad-dress register, a 1 of address decoder, and a RAM withmemory locations and at least output lines. bit code-words are stored in the RAM. Each codeword is composed of an

-tuple of state variable values that encode the FSM states.During each bit-time, a codeword is output from the RAM andfed to the most significant bits of the -bit wide addressregister. This codeword represents the encoded value of the nextstate (NS). The least significant bit of this address register is theinput symbol from the channel. At the next clocked bit-time, theinput symbol is clocked into the address register along with theRAM codeword. The content of the address register is now theencoded value of the present state and the present input symbol.The output of the address register is the address in the RAM thatcontains the bit code word for the next state (NS). TheRAM output can be wider than the bit codeword. The re-maining RAM output bits are used as external output lines thatcan serve as control bits and pattern detection indicators. Thepattern detection indicators are the binary encoded output valuesfor the present state s(t) and the present input i(t) combinations.

Definition 5: Let machine be theMealy machine resulting from transforming the Moore machine

. The correspondence between the Moore machine and theMealy machine is given by the following relationships:

Definition 6: Let machine be aMealy machine that transforms the inputs, outputs, and states

Page 4: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

150 IEEE SYSTEMS JOURNAL, VOL. 4, NO. 2, JUNE 2010

on Mealy machine into time transformed sequences of in-puts, outputs and states. The correspondence between the ma-chine and machine is given by the following relation-ships. The input vector, denoted , is a concatenated stringof symbols of length n written as shown in the equation at thebottom of the page.

IV. PROCEDURE AND EXAMPLE

This section presents the procedure for surmounting dataoverflow problems in the collection of information forhigh-speed network systems. Throughout this procedure,we provide a simple running example.

ProcedureInput: Completely specified recognizer Moore FSM for asingle input symbol per epoch.Outputs: System design truth table for SSA implemen-tation by RAM look up table for the pattern recognitionMealy FSM derived from the input Moore FSM. This spec-ifies the RAM configuration file for the time transformedrecognizer of interest.Step 1. Create the Moore machine state diagram and statetable for the pattern recognition FSM of interest for thecase of .We shall use as an example the Moore FSM defined by thestate table and state diagram presented in Table I and Fig. 5,respectively. This FSM recognizes two channel symbol se-quences (00000) and (00110). The regular expression torecognize these sequences is given by:

, where in-cludes the concatenation, Boolean OR , and opera-tions [11]. The complete procedure for deriving regular ex-pressions for patterns to be recognized on high speed net-work systems is presented by Hershey and Silio in otherwork [16].Step 2. Create the system design truth table for the Mooremachine.The system design truth table for the example FSM with

is given in Table II.

TABLE ISTATE TABLE FOR EXAMPLE MOORE MACHINE

Fig. 5. State diagram for example recognizer.

Step 3. Transform the Moore machine defined in Step 1into a Mealy machine using the procedure presented inBooth ( [10, p. 97]) and Kohavi ( [11, pp. 352–353]).

Page 5: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

HERSHEY AND SILIO,: SURMOUNTING DATA OVERFLOW PROBLEMS 151

TABLE IISYSTEM DESIGN TRUTH TABLE FOR IMPLEMENTATION BY RAM

LOOKUP TABLE FOR EXAMPLE MOORE MACHINE

TABLE IIISTATE TABLE FOR EXAMPLE MEALY MACHINE WITH

ONE INPUT SYMBOL PER EPOCH

The state table and state diagram for the example trans-formed Mealy machine are presented in Table III andFig. 6, respectively.By transforming the Moore machine to a Mealy machine,the output becomes a function of the input symbol andpresent state.Step 4. Determine the desired value of .

Fig. 6. Decimal output mapping for example Mealy machine with one inputsymbol per epoch.

For the example state machine with state diagram given inFig. 6, assume that . We will use this assumption inthe examples for the remaining steps of this procedure.Step 5.Using either the state diagram or state table con-structed in Step 3, begin with state and determine thenext state of the time transformed machine (TTM)for all possible combinations of input symbols.Using the assumption that , we begin with stateand assume that a single input symbol occurs with a valueof 0. The next state is . Next assume a second inputsymbol occurs with value 0. Then the next state is .Because for this example, we assume that both ofthese input symbols occur during the same epoch. There-fore if the present state of the time transformed machine is

, and the input symbols 00 occur during a single epoch,then the next state . For illustration, sup-pose that the present state of the time transformed machineis and that two input symbols occur during a singleepoch. If , then . If ,then . If , then .Step 6. Repeat Step 5 for each and every state in the Mealymachine obtained in Step 3.Step 7. Using either the state diagram or state table con-structed in Step 3, determine the value of the output func-tion for transitions from every state in the Mealy ma-chine for all possible input symbol sequences.Again, let for the example state machine with statediagram given in Fig. 6. Suppose that we arbitrarily beginwith state and that a single input symbol occurs witha value of 0. Then , and the next state is state .Next assume a second input symbol occurs with value 0.Then , and the next state is . Because forthis example, we assume that both output symbols occurduring the same epoch. For remainder of this illustration,suppose that the present state of the time transformed ma-chine is and that two input symbols occur during asingle epoch. If , then and

. If , then and. If , then

and . If , thenand .

Page 6: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

152 IEEE SYSTEMS JOURNAL, VOL. 4, NO. 2, JUNE 2010

TABLE IVSTATE TABLE FOR EXAMPLE TIME TRANSFORMED MEALY

MACHINE WITH TWO INPUT SYMBOLS PER EPOCH

Fig. 7. State diagram for example time transformed Mealy machine with twoinput symbols per epoch.

Step 8. Construct the state table and state diagram for thetime transformed Mealy machine using the input, output,and state information from Steps 5 through 7.The state table and state diagram for the example timetransformed Mealy machine are presented in Table IVandFig. 7, respectively.Step 9. Sequentially assign to each state one arbitrarilychosen, but unique, binary -tuple, where:and denotes the number of states in of the FSM.This step carries out the state encoding for the time trans-formed machine. The results of this state encoding for theexample time transformed machine are presented in thesystem design truth table (i.e., Table V) for Fig. 7. For this

TABLE VSYSTEM DESIGN TRUTH TABLE FOR RAM IMPLEMENTATION

OF TTM WITH TWO INPUT SYMBOLS PER EPOCH

state encoding, we again require only 4 bits per code wordfor the SSA.

Page 7: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

HERSHEY AND SILIO,: SURMOUNTING DATA OVERFLOW PROBLEMS 153

Step 10. Create a RAM configuration file for the time trans-formed machine.Edit the system design truth table and create a file to in-clude only the columns of the system design truth tablethat contain the next state (viz., NS) values and the outputvalues.End of Procedure

V. IMPLEMENTATION EXAMPLES

Steps 1 and 2 of the procedure in Section IV were first imple-mented and deployed by IBM in production network systemsto recognize and collect events of importance occurring at cus-tomer sites using a 73-state FSM recognizer with nine terminalstates. The rapid prototyping device used programmable com-ponents including SRAM, a Complex Programmable Logic De-vice (CPLD), and a Field Programmable Gate Array (FPGA)and was inserted into the system bus of a network manage-ment workstation to count the number of occurrences of specificevents that served as parameters for calculating utilization andother performance measures. This implementation of the pro-cedure provided accurate performance results while processingthe transported data and minimizing post-processing. Completeresults appear in [17].

Steps 1 and 2 of the procedure in Section IV were later im-plemented for an experimental Synchronous Digital Hierarchy(SDH) broadband network at BT Laboratories in Martlesham,England. This implementation is referred to as the EnhancedPerformance Monitoring Architecture (EPMA). The SDHEPMA feasibility demonstration successfully tested this proce-dure in three ways: synchronization accuracy monitoring, errorperformance and alarm monitoring, and utilization computationand pattern detection at the VC-12 tributary level at 155 Mbps,as reported in [18]. The SDH EPMA implementation introducedtwo new utilization performance measures. The first measurewas based on detection of all ones in the VC-12 tributary.The presence of all ones on a VC-12 tributary indicates eitheran Alarm Indication Signal (AIS) or an equipped tributaryover which no useful information is transmitted. For example,Austin and Tomasson [19] in monitoring a particular network inwhich a customer sent a message to another end user found thatthe all ones condition indicated an equipped tributary; however,they also found that the end user was not using the circuit.This situation resulted in an equipped but idle tributary [19].Gagllardl, Mogavero, and Panarotto [20] suggest that the mostefficient method for monitoring an SDH network at the linklevel is to identify idle cells or tributaries. The second EPMAutilization measure is based on the detection of an unequippedtributary as indicated by the V5 byte. If bits 5, 6, and 7 ofthe V5 byte are all 0, then the VC-12 tributary is unequipped,i.e., unused [21]. Results showed that utilization computationusing these measures was accurate and required collection andprocessing of only a fraction of the data transported over thenetwork media. This savings alleviated memory and processinglimitations for the collection components. Complete resultsappear in [18].

All ten steps of the procedure in Section IV are needed tobuild a TTM-based, Collection and Analysis (CA) node for

Fig. 8. Physical model for SONET/SDH OC192.

emerging high-speed optical network systems. Fig. 8 presentsa physical model for implementing monitoring and analysisof patterns of interest with FSMs having up to 128 statesfor an OC192 SONET/SDH transport network running at 10Gbps over optical fiber [22]. This model comprises multipleCA nodes that are strategically located to observe all criticalinformation for the network system. Within each CA nodeare off-the-shelf electrical and optical storage and processingcomponents that together provide economical, flexible, andscalable monitoring and analysis capabilities [23]–[31]. Thelist of components includes the following.

• Optical Wavelength Multiplexer (OWM) [23].• Optical Switch (OS) [24].• Optical Shift Register (OSR) and Latch (OL) [25], [26].• Optical to Electrical Converter (OEC) [27].• Emitter Coupled Logic Latch (ECL) [28].• Highly Parallel Static Random Access Memory (SRAM)

to Implement the FSM Recognizer [29].• ECL Shift Register (ESR) to reduce data rate into the Field

Programmable Gate Array (FPGA) [30].• FPGA to count patterns of interest [31].• Complex Programmable Logic Device (CPLD) to provide

interface to local processor [31].The combination of OWMs and OSs permits the OC192

signal to either bypass the CA processing circuitry or passthrough this circuitry to derive the CA results and be mul-tiplexed back onto the OC192 signal at the output. The CAresults can also be passed to a local processor for further anal-ysis via the CPLD. We assume the number of output patternscan be encoded in 20 bits, but this merely affects the width ofthe RAM output word and subsequent circuitry.

These components are economical and flexible because theyare programmable, and they enable collection and analysis ofonly the information required to resolve a particular customerservice problem. They are scalable according to the transmis-sion media speed and the amount of information collection re-quired. They permit passive collection and analysis of infor-mation, while actively providing responses to events that af-fect security, routing, performance, and other system charac-teristics. Table VI presents the speed (Gbps), number of Input/

Page 8: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

154 IEEE SYSTEMS JOURNAL, VOL. 4, NO. 2, JUNE 2010

TABLE VIIMPLEMENTATION DETAILS FOR OC192 CA NODE

Output lines, and the number of devices for the OC192 CA nodeimplementation.

The physical implementation does have limitations with re-spect to the growth in the amount of RAM memory space neededto implement a time transformed FSM. This growth is a functionof (i.e., the number of input symbols per epoch). Let ,where r denotes the number of distinct input symbols in inputalphabet I for the Moore machine with . (For the ex-ample in Fig. 5, if , such as in an optical network,then .) Let be the number of state variables neededto encode the number of states , . Let de-note the number of memory locations (words) used to imple-ment M by table look up in RAM, . Similarly, let

, for Mealy machine . Let , the numberof distinct input symbols for the time transformed machinethat corresponds to ; . The number of memory lo-cations needed to implement by table look up in RAMis: [32]. Thus, for the TTM, the totalnumber of memory locations grows exponentially with . Themultiplicative increase is . For the example inFig. 8, , , , and ; then

. We can implement the TTM-based CAusing 64 off-the-shelf SRAMs having capacity 72 Mbits eachand operating frequency 500 MHz with a configuration 18 bitswide by 4 Mega words deep [29]. The word depth requires 32of these SRAM chips, but a word width of 27 bits requires twosets of them, leaving 9 bits free in each word.

Note that the RAM output symbols can be ternary, as inTable III, and the output symbols can be two-tuples of ternaryvalues, as in Table IV. Implementation in a binary RAM willrequire that they be coded in binary, thus further increasingthe width of the RAM word. If there are K distinct outputsin the -tuple output symbol entries in , then the outputsection word-width is , and the next state sectionword-width is still . In Fig. 8 and Table VI, we assumeand .

VI. CONCLUSIONS

This paper presented a new procedure for surmounting dataoverflow problems in the collection of information for highspeed network systems. This procedure extends previous workson information collection focused on tracing the actual data,collecting statistical information only, packet classification,and pattern matching. This work also extends prior work on thetime transformed finite state machine (TTM) by formalizingthe approach to successfully implement information collectionfor both experimental and production network systems.

We illustrated the steps of the procedure with a concrete ex-ample, described two implementations of the procedure for pro-duction and experimental networks, respectively, and provideda physical model to enable implementation of the procedurefor information collection on an OC192 SONET/SDH transportnetwork. These implementation examples demonstrate the flex-ibility of this approach in recognizing and collecting differentpatterns of interest using reprogrammable, off-the-shelf com-ponents that include SRAM, FPGA, and CPLD.

The TTM procedure implementation trades state transitiontime for memory space (i.e., hardware). As the number of inputsymbols per time period (i.e., ), increases, the TTM can de-tect patterns using technology that is slower than the networkmedium. However, the number of memory locations neededgrows exponentially. In the future as hardware speed increasestowards the speed of the network medium, the TTM will re-quire a lower value of , and the memory space required will bereduced.

VII. AREAS FOR FUTURE WORK

The first area for future work is to build and demonstrate aTTM-based, CA node for an OC192 SONET/SDH transportnetwork. As described in Section V, the CA node physicalmodel includes off-the-shelf electrical storage and processingcomponents that can be implemented today to provide econom-ical, flexible, and scalable collection and analysis capabilities.

The next step would be to implement a CA node foremerging high-speed optical network systems, such as anOC768 SONET/SDH transport network (i.e., 40 Gbps) [22].Present technology precludes cost-effective implementationbecause of the memory limitations discussed in Section V.However, technology roadmaps show that an SRAM withcapacity 288 Mbits and 1 GHz operating frequency is expectedto be available within the next 4 years [33]. This technologywould enable the economical implementation of a CA nodethat can support OC768.

A final area for future work is to consider multilevel and con-catenated component finite state recognizers that individuallyrecognize only a subset the information of interest, but togetherrecognize the complete information set [34], [35]. This FSMapproach enables effective information collection and analysiswhile reducing the overall number of states in and, correspond-ingly, the size of each SRAM.

REFERENCES

[1] R. W. Schulte and Y. V. Natis, Service Oriented Architecture. :Gartner, 1996.

Page 9: Surmounting Data Overflow Problems in the Collection of Information for Emerging High-Speed Network Systems

HERSHEY AND SILIO,: SURMOUNTING DATA OVERFLOW PROBLEMS 155

[2] P. Hershey and C. Silio, “Surmounting data overflow problems in thecollection of information for high speed communications systems,”in Proc. IEEE Systems Conf., Montreal, QC, Canada, Apr. 2008, pp.493–500.

[3] P. C. Hershey and C. B. Silio, “Time transformed machine for highspeed computer network performance measurement,” in Proc. IEEEGLOBECOM, San Francisco, CA, Nov. 2000, pp. 684–689.

[4] IBM System Storage DS6800 IBM Corp., Systems and TechnologyGroup, 2006, TSD00605-USEN-03.

[5] P. Gupta and N. McKeown, “Algorithms for packet classification,”IEEE Network, vol. 15, no. 2, pp. 24–32, Mar./Apr. 2001.

[6] C. R. Clark and D. E. Schimmel, “Scalable pattern matching for highspeed networks,” in Proc. IEEE Symp. on Field-Programmable CustomComputing Machines (FCCM), Napa, CA, Apr. 2004, pp. 249–257.

[7] J. van Lunteren, “High-performance pattern-matching for intrusion de-tection,” in Proc. IEEE INFOCOM, Barcelona, Spain, Apr. 2006, pp.1–13.

[8] P. Shvaiko and J. Euzenat, “A survey of schema-based matching ap-proaches,” J. Data Semantics, vol. IV, pp. 146–171, 2005.

[9] F. Giunchiglia and P. Shvaiko, “Semantic matching,” in The Knowl-edge Engineering Rev. Cambridge, U.K.: Cambridge Univ. Press,2004, vol. 18, pp. 265–280.

[10] T. L. Booth, Sequential Machines and Automata Theory. New York:Wiley, 1967.

[11] Z. Kohavi, Switching and Finite Automata Theory. New York: Mc-Graw-Hill, 1978.

[12] K. J. Barker and C. S. Lingafelt, “Programmable digital filter,” IBMTechn. Discl. Bull., vol. 31, no. 2, pp. 198–204, Jul. 1988.

[13] P. C. Hershey and J. G. Waclawsky, “Event Driven Interface Havinga Dynamically, Reconfigurable Counter for Monitoring a High SpeedData Network According to Changing Traffic Events,” U.S. Patent615,135, Mar. 25, 1997.

[14] P. C. Hershey, J. G. Waclawsky, K. J. Barker, and C. S. Lingafelt, “In-formation Collection Architecture and Method for a Data Communi-cations Network,” US Patent 375,070, Dec. 20, 1994.

[15] M. V. Wilkes, “The best way to design an automatic calculating ma-chine,” in Rpt. Manchester Univ. Computer Inaugural Conf., ElectricalEngineering Department, 1951, pp. 16–18.

[16] P. Hershey and C. Silio, “Procedure for information collection on highspeed, high bandwidth communications systems to enable networkmanagement,” in Proc. 6th Annu. Communications Networks and Ser-vices Research Conf., Halifax, NS, Canada, May 2008, pp. 308–315.

[17] P. Hershey, C. Silio, and J. Waclawsky, “Real-time traffic measure-ments for high speed networks,” BT Technol. J., vol. 13, no. 3, pp.113–122, Jul. 1995.

[18] P. C. Hershey, T. Brown, and C. B. Silio, “Enhanced performance mon-itoring architecture for SDH networks,” BT Technol. J., vol. 14, no. 3,pp. 145–160, Jul. 1996.

[19] G. Austin and H. Tomasson, “Unlocking the Value of PerformanceMonitoring Data,” Internet Telephony, pp. 49–54, Nov. 1994.

[20] F. Gagllardl, C. Mogavero, and G. Panarotto, “Physical layer tech-niques to monitor and manage the B-ISDN access,” in IEEE Conf.Rec. Int. Conf. Communications (ICC), Denver, CO, Jun. 1991, pp.269–274.

[21] General Aspects of Digital Transmission System, Network Node Inter-face for the Synchronous Digital Hierarchy (SDH) 1995, ITU-T, G.707Draft.

[22] P. Hershey and C. Silio, “Systems engineering approach for event mon-itoring and analysis in high speed enterprise communications systems,”in Proc. IEEE Systems Conf., Vancouver, BC, Canada, Mar. 2009, pp.344–349.

[23] A1040 –4 X OC-48 to OC-192 Multiplexer Avvio Networks, 2005, Litno. A1040.7.

[24] M. Y. Yuang et al., “HOPSMAN: An experimental testbed system fora 10 GigB/S optical packet switched WDM metro ring network,” IEEECommun. Mag., vol. 46, no. 7, pp. 158–166, Jul. 2008.

[25] B. Tian, W. van Etten, and W. Beuwer, “Ultrafast all-optical shift reg-ister and its prospective application for optical fast packet switching,”IEEE J. Sel. Topics Quantum Electron., vol. 8, no. 3, pp. 722–728,May/Jun. 2002.

[26] T. Houbavlis and K. Zoiros, “10-GHz all-optical re-circulating shiftregister with semiconductor optical amplifier (SOA)-assisted sagnacswitch and SOA feedback,” Opt. Eng., vol. 42, no. 9, pp. 2483–2484,Sep. 2003.

[27] Optical-to-Electrical Converters,B and P6703B Tektronix, Inc., 2003,www.tektronix.com/accessories.

[28] 100EL 11 5 V ECL 1:2 Differential Fan-Out Buffer Fairchild Semi-conductor Corp., 2003, DS500769.

[29] High-Speed 72 Mbit QDR™II � SRAMs RENESAS TechnologiesAmerica Inc., 2009, REUO1C0002-0100.

[30] MC10E141, Mc100E141, 5 V ECL 8-Bit Shift Register Semicon-ductor Components Industries, LLC, 2006, Rev.7, Pub. Order No.MC10E141/D.

[31] Virtex-5 Family Overview, Product Specification XILINX, Inc., 2009,DS100 (v5.0).

[32] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Principles and Practicesof the Design and Analysis of Computer Algorithms. Reading, MA:Addison-Wesley, 1974.

[33] J. Soon-Moon, SRAM Technol.. : Semiconductor R&D Center, Sam-sung Electronics Co., Ltd, 2005.

[34] R. Zhang and M. Iwara, “An efficient signature matching scheme formobile security,” IEICE Trans. on Commun., vol. E91-B, no. 10, pp.3251–261, 2008.

[35] P. C. Hershey and C. B. Silio, “Finite state machines for informa-tion collection and assessment on high-speed data networks,” in Proc.IASTED Int. Conf. Wireless and Optical Communications, Banff, AB,Canada, Jul. 2002, pp. 499–506.

Paul C. Hershey (M’82–SM’99) received the A.B.degree in mathematics from the College of Williamand Mary, Williamsburg, VA, and the M.S. and Ph.D.degrees in electrical engineering from the Universityof Maryland, College Park. His IBM-sponsoredPh.D. research developed a real-time monitoringproduct that he is presently extending to informationmanagement for high-speed enterprise systems

He is currently an Engineering Fellow and ChiefEngineer of Global Hawk Ground Segment programsat Raytheon, Dulles, VA, where he derives and eval-

uates technical strategy and solutions. He has published 28 U.S. patents and 35technical articles and is an Adjunct Professor at George Washington University,Washington, DC.

Charles B. Silio, Jr. (S’62–M’72–SM’89) receivedthe B.S.E.E., M.S.E.E., and Ph.D. degrees in elec-trical engineering from the University of NotreDame, Notre Dame, IN.

He is an Associate Professor of electrical andcomputer engineering at the University of Mary-land, College Park. His research interests includeperformance evaluation, and reliability of computernetworks. He served as IEEE Computer Societytreasurer, chaired its technical committee on mul-tiple-valued logic, and has been an NRC research

associate at the Naval Postgraduate School and the Army Research Laboratory.He is a member of Eta Kappa Nu, Tau Beta Pi, and Sigma Xi, and a LieutenantColonel/retired U.S. Army.