A Framework for Constructing Features and Models …jeffay/courses/nidsS05/mining/lee_dmids...A Framework for Constructing Features and Models for Intrusion Detection Systems WENKE

A Framework for Constructing Featuresand Models for Intrusion DetectionSystems

WENKE LEEGeorgia Institute of TechnologyandSALVATORE J. STOLFOColumbia University

Intrusion detection (ID) is an important component of infrastructure protection mechanisms.Intrusion detection systems (IDSs) need to be accurate, adaptive, and extensible. Given theserequirements and the complexities of today’s network environments, we need a more system-atic and automated IDS development process rather than the pure knowledge encoding andengineering approaches. This article describes a novel framework, MADAM ID, for MiningAudit Data for Automated Models for Intrusion Detection. This framework uses data miningalgorithms to compute activity patterns from system audit data and extracts predictivefeatures from the patterns. It then applies machine learning algorithms to the audit recordsthat are processed according to the feature definitions to generate intrusion detection rules.Results from the 1998 DARPA Intrusion Detection Evaluation showed that our ID model wasone of the best performing of all the participating systems. We also briefly discuss ourexperience in converting the detection models produced by off-line data mining programs toreal-time modules of existing IDSs.

Categories and Subject Descriptors: C.2.0 [Computer-Communication Networks]: Gener-al—Security and protection (e.g., firewalls); C.2.3 [Computer-Communication Networks]:Network Operations—Network monitoring; D.4.6 [Operating Systems]: Security and Protec-tion; H.2.8 [Database Management]: Database applications—Data mining; I.2.6 [ArtificialIntelligence]: Learning—Concept learning

General Terms: Design, Experimentation, Security

Additional Key Words and Phrases: Data mining, feature construction, intrusion detection

This article is based on the authors’ published papers in the Proceedings of the 1999 IEEESymposium on Security and Privacy and the Proceedings of the 5th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining, as well as Wenke Lee’s Ph.D.dissertation in the Computer Science Department at Columbia University.This research is supported in part by grants from DARPA (F30602-96-1-0311).Authors’ addresses: W. Lee, College of Computing, Georgia Institute of Technology, 801Atlantic Drive, Atlanta, GA 30332–0280; email: [email protected]; S. J. Stolfo, ComputerScience Department, Columbia University, 1214 Amsterdam Avenue, Mailcode 0401, NewYork, NY 10027-7003; email: [email protected] to make digital / hard copy of part or all of this work for personal or classroom useis granted without fee provided that the copies are not made or distributed for profit orcommercial advantage, the copyright notice, the title of the publication, and its date appear,and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, torepublish, to post on servers, or to redistribute to lists, requires prior specific permissionand / or a fee.© 2001 ACM 1094-9224/00/1100–0227 $5.00

ACM Transactions on Information and System Security, Vol. 3, No. 4, November 2000, Pages 227–261.

1. INTRODUCTION

As network-based computer systems play increasingly vital roles in modernsociety, they have become the target of intrusions by our enemies andcriminals. In addition to intrusion prevention techniques, such as userauthentication and authorization, encryption, and defensive programming,intrusion detection is often used as another wall to protect computersystems.

The two main intrusion detection techniques are misuse detection andanomaly detection. Misuse detection systems, for example, IDIOT [Kumarand Spafford 1995] and STAT [Ilgun et al. 1995], use patterns of well-known attacks or weak spots of the system to match and identify knownintrusions. For example, a signature rule for the “guessing passwordattack” can be “there are more than four failed login attempts within twominutes.” Misuse detection techniques in general are not effective againstnovel attacks that have no matched rules or patterns yet. Anomaly detec-tion (sub)systems, for example, the anomaly detector of IDES [Lunt et al.1992], flag observed activities that deviate significantly from the estab-lished normal usage profiles as anomalies, that is, possible intrusions. Forexample, the normal profile of a user may contain the averaged frequenciesof some system commands used in his or her login sessions. If for a sessionthat is being monitored, the frequencies are significantly lower or higher,then an anomaly alarm will be raised. Anomaly detection techniques can beeffective against unknown or novel attacks since no a priori knowledgeabout specific intrusions is required. However, anomaly detection systemstend to generate more false alarms than misuse detection systems becausean anomaly can just be a new normal behavior. Some IDSs, for example,IDES and NIDES [Anderson et al. 1995], use both anomaly and misusedetection techniques.

While accuracy is the essential requirement of an IDS, its extensibilityand adaptability are also critical in today’s network computing environ-ment. There are multiple “penetration points” for intrusions to take placein a network system. For example, at the network level carefully crafted“malicious” IP packets can crash a victim host; at the host level, vulnera-bilities in system software can be exploited to yield an illegal root shell.Since activities at different penetration points are normally recorded indifferent audit data sources, an IDS often needs to be extended to incorpo-rate additional modules that specialize in certain components (e.g., hosts,subnets, etc.) of the network systems. The large traffic volume in security-related mailing lists and Web sites suggests that new system security holesand intrusion methods are continuously being discovered. Therefore IDSsneed to be adaptive in such a way that frequent and timely updates arepossible.

Currently building an effective IDS is an enormous knowledge engineer-ing task. System builders rely on their intuition and experience to selectthe statistical measures for anomaly detection [Lunt 1993]. Experts firstanalyze and categorize attack scenarios and system vulnerabilities, and

228 • W. Lee and S. J. Stolfo

ACM Transactions on Information and System Security, Vol. 3, No. 4, November 2000.

hand-code the corresponding rules and patterns for misuse detection.Because of the manual and ad hoc nature of the development process,current IDSs have limited extensibility and adaptability. Many IDSs onlyhandle one particular audit data source, and their updates are expensiveand slow [Allen et al. 2000].

Some of the recent research and commercial IDSs have started to providebuilt-in mechanisms for customization and extension. For example, bothBro [Paxson 1998] and NFR [Network Flight Recorder Inc. 1997] filternetwork traffic streams into a series of events, and execute scripts, such asBro policy scripts and NFR’s N-Codes, that contain site-specific eventhandlers, that is, intrusion detection and handling rules. The systemadministration personnel at each installation site must now assume theroles of both security experts and IDS builders because they are responsiblefor writing the correct event-handling functions. Our first-hand experiencewith both Bro and NFR show that while these systems provide greatflexibility, writing the scripts involves a lot of effort, in addition to learningthe scripting languages. For example, there is no means to debug thescripts. These systems also handle a fixed set of network traffic eventtypes. On a few occasions we were forced to make changes to the sourcecode of the original IDS to handle new event types.

Our research aims to develop a more systematic and automated approachfor building IDSs. We have developed a set of tools that can be applied to avariety of audit data sources to generate intrusion detection models. Wetake a data-centric point of view and consider intrusion detection as a dataanalysis process. Anomaly detection is about finding the normal usagepatterns from the audit data, whereas misuse detection is about encodingand matching the intrusion patterns using the audit data. The centraltheme of our approach is to apply data mining programs to the extensivelygathered audit data to compute models that accurately capture the actualbehavior (i.e., patterns) of intrusions and normal activities. This approachsignificantly reduces the need to manually analyze and encode intrusionpatterns, as well as the guesswork in selecting statistical measures fornormal usage profiles. The resultant models can be more effective becausethey are computed and validated using a large amount of audit data.Furthermore, data mining programs can be applied to multiple streams ofevidence, each from a detection module that specializes in a specific type(s)of intrusion or a specific component of the network system (e.g., a mission-critical host) to learn the combined detection model that considers all theavailable evidence. Therefore, using our framework, IDSs can be extendedand adapted easily via automated integration of new modules.

The rest of the article is organized as follows: Section 2 outlines the maincomponents of our framework. Section 3 briefly describes several datamining programs, and discusses how they can be applied to discoverfrequent intrusion and normal activity patterns, which are the basis forbuilding misuse detection models and user anomaly detection models.Section 4 describes how to construct temporal and statistical features usingthe frequent patterns mined from audit data. Section 5 reports the results

Framework for Constructing Features and Models • 229


of our experiments on building intrusion detection models using the auditdata from the DARPA evaluation program. Section 6 briefly describes ourapproach of converting off-line learned models into real-time intrusiondetection modules. Section 7 discusses related research projects. Section 8outlines our future research plans.

2. A SYSTEMATIC FRAMEWORK

A basic premise for intrusion detection is that when audit mechanisms areenabled to record system events, distinct evidence of legitimate activitiesand intrusions will be manifested in the audit data. Because of the sheervolume of audit data, both in the amount of audit records and in thenumber of system features (i.e., the fields describing the audit records),efficient and intelligent data analysis tools are required to discover thebehavior of system activities.

Data mining generally refers to the process of extracting useful modelsfrom large stores of data [Fayyad et al. 1996]. The recent rapid develop-ment in data mining has made available a wide variety of algorithms,drawn from the fields of statistics, pattern recognition, machine learning,and databases. Several types of algorithms are particularly useful formining audit data:

Classification: maps a data item into one of several predefined categories.These algorithms normally output “classifiers,” for example, in the formof decision trees or rules. An ideal application in intrusion detectionwould be to gather sufficient “normal” and “abnormal” audit data for auser or a program, then apply a classification algorithm to learn aclassifier that can label or predict new unseen audit data as belonging tothe normal class or the abnormal class;

Link analysis: determines relations between fields in the databaserecords. Correlations of system features in audit data, for example, thecorrelation between command and argument in the shell commandhistory data of a user, can serve as the basis for constructing normalusage profiles. A programmer, for example, may have emacs highlyassociated with C files;

Sequence analysis: models sequential patterns. These algorithms candiscover what time-based sequence of audit events frequently occurtogether. These frequent event patterns provide guidelines for incorpo-rating temporal statistical measures into intrusion detection models. Forexample, patterns from audit data containing network-based denial-of-service (DoS) attacks suggest that several per-host and per-service mea-sures should be included.

We have developed a framework, MADAM ID (for Mining Audit Data forAutomated Models for Intrusion Detection), described in Lee and Stolfo[1998], Lee et al. [1999a; 1999b], and Lee [1999]. The main idea is to applydata mining techniques to build intrusion detection models. The main



components of the framework include programs for learning classifiers andmeta-classifiers [Chan and Stolfo 1993], association rules [Agrawal et al.1993] for link analysis, and frequent episodes [Mannila et al. 1995] forsequence analysis. It also contains a support environment that enablessystem builders to interactively and iteratively drive the process of con-structing and evaluating detection models. The end products are conciseand intuitive rules that can detect intrusions, and can be easily inspectedand edited by security experts when needed.

The process of applying MADAM ID can be summarized in Figure 1. Raw(binary) audit data is first processed into ASCII network packet informa-tion (or host event data), which is in turn summarized into connectionrecords (or host session records) containing a number of basic features,such as service, duration, and the like. Data mining programs are thenapplied to the connection records to compute the frequent patterns (i.e.,association rules and frequent episodes), which are in turn analyzed toconstruct additional features for the connection records. Classificationprograms, for example, RIPPER [Cohen 1995], are then used to inductivelylearn the detection models. This process is of course iterative. For example,poor performance of the classification models often indicates that morepattern mining and feature construction is needed.

In our approach, the learned rules replace the manually encoded intru-sion patterns and profiles, and system features and measures are selectedby considering the statistical patterns computed from the audit data.Meta-learning is used to learn the correlation of intrusion evidence frommultiple detection models, and to produce a combined detection model.

Fig. 1. The data mining process of building ID models.



Our framework does not eliminate the need to preprocess and analyzeraw audit data, for example, tcpdump [Jacobson et al. 1989] or BSM[SunSoft 1995] audit data. In fact, to build intrusion detection models fornetwork systems, our data mining programs use preprocessed audit datawhere each record corresponds to a high-level event (e.g., a networkconnection or host session). Each record normally includes an extensive setof features that describe the characteristics of the event, for example, theduration of a connection, the number of bytes transferred, and so on. Whileanalyzing and summarizing raw audit data is an essential task for an IDS,we argue that generic utilities should first be developed by network andoperating system experts, and made available to all IDSs as the low-levelbuilding blocks. Bro and NFR can be regarded as examples of such robustutilities, as they both perform IP packet filtering and reassembling, andallow event handlers to output summarized connection records. Our frame-work assumes that such building blocks are available when constructingIDSs.

Note that currently MADAM ID produces misuse detection models fornetwork and host systems as well as anomaly detection models for users.We are extending MADAM ID to build network and host anomaly detectionmodels. Also note that the detection models produced by MADAM ID areintended for off-line analysis. In Section 6, we briefly discuss how toconvert these models to on-line detection modules.

3. MINING AUDIT DATA

In this section, we describe our data mining algorithms, and illustrate howto apply these algorithms to generate detection models from audit data.Here audit data refers to preprocessed timestamped audit records, eachwith a number of features (i.e., fields).

3.1 Classification

Intrusion detection can be thought of as a classification problem: we wishto classify each audit record into one of a discrete set of possible categories,normal or a particular kind of intrusion.

Table I. Telnet Records

Label Service Flag hot failed_logins compromised root_shell su Duration ...

normal telnet SF 0 0 0 0 0 10.2 ...normal telnet SF 0 0 0 3 1 2.1 ...guess telnet SF 0 6 0 0 0 26.2 ...normal telnet SF 0 0 0 0 0 126.2 ...overflow telnet SF 3 0 2 1 0 92.5 ...normal telnet SF 0 0 0 0 0 2.1 ...guess telnet SF 0 5 0 0 0 13.9 ...overflow telnet SF 3 0 2 1 0 92.5 ...normal telnet SF 0 0 0 0 0 1248 ...... ... ... ... ... ... ... ... ... ...



Given a set of records, where one of the features is the class label (i.e.,the concept to be learned), classification algorithms can compute a modelthat uses the most discriminating feature values to describe each concept.For example, consider the telnet connection records shown in Table I. labelis the concept to be learned. “normal” represents normal connections and“guess” and “overflow” represent various kinds of intrusions. hot is thecount of access to system directories, creation and execution of programs,and so on and compromised is the count of file/path “not found” errors,“Jump to” instructions, and the like. RIPPER [Cohen 1995], a classificationrule learning program, generates rules for classifying the telnet connec-tions (see Table II). The symbol to the left of “:-” is the class label, and thecomma-separated expressions on the right are conjuncts (i.e., (sub)condi-tions) of the classification rule. We see that RIPPER indeed selects thediscriminating feature values into the classification rules for the intru-sions. These rules can be first inspected and edited by security experts, andthen be incorporated into misuse detection systems.

The accuracy of a classification model depends directly on the set offeatures provided in the training data. From a theoretical point of view, thegoal of constructing a classification model is that after (selectively) apply-ing a sequence of feature value tests, the dataset can be partitioned into“pure” subsets, that is, each in a target class. Therefore, when constructinga classification model, a classification algorithm searches for features withlarge information gain [Mitchell 1997], defined as the reduction in entropy,which characterizes the “impurity” of a dataset. It is thus very importantthat the dataset indeed includes features with large information gain. Forexample, if the features hot, compromised, and root_shell were removedfrom the records in Table I, RIPPER would not be able to produce accuraterules to identify buffer overflow connections. In Lee and Stolfo [1998], weshowed that due to the temporal nature of network events, especiallycertain intrusions such as probing (e.g., port-scan, ping-sweep, etc.) anddenial-of-service (e.g., ping-of-death, teardrop, etc.), adding per-host andper-service temporal statistics resulted in significant improvement in theaccuracy of the classification models. Thus, selecting the right set of systemfeatures is a critical step when formulating the classification tasks. Our

Table II. Example RIPPER Rules from Telnet Records Shown in Table I

RIPPER Rule Meaning

guess:- failed_logins $ 4. If number of failed logins is at least 4, thenthis telnet connection is “guess”, a guessingpassword attack.

overflow:- hot $ 3, compromised $ 2,root_shell 5 1.

If the number of hot indicators is at least 3,the number of compromised conditions is atleast 2, and a root shell is obtained, then thistelnet connection is a buffer overflow attack.

. . . . . .normal:- true. If none of the above, then this connection is

“normal”.



strategy is to first mine the frequent sequential patterns from the networkaudit data, and then use these patterns as guidelines to select andconstruct temporal statistical features. Section 3.3 discusses this process ingreater detail.

3.1.1 Meta-classification. Meta-learning [Chan and Stolfo 1993] is amechanism for inductively learning the correlation of predictions by anumber of (base) classifiers. Each record in the training data for meta-learning contains the true class label of the record and the predictionsmade by the base classifiers. The resultant meta-classifier thus “combines”the base models because it uses their predictions to make the finalprediction. The motivations for meta-learning include: to improve classifi-cation accuracy, that is, to produce a meta-classifier that is more accuratethan any individual base classifier; and to improve efficiency and scalabil-ity, that is, to combine the models rather than the potentially huge volumeof data from different data sources. This general approach has beenextensively studied [Stolfo et al. 1997] and empirically evaluated in arelated domain of credit card fraud detection and has been shown to beeffective and scalable.

In order to avoid becoming a performance bottleneck and an easy attacktarget, an IDS should consist of multiple cooperative lightweight sub-systems that each monitors a separate part (e.g., access point) of the entirenetwork environment. For example, an IDS that inspects the full datacontents of each IP packet and keeps track of all opened connections mayrun out of memory (i.e., buffers) during a TCP-based DoS attack and ceaseto function. On the other hand, a more lightweight IDS that only inspectsthe header of each IP packet can detect only those intrusions that areaimed at the network protocols, and not those that try to exploit the hosts,such as guess password, buffer overflow, and the like. A solution is to haveone relatively lightweight system on the gateway that checks only thepacket headers, and several host-based systems that monitor the activitieson the mission-critical hosts. A “global” detection system can then combinethe evidence from these subsystems and take appropriate actions. We usemeta-learning as a means to combine multiple intrusion detection models.

3.2 Association Rules

There is empirical evidence that program executions and user activitiesexhibit frequent correlations among system features. For example, certainprivileged programs only access certain system files in specific directories[Ko et al. 1994], programmers edit and compile C files frequently, and soon. These consistent behavior patterns should be included in normal usageprofiles.

The goal of mining association rules is to derive multifeature (attribute)correlations from a database table. Given a set of records, where eachrecord is a set of items, support~ X ! is defined as the percentage of recordsthat contain item set X. An association rule is an expression



X 3 Y, @c, s#.

Here X and Y are item sets, and X ù Y 5 À, s 5 support~ X ø Y! is thesupport of the rule, and c 5 support~ X ø Y! / support~ X ! is the confidence[Agrawal et al. 1993].

Consider the shell input commands during one telnet session by asecretary, shown in Table III. Here we keep only the filename extensions,remove the (input) contents of mail bodies and files, and use “am” torepresent all the morning timestamps.

The original association rules algorithm searches for all possible frequentassociations among the set of given features. However, not all associationsare necessarily useful for analyzing program or user behavior. We utilizedthe “schema” level information (i.e., data definitions) about the auditrecords to direct the pattern mining process. Observe that certain featuresare essential in describing the data, while others provide only auxiliaryinformation. Domain knowledge is used to determine the appropriateessential features for an application. In shell command data, since thecombination of the exact “time” and “command” uniquely identifies eachrecord, “time” and “command” are the essential features; likewise, innetwork connection data, timestamp, source and destination hosts, sourceport, and service (i.e., destination port) are the essential features becausetheir combination uniquely identifies a connection record. We argue thatthe relevant association rules should describe patterns related to theessential features.

We call these essential features(s) axis features when they are used as aform of item constraint, which specifies the conditions on the item sets ofan association rule. We restrict the association rules algorithm to onlyoutput rules that include axis feature values. In practice, we need notdesignate all essential features as the axis features. For example, some

Table III. Shell Command Records

Time Hostname Command arg1 arg2

am pascal mkdir dir1am pascal cd dir1am pascal vi texam pascal tex viam pascal mail freddam pascal subject progressam pascal vi texam pascal vi texam pascal mail williamfam pascal subject progress

... ... ... ...am pascal vi texam pascal latex texam pascal dvips dvi -o... ... ... ... ...

am pascal logout



network analysis tasks require statistics about various network serviceswhile others may require the patterns related to the destination hosts.Accordingly, we can use service as the axis feature to compute the associa-tion rules that describe the patterns related to the services of the connec-tions, and use destination host as the axis feature to compute patternsrelated to hosts.

In the case of shell command records, we use command as the axisfeature. Table IV shows some example association rules from the shellcommand data in Table III. Each of these association rules conveysinformation about the user’s behavior. The rules mined from each telnet/login session of the same user can be merged into an aggregate rule set toform the user’s normal profile. Section 5.1.8 details our experiments usingassociation rules for anomaly detection.

3.3 Frequent Episodes

There is often the need to study the frequent sequential patterns of auditdata in order to understand the temporal and statistical nature of manyattacks as well as the normal behavior of users and programs. We usefrequent episodes to represent the sequential audit record patterns.

Given a set of timestamped event records, where each record is a set ofitems, an interval @t1, t2# is the sequence of event records that starts fromtimestamp t1 and ends at t2. The width of the interval is defined as t2 2t1. Let X be a set of items; an interval is a minimal occurrence of X if itcontains X and none of its proper subintervals contains X. Definesupport~ X ! as the ratio between the number of minimum occurrences thatcontain X and the total number of event records. A frequent episode rule isthe expression [Mannila and Toivonen 1996]

X, Y 3 Z, @c, s, w#.

X, Y, and Z are item sets, and together they form an episode. s 5support~ X ø Y ø Z! is the support of the rule, and c 5 support~ X ø Yø Z! / support~ X ø Y! is the confidence. The width of each of the occur-rences must be less than w.

Table IV. Example Association Rules from Shell Command Data Shown in Table III

Association Rule Meaning

command 5 vi 3 time 5 am,hostname 5 pascal, arg1 5 tex,@1.0, 0.28#

When using vi to edit a file, the user is always(i.e., 100% of the time) editing a tex file, in themorning, and at host pascal; and 28% of thecommand data matches this pattern.

command 5 subject 3 time 5 am,hostname 5 pascal, arg1 5progress, @1.0, 0.11#

The subject of the user’s email is always (i.e.,100% of the time) about “progress”, such emailsare in the morning, and at host pascal; and 11% ofthe command data matches this pattern.



We introduced several extensions to the original frequent episodes algo-rithm. Our extended algorithm computes frequent sequential patterns intwo phases: it finds the frequent associations using the axis features(s) aspreviously described; then it generates the frequent serial patterns fromthese associations. Thus, our approach combines the associations amongfeatures and the sequential patterns among the records into a single rule.

Another interesting schema-level fact about audit records is that someessential features can be the references of other features. These referencefeatures normally carry information about some “subject”, and other fea-tures describe the “actions” that refer to the same “subject”. For example, ifwe want to study the sequential patterns of connections to the samedestination host, then dst_host is the “subject” and service is the action. Inthis case, we can designate dst_host as the reference feature. When formingan episode, our program tests the condition that, within the episode’sminimal occurrences, the event records covered by its constituent item setshave the same reference feature value.

4. FEATURE CONSTRUCTION

We use the mined frequent episodes, which also contain associations amongthe features, from audit records as guidelines to construct temporal statis-tical features for building classification models. This process involves firstidentifying the intrusion-only patterns, then parsing these patterns todefine features accordingly. In this section, we use network connection dataas an example to illustrate the feature construction process.

Raw tcpdump output is first summarized into network connection recordsusing preprocessing programs, where each record has a set of intrinsicfeatures. For example, the duration, service, src_host and dst_host (sourceand destination hosts), src_port (source port), src_bytes and dst_bytes(number of data bytes), a flag indicating normal or error status according tothe protocols, and so on, are intrinsic features of a single connection. Table V

Table V. Network Connection Records

Timestamp Duration Service src_host dst_host src_bytes dst_bytes Flag ...

1.1 0 http spoofed_1 victim 0 0 S0 ...1.1 0 http spoofed_2 victim 0 0 S0 ...1.1 0 http spoofed_3 victim 0 0 S0 ...1.1 0 http spoofed_4 victim 0 0 S0 ...1.1 0 http spoofed_5 victim 0 0 S0 ...1.1 0 http spoofed_6 victim 0 0 S0 ...1.1 0 http spoofed_7 victim 0 0 S0 ...... ... ... ... ... ... ... ... ...10.1 2 ftp A B 200 300 SF ...12.3 1 smtp B D 250 300 SF ...13.4 60 telnet A D 200 12100 SF ...13.7 1 smtp B C 200 300 SF ...15.2 1 http D A 200 0 REJ ...... ... ... ... ... ... ... ... ...



shows examples of connection records. Note that these “intrinsic” featuresare for general network analysis purposes, and not specific to intrusiondetection.

4.1 Identifying the Intrusion Patterns

We apply the frequent episodes program to both the exhaustively gatherednormal connection dataset and the dataset that contains an intrusion. Wethen compare the resulting patterns to find the intrusion-only patterns,that is, those that exhibit only in the intrusion dataset. The details of thepattern comparison algorithm are described in Lee et al. [1999b]. Briefly,since the number of patterns may be very large and there are rarely exactlymatched patterns from two datasets, we used heuristic algorithms toautomatically identify the intrusion-only patterns. The idea is to firstconvert patterns into numbers in such a way that “similar” patterns aremapped to “closer” numbers. Then pattern comparison and intrusion pat-tern identification are accomplished through comparing the numbers andrank ordering the results. We devised an encoding procedure that convertseach pattern into a numerical number, where the order of digit significancecorresponds to the order of importance of the features. We used thefollowing heuristic ordering on the importance of the features: flag, the axisfeature, the reference feature, the rest of the essential attributes, and thenthe rest of the features in alphabetical order. flag is considered as the mostimportant in describing a pattern because it carries the summary informa-tion of the connection behavior with regard to the protocol specifications.Each unique feature value is mapped to a digit value in the encodingprocess. The “distance” of two patterns is then simply a number where eachdigit value is the digit-wise absolute difference between the two encodings.A comparison procedure computes the intrusion score for each pattern fromthe intrusion dataset, which is its lowest distance score against all patternsfrom the normal dataset, and outputs the user-specified top percentagepatterns that have the highest intrusion scores as the intrusion-onlypatterns.

As an example, consider the SYN flood attack records shown in Table V.The attacker used many spoofed source addresses to send a lot of S0connections (i.e., only the first SYN packet is sent) to a port (e.g., http) ofthe victim host in a very short time span (e.g., all in timestamp 1.1). TableVI shows one of the top intrusion only patterns, produced using service asthe axis feature and dst_host as the reference feature.

Table VI. Example Intrusion Pattern

Frequent Episode Meaning

~ flag 5 S0, service 5 http, dst_host 5victim!, ~ flag 5 S0, service 5 http,dst_host 5 victim! 3

93% of the time, after two http connectionswith S0 flag are made to host victim, within2 seconds from the first of these two, thethird similar connection is made, and thispattern occurs in 3% of the data

~ flag 5 S0, service 5 http, dst_host 5victim! @0.93, 0.03, 2#



4.2 Constructing Features from Intrusion Patterns

Each of the intrusion patterns is used as a guideline for adding additionalfeatures into the connection records to build better classification models.We use the following automatic procedure for parsing a frequent episodeand constructing features:

—Assume F0 (e.g., dst_host) is used as the reference feature, and the widthof the episode is w seconds.

—Add the following features that examine only the connections in the pastw seconds that share the same value in F0 as the current connection:—A feature that computes “the count of these connections”;—Let F1 be service, src_dst, or dst_host other than F0 (i.e., F1 is an

essential feature). If the same F1 value (e.g., http) is in all the itemsets of the episode, add a feature that computes “the percentage ofconnections that share the same F1 value as the current connection”;otherwise, add a feature that computes “the percentage of differentvalues of F1”;

—Let V2 be a value (e.g., S0) of a feature F2 (e.g., flag) other than F0 andF1 (i.e., V2 is a value of a nonessential feature). If V2 is in all the itemsets of the episode, add a feature that computes “the percentage ofconnections that have the same V2”; otherwise, if F2 is a numericalfeature, add a feature that computes “the average of the F2 values.”

This procedure parses a frequent episode and uses three operators, count,percent, and average, to construct statistical features. These features arealso temporal since they measure only the connections that are within atime window w and share the same reference feature value. The intuitionbehind the feature construction algorithm comes from the straightforwardinterpretation of a frequent episode. For example, if the same feature valueappears in all the itemsets of an episode, then there is a large percentage ofrecords that have the same value. We treat the essential and nonessentialfeatures differently. The essential features describe the anatomy of anintrusion, for example, “the same service (i.e., port) is targeted.” The actualvalues (e.g., http) are often not important because the same attack methodcan be applied to different targets (e.g., ftp). On the other hand, the actualnonessential feature values (e.g., flag 5 S0) often indicate the invariant ofan intrusion because they summarize the connection behavior according tothe network protocols.

This SYN flood pattern shown in Table VI results in the followingadditional features: a count of connections to the same dst_host in the past2 seconds, and among these connections, the percentage of those that havethe same service, and the percentage of those that have the “S0” flag.

4.3 Discussions

We examined the theoretical underpinnings of the feature constructionprocess in Lee [1999]. We outline the results here and explain why the



features constructed from the intrusion patterns can be utilized to buildmore accurate classification models. First, the intrusion-only patterns arethe results of intrusion records. That is, the “intrusion” dataset mustcontain “intrusion records,” i.e., unique records, unique sequences ofrecords, or records or sequences with unique frequencies, in order for it tohave intrusion-only patterns. Second, for the temporal and statisticalfeatures constructed from the intrusion patterns, their values in theintrusion records that are responsible for resulting in the intrusion-onlypatterns will be very different from the feature values in the normalconnection records. For example, for the feature from the SYN floodpattern, for the connections to the same destination host in the past 2seconds, the percentage of those that have S0 flag, normal records havevalues close to 0, but SYN flood records have values in the range of greaterthan 80%. The constructed features have high information gain becausetheir value ranges can separate intrusion records from the normal records.In fact, they normally have higher information gain than the existing set offeatures. For example, the feature “flag” has very low information gainbecause some normal connections also have an “S0” value. As discussed inSection 3.1, a classification algorithm needs to select features with thehighest information gain when computing a classification model. Therefore,when the features constructed from the intrusion patterns are added to theaudit data, a more accurate classification model can be computed. This isprecisely the purpose of our feature construction process.

An open problem is how to decide the right time window value w. Wemine sequential patterns using different w values, for example, from 0.1 to20 with an increment of 1, and plot the number of patterns generated ateach run. Our experience shows that this plot tends to stabilize after theinitial sharp jump. We call the smallest w in the stable region w0. In Leeand Stolfo [1998], we reported experiments using different w values tocalculate temporal statistical features for classification models. Our resultsshowed the plot of accuracy of the classifier also stabilizes after w $ w0

and tends to taper off. Intuitively, a requirement for a good window size isthat its set of sequential patterns is stable; that is, sufficient patterns arecaptured and noise is small. We therefore use w0 for adding temporalstatistical features.

5. EXPERIMENTS

In this section, we describe our experiments in building intrusion detectionmodels on the audit data from the 1998 DARPA Intrusion DetectionEvaluation Program. In these experiments, we applied the algorithms andtools of MADAM ID to process audit data, mine patterns, construct fea-tures, and build RIPPER classifiers.

We first describe the experiments on tcpdump data. The results of theseexperiments were submitted to DARPA and were evaluated by MIT LincolnLab. We then report recent experiments on BSM data, which were performed



after the DARPA evaluation. We discuss our experiences and evaluate thestrengths and weaknesses of MADAM ID.

5.1 Experiments on tcpdump Data

We participated in the DARPA Intrusion Detection Evaluation Program,prepared and managed by MIT Lincoln Lab [Lippmann et al. 2000]. Theobjective of this study was to survey and evaluate the state of the art inintrusion detection research. A standard set of extensively gathered auditdata, which includes a wide variety of intrusions simulated in a militarynetwork environment, was provided by DARPA. Each participating sitewas required to build intrusion detection models or tweak their existingsystem parameters using the training data, and send the results (i.e.,detected intrusions on the test data) back to DARPA for performanceevaluation. We report our experience here.

5.1.1 The DARPA Data. We were provided with about 4 gigabytes ofcompressed tcpdump data of 7 weeks of network traffic. This data can beprocessed into about 5 million connection records of about 100 bytes each.The data contains the content (i.e., the data portion) of every packettransmitted between hosts inside and outside a simulated military base.BSM audit data from one UNIX Solaris host for some network sessionswere also provided.

The data contains four main categories of attacks:

—DoS, for example, Ping-of-Death, Teardrop, smurf, SYN flood, and so on;

—R2L, unauthorized access from a remote machine, for example, guessingpassword;

—U2R, unauthorized access to local superuser privileges by a local unpriv-ileged user, for example, various buffer overflow attacks; and

—PROBING, surveillance and probing, for example, Port-Scan, Ping-Sweep, and the like.

In addition, there were anomalous user behaviors such as “a managerbecomes (i.e., behaves like) a system administrator.”

5.1.2 Data Preprocessing. We used Bro as the packet filtering andreassembling engine. We extended Bro to handle ICMP packets, and madechanges to its packet fragment inspection modules since it crashed whenprocessing data that contains Teardrop or Ping-of-Death attacks.

We used a Bro “connection finished” event handler to output a summa-rized record for each connection. Each connection record included a set of“intrinsic” features shown in Table VII.

5.1.3 Misuse Detection. The training data from DARPA includes “listfiles” that identify the timestamp, source host and port, destination hostand port, and the name of each attack. We used this information to selectintrusion data to perform pattern mining and feature construction, and to



label each connection record with “normal” or an attack type to createtraining data for building classification models.

Since the amount of audit data is huge, for example, some days haveseveral millions of connection records due to some nasty DoS attacks, wedid not aggregate all the connection records into a single training data set.Instead, we extracted all the connection records that fell within a surround-ing time window of plus and minus 5 minutes of the whole duration of eachattack to create a data set for each attack type. We also extractedsequences of normal connection records to create the normal data set thathas the same distribution as the original data set.

5.1.4 Manual and Automatic Feature Construction. Following the fea-ture construction approach described in Section 4, for each attack type(e.g., SYN flood, Port-Scan, etc.) we performed pattern mining and compar-ison using its intrusion data set and the normal data set. We constructedfeatures according to the top 20% intrusion-only patterns of each attacktype. Here we summarize the temporal and statistical features automati-cally constructed by our system:

—the “same host” features that examines only the connections in the past 2seconds that have the same destination host as the current connection:—the count of such connections, the percentage of connections that have

the same service as the current one, the percentage of differentservices, the percentage of SYN errors, and the percentage of REJ (i.e.,rejected connection) errors;

—the “same service” features that examine only the connections in the past2 seconds that have the same service as the current connection:—the count of such connections, the percentage of different destination

hosts, the percentage of SYN errors, and the percentage of REJ errors.

We call these the “time-based traffic” features for connection records.They are summarized in Table VIII.

There are several “slow” PROBING attacks that scan the hosts (or ports)using a much larger time interval than 2 seconds, for example, one in everyminute or even one in every (few) hour(s). As a result, these attacks did not

Table VII. Intrinsic Features of Network Connection Records

Feature Description Value Type

duration Length (number of seconds) of the connection Continuousprotocol_type Type of the protocol, e.g., TCP, UDP, etc. Discreteservice Network service on the destination, e.g., http, telnet, etc. Discretesrc_bytes Number of data bytes from source to destination Continuousdst_bytes Number of data bytes from destination to source Continuousflag Normal or error status of the connection Discreteland 1 - connection is from/to the same host/port; 0 - otherwise Discretewrong_fragment Number of “wrong” fragments Continuousurgent Number of urgent packets Continuous



produce intrusion-only patterns with the time window of 2 seconds. Wesorted the connection records by the destination hosts, and applied thesame pattern mining and feature construction process. Instead of using atime window of 2 seconds, we now used a “connection” window of 100connections, and constructed a mirror set of “host-based traffic” features asthe time-based traffic features.

We discovered that unlike most of the DoS and PROBING attacks, theR2L and U2R attacks don’t have any intrusion-only frequent patterns. Thisis because most of the DoS and PROBING attacks involve sending a lot ofconnections to some host(s) in a very short period of time, and therefore canhave frequent sequential patterns that are different from the normaltraffic. The R2L and U2R attacks are embedded in the data portions of thepackets and normally involve only a single connection. Therefore, it isunlikely that they can have any unique frequent traffic patterns. In otherwords, our automatic feature construction process, which is based onfrequent patterns of connection records, would fail to produce any featuresfor these attacks.

After studying the outcome of this mining process, we focused ourattention on the content of the connections. Ideally, we should apply datamining programs to compute patterns from the connection content data andconstruct appropriate features for R2L and U2R attacks. However, ourcurrent data mining algorithms cannot deal with unstructured data con-tents of IP packets. We instead relied on domain knowledge to definesuitable features for R2L and U2R attacks. In the Bro event handlers, weadded functions that inspect data exchanges of interactive TCP connections(e.g., telnet, ftp, smtp, etc.). These functions assign values to a set of“content” features to indicate whether the data contents suggest suspiciousbehavior. These features are: number of failed logins, successfully logged inor not, whether logged in as root, whether a root shell is obtained, whethera su command is attempted and succeeded, number of access to accesscontrol files (e.g., “/etc/passwd”, “.rhosts”, etc.), number of compromisedstates on the destination host (e.g., file/path “not found” errors, and “Jump

Table VIII. Traffic Features of Network Connection Records


count Number of connections to the same host as the currentconnection in the past 2 seconds

Continuous

the following features refer to these same-host connectionsserror_% % of connections that have “SYN” errors Continuousrerror_% % of connections that have “REJ” errors Continuoussame_srv_% % of connections to the same service Continuousdiff_srv_% % of connections to different services Continuoussrv_count Number of connections to the same service as the

current connection in the past 2 secondsContinuous

the following features refer to these same-service connectionssrv_serror_% % of connections that have “SYN” errors Continuoussrv_rerror_% % of connections that have “REJ” errors Continuoussrv_diff_host_% % of connections to different hosts Continuous



to” instructions, etc.), number of hot indicators, (e.g., access to systemdirectories, creation and execution of programs, etc.), and number ofoutbound connections during a ftp session. These features are summarizedin Table IX. Our approach here is to include an extensive set of indicators,and then let classification programs decide, from the vast amount of auditdata, which minimal set of discriminating features should actually be usedto identify intrusions.

5.1.5 Detection Models. It is evident from the feature construction pro-cess that different categories of intrusions require different sets of con-structed features for detection purposes. We therefore built classificationmodels using different feature sets:

—The “time-based traffic” model: each connection record contains the“intrinsic” and the “time-based traffic” features. Table X shows someexample labeled connection records. The resultant RIPPER classifierdetects the DoS and PROBING attacks. Table XI shows some exampleRIPPER rules.

—The “host-based traffic” model: each connection record contains the“intrinsic” and the “host-based traffic” features. The resultant RIPPERclassifiers detect the slow PROBING attacks.

Table IX. Content Features of Network Connection Records


hot Number of “hot indicators” Continuousfailed_logins Number of failed login attempts Continuouslogged_in 1 - successfully logged in; 0 - otherwise Discretecompromised Number of “compromised” conditions Continuousroot_shell 1 - root shell is obtained; 0 - otherwise Discretesu 1 - “su root” command attempted; 0 - otherwise Discretefile_creations Number file creation operations Continuousshells Number of shell prompts Continuousaccess_files Number of write, delete, and create operations on access

control filesContinuous

outbound_cmds Number of outbound commands in a ftp session Continuoushot_login 1 - the login belongs to the “hot” list (e.g., root, adm, etc.);

0 - otherwiseDiscrete

guest_login 1 - the login is a “guest” login (e.g., guest, anonymous,etc.); 0 - otherwise

Discrete

Table X. Example “Traffic” Connection Records

Label Service Flag Count srv_count rerror_% diff_srv_% ...

normal ecr_i SF 1 1 0 1 ...smurf ecr_i SF 350 350 0 0 ...satan user-level REJ 231 1 85% 89% ...

normal http SF 1 0 0 1 ...... ... ... ... ... ... ... ...



—The “content” model: each connection record contains the “intrinsic” andthe “content” features. Table XII shows some example labeled connectionrecords. The resultant RIPPER classifier detects the R2L and U2Rattacks. Table XIII shows some example RIPPER rules.

These classification models each specializes in a certain type of intrusion.We then constructed a meta-level classifier to combine these detectionmodels. Each meta-level training record consists of four features: the threepredictions — each from one of the base models — plus the true class label(i.e., “normal” or an attack type). RIPPER was then applied to learn therules that combine the evidence from the “time-based traffic,”,“host-basedtraffic,”and “content” classifiers to make a (final) prediction on a connec-tion. The resulting meta-level rules basically use the predictions from the“content” model to detect R2L and U2R attacks, and the combination of“time-based traffic” and “host-based traffic” models to detect the DoS and(fast and slow) PROBING attacks. That is, the meta-classifier predicts aconnection as an attack of R2L or U2R whenever the “content” model doesso; and an attack of DoS or PROBING whenever the “time-based traffic”model does so, or whenever the “time-based traffic” model predicts “normal”but the “host-based traffic” model predicts a PROBING attack.

Table XIV summarizes the complexity of the base models in terms of thenumber of features in a connection record, the number of RIPPER rules

Table XI. Example RIPPER Rules for DoS and PROBING Attacks

RIPPER Rule Meaning

smurf:- count $ 5, srv_count$ 5, service 5 ecr_i.

If the service is ICMP echo request, and for the past 2 seconds,the number of connections that have the same destination hostas the current one is at least 5, and the number of connectionsthat have the same service as the current one is at least 5,then this is a smurf attack (a DoS attack).

satan:- rerror_% $ 83%,diff_srv_% $ 87%.

If for the connections in the past 2 seconds that have thesame destination host as the current connection, thepercentage of rejected connections is at least 83%, and thepercentage of different services is at least 87%, then this is asatan attack (a PROBING attack).

Table XII. Example TCP Connection Records

label service flag hot failed_logins compromised root_shell su ...

normal ftp SF 0 0 0 0 0 ...normal telnet SF 0 0 0 3 1 ...guess telnet SF 0 6 0 0 0 ...

normal telnet SF 0 0 0 0 0 ...overflow telnet SF 3 0 2 1 0 ...normal rlogin SF 0 0 0 0 0 ...guess telnet SF 0 5 0 0 0 ...

overflow telnet SF 3 0 2 1 0 ...normal telnet SF 0 0 0 0 0 ...

... ... ... ... ... ... ... ... ...



produced, and the number of distinct features actually used in the rules.The numbers in bold, for example, 9, indicate the number of automaticallyconstructed temporal and statistical features being used in the RIPPERrules. We see that, for both the “traffic” and “host-based traffic” models, ourfeature construction process contributed the majority of the features. Weshould point out that not all features in the connection records wereselected by RIPPER. This is because RIPPER, like most classificationalgorithms, has a built-in “feature selection” process to select the mostdiscriminating and generalizable features according to their statisticalsignificance, that is, information gain, and performance on a hold-out testdataset that simulates the “unseen/future” data. Because of the largeamount of audit data, a human expert is not able to manually gather andtest various statistics, and thus tends to do a poor job in selecting thefeatures. As a result, hand-crafted “signature” rules tend to be very specificto a small intrusion data set. Alternative classification algorithms thatcompute underlying probability distributions may indeed require all fea-tures be evaluated in their resultant models. A crucial issue here is thetradeoff between model accuracy and model cost. The RIPPER outputindicates that some features are irrelevant and hence we need not computethese at run-time, thus reducing the cost of detection. This is the subjectmatter of our ongoing research.

5.1.6 Results. We report the performance of our detection models asevaluated by MIT Lincoln Lab. We trained our intrusion detection models(i.e., the base models and the meta-level classifier) using the 7 weeks oflabeled data, and used them to make predictions on the 2 weeks ofunlabeled test data (i.e., we were not told which connection was an attack).The test data contained a total of 38 attack types, with 14 types in test dataonly (i.e., our models were not trained with instances of these attack types;hence, these were considered as “new” attack types).

Table XIII. Example RIPPER Rules for R2L and U2R Attacks

RIPPER Rule Meaning

guess:- failed_logins $ 4. If number of failed logins is at least 4, then this telnetconnection is “guess”, a guessing password attack.

overflow:- hot $ 3,compromised $ 2,root_shell 5 1.

If the number of hot indicators is at least 3, the number ofcompromised conditions is at least 2, and a root shell isobtained, then this telnet connection is a buffer overflow attack.

. . . . . .normal:- true. If none of the above, then this connection is “normal”.

Table XIV. Model Complexities

Model #of Features in Records #of Rules #of Features Used in Rules

content 22 55 11traffic 20 26 419

host traffic 14 8 115



Figure 2 shows the ROC curves of the detection models by attackcategories as well as on all intrusions. In each of these ROC plots, thex-axis is the false alarm rate, calculated as the percentage of normalconnections classified as an intrusion; the y-axis is the detection rate,calculated as the percentage of intrusions detected. A data point in theupper left corner corresponds to optimal performance, that is, high detec-tion rate with low false alarm rate. We compare here our models with otherparticipants (denoted as Groups 1 through 3) in the DARPA evaluationprogram (see the report by Lippmann et. al. [2000]).

Although our models were intended for misuse detection, we had hopedthat the features we constructed would be general enough that the modelsdetect new variations of the known intrusions. Table XV compares thedetection rates of old intrusions and new intrusions. Here, new intrusionsrefer to those that did not have corresponding instances in the trainingdata. We see that our models were able to detect a large percentage of newPROBING and U2R attacks, but were not as effective for new DoS and R2Lattacks.

0

10

20

30

40

50

60

70

0 0.05 0.1 0.15 0.2

Det

ectio

n R

ate

False Alarm Rate

Columbia Group1 Group2 Group3

0

10

20

30

40

50

60

70

80

90

100

0 0.05 0.1 0.15 0.2

Det

ectio

n R

ate

False Alarm Rate

Columbia Group1 Group2 Group3

0

10

20

30

40

50

60

70

80

0 0.05 0.1 0.15 0.2

Det

ectio

n R

ate

False Alarm Rate

Columbia U2RGroup3 U2R

Group3 R2L Group1 R2L

Columbia R2L

0

10

20

30

40

50

60

70

0 0.05 0.1 0.15 0.2

Det

ectio

n R

ate

False Alarm Rate

Columbia Group1 Group3

(a) DoS (b) PROBING

(c) U2R and R2L (d) Overall

Fig. 2. Performance of tcpdump misuse detection models: ROC curves on detection rates andfalse alarm rates.



5.1.7 Discussion. PROBING attacks have relatively limited variancebecause they all involve making connections to a large number of hosts orports in a given timeframe. Likewise, the outcome of all U2R attacks is thata root shell is obtained without legitimate means (e.g., login as root, su toroot, etc.). Thus, for these two categories of attacks, given some represen-tative instances in the training data, our data mining system was able toconstruct features that captured their general behavior patterns. As aresult, our detection models can detect a high percentage of old and newPROBING and U2R attacks. On the other hand, DoS and R2L have a widevariety of behavior because they exploit the weaknesses of a large numberof different network or system services. The features constructed based onthe available attack instances were very specialized to the known attacktypes. Our detection models therefore missed a large number of new DoSand R2L attacks.

The results here are not entirely surprising since our models are misusedetection models. We need to use anomaly detection models on networktraffic or system programs to guard against new and diversified attacks.Anomaly detection is much more challenging than misuse detection. Forexample, we need to first decide whether we should build a normal profilefor each network service or group of services, and for each host or group ofhosts. The feature construction process will likely be more complex sinceunlike a relatively small number of intrusion-only patterns, normal net-work traffic can have a large number of variations. Network anomalydetection is an important problem and an active area of research that weare pursuing.

5.1.8 User Anomaly Detection. Thus far we have discussed only thedetection of attacks from outside the network. “Insiders” misusing theirprivileges can also seriously compromise security. These insider attacks arehard to detect because the insiders don’t need to break in. The goal of useranomaly detection is to determine whether the behavior of a user is normal(i.e., legitimate).

It is often very difficult to classify a single event by a user as normal orabnormal because of the unpredictable nature of most people. A user’sactions during a login session needs to be studied as a whole to determinewhether he or she is behaving normally. We used Bro event handlers toexamine the telnet sessions, and extract the shell commands of the users.We further preprocessed the shell commands by replacing timestamps with

Table XV. Comparing Detection Rates (in %) on Old and New Attacks

Category Old New

DoS 79.9 24.3PROBING 97.0 96.7

U2R 75.0 81.8R2L 60.0 5.9

Overall 80.2 37.7



am, pm, and nt (for night), eliminated the input (i.e., contents) of edit andsendmail commands, and kept only the filename extensions. Table IIIshows examples of the processed command data. These shell commandrecords were used for user anomaly detection.

Our initial exploratory approach was to mine the frequent patterns fromthe command data, and merge or add the patterns into an aggregate set toform the normal usage profile of a user. A new pattern can be merged withan old pattern if they have the same left- and right-hand sides, theirsupport values are within 5% of each other, and their confidence values arealso within 5% of each other.

To analyze a user login session, we mine the frequent patterns from thesequence of commands during this session. This new pattern set is com-pared with the profile pattern set and a similarity score is assigned.Assume that the new set has n patterns and among them, there are mpatterns that have “matches” (i.e., rules that they can be merged with) inthe profile pattern set; then the similarity score is simply m / n. Obviously,a higher similarity score means a higher likelihood that the user’s behavioragrees with his or her historical profile.

The DARPA data also included user anomaly data to evaluate anomalydetection systems. Table XVI describes the consistent behavior of the sixusers for anomaly analysis. Note that since we were the only group thatperformed anomaly detection on the test data, Lincoln Lab did not evaluateour results. We report our experiments on the training data here.

We applied our frequent episode algorithms to the command data fromeach login session of the same user, with command as the axis feature andw 5 5 (i.e., we looked for patterns within the range of five consecutivecommands), to mine the frequent sequential patterns on the associationsamong user commands, their arguments, time segments, and hosts. Weused the first 4 weeks as a data-gathering period, during which we simplymerged the patterns into each user’s profiles. Each user has three profiles— one for the activities of each time segment, am, pm, and nt. We used thefifth week as the training period, during which we compared the patternsfrom each session to the profile of the time segment. We recorded thenormal range of the similarity scores during this week. The data in thesixth week had some user anomalies, as described in Table XVII. For eachof the anomalous sessions, we compared its patterns against the original

Table XVI. User Descriptions

User Normal Activities

sysadm Logs in as root, cats the password file, and runs commands such as top.programmer1 Writes public domain C code, uses a vi editor, compiles the C code, reads

and sends mail,and executes UNIX commands.programmer2 A similar user profile, but works in afternoons and evenings.secretary Edits Latex files, runs Latex, reads and sends mail.manager1 Reads and sends mail.manager2 Reads mail.



user’s profile, and then compared the resulting similarity score against therecorded normal range of the same time segment. In Table XVIII, thecolumn labeled “Normal” is the range of similarity of each user against hisor her own profile as recorded during the fifth week. A ` here means thatthe user did not login during the time segment in the fifth week. Thecolumn “Anomaly” is the similarity measure of the anomalous sessiondescribed in Table XVII. We see that all anomalous sessions can be clearlydetected since their similarity scores are much smaller than the normalrange. For example, the row in bold in Table XVIII shows that, when thesysadm becomes programmer, his/her patterns have 0 matches with thesysadm’s profile; while, for the whole fifth week, the pm similarity scoresare in the range of 0.64 to 0.95.

Once user abnormal behavior is observed, we need to investigate thenature of the anomaly. We report our experiments on finding out how “userjob functions” are violated. The problem can be stated as follows: Assumethat there are n possible groups of users according to their job functions.When a user in group i does not behave according to the group profile (i.e.,the normal job functions), we want to identify which group (e.g., group j)the user has become. That is, we want to know what “illegal” job functionsthe user has performed.

In our experiments, we first built group profiles for the p group (theprogrammers), the s group (the secretary), the m (the managers), and thesa group (the sysadm). From the data of the first 4 weeks, the patterns ofall the users of a group were aggregated to form the group profile. The dataof the fifth week was used to establish the range of similarity measures forall the users of each group. The user anomalies described in Table XVIIinclude “illegal job function” cases during the sixth week: programmer1becomes a secretary, secretary becomes a manager, and sysadm becomes aprogrammer. Table XIX compares the similarity measure of each user in ananomalous session with his/her normal similarity range gathered for thesame time segment. From the normal similarity measures of each user withrespect to the four groups, we can see that each user indeed has the largestsimilarity measure with his/her own group. From the similarity measuresof the user anomalies, as the bold entries in Table XIX show, for each

Table XVII. User Anomaly Description

User Anomaly Description

programmer2 Logs in from betasecretary Logs in at nightsysadm Logs in from jupiterprogrammer1 Becomes a secretarysecretary Becomes a managerprogrammer1 Logs in at nightsysadm Becomes a programmermanager1 Becomes a sysadmmanager2 Logs in from pluto



“illegal job function” case, the similarity measure of the targeted group isthe largest. For example, when sysadm becomes a programmer, the simi-larity measure with the p group, 0.24, is the largest, and the similaritymeasure with the sa group is outside the normal range.

In summary, although formal evaluation statistics are not available todetermine the error rates of our approach in user anomaly detection, theinitial results are encouraging. We believe that our approach is worthy offuture study.

5.2 Experiments on BSM Data

The DARPA data also contains Solaris BSM (Basic Security Module)[SunSoft 1995] audit data for a designated host, pascal. In this section, wedescribe our experiments in building host-based intrusion detection modelsusing BSM data. The purpose of these experiments was to show that ouralgorithms for pattern mining and feature construction are not specific to aparticular audit data source, for example, tcpdump. We also wanted toinvestigate whether combining models from tcpdump and BSM can resultin better detection performance.

When BSM is enabled in a host machine, there exists an audit trail forthe host. An audit trail is a time-ordered sequence of actions that are

Table XVIII. Similarity Against User’s Own Profile: in Normal Use and in the AnomalyDescribed in Table XVII

User Normal Anomaly

programmer2 (0.58, 0.79) 0.00secretary (`, `) 0.00sysadm (0.84, 0.95) 0.00programmer1 (0.31, 1.00) 0.04secretary (0.41, 0.98) 0.17programmer1 (`, `) 0.00sysadm (0.64, 0.95) 0.00manager1 (0.57, 1.00) 0.00manager2 (1.00, 1.00) 0.00

Table XIX. Similarity Against Group Profiles: in Normal Use and in the AnomalyDescribed in Table XVII

Normal Anomaly

User P S M SA P S M SA

p2 (0.33, 0.71) (0.04, 0.11) (0.00, 0.03) (0.00, 0.07) 0.00 0.00 0.00 0.00s (`, `) (`, `) (`, `) (`, `) 0.04 0.00 0.00 0.00

sa (0.00, 0.00) (0.00, 0.00) (0.00, 0.00) (0.51, 0.81) 0.00 0.00 0.00 0.00p1 (0.12, 0.57) (0.06, 0.09) (0.00, 0.00) (0.04, 0.14) 0.04 0.11 0.00 0.00s (0.02, 0.18) (0.08, 0.73) (0.00, 0.00) (0.00, 0.00) 0.17 0.00 0.50 0.00

p1 (`, `) (`, `) (`, `) (`, `) 0.27 0.00 0.00 0.00sa (0.00, 0.00) (0.00, 0.00) (0.00, 0.00) (0.51, 0.81) 0.24 0.03 0.00 0.00m1 (0.14, 0.17) (0.00, 0.00) (0.29, 1.00) (0.00, 0.00) 0.02 0.00 0.00 0.61m2 (0.50, 1.00) (0.00, 0.00) (1.00, 1.00) (0.00, 0.00) 0.00 0.00 0.00 0.00



audited on the system, and consists of one or more audit files. Each auditrecord in an audit file describes a single audit event, which can be a kernelevent (i.e., a system call) or a user-level event (i.e., a system program, suchas inetd, in.rshd, etc. invocation).

We define audit session here as the collection of all audit events of an“incoming” or “outgoing” session on the host. Examples of these hostsessions include login (e.g., terminal login, telnet login, rlogin, etc.), rsh,ftp, sendmail, and so on. It is easy to see that each host session oftencorresponds to a network connection. We can therefore correlate thepredictions on host sessions by a host-based intrusion detection model withthe predictions on the corresponding connections by a network intrusiondetection model, to yield a higher accuracy.

As in the case of building network intrusion detection models, we alsoneed to first perform a sequence of data preprocessing tasks on the rawBSM data. We extended the preprocessor component of USTAT [Ilgun1992] to process the binary BSM data into ASCII event data. Table XXshows examples of the event records. Here a “?” means the value is notgiven in the original BSM audit record. Each event record contains anumber of basic features, defined in Table XXI.

We developed a program to process the event data into session records. Abrief description of the procedure is the following:

—Watch for the beginning of a session, which is the execution of—the inetd_connect event (for telnet, rlogin, rsh, etc.), or—the execve event on a system program in.fingerd (for incoming finger

request) or finger (outgoing), mail.local (incoming) or sendmail (outgo-ing), ftpd (incoming) or ftp (outgoing), and so on.

—Record the setaudit event, which assigns the auid (audit user id) and sid(audit session id) of a session.

—Examine all audit records that share the same combination of auid andsid to summarize a number of session features, which are described inSection 5.2.1.

—Record the termination of a session.

Table XX. Example BSM Event Records

Time auid sid event pid obname ... ruid euid

08:05:22 0 0 inetd_connect 0 ? ... 0 0... ... ... ... ... ... ... ... ...

08:05:22 -2 0 execve 415 /usr/sbin/in.telnetd ... 0 0... ... ... ... ... ... ... ... ...

08:05:31 2104 417 setaudit 417 ? ... 0 0... ... ... ... ... ... ... ... ...

08:05:31 2104 417 chdir 418 /home/tristank ... 2104 2104... ... ... ... ... ... ... ... ...



The DARPA BSM data contains for each day about 500 sessions on hostpascal. The vast majority of the intrusions in the BSM data are U2R bufferoverflow attacks.

5.2.1 Defining Session Features. We experimented with feature con-struction for session records. We first computed frequent patterns from theevent records. Here each dataset prepared for pattern mining contains allevent records of a session that have positive auid and sid values. That is,we are only interested in events that are “accountable” to the users.

Since we are looking for general rather than session-specific eventpatterns, we first removed auid and sid from the datasets. We also replacedruid and euid with a flag same_reid to indicate whether ruid agrees witheuid. We designated event as the axis attribute since it is obviously themost essential attribute in describing event data.

After the initial few rounds of experiments, we discovered that thepatterns are all related to very specific obname or event values. There aremany kernel events (system calls) that cannot be directly linked to user-level commands. We reasoned that for intrusion detection purposes, weonly needed to analyze user-level commands and their operations on thefile system. We therefore only kept the following types of event records:read, write, create, delete, execute, change owner or permission, rename,and link. The event value of each event record was replaced by theappropriate type name; for example, open_r is replaced by read. We only

Table XXI. Features of BSM Event Records


time Timestamp of the event Discreteauid Audit user id, inherited by all child processes started by the

user’s initial process of a sessionDiscrete

sid Audit session id, assigned for each login session andinherited by all descendant processes

Discrete

event Audit event name Discretepid Process id of the event Discreteobname The name of the object, i.e., full path of the file on which

the event operatesDiscrete

arg1 - arg4 Arguments of the system call Discretetext Short information of the event, e.g., “successful login” Discreteerror_status Error status of the event Discretereturn_value Return value of a system call event Discretetmid Terminal id (port and ip address) of the event Discreteip header The source and destination ip addresses and ports for the

network connection handled by the eventDiscrete

socket The local and remote ip addresses and ports of the socketused by the event

Discrete

ruid The real user id of the event Discretergid The real group id of the event Discreteeuid The effective user id of the event Discreteegid The effective group id of the event Discrete



kept the original obname if the event was execute; otherwise, we used“user” to replace all obname values that indicated files in the user’sdirectories, and “system” to replace the obname values that indicated filesin the system’s directories. We also removed all event records that had “?”(i.e., missing) obname values.

We aggregated event patterns of all normal sessions into a normalpattern set. And for each U2R session, we mined its event patterns andcompared them with the normal patterns. We used the top 20% of intru-sion-only patterns for each U2R attack, for example,

(event 5 execute, obname 5 /home/tristank/ffbexploit, same_reid51),(event 5 execute, obname 5 /usr/sbin/ffbconfig, same_reid 5 0) 3(event 5 execute, obname 5 /usr/bin/ksh, same_reid 5 0)

and

(event 5 execute, obname 5 /usr/bin/pwd, same_reid 5 0) 3(event 5 read, obname 5 home, same_reid 5 0).

These patterns are very “unique” because in the normal pattern set,patterns with same_reid 5 0 are those related to read operations only, forexample,

(event 5 read, obname 5 system, same_reid 5 0),(event 5 read, obname 5 system, same_reid 5 0) 3(event 5 read, obname 5 system, same_reid 5 0

and

(event 5 read, obname 5 home, same_reid 5 0) 3(event 5 execute, obname 5 /usr/bin/cat, same_reid 5 1).

We could have used a mechanical (i.e., automatic) pattern parsing andfeature construction procedure for the intrusion-only patterns of the aboveforms. For example, we could have added features to record the executionsof the relevant specific events as described by the patterns. However, wevery quickly realized that many U2R buffer overflow attacks share thesame characteristics in their intrusion-only patterns. For example, anotherattack has the following intrusion-only pattern:

(event 5 execute, obname 5 /home/tristank/formatexploit, same_reid 5 1),(event 5 execute, obname 5 /usr/bin/fdformat, same_reid 5 0) 3(event 5 execute, obname 5 /usr/bin/ksh, same_reid 5 0).

These buffer overflow patterns indicate that there is an execution of auser program, follow by a SUID (setuid) system utility, and finally a shellin SIUD state (i.e., a root shell with effective user ID as root and differentfrom the real user ID). We need to construct features that capture thegeneral behavior of the attack method and the final outcome, rather thanthe specific system utilities being exploited. However, since the event datacontains very low-level and specific information of the operating system, weneed to use domain knowledge to interpret the patterns to construct themore abstract and general features. Although our experiments here showedthe limitations of fully automatic feature construction when dealing with



low-level event data, we believe that the intrusion-only patterns resultedfrom pattern mining and comparison can still provide a very helpfulstarting point for the manual feature definition process.

We defined a set of features, described in Table XXII, for the BSM sessionrecords. Some of the features (i.e., those in bold) are from the bufferoverflow patterns, while others are similar to the “content” features de-scribed in Section 5.1.3.

5.2.2 Misuse Detection Models on BSM Data. We labeled each BSMsession record as “normal” or an intrusion name using the DARPA-suppliedlist files. BSM session records from all 7 weeks were aggregated into asingle dataset for training a misuse detection model. Table XXIII showssome examples of the labeled BSM session records.

RIPPER was applied to the dataset to learn a set of classification rules.Table XXIV shows some examples of these detection rules.

We evaluated the performance of the rule set on the test data using thelist files provided by DARPA (available after completion of the officialevaluation). Figure 3 shows the ROC curves of the detection models onBSM data. Here Figure 3(a) compares our model with other participants in

Table XXII. Features of BSM Session Records


duration Length (number of seconds) of the session Continuousservice Operating system or network service, e.g., telnet,

responsible for this sessionDiscrete

logged_in Whether the user successfully logged in (when usingtelnet, rsh , etc).

Discrete

failed_logins Number of failed login attempts Continuousprocess_count Number of processes in the session Continuoussuid_sh Whether a shell is executed in suid state Discretesuid_p Whether a suid system program is executed Discreteuser_p Whether a user program is executed Discretesu_attempted Whether a su command is issued Discreteaccess_files Number of write, delete, and create operations on

access control filesContinuous

file_creations Number of file creations Continuoushot_login Whether the login belongs to the “hot” list Discreteguest_login Whether the login belongs to the “guest” list Discrete

Table XXIII. Example BSM Session Records

label service suid_sh suid_p user_p file_creations ...

normal smtp 0 0 0 0 ...normal telnet 0 1 1 3 ...normal telnet 0 1 0 0 ...

buffer_overflow telnet 1 1 1 2 ...normal ftp 0 0 0 0 ...

wraz_master ftp 0 0 0 42 ...... ... ... ... ... ... ...



the DARPA Evaluation, in terms of performance in detecting the U2Rattacks (DARPA only evaluated performance on U2R). Figure 3(b) showsthe performance of our model in detecting DoS, R2L, and PROBINGattacks, as well as the overall performance (when all attacks are consid-ered).

The BSM model here may seem to have slightly better performance thanthe tcpdump models reported in Section 5.1.5. However, the tcpdumpmodels were computed using the tcpdump data of the entire network whilethe BSM model was constructed using the BSM data, which is availablefrom only one host and contains many fewer attacks and normal sessions.We should note that the BSM model, like the tcpdump models, has goodperformance in detecting PROBING, U2R, and DoS attacks, but very poorperformance in detecting R2L attacks. In fact, when we compared thepredictions made by the BSM model and the tcpdump models, we foundthat they simply agree with each other’s predictions on the pairs ofcorresponding host session and network connection. That is, the BSMmisuse detection model, as we have constructed, provides no additionalpredictive power. This is because:

—by the nature of misuse detection, a truly innovative intrusion, such asone that uses a service that was not modeled, can go undetected, nomatter what audit data source is used for analysis; and

Table XXIV. Example RIPPER Rules for BSM Session Records Shown in Table XXIII

RIPPER Rule Meaning

buffer_overflow:- suid_sh 5 1 If a shell is executed in the SUID state, then this is a bufferoverflow attack.

wraz_master:- file_creations$ 40, service 5 ftp,guest_login 5 1

If the service is ftp, the user logs in as a guest (anonymous),and the number of file creation operations is at least 40, thenthis is a wraz attack (in which the attacker logs to anonymousFTP site, creates a hidden directory, and uploads a lot of files,often pirated copies of software, for others to download).

0

20

40

60

80

100

0 2 4 6 8 10

Det

ectio

n R

ate

False Alarm Rate

Columbia Group1 Group2 Group3 Group4

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Det

ectio

n R

ate

False Alarm Rate

Columbia R2LColumbia DOS

Columbia Probe Columbia Overall

(a) U2R (b) DoS, R2L, PROBING, and overall

Fig. 3. Performance of BSM misuse detection models: ROC curves on detection rates andfalse alarm rates.



—the features used in the BSM model and the tcpdump models are verysimilar; that is, the models are looking for similar sets of evidence,although in different data sources.

Although this seems discouraging if we have hoped for an accuracyimprovement by combining the two models, this is in fact encouraging ifour goal is to combine a lightweight tcpdump model that only checked theIP headers with a number of host-based models that monitor the operatingsystem activities. Our experiments in meta-learning, where a combinedmodel was computed based on a model for tcpdump header-only connectiondata and a model for BSM host session data, indeed showed that the samelevel of accuracy was maintained as using a heavyweight tcpdump modelthat also checked the IP data contents.

6. CONVERTING LEARNED MODELS TO REAL-TIME IDS MODULES

Effective intrusion detection should be in real-time to minimize securitycompromises. We therefore need to study how our models, produced off-lineby MADAM ID, can be converted to modules of real-time IDSs, and howthey perform in a real-time environment. We are developing a system fortranslating RIPPER rules into real-time detection modules of NFR (Net-work Flight Recorder), a system that includes a packet capturing engineand N-code programming support for specifying packet “filtering” logic.

NFR offers a fairly simple framework for network monitoring. It sniffspackets from the network, reassembles them, and then passes them tofilter functions for further processing, for example, calculating networktraffic statistics. Filter functions are written in N-code, a preemptive,event-driven scripting language that supports a wide range of operationson network packet data. In order to use NFR for network intrusiondetection, one needs to implement an extensive set of N-code filters thatlook for evidence in the network packets.

We seek a mechanical means of converting our off-line detection modelsinto real-time modules. Our approach is to first implement all the requiredfeatures used in the RIPPER ruleset as N-code “feature filters,” and thenimplement a translator that can automatically convert each RIPPER ruleinto an N-code “rule filter.” For example, a detection rule,

pod:- wrong_fragment .5 1, protocol_type 5 icmp.

can be automatically converted into the N-code:filter pod () {

if (wrong_fragment() . 1 && protocol_type () 55 icmp)alarm ( “ping_of_death”);

}

as long as the features (i.e., wrong_fragment and protocol_type) have beenimplemented as N-code filter functions in NFR.

The advantage of this approach is that when a real-time IDS such asNFR is shipped with MADAM ID to a customer site, MADAM ID canprocess the local audit data and compute the site-specific intrusion detectionrules, which are then automatically converted into N-code “rule filters.”



Although often ignored in off-line analysis, efficiency is a very importantconsideration in real-time intrusion detection. Specifically, in off-line anal-ysis, it is implicitly assumed that all connections have already finished;therefore, we have the luxury to compute all the features and check thedetection rules one by one. Whereas in real-time detection, we need todetect and respond to an intrusion as soon as it happens, often during anongoing connection. We are therefore studying how to generate an efficientreal-time execution plan for a detection ruleset. In particular, we find thatmany intrusions have cheap “necessary” conditions, which contain featuresthat are easy to compute and are available early in the connection. Forexample, SYN flood has a necessary condition flag5S0; whereas its suffi-cient condition includes checking the feature “for the connections to thesame destination host as the current connection in the past 2 seconds, thepercentage of those that have the S0 flag.” The necessary condition ischeaper since the feature flag relies on only the information of the currentconnection; whereas the feature for the sufficient condition is computed bylooking up a number of connection records. Note that a violation of thenecessary condition means that there is no need to check the sufficientcondition for the intrusion. For example, if flag is not S0, we know that theconnection can not be a SYN flood. Therefore, by checking the violations ofthe cheap necessary conditions first and filtering out a large number ofunnecessary checking of ID rules, the overall execution time of the rulesetcan be reduced.

We have designed and implemented an algorithm for finding the neces-sary conditions for intrusions, and are implementing the ruleset filteringalgorithm in NFR.

7. RELATED WORK

Network intrusion detection has been an ongoing research area [Mukherjeeet al. 1994]. More recent systems, for example, Bro [Paxson 1998], NFR[Network Flight Recorder Inc. 1997], and EMERALD [Porras and Neu-mann 1997] all made extensibility their primary design goals. Both Bro andNFR provide high-level scripting languages for codifying the site-specificintrusion detection rules, which are executed in run-time as event handlersby the packet filtering and reassembly engines. Our research focuses ondeveloping methods for constructing intrusion detection models. Ourpremise is to use robust IDSs such as Bro and NFR as the building blocks,and provide a framework so that site-specific models can be computed andinstalled automatically.

EMERALD provides an architecture to facilitate enterprise-wide deploy-ment and configuration of intrusion detectors. The meta-learning mecha-nism in our framework is designed to automate the process of learning a“resolver,” which is needed to combine the alarms from the distributeddetectors to make a determination of the state of the (entire) network. Themeta-learning results reported in this article are preliminary. We will



study how to incorporate network configuration information into the meta-learning process.

There are several learning-based research efforts in intrusion detection.Warrender et al. [1999] showed that a number of machine-learning ap-proaches (e.g., rule induction, hidden Markov model, etc.) can be used tolearn concise and generalizable representation of the “self” identity of asystem program, which is the short sequences of run-time system callsmade by the program [Forrest et al. 1996]. These learned models wereshown to be able to accurately detect anomalies caused by exploits on thesystem programs. Ghosh and Schwartzbard [1999] showed that, usingprogram activity data (e.g., system calls, arguments, return values, andpermissions, etc.) from BSM audit logs, Artificial Neural Networks can beused to learn anomaly and misuse detection models for system programs.Our research aims to develop algorithms and techniques that can be usedto build ID models for both networks and hosts. Towards this end, wedesign the data mining algorithms to be independent of audit data sources.We concentrate our effort on the feature construction process because webelieve this is the most critical step in the process of building ID models.

Lane and Brodley developed algorithms for analyzing user shell com-mands and detecting anomalies [1999]. The basic idea is to first collapsethe multicolumn shell commands into a single stream of strings, and thenstring matching techniques and consideration of “concept drift” are used tobuild and update user profiles. We believe that our extended frequentepisodes algorithm is a superior approach because it considers both theassociation among commands and arguments, and the frequent sequentialpatterns of such associations.

8. CONCLUSIONS AND FUTURE DIRECTIONS

In this article, we outlined a data mining framework for constructingintrusion detection models. The key idea is to first apply data miningprograms to audit data to compute frequent patterns, extract features, andthen use classification algorithms to compute detection models. To facili-tate adaptability and extensibility, we proposed the use of meta-learning asa means to construct a combined model that incorporates evidence frommultiple base models.

We extended the basic association rules and frequent episodes algorithmsto accommodate the special requirements in analyzing audit data. Ourexperiments showed that the frequent patterns mined from audit data canbe used as reliable user anomaly detection models, and as guidelines forselecting temporal statistical features to build effective classification mod-els. Results from the 1998 DARPA Intrusion Detection Evaluation Programshowed that our detection models performed as well as the best systemsbuilt using the manual knowledge engineering approaches.

Anomaly detection models are the only means to detect the truly “inno-vative” intrusions. Therefore, for future work, we will develop algorithmsfor learning network anomaly detection models.



IDSs need to maximize user-defined security goals while minimizingcosts. This requires that ID models be sensitive to cost factors, that at theminimum should include the development cost, the operational cost (i.e.,the needed resources) the cost of damages of an intrusion, and the cost ofdetecting and responding to a potential intrusion. For example, when theresponse cost of an intrusion exceeds its damage cost, the ID models maychoose to ignore (i.e., not to respond to) the intrusion. It is particularlyimportant to direct the limited resources (i.e., the computing and humanresources) to detect and respond to the most damaging intrusions when theadversaries are launching a large amount of automated attacks in anattempt to overload the IDS. That is, we need ID models that can dynami-cally determine what are the most cost-saving actions that should be taken.We will extend our framework so that our algorithms can incorporate theuser-defined cost factors and policies to compute cost-sensitive ID models.

ACKNOWLEDGMENTS

We wish to thank our colleagues at Columbia University, Kui Mok, ChrisPark, Matt Miller, Wei Fan, and Andreas Prodromidis, for their help andencouragement. We also wish to thank the anonymous reviewers for theirhelpful comments.

REFERENCES

AGRAWAL, R., IMIELINSKI, T., AND SWAMI, A. 1993. Mining association rules between sets ofitems in large databases. In Proceedings of the 1993 ACM SIGMOD International Confer-ence on Management of Data (SIGMOD ’93, Washington, DC, May 26-28), P. Buneman andS. Jajodia, Eds. ACM Press, New York, NY, 207–216.

ALLEN, J., CHRISTIE, A., FITHEN, W., MCHUGH, J., PICKEL, J., AND STONER, E. 2000. State of thepractice of intrusion detection technologies. CMU/SEI-99-TR-028,CMU/SEI. SoftwareEngineering Institute, Carnegie Mellon University, Pittsburgh, PA.

ANDERSON, D., FRIVOLD, T., AND VALDES, A. 1995. Next-generation intrusion detection expertsystem (NIDES): A summary. SRI-CSL-95-07 (May).

CHAN, P. K. AND STOLFO, S. J. 1993. Toward parallel and distributed learning by meta-learning. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases.227–240.

COHEN, W. W. 1995. Fast effective rule induction. In Proceedings of 12th InternationalConference on Machine Learning (Lake Tahoe, CA). Morgan Kaufmann, San Mateo, CA.

FAYYAD, U., PIATETSKY-SHAPIRO, G., AND SMYTH, P. 1996. The KDD process of extracting usefulknowledge from volumes of data. Commun. ACM 39, 11, 27–34.

FORREST, S., HOFMEYR, S. A., SOMAYAJI, A., AND LONGSTAFF, T. A. 1996. A sense of self for Unixprocesses. In Proceedings of the IEEE Symposium on Security and Privacy (Oakland, CA,May). IEEE Press, Piscataway, NJ, 120–128.

GHOSH, A. K. AND SCHWARTZBARD, A. 1999. A study in using neural networks for anomaly andmisuse detection. In Proceedings of the 8th Security Symposium on USENIX (USENIX,Aug.).

ILGUN, K. 1992. USTAT: A real-time intrusion detection system for Unix. Master’s Thesis.University of California at Santa Barbara, Santa Barbara, CA.

ILGUN, K., KEMMERER, R. A., AND PORRAS, P. A. 1995. State transition analysis: A rule-basedintrusion detection approach. IEEE Trans. Softw. Eng. 21, 3 (Mar.), 181–199.

JACOBSON, V., LERES, C., AND MCCANNE, S. 1989. Tcpdump. available via anonymous ftp toftp.ee.lbl.gov.



KO, C., FINK, G., AND LEVITT, K. 1994. Automated detection of vulnerabilities in privilegedprograms by execution monitoring. In Proceedings of the 10th Conference on ComputerSecurity Applications (Dec.). IEEE Computer Society Press, Los Alamitos, CA, 134–144.

KUMAR, S. AND SPAFFORD, E. H. 1995. A software architecture to support misuse intrusiondetection. In Proceedings of the 18th National Conference on Information Security. 194–204.

LANE, T. AND BRODLEY, C. E. 1999. Temporal sequence learning and data reduction foranomaly detection. ACM Trans. Inf. Syst. Secur. 2, 3, 295–331.

LEE, W. 1999. A data mining framework for constructing features and models for intrusiondetection systems. Ph.D. Dissertation. Columbia University, New York, NY.

LEE, W. AND STOLFO, S. J. 1998. Data mining approaches for intrusion detection. InProceedings of the 7th Symposium on USENIX Security (San Antonio, TX, Jan.).

LEE, W., STOLFO, S. J., AND MOK, K. W. 1999. A data mining framework for building intrusiondetection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy(Oakland, California, May).

LEE, W., STOLFO, S. J., AND MOK, K. W. 1999. Mining in a data-flow environment: Experiencein network intrusion detection. In Proceedings of the 5th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining (KDD-99).

LIPPMANN, R. P., FRIED, D., GRAF, I., HAINES, J., KENDALL, K., MCCLUNG, D., WEBBER, D.,WEBSTER, S., WYSCHOGRAD, D., CUNNINGHAN, R., AND ZISSMAN, M. 2000. Evaluatingintrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. InProceedings of the on DARPA Information Survivability Conference and Exposition (DISCEX’00, Hilton Head, South Carolina, Jan. 25-27). IEEE Computer Society Press, Los Alamitos,CA, 12–26.

LUNT, T. 1993. Detecting intruders in computer systems. In Proceedings of the 1993Conference on Auditing and Computer Technology.

LUNT, T., TAMARU, A., GILHAM, F., JAGANNATHAN, R., NEUMANN, P., JAVITZ, H., VALDES, A., AND

GARVEY, T. 1992. A real-time intrusion detection expert system (IDES) - final technicalreport.

MANNILA, H. AND TOIVONEN, H. 1996. Discovering generalized episodes using minimaloccurrences. In Proceedings of the 2nd International Conference on Knowledge Discovery inDatabases and Data Mining (Portland, OR, Aug.).

MANNILA, H., TOIVONEN, H., AND VERKAMO, A. I. 1995. Discovering frequent episodes insequences. In Proceedings of the First International Conference on Knowledge Discovery inDatabases and Data Mining (Montreal, Canada, Aug. 20-21).

MITCHELL, T. 1997. Machine Learning. McGraw-Hill, Inc., New York, NY.MUKHERJEE, B., HEBERLEIN, L. T., AND LEVITT, K. N. 1994. Network intrusion detection. IEEE

Network 8, 1 (Jan.).NETWORK FLIGHT RECORDER INC. 1997. Network flight recorder. http://www.nfr.comPAXSON, V. 1998. Bro: A system for detecting network intruders in real-time. In Proceedings

of the 7th Symposium on USENIX Security (San Antonio, TX, Jan.).PORRAS, P. AND NEUMANN, P. 1997. EMERALD: Event monitoring enabling responses to

anomalous live disturbances. In Proceedings of the 20th National Conference on NationalInformation Systems Security. Vol.1 (Baltimore, MD). National Institute of Standards andTechnology, Gaithersburg, MD, 353–365.

STOLFO, S. J., PRODROMIDIS, A. L., TSELEPIS, S., LEE, W., FAN, D. W., AND CHAN, P. K. 1997.JAM: Java agents for meta-learning over distributed databases. In Proceedings of the 3rdACM SIGMOD International Workshop on Data Mining and Knowledge Discovery (SIG-MOD-96, Newport Beach, CA, Aug.), R. Ng, Ed. ACM Press, New York, NY, 74–81.

SUNSOFT. 1995. SunSHIELD Basic Security Module Guide.WARRENDER, C., FORREST, S., AND PERLMUTTER, B. 1999. Detecting intrusions using system

calls: Alternative data models. In Proceedings of the 1999 IEEE Computer Society Sympo-sium on Research in Security and Privacy (Berkeley, CA, May). IEEE Computer SocietyPress, Los Alamitos, CA, 133–145.

Received: February 2000; revised: October 2000; accepted: November 2000



Documents

A Framework for Constructing Features and Models …jeffay/courses/nidsS05/mining/lee_dmids...A Framework for Constructing Features and Models for Intrusion Detection Systems WENKE