8
806 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994 Let us assume that z* is not a weak efficient solution for (N). Then there exists a point I’ E X such that f(z’) > f(I*). From the definition of P(q,Z1), it follows that I’ E 21 and .Zl(I‘) < 21(z*). (2) Since (1) contradicts (2); z* is a weak efficient solution. it is an eficient solution for (N). Then the following inequality is satisfied. Theorem 2: Zfthe optimal solution of P(zl, 2,) is unique, then Proof: Let I* be the unique optimal solution of P(z1,Zl). ZI(I) > z1 (I*) for each z E 21. (3) Let us assume that z* is not an efficient solution for (N). Therefore, there exists a point I’ E X such that f(z’) I f(z*) and fk(z’) > fk(~*) and for at least one k E K. As in the proof of Theorem 1 and from the definition of P(z1, &), it follows that I‘ E 21 and z~(I’) 5 ZI(Z*) (4) Since (4) contradicts (3); I* is an efficient solution for (N). solutions for (N). following inequality is satisfied. Theorem 3: Proofi The optimal solutions of P(z2, 2,) are eficient Let I* be an optimal solution of P(ZZ, Z1). Then the zz (I) 2 z2 (I*) for each I E 21. (5) Let us assume that I* is not an efficient solution for (N). Then there exists a point 2’ E X such that f(~’) 1 f(x*) and fk(2‘) > fk(~*) for at least one k E K. As in the proof of Theorem 1 and from the definition of P(z2,21), it follows that I’ E 21 a(d) I a(%*), (6) and where From (6) and (7) it follows that tZ(Z‘) < zz(;z*). (8) Since (8) contradicts (5); z* is an efficient solution for (N). The solution (f h, xh) of the single objective sur- rogate problems P(z1,Zl) and P(zg, 21) is a feasible solution for the next iteration. Theorem 4 Proof: Follows from the formulation of the problems. REFERENCES A. Geoffrion, J. Dyer and A. Feinberg, “An interactive approach for multicriterion optimization with an application to the operations of an academic department,” Management Science, vol. 19, pp. 357-368, 1972. P. Korhonen and J. Laakso, “A visual interactive method for solving the multiple criteria problem,” European Joumal of Operational Research, vol. 24, pp. 277-287, 1986. P. Korhonen and J. Wallenius, “A multiple objective linear program- ming decision support system,” Decision Support Systems, vol. 6, pp. 243-25 1, 1990. H. Nakayama and Y. Sawaragi, “Satisficing trade-off method for multi- objective programming and its applications,” Lecture Notes in Econom- ics and Mathematical Systems, vol. 229, pp. 113-122, 1984. S. C. Narula and H. Weistroffer, “Algorithms for multiple objective non- linear programming problems: An overview,” in: lmproving Decision Making in Organizations. A. G. Lockett and G. Islei, Eds. Springer- Verlag, Berlin, pp. 434-443, 1989. K. Oppenheimer, “A proxy approach to multi-attribute decision mak- ing,’’ Management Science, vol. 24, pp. 675-689, 1978. E. Rosinger, “Interactive algoritm for multiobjective optimization,” Joumal of Optimization Theory and Applications, vol. 35, pp. 339-365, 1981. H. R. Weistroffer and S. C. Narula, “The current state of nonlinear multiple criteria decision making,” in: Operations Research, G. Fandel and H. Gehring, Eds. Springer-Verlag, Berlin, pp. 109-119, 1991. A. P. Wierzbicki, “A mathematical basis for satisfying decision making,” Mathematical Modelling, vol. 3, pp. 391405, 1982. A Multilayer Neural Network System for Computer Access Security M. S. Obaidat, Senior Member, IEEE, and D. T. Macchairolo Abstract-This paper presents a new multilayer neural network system to identify computer users. The input vectors were made up of the time intervals between successive keystrokes created by users while typing a known sequence of characters. Each input vector was classified into one of several classes, thereby identifying the user who typed the character sequence. Three types of networks were discussed: a multilayer feedforward network trained using the back propagation algorithm, a sum-of-products network trained with a modification of back propagation, and a new hybrid architecture that combines the two. A maximum classification accuracy of 97.5% was achieved using a neural network based pattern classifier. Such approach can improve computer access security. I. INTRODUCTION Artificial Neural Networks can be used effectively to provide solutions for a broad spectrum of applications including pattern mapping and classification, image analysis and encoding, signal processing, optimization, graph manipulation, character recognition, automatic target recognition, data fusion, binocular vision, knowledge processing, medical diagnosis, noise cancellation, hazardous environ- ment automation, telecommunications, solid-state electronics, optical Manuscript received December 9, 1991; revised July 12, 1993. M. S. Obaidat is with the Department of Electrical Engineering, City University of New York, The City College, Convent Ave. at 140th Street, New York, NY 10031 USA. D. T. Macchairolo is with AT&T Bell Lab., Microelectronics Division, Greensboro, NC 27265 USA. IEEE Log Number 9400627. 0018-9472/94$04.00 0 1994 IEEE

A multilayer neural network system for computer access security

  • Upload
    dt

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: A multilayer neural network system for computer access security

806 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994

Let us assume that z* is not a weak efficient solution for ( N ) . Then there exists a point I’ E X such that

f(z’) > f(I*). From the definition of P ( q , Z 1 ) , it follows that I’ E 2 1 and

.Zl(I‘) < 21(z*). (2)

Since (1) contradicts (2); z* is a weak efficient solution.

it is an eficient solution for ( N ) .

Then the following inequality is satisfied.

Theorem 2: Zfthe optimal solution of P ( z l , 2,) is unique, then

Proof: Let I* be the unique optimal solution of P(z1,Zl).

Z I ( I ) > z1 (I*) for each z E 21. (3)

Let us assume that z* is not an efficient solution for ( N ) . Therefore, there exists a point I’ E X such that

f(z’) I f(z*) and f k ( z ’ ) > f k ( ~ * )

and for at least one k E K . As in the proof of Theorem 1 and from the definition of P(z1, & ) , it follows that

I‘ E 2 1 and z ~ ( I ’ ) 5 ZI(Z*) (4)

Since (4) contradicts (3); I* is an efficient solution for ( N ) .

solutions for ( N ) .

following inequality is satisfied.

Theorem 3:

Proofi

The optimal solutions of P(z2 , 2,) are eficient

Let I* be an optimal solution of P ( Z Z , Z1) . Then the

zz (I) 2 z2 (I*) for each I E 2 1 . ( 5 )

Let us assume that I* is not an efficient solution for ( N ) . Then there exists a point 2’ E X such that

f(~’) 1 f(x*) and f k ( 2 ‘ ) > f k ( ~ * )

for at least one k E K . As in the proof of Theorem 1 and from the definition of P(z2,21), it follows that I’ E 2 1

a ( d ) I a(%*), (6)

and

where

From (6) and (7) it follows that

t Z ( Z ‘ ) < zz(;z*). (8)

Since (8) contradicts (5); z* is an efficient solution for ( N ) . The solution (f h , xh) of the single objective sur-

rogate problems P(z1 ,Zl ) and P(zg, 2 1 ) is a feasible solution for the next iteration.

Theorem 4

Proof: Follows from the formulation of the problems.

REFERENCES

A. Geoffrion, J. Dyer and A. Feinberg, “An interactive approach for multicriterion optimization with an application to the operations of an academic department,” Management Science, vol. 19, pp. 357-368, 1972. P. Korhonen and J. Laakso, “A visual interactive method for solving the multiple criteria problem,” European Joumal of Operational Research, vol. 24, pp. 277-287, 1986. P. Korhonen and J. Wallenius, “A multiple objective linear program- ming decision support system,” Decision Support Systems, vol. 6, pp. 243-25 1, 1990. H. Nakayama and Y. Sawaragi, “Satisficing trade-off method for multi- objective programming and its applications,” Lecture Notes in Econom- ics and Mathematical Systems, vol. 229, pp. 113-122, 1984. S. C. Narula and H. Weistroffer, “Algorithms for multiple objective non- linear programming problems: An overview,” in: lmproving Decision Making in Organizations. A. G. Lockett and G. Islei, Eds. Springer- Verlag, Berlin, pp. 434-443, 1989. K. Oppenheimer, “A proxy approach to multi-attribute decision mak- ing,’’ Management Science, vol. 24, pp. 675-689, 1978. E. Rosinger, “Interactive algoritm for multiobjective optimization,” Joumal of Optimization Theory and Applications, vol. 35, pp. 339-365, 1981. H. R. Weistroffer and S. C. Narula, “The current state of nonlinear multiple criteria decision making,” in: Operations Research, G. Fandel and H. Gehring, Eds. Springer-Verlag, Berlin, pp. 109-119, 1991. A. P. Wierzbicki, “A mathematical basis for satisfying decision making,” Mathematical Modelling, vol. 3 , pp. 391405, 1982.

A Multilayer Neural Network System for Computer Access Security

M. S. Obaidat, Senior Member, IEEE, and D. T. Macchairolo

Abstract-This paper presents a new multilayer neural network system to identify computer users. The input vectors were made up of the time intervals between successive keystrokes created by users while typing a known sequence of characters. Each input vector was classified into one of several classes, thereby identifying the user who typed the character sequence. Three types of networks were discussed: a multilayer feedforward network trained using the back propagation algorithm, a sum-of-products network trained with a modification of back propagation, and a new hybrid architecture that combines the two. A maximum classification accuracy of 97.5% was achieved using a neural network based pattern classifier. Such approach can improve computer access security.

I. INTRODUCTION Artificial Neural Networks can be used effectively to provide

solutions for a broad spectrum of applications including pattern mapping and classification, image analysis and encoding, signal processing, optimization, graph manipulation, character recognition, automatic target recognition, data fusion, binocular vision, knowledge processing, medical diagnosis, noise cancellation, hazardous environ- ment automation, telecommunications, solid-state electronics, optical

Manuscript received December 9, 1991; revised July 12, 1993. M. S. Obaidat is with the Department of Electrical Engineering, City

University of New York, The City College, Convent Ave. at 140th Street, New York, NY 10031 USA.

D. T. Macchairolo is with AT&T Bell Lab., Microelectronics Division, Greensboro, NC 27265 USA. IEEE Log Number 9400627.

0018-9472/94$04.00 0 1994 IEEE

Page 2: A multilayer neural network system for computer access security

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5 , MAY 1994 807

neurocomputing, quality control, robotic and control engineering, stock or commodity trading, mortgage processing, consumer loan credit screening, and inventory control. Neural networks excel at problems involving pattern mapping and classification. One possi- ble application of pattern classification is computer access security [11-[61.

Computer security has become an important issue in recent times. It has become necessary to control the access to computer systems, since more and more sensitive information is being stored on them. Pattern recognition techniques have been applied to the recognition of hand- writing analysis to determine the identity of the individual. However, this approach achieved only a limited success. More recently Bleha and Obaidat [7] have applied classical pattern recognition techniques to the individual’s typing technique to achieve user identification.

In this paper we present a neurocomputing approach for identifying computer users. The main goal is to develop an intelligent system that can identify computer users using data from keystroke patterns.

11. EXPERIMENT The participants in the experiment were asked to enter the same

password that was not visible during the process of typing. There- fore, it was necessary to display the message on the screen after entering it. The password was retyped by the participant if it was entered incorrectly. The time durations between keystrokes were collected using an IBM-486 based computer. We used FORTRAN as the main programming language with some procedures written in assembly language for the data acquisition system that collected the measurements. The assembly language procedures made use of the keyboard interrupt and provided the main program with the time duration between keystrokes. For instance, if the password “BALQIES” was entered then the assembly language program would compute the time durations between the character pairs (B, A), (A, L), (L, Q), (Q, I), (I, E) and (E, S). We have allowed open period of time for the participants to conduct the experiment which helped in averaging out the effect of uncorrelated sources of noise that might be introduced by both the participants and the instrument. A phrase that has thirty vector components was used as a password.. However, only the first fifteen vector components were used since using the remaining vectors did not change the results. The data was gathered from six different users over a six-week period. The total number of measurement vectors per user were forty.

The raw data was arranged as follows: 1) Each pattern consisted of 15 values, which were the time

durations in milliseconds between successive keystrokes of a known character sequence.

2) Each class (user) consisted of 40 patterns (600 values per class). 3) Six classes were defined (3600 values total).

111. TRAINING SET PREPARATION For training purposes, the raw data was separated into two parts: all

of the odd-numbered patterns and all of the even-numbered patterns of each class. In any given simulation, half of the raw data, either the odd numbered lines, or the even numbered lines, was used to form the training set. Each value was also rescaled from milliseconds into seconds.

Several versions of the training data were created to investigate the networks’ ability to generalize, rather than to memorize the training set.

. - - The difference in the training pattern sets are: Whether the patterns are from the odd or even half of the raw data; How many raw data patterns were averaged to form each training pattern. The pattern sets used are shown in Table I. The first

TABLE I TRAINING PATERN SET NAMES

File Patterns per Raw Patters per OddEven name Class Average Pattern Half

patt 15.pat 40 1 all odd-20.pat 1 20 Odd even-20.pat 1 20 Even odd-5.pat 4 5 Odd even-5.pat 4 5 Even odd-2.pat 10 2 Odd even-2.pat 10 2 Even

pattern set is simply all 240 patterns. This set was used after training to test the network. The remaining patterns are subsets of the raw data patterns, with varying numbers of raw patterns averaged to form the training patterns.

For instance the entry Odd-20.Pat in Table I means that odd numbered patterns in the raw data were used Le. 1, 3, 5, 7 . . .etc. Then, 20 patterns were averaged into 1 pattern, resulting in 1 pattern per class (6 total).

Each pattern in a set has a pattern name that identifies it, the 15 input values, and one of six class outputs: p01.01 .329 .395 ... p01.02 .307 .197... p01.05 .307 .351 ... etc.

,438 .241 .328

1 0 0 0 0 0 (class 1) 0 1 0 0 0 0 (class 2) 0 0 0 0 1 0 (class 5)

Iv . APPLICATION OF CONVENTIONAL NEURAL NETWORK TECHNIQUES

A feed-forward neural network for pattern classification consists in general of the following elements [8], [9]:

A set of relatively simple processing units. A state of activation for each unit. An output function for each unit. A connectivity pattern between units. A propagation rule for propagating patterns though the network of units. An activation rule for combining the inputs to each unit and changing the activation for the unit. A learning rule to modify the connectivity patterns based on experience. An environment in which the network must operate.

Each element, with its connectivity weights, activation, propagation and output functions constitute a decision function or discriminant [lo].

A. Architecture The specific architectures of two conventional feedforward neural

networks, the back propagation and sum-of-products, are presented below.

Back Propagation Network The back propagation paradigm has been tested in various applications such as bond rating, mortgage application evaluation, protein structure determination, signal pro- cessing and handwritten digit recognition [ l l ] , [l], [3], [6]. The proof of the generalized delta rule involves an elaborate reasoning similar to that used in the proof of the delta rule [ 111. The results of the proof presented in reference [ 111 are:

The generalized delta rule has the same form as the delta rule, i.e., the weight on each line should be changed by an amount proportional to the product of an error signal 6, available to the unit receiving input along that line and the output of the unit sending activation along that line.

Page 3: A multilayer neural network system for computer access security

808 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5 , MAY 1994

Fig. 1. Back Propagation Neural Processing Unit.

The error signal for the output units is very similar to the

(1)

where f, is the semilinear activation function that maps the total input to the unit to an output value, t,, is the target output, and O,, is the actual output. The error for the hidden units is determined recursively in terms of the those units to which it directly connects and the weights of those connections, that is,

standard delta rule and is given by:

h,, = ( tpr - o,,)f;(net,,)

fi,, = $(netpz) C f i P k w k J (2) k

The paradigm is applied in two steps. First, the output value O,, is compared for each unit, then compared with the target and an error signal, fi,, results for each output unit. Second, a backward pass allows the recursive computation for 6 as indicated in the above two equations. The j t h neural processing unit has several inputs vi and an output oj which is a function of the weighted sum of the inputs.

(3)

Bias term is usually included in the sum. This term is realized by a weight from an input that always assumes the value of 1. Fig. 1 shows an illustration of the back propagation neural processing element.

The function f is defined as the logistic or sigmoid function [ I 11, 111:

The back propagation unit requires an output function that is differ- entiable. The derivative of the logistic function is [ 1 I]:

Sum-of-Products Network The back propagation network inter- connection scheme is additive. The input to a particular unit is the weighted sum of the outputs of other units. Additional non-linearities can be introduced into the neural network system by using a more complicated interconnection called a sum-of-products, or sigma-pi P I .

The sum-of-products architecture expands the additive inputs to a particular unit into multiplicative ones. Each input to a unit is the

Fig. 2. Sum-of-Products Interconnection.

product of the outputs of two other units, multiplied by a weight value. Each one of these products is called a conjunct [8], [l]. Let wktj represent the weight of the conjunct of input units i and j into unit IC. The output of unit k is:

Where f is the semilinear activation function. For n units that are inputs to a particular output unit, there are 1 + 2 + . . . + ( n - I ) combinations of 2 inputs o,03, i # j, 1 < i , j < 1%. The sum-of- products interconnection is shown in Fig. 2. A closed-form expression for this arithmetic progression is [12]:

(7) n(n - 1)

number of combinations = ~ 2

The processing unit used is similar to those used in the back propagation network. Once the net input value is found from the sum of products, the output value can be determined by the logistic function given by equation 4.

Network Environment In order to compare several different types of neural networks, the environment must first be defined:

I ) Each network consists of 15 inputs and 6 outputs, and some number (either 4 or 5 ) of hidden units.

2) Each output defines one of the possible output classes. The outputs are mutually exclusive (each pattern is a member of exactly one class).

3) When a pattern is placed on the inputs of the [letwork, the pattern is classified by which output has the greatest value, i.e. “largest value wins”.

This network is shown in Fig. 3. The number of inputs correspond to the number of data samples, 5 , that were taken on each pattern. The number of outputs, 6, are the number of classes that any particular pattern could be a member of. Both the back propagation and sum- of-products networks were simulated in this configuration.

Page 4: A multilayer neural network system for computer access security

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5 , MAY 1994 809

OUTPUTS

..... HIDDEN LAYER

INPUTS

Fig. 3. Pattern Classification Network with Hidden Units.

B. Leaming Algorithms In this section, the learning algorithms for the back propagation

and the sum-of-products networks are presented. Back Propagation Leaming Originally, linear discriminants and

Perceptron [13] type networks contained only two sets of units: input and output, with one single layer of interconnecting weights. If the pattern classes are not linearly separable, such as the XOR problem, the single-layer network cannot find a solution to the problem [8], [ 141. To overcome this limitation, networks containing additional internal layers, known as hidden layers, are used. Fig. 3 shows a pattern classification network with hidden units. It is easy to determine the weight changes of the units in the outermost layer, since presumably the desired (target) values are known for the outputs of the network. However, what are the target values for units that are internal to the network? If an output value is wrong, how much of the error is due to a particular hidden unit? The method of hidden unit training developed in [14] is the core of the back propagation network.

The change in a weight that connects unit i with unit j after the presentation of a training pattern or group of training patterns is defined as:

Aw,, = P ~ , o , (8)

where unit i is an input to unit j , oz is the output value of unit i , 7 is the leaming rate, and if unit j is an output unit, then 6, is:

where t , is the desired (target) output of the unit, 0, is the actual output of the unit, f ' is the derivative of the activation function as shown in equation (9, and net, is the weighted sum of all of the inputs to unit j (Le. E, W , ~ V ~ ) . If unit j is a hidden unit (not an output) then the value of 6, is:

where k is all of the units that unit j is connected to. In other words, if unit j is an output unit, then SP, is simply the output error ( t p , - op,)

multiplied by the derivative of the activation function f . If unit j is not an output, 6,, is the weighted sum of the 6's of the k units that unit j feeds, multiplied by the derivative of f . A non-output unit receives error information from the units that it feeds, weighted by the degree that unit j affects the unit k that it feeds (which is simply W k i ) .

There are many commercial software products available for the simulation of back propagation networks [13, 11. The one used here is the bp simulator developed by McClelland & Rumelhart [ l l ] . Although it is not very fancy in its presentation to the user, it is quite flexible and easy to set up. The operation and configuration of the bp program is well-documented in [14].

Sum-ofproducts Leaming and Simulation The generalized delta rule that was developed for the back propagation network can be extended to work with the sum-of-products network. The change in the weight W k t , which connects unit k to the product of the outputs of units i , j is similar to that of the back propagation network:

AWk,, = q6kOt0,. (9)

The 6 value for unit k is defined as follows. If unit uk is an output unit, then:

6k = (tk - ok)(f'(netk)) (9.1)

which is the same as equation 9.1. If the unit is not an output unit, then for unit i , 6, is the sum of

the IC units that ut feeds, weighted by the connection weight between w , and U k , as in the case of the back propagation network. Since the connection between ut and w k is also affected by the value of the conjunct units u, , then the output of u, will also affect 6, :

(9.2)

That is to say, the 6 value of hidden unit z is the sum of the weighted 6 values of the IC units that it feeds, multiplied by the output of its conjunct unit j.

Since no commercial simulator was available for the sum-of- products network, one was developed for this application. It is called sp, and is written in C language, for MS-DOS based computers, but could be easily ported to other environments. It allows for the simulation of networks with up to 10 layers. The number of units and weights are limited only by the amount of memory available to the program. Certain portions of the program were inspired by the bp program [ 111, including the units' configuration indexing scheme.

Forward Propagation Algorithm Forward propagation is present- ing an input pattern to the network and computing the new values of all of the units, including the output units. In the segment below, k indexes the units, and i and index the units that input to each unit k (see top of this page).

Backward Propagation Algorithm Like the back propagation net- work, the sum-of-products network also propagates the training error backward through the network. First, the difference between each output and its target value was stored for the output units: for ( k = firstoutput; k 5 lastoutput; k = k + 1)

error[kl = target[k] - output[kl; Then, the error values and the values of 6 were calculated for

each unit. Since the activation function was the logistic function, then f'(netk) was defined as Ok(1 - O k ) as shown in equation 5 . The calculations were done from the top of the network, backwards towards the inputs (see second fig. block this page):

Again, the variable k counts the units, starting from the last unit backwards to the first unit after the input units, and the variables i and j count the combinations of the units that feed unit k , as before. Each

Page 5: A multilayer neural network system for computer access security

810 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994

for (k = firstunit; k 5 lastunit; k = k + 1)

net[k] = bias[k]; for (i = startinput[k]; i 5 lastinput[k] -1; i = i + 1)

{

for 0' = i + 1; j 5 lastinput[k]; j = j + 1) {

1 net[kl = net[kl + (output[il * outputb] * weight[k, i , j ] ) ;

output[kl = logistic (net[kl); }

for (k = lastunit; k > firstunit; k = k - 1) {

delta[kl = errorW1 * output[kl * (1.0- output[k]); for (i = startinput[k]; k < = lastinput[k] -1; i = i+l)

for 0' = i + 1;j < = lastinput[k]; j = j + 1)

error[il = error[il + (delta[k] * outputb] * weight[k, i, j]); errorli l = er rorb] + (deltafk] * output[i] * weight[%, i , j ] ) ;

{

} }

for (k = firstunit; k 5 lastunit; k = k + 1) {

for (i = startinput[k]; i 5 lastinput[k] - 1; i = i + 1) for ( j = i + 1;j I= lastinput[k]; j = j + 1)

{

1 weight-error[k,ijl = weight-error[k, i, j ] + delta[k] * output[i] * output[j];

bias-error[k] = bias-error[k] + delta[k]; 1

for (k = firstunit; k 5 lastunit; k = k + 1) {

for (i = startinput[k]; i 5 lastinput[k] - 1; i = i + 1 ) for 0' = i + 1; j 5 lastinput[k); j = j + 1) {

1

delta-weight[k] = (learnrate * weight-error[k, i, j ] ) + (momentum * delta-weight[k]); weight[k, i, j] = weight[k, i , j ] + delta-weight[k];

delta-bias[k] = (learnrate * bias-error[k])+ (momentum * deltabias[k]); bias[k] = bias[k]+ delta-bias[k];

1

unit's error was summed from the 5 of the units that it fed, weighted by the outputs of its conjunct pair and the conjunct weights. After all of the error and delta values were known, the changes in the weights can be determined. First, the errors are collected for each weight (see third fig. block top of this page):

Finally, the error terms were multiplied by the learning rate, and the momentum factor was added in. The momentum is simply a fraction of the previous weight change, and tends to filter out any "wild" swings in the weights during training (see top of next page).

and thus the weights and biases are updated.

v. APPLICATION OF THE DEVISED NEURAL NETWORK In the sum-of-products architecture previously discussed, the num-

ber of interconnecting weights grows rapidly as the number of units in

a layer increases. For a sum-of-products network with n units feeding into a layer of m units, the number of weights between layers follows from equation (7)::

(10)

By contrast, the same number of units in a back propagation network would have only nm weights. The ratio of the number of weights between the sum of-products network and the back propagation network is therefore

m . n(n - 1) 2

Number of Weights =

Ratio of the number of weights (n--l)

1 between back propagation =

and sum - of -products (10.1)

Page 6: A multilayer neural network system for computer access security

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994 81 1

For example, a sum-of-products network with 15 inputs implies 105 weights to each hidden or output unit in the layer above, where the back propagation would require only 15. The larger number of weights increases the real training time, due to the larger number of floating point calculations required. What is needed to reduce these numbers is an architecture that has some of the multiplicative non- linearities of the sum-of-products architecture, with fewer weights. By making the weights between the input layer and the next layer a standard back propagation configuration, the hybrid sum-of-products network is defined.

A. Architecture In the hybrid architecture, the lowest layer (the input layer) is

connected to the layer above in the standard back propagation manner. Assume that unit j is in the layer above the input layer, which contains i units. The output of unit j would be:

Subsequent layers are connected in the sum-of-products manner. If unit k is in a layer that is fed by units i and j , which are not inputs, then Hybrid Sum-of-Products Network

The network used in classifying the keystroke patterns was defined as: The network consists of 15 inputs and 6 outputs, and some number (either 4 or 5) of hidden units. Each output defines one of the possible output classes. The outputs are mutually exclusive (each pattern is a member of exactly one class). When a pattern is placed on the inputs of the network, the pattern is classified by which output has the greatest value, i.e. “largest value wins”. network is shown in Fig. 4. As before, the number of inputs I

corresponds to the number of data samples that were taken on each pattern (Le. 15). The number of outputs which is six is the number of classes that any particular pattern could be a member of.

B. Hybrid Sum-of Products Learning and Simulation The learning rules required for the hybrid architecture were very

similar to those required for the sum-of-products architecture. Few changes were required. Since input units did not have defined error or 6 values, the fact that they were connected differently has no consequence on the calculations of the error and values for the rest of the units. In fact, the calculations were identical to those of the sum-of-products network. The only variations were the calculations of the A,’s for the input layer. For the upper (sum-of-products) layers:

and in the lower (input) layers:

The hybrid sum-of-products network is depicted in Fig. (4). The simulator used to simulate a hybrid sum-of-products network is called sph. It has many similarities with the sp simulator. Wherever the

INPUTS (15 U”)

Fig. 4. Hybrid Sum-of-Products Classifying Network

I

units needed to be accessed for forward or backward propagation, the following changes were made: for ( k =firstunit; k 5 lastunit; k = k + 1)

if (startinput[k]! = 0) {

{

1

{

1

. . . sum-of-products type weight addressing (weight[k, i , j ] ) . . .

else

. . . back propagation type weight addressing (weight[j, i]) . . .

If the starting input to a particular input was not unit 0 (the first input unit) then the unit in question would not be connected to the input layer and it would be treated as a sum-of- products unit. If the first input to a unit was unit 0, then it would be treated as a back propagation unit.

I

VI. RESULTS AND DISCUSSION In order to examine each of the network architectures, simulations

were performed on each type of network, using the appropriate simulation software. The simulations were run on an IBM-80486 that runs at 33 MHz. Each of the networks was trained in the same manner: the training set was presented to the network and the weights were adjusted until the training set produced a total summed squared (TSS) error of less then 0.05. TSS was calculated as follows:

where k represents the number of output units, and p is the number of patterns in the training set. After training, the entire pattern set (240 patterns) was then presented to the network. The number of misclassified patterns was then used to figure the error rate in percent.

The results of simulation for the back propagation, sum-of- products, and the hybrid sum-of-products techniques are shown in

Page 7: A multilayer neural network system for computer access security

812 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5 , MAY 1994

TABLE I1 BACK PROPAGATION RESULTS

Training Patterns Hidden Number of Number Error Set per Units Weights of rate

odd-20.pat 1 4 84 1827 7.1% even- 1 4 84 1217 7.9% 2O.pat

odd-5.pat 4 4 84 1217 7.1% even-5.pat 4 4 84 1360 10.4% odd-2.pat 10 4 84 1435 10.8% even-2.pat 10 4 84 1227 4.6% odd-20.pat 1 5 105 1662 8.3%

even- 1 5 105 1413 7.5% 2O.pat

odd-5.pat 4 5 105 1113 5.42% even-5.pat 4 5 105 1360 4.6% odd-2.pat 10 5 105 1153 5.8% even-2.pat 10 5 105 1070 2.5%

Class Presentation

TABLE IU SM-OF-PRODUCTS RESULTS

Training Patterns Hidden Number Number of Error Set Der Units of Presentations Rate

class Weights odd-20.pat 1 4 456 4513 16.7%

even- 1 4 456 3716 18.8% 20.pat

odd-5.pat 4 4 456 1638 12.1% even-5.pat 4 4 456 2067 17.1% odd-20.pat 1 5 585 3021 10.0%

even- 1 5 585 3122 10.0% 20.pat

odd-5.pat 4 5 585 1387 6.3% even-5.pat 4 5 585 2127 11.3%

Tables 11, I11 and IV respectively. The learning rate and momentum for the back propagation were 0.5. The learning rate and momentum for the sum-of-products and hybrid sum-of-products were 0.5 and 0.3 respectively. In Tables 11,111, and IV column 1 is the training set used to train the network while column 2 is the size of the training set. Column 3 is the number of hidden units, while column 4 is the number of weights that are contained in the particular network. The latter is dependent on the architecture being considered, and the number of units. More weights require more floating point calculations during both training and operation. The number of presentation column gives the number of cycles required to present all the training patterns on each cycle. For example, in the first line of Table I11827 presentations of the training pattern set were made (1827 * 6 patterns, since odd- 20.pat has 6 patterns). Training was continued until the total Sum Squared Error (TSS) was below 0.05. The Error rate column is the percentage of the misclassification patterns.

It is clear that increasing the size of the training set, in the case of back propagation, provided the network with better performance. Even with the smallest training set, the network performed at better than 91% accuracy with either 4 or 5 hidden units, given only 6 training samples. However, a larger number of training samples may tend to overwhelm a network that does not have enough weights in it, and the network will simply memorize the training samples.

The sum-of-products network did not seem practical for this problem. Because of the large number of weights, the dimensionality of the weight-space was high, which means that there were probably more “traps” for the system to fall into. Also, a computational penalty existed when dealing with the large number of floating-point multiplications required during training.

TABLE IV Hmm SUM-OF-PRODUCTS RESULTS

Training Patterns Hidden Number of Number Error Set per Units Weights of rate

class Presentation 4 96 1494 7.5% odd-20.uat 1

even- 2O.pat

odd-5 .pat even-5.pat odd-2.pat even-2.pat odd-20.pat

even- 2O.pat

odd-5.pat even-5.pat odd-2.pat even-2.pat

1

4 4 10 10 1 1

4 4 10 10

4

4 4 4 4 5 5

5 5 5 5

96

96 96 96 96 135 135

135 135 135 135

1291

96 1 865 1151 1148 1128 1058

806 790 899 702

5.8%

7.9% 5.4% 9.6% 10.8% 6.3% 6.3%

3.8% 5.4% 4.2% 4.2%

Setting the learning rate7 and momentum for the Sum-of-Product (SOP) and Hybrid Sum-of-Products (HSOP) paradigms to (7 = 0.5, Momentum = 0.5) caused the network to “oscillate.” Lowering the momentum to 0.3 caused the oscillation to decay quickly. The fact that the moment factor was different for the SOP and HSOP paradigms was not a penalty. Actually, a smaller momentum factor would cause the learning to be slower, given all other parameters were the same. Even with a lower momentum, the HSOP paradigm learned more quickly than the back propagation paradigm.

The hybrid sum-of-products showed faster learning with the larger training set than either the back propagation or the standard sum- of-products, with only a small penalty in the error rate. Because the number of weights was comparable to the back propagation network (for small numbers of hidden units) there was no significant computational penalty.

Based on examination of the raw data, there were few patterns in each class which fell outside of the average for the class. These variations were due to users not typing the password the same way each time. The classification accuracy could be improved in a real- time system by requiring the user to enter the password two times, and averaging or “shuffling” [7] the entries. In fact, the pattern sets odd-2.pat and even-2.pat simulated this averaging, since each pattern is the average of two raw data patterns. In both the back propagation and hybrid sum-of products networks, the lowest error rates achieved using these pattern sets were 2.5% and 4.2% respectively. Because of the computational penalties involved in training, these patterns sets were not attempted using the standard sum-of-products network. Other patterns suggested that a lower error rate would not be achieved on the sum-of-products network, compared to the other two architectures.

It was found that in all the three paradigms, the error rate was decreased when the number of hidden units was increased from 4 to 5. Obviously four hidden units were not sufficient for the internal representation that were formed during training.

The HSOP paradigm with 5 hidden units represents a good compromise between training time and accuracy as opposed to the back propagation. Since it learned faster than the back propaga- tion, the additional nonlinearities introduced by the multiplications of outputs seemed to be suited to the particular data set of this pro b 1 em .

ACKNOWLEDGMENT

The authors would like to thank anonymous referees for their useful comments and suggestions.

Page 8: A multilayer neural network system for computer access security

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5 , MAY 1994 813

VIII. CONCLUSION In this paper we presented an intelligent neural network system

for computer access security. We devised a new multilayer neural network for this application called the hybrid sum-of-product. More- over, we applied two well-known neural network paradigms, namely the back propagation and sum-of-products. The back propagation algorithm provided results with an accuracy of 97.5% while the sum- of-product algorithm gave less accuracy, 93.7%. The new multilayer Neural Network introduced here, the hybrid sum-of-products, gave identification results with an accuracy of 96.2% with less training time than either. The presented neural network-based computer access security system is novel and accurate.

REFERENCES

R. Hecht-Nielsen, “Neurocomputing,” Addison-Wesley Publishing Company, Reading, MA, 1990. J. Hertz, A. Krogh and R. Palmer, “Introduction to the Theory of Neural Computation,” Addison-Wesley Publishing Company, Reading, MA, 1991. T. Kohonen, “The “Neural” Phonetic Typewriter,” IEEE Computer, pp. 11-22, March 1988. B. Widrow and M. A. Lehr, “30 Years of Adaptive Neural Networks: Perceptron, Madaline and Backpropagation,” Proceedings of the [EEE, Vol. 78, No. 9, pp. 1415-1442, September 1990. E. Posner, “Neural Networks in Communication,” IEEE Trans. Neural Networks, Vol.1, No.1, pp. 145-147, March 1990. M. S. Obaidat and J. V. Walk, “An Evaluation Study of Traditional and Neural Network Techniques for Image Processing Applications,” Proceedings of the IEEE 34th Midwest Symposium on Circuits and Systems, pp. 72-75, May 1991. S. Bleha and M.S. Obaidat, “Dimensionality Reduction and Feature Extraction Applications in Identifying Computer Users,” IEEE Trans- actions on Systems, Man and Cybernetics, pp. 452-456, May 1991. D. Rumelhart and J. McClelland Eds., “Parallel Distributed Processing,” MIT Press, Cambridge, MA, 1986. Y. Pao; “Adaptive Pattern Recognition and Neural Networks,” Addison Wesley, Reading, MA, 1989. J. Tou and R. Gonzalea, “Pattern Recognition Principles,” Addison- Wesley, Reading, MA, 1974. D. Rumelhart, G. Hinton and R. Williams, “Learning Internal Represen- tations by Error Propagation,” in Parallel and Distributed Systems, (Eds. D. Rumelhart and J. McClelland), MIT Press, Cambridge, MA, 1986. M. O’Flynn, “Probabilities, Random Variables and Random Processes,” Harper and Row, New York, 1982. F. Rosenblatt, “The Perceptron: A Perceiving and Recognizing Automa- tion,’’ Come11 Aeronautical Laboratory Report 85-460-1, 1957. J. McClelland and Rumelhart, “Exploration in Parallel Distributed Processing,” MIT Press, Cambridge, MA, 1988.

Design Support to Determine the Range of Design Parameters by Qualitative Reasoning

Masaru Ohki, Hiroshi Shinjo, Eiji Oohira and Masahiro Abe

Abstract- We present a new application of qualitative reasoning to design: suggesting valid ranges for design parameters after the structure has been determined. This design step is implemented by using an envisioning mechanism that uses qualitative reasoning to determine all possible behaviors of a system. Our method finds all possible behaviors by envisioning with design parameters whose values are initially inde- terminate and with whatever specifications the designer has. If several behaviors are found, the designer selects the ones he prefers.

The design-support system Desq (Design support system based on - qualitative reasoning) is based on an earlier qualitative reasoning system Qupras (Qualitative physical cegsoning System) with three improve- ments: enxoning, propagating new constraints on constant parameters, and solving constraint in parallel.

Like Qupras, the Desq system can deal with quantities qualitatively and quantitatively. Therefore, if the parameters can be expressed quanti- tatively, we may be able to determine the quantitative ranges, which are often more useful than qualitative values.

I. INTRODUCTION Although many expert systems have recently been introduced in

diverse fields of engineering, several problems still exist. One is the difficulty of building knowledge bases from the experience of human experts, and another is that these expert systems have not be able to deal with situations that cannot be predicted [12]. Reasoning methods using deep knowledge, which is the fundamental knowledge of a domain, are expected to solve these problems. Qualitative reasoning [2] determines dynamic behaviors, which are the states and state changes of a dynamic system by using deep knowledge of the dynamic system. Another feature of qualitative reasoning is that it can deal with quantities qualitatively. So far, there have been many applications of qualitative reasoning to engineering [ 14-16]. The main application has been to diagnosis [22], [27], but recently there have also been applications to design [13], [28].

In this paper, we show a new application to design that supports decisions by suggesting valid ranges for design parameters after the structure of the designed system has been determined. Although this application is not more innovative than the previous applications [ 131, [28] to design, it is nonetheless an important step in design [3]. Our previous paper [20] gave an interim report of our research, and this paper gives the final one.

The key to design support is to use an envisioning mechanism, which finds possible behaviors of the dynamic system, to determine the ranges of those design parameters whose values are indetermi- nate. When the design parameters whose values a designer wants to determine are indeterminate, all possible behaviors under those indeterminate parameters can be predicted by the envisioning process. If the designer gives some specifications, the number of the possible behaviors may be reduced. In the envisioning, some hypotheses may need to be made to obtain each behavior. The main reason that hypotheses are made is that conditions written into the definitions of objects and physical rules cannot be evaluated when the design parameters are indeterminate. Among the possible behaviors obtained, more than one behavior acceptable to the designer is expected to exist.

Manuscript received August 5, 1992; revised July 12, 1992. The authors are with the Intelligent Systems Research Department, Central

Research Laboratory, Hitachi, Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185, Japan.

IEEE Log Number 9400626.

0018-9472/94$04.00 0 1994 IEEE