Upload
anita-lee
View
215
Download
0
Embed Size (px)
Citation preview
Research
Software development cost estimation:Integrating neural network with cluster analysis
Anita Leea,*, Chun Hung Cheng1,b, Jaydeep Balakrishnan2,c
a Decision Science and Information Systems Area, School of Management, Gatton College of Business and Economics,
University of Kentucky, Lexington, KY 40506-0034, USAb Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin,
New Territorities, Hong Kongc Faculty of Management, University of Calgary, 2500 University Drive N.W., Calgary, Alberta T2N 1N4, Canada
Received 19 March 1997; accepted 15 March 1998
Abstract
For software project planning control and management, an accurate estimate of software development cost is important. Past
research has focused on using parametric models to predict development cost based on attributes such as lines of code or
function points. This requires researchers to identify the set of factors that in¯uence cost estimation before the system is
constructed. We propose a non-parametric approach that integrates a neural network method with cluster analysis to estimate
development cost. The integration of the two techniques not only allows for a more accurate cost estimate but also leads to an
increase in the training ef®cacy of the network. # 1998 Elsevier Science B.V. All rights reserved
Keywords: Software development cost; Neural network; Cluster analysis; Machine learning
1. Introduction
Accurate cost estimation of a software development
effort is critical for good management decision mak-
ing; the estimate must include software project con-
trol, budgeting, personnel allocation, and bidding for
contracts. An accurate cost estimation is important,
because a low cost estimate may either cause loss or
compromise the quality of the software developed,
resulting in partially functional or insuf®ciently tested
software that requires later high maintenance costs.
However, if the cost estimate is too high, many useful
projects may not be funded, resulting in misallocation
of resources and a backlog of needed software. An
early cost estimate is equally important because the
result will have value for project management and
control only if it is provided in the early phases of the
software development life cycle, preferably during the
planning and requirement analysis rather than the
coding and testing phases. Thus, from an organiza-
tional perspective, an early and accurate cost estimate
will reduce the possibility of organizational con¯ict
during the later stages.
Information & Management 34 (1998) 1±9
*Corresponding author. E-mail: [email protected]: [email protected]: [email protected]
0378-7206/98/$19.00 # 1998 Elsevier Science B.V. All rights reserved
PII: S-0378-7206(98)00041-X
This paper is intended to provide software managers
with a decision support tool for early cost estimation
of software development efforts. The work is moti-
vated by the need to explore innovative ways to
estimate software development cost in the 1990s
due to the increasing complexity of the problem space
as a result of advances in computer technologies,
expert system applications, and interorganizational
systems [2, 21]. A new technique integrating a neural
network method with cluster analysis is implemented
and tested using historical data and demonstrated to
show how it improves network performance. Unlike
prior approaches to software development cost esti-
mation such as size-based, function-based, or deci-
sion-tree learning based models, the new technique is
capable of distinguishing relevant cost estimation
factors from irrelevant ones. This relieves the need
to specify the set of cost estimation factors before-
hand. In addition, the cost estimated by our technique
yields higher accuracy in our experimental study.
2. Literature review
Software cost is growing at an annual rate of 12%
and is expected to reach $400 billion by the year 2000
[4]. However, a signi®cant amount of the cost, 40% or
more, is devoted to the maintenance of existing soft-
ware instead of developing much needed new products
[5]. In addition, software cost is related closely to
software quality and productivity. Unrealistically low
cost estimates frequently lead to poor product quality
and low project productivity [11]. Therefore, the
importance of understanding software cost has moti-
vated considerable research to identify factors that
in¯uence software costs. This has yielded a number of
software cost estimation models.
2.1. Size-based models
Size-based models consider project size measured
in lines of code (LOC) or thousands of lines of code
(KLOC) to be the primary factor affecting software
cost estimation. There are two parts to these cost
estimation models. One provides a base estimate of
development effort as a function of software size; it is
of the form:
E � A� B� �KLOC�c
where E is the estimated effort in man-months; A, B,
and C are constants.
The second modi®es the base estimate by taking
into account such environmental factors as the method
used in top-down design, structured code, personnel
experience and ability, etc..
Typical models of this kind include the Walston±
Felix model [20], Doty model [8], Bailey±Basili
model [1], and Boehmn's Constructive Cost model
(COCOMO) [3]. Among these models, COCOMO is
the most widely known and studied. Table 1 provides
an overview of each of these models in its base
estimate form.
The problems with using LOC as an estimate of
project size include:
1. there is no accepted de®nition of LOC and few
researchers speci®ed the line-counting rules used,
resulting in variations and uncertainty;
2. LOC is language dependent; fewer LOC may be
required for a higher-level language than a lower-
level language and yet the time per line is greater,
resulting in dif®culty in directly comparing pro-
jects using different languages;
3. it is dif®cult to estimate LOC. An experiment
performed by Yourden where the size of 16 projects
was estimated by experienced managers based on
the speci®cation of each project, showed a discre-
pancy between estimated and actual project size
ranging from ÿ210% to 83%, as shown in Table 2
[10];
4. LOC places undue emphasis on coding, which
accounts for only 10 to 15% of the total effort in
software development [6]. Hence, factors other
than size should be considered in estimating soft-
ware development cost.
Table 1
An overview of size-based models
Base estimate
Model name A B C
Walston±Felix 5.2 0.91
Bailey±Basili 5.5 0.73 1.16
Boehm basic 3.2 1.05
Boehm intermediate 3 1.12
Boehm advanced 2.8 1.2
Doty 5.288 1.047
2 A. Lee et al. / Information & Management 34 (1998) 1±9
2.2. Function-based models
Function-based models use other counts than
LOC in estimating software development cost.
`Function Points' as de®ned by Albrecht of IBM in
1979 involve a process called function point analysis
(FPA) [14]. Albrecht's FPA involves the following
steps:
1. Identify the major system components: external
inputs, external outputs, logical internal ®les,
external interface ®les, and external inquiries.
2. Classify each component as `simple', `average', or
`complex' depending on the number of interacting
data elements and other factors.
3. Calculate the unadjusted function points (UFP)
using the following table, which includes
weights:
Complexity
Function type Simple Average Complex Total
External input �3 �4 �6
External output �4 �5 �7
Logical internal file �7 �10 �15
External interface filex Ð5�7 Ð�10
External inquiry �3 �4 �6
Total
unadjusted
function
points
4. Adjust the unadjusted function points for applica-
tion and environment complexity through a mea-
sure called the complexity adjustment factor
(CAF), i.e., function points�UFP�CAF.
CAF is calculated by using the formula:
CAF � 0:65� 0:01 N
where N is the total degree of in¯uence (DI) of 14
characteristics which are data communications, dis-
tributed processing, performance objective, con®gura-
tion load, transaction rate, on-line data entry, end-user
ef®ciency, on-line update, complex processing, re-
usability, installation ease, operational ease, multiple
sites, and change facilitation. DI takes a value from 0
(no in¯uence) to 5 (strongest in¯uence).
There are several problems with using FPA, includ-
ing:
1. it is designed for business applications and is not
appropriate for scienti®c or technical applications
in which complex algorithms are involved;
2. the validity of the method for general objective
assessment of system costs is questionable because
many elements such as the weighting factors,
component complexity, complexity factors, and
degree of in¯uence are subjectively developed
for a particular environment only; and
3. systems of high internal complexity are not ade-
quately considered.
2.3. Learning-based models
Both size-based and function-based models are
parametric; they use a function/formula of ®xed form
for software cost estimation [18]. Assumptions about
the form of the function are needed. Also, the function
developed is static, i.e., the factors and their corre-
sponding degree of in¯uence on cost estimation are
®xed. More importantly, the set of in¯uential factors
on cost estimation is identi®ed before the model can be
constructed. Learning-based models are developed to
overcome these problems. These models make no
assumptions about the form of the function under
study and they are capable of learning incrementally
as new data are provided over time. In addition, the
availability of historical data on this problem domain
makes it particularly suitable for the application of a
type of machine learning technique called `learning by
Table 2
The discrepancy between estimated and actual project size
Project Actual Predicted Actual±
predicted
%
Difference
1 70900 34700 36200 51%
2 129000 32100 96900 75%
3 23000 22000 1000 4%
4 34600 9100 25500 74%
5 23000 12000 11000 48%
6 25000 7300 17700 71%
7 52100 28500 23600 45%
8 7650 8000 ÿ350 ÿ5%
9 25900 30600 ÿ4700 ÿ18%
10 16300 2720 13580 83%
11 17400 15300 2100 12%
12 33900 105000 ÿ71100 ÿ210%
13 57200 18500 38700 68%
14 21000 35400 ÿ14400 ÿ69%
15 8640 3650 4990 58%
16 17500 2950 14550 83%
A. Lee et al. / Information & Management 34 (1998) 1±9 3
example'. Learning by example attempts to infer or
generalize regularities from speci®c instances of a
concept. Thus, the success of applying this type of
example-driven learning technique requires the
provision of domain-speci®c knowledge in the
form of training and testing data sets. The back-
ground of this learning technique and a review of
its recent applications can be found in Refs. [19, 9],
respectively.
2.3.1. Decision tree learning models
These models construct a decision tree for software
cost estimation [15, 16]. The nodes of the tree repre-
sent attributes that best divide the data into disjoint
groups. The leaves of the tree represent the average
cost of software development. By descending the tree
along an appropriate path, the cost of software devel-
opment can be determined.
Relevant attributes for cost estimation are identi®ed
from previous efforts. Data on these attributes is
accumulated to allow the construction of a decision
tree through a process called recursive-partitioning
regression in which the best `divisive' attribute is
selected to partition the data into subsets. The process
is recursively repeated on these subsets as the tree is
expanded until no further partitioning is feasible.
Various attribute selection measures have been pro-
posed. For example, ID3 selects the most informative
attribute based on a measure that minimizes the
following function:
E�A� � ÿXV
i�1
Si
S
XN
j�1
kji
Si
log2
kji
Si
where V is the number of values for attribute A, kji the
number of examples in the jth category with the ith
value for attribute A, S the total number of examples, Si
the number of examples with the ith value for attribute
A, and N the number of categories.
One problem with this type of learning model is that
the set of relevant attributes must be identi®ed before-
hand. A neural network, with its ability to differentiate
relevant from irrelevant attributes, offers a more ¯ex-
ible approach to estimating software development
cost. Moreover, empirical evidence suggests that a
learning procedure based on a neural network often
outperforms decision tree in terms of prediction accu-
racy [17].
2.3.2. Neural network learning models
These models are built on networks of processing
units called neurons that are arranged in layers and are
connected to one another by restricted links (see [12]).
Links between neurons have associated weights. Each
neuron in the network computes a non-linear function
of its inputs; these are called activation functions. The
most common one of the activation function is:
1
1� exp�ÿPWiIi�where WiIi is a weighted sum of the inputs, Ii, to
neuron `i'. The resultant value is passed along to the
next layer after being multiplied by the connecting
weight. This process is repeated all the way from the
input layer to the output layer. The goal here is to
generate an accurate mapping between input (project
attributes) and output (software development cost)
patterns.
Different learning procedures have been proposed
to train the network to generate appropriate output
patterns for corresponding input patterns. One of the
most commonly used is called back-propagation, in
which the weights are modi®ed in such a way as to
reduce the error between actual and correct outputs on
sample patterns. The error is determined by comparing
the network's actual output pattern with an a priori
known output pattern. The difference or error between
the two is `back-propagated' through the net by modi-
fying the weights (see [7, 13] for recent business
applications of back-propagation neural networks).
Srinivasan and Fisher point out that the perfor-
mance of neural network approaches is very sensitive
to con®guration choices, such as the number of hidden
units, the stopping criteria, and the initial weight
settings. The appropriate settings of these choices
can only be determined empirically. Thus, the manner
in which the network should be trained is a concern.
We give here a new approach that integrates neural
network methods with cluster analysis to improve both
training ef®cacy and network performance.
3. The approach
Our approach involves two phases: the ®rst groups
similar projects together by cluster analysis to facil-
itate the training of the neural network in the second
4 A. Lee et al. / Information & Management 34 (1998) 1±9
phase. Cluster analysis is designed to identify similar
objects in an n-dimensional space, where n is the
number of descriptive attributes of the object. When
applied to the problem domain of software develop-
ment cost estimation, it is assumed that similar pro-
jects share similar development cost. The similarity
among different projects, once computed, can then be
used as a valuable piece of input information to
enhance the training ef®ciency of the network.
3.1. Cluster analysis
Projects are grouped together into clusters based on
a similarity measure termed a resemblance coef®cient.
Two kinds of coef®cients are computed, depending on
the types of attributes. For quantitative attributes, the
average Euclidean or RMS distance between two
projects in an n-dimensional space is used. It is de®ned
as:
djk ���������������������������������Pn
i�1�xij ÿ xik�2n
swhere djk is the average Euclidean distance between
projects j and k, xij the value of project j's attribute i, xik
the value of project k's attribute i, and n the number of
quantitative attributes. The average Euclidean dis-
tance is, in fact, a measure of dissimilarity between
two projects: the smaller the value of the coef®cient,
the more similar are the two projects.
For nominal attributes, the Jaccard coef®cient is
used and is de®ned as:
Cjk � ÿ1� N�1ÿ 1�2� N�Data� ÿ N�1ÿ 1�
where Cjk is the Jaccard coef®cient of projects j and k,
N(1ÿ1) the number of matches between projects j and
k over all nominal attributes and N(Data) the total
number of nominal attributes. Like the average Eucli-
dean distance, a smaller value of the Jaccard coef®-
cient indicates a higher similarity between two
projects. Since the two coef®cients have different
ranges of values, the two coef®cients are converted
to standard deviations using the standard score method
before combining them.
A tree is then constructed based on the combined
resemblance coef®cients by using a hierarchical clus-
tering analysis technique, the unweighted pair-group
method. This iteratively selects the two most similar
objects to cluster into one new `object' until all objects
are clustered. Various ways of forming the clusters can
be read off from the tree. Our strategy is to cut the tree
at the point where the range of the resemblance
coef®cient is the highest, because a large range in
the value of the resemblance coef®cient indicates that
the resulting clusters are well separated in the attribute
space.
3.2. Neural network
In phase two, a neural network is ®rst trained, based
only on attributes given from the project description to
determine the appropriate settings for the following
network con®guration: (1) the number of neurons per
layer; (2) the size and selection of training and testing
data; and (3) the choice of the activation function.
These network con®guration parameters can only be
determined empirically as different problem domains
require different settings. Hence, the network is
trained twice with the intent that the best con®guration
choice will be decided in the ®rst round. Then the
information from phase one ± the cluster analysis and
the preliminary neural network ± is fed as input to a
second round of neural network training to complete
the task of software development cost estimation. Our
experimental study indicates that the proposed
approach can lead to improved network performance.
4. Experimental study
The approach was tested by using the COCOMO
dataset. Based on a regression analysis of 63 projects,
Boehm developed three forms of COCOMO: the
basic, intermediate, and advanced. The basic model
produces a base estimate of development effort using
KLOC only; the intermediate model adds 15 qualita-
tive cost drivers to improve the base estimate. These
cost drives are classi®ed into four categories: software
product attributes; computer attributes; personnel
attributes; and project attributes, as shown in Table 3.
The advanced model assesses the cost drives at each
development phase. In addition to the 15 cost drivers,
the COCOMO dataset also has other attributes, giving
a total of 39 descriptive project attributes. A complete
list is included in the Appendix A.
A. Lee et al. / Information & Management 34 (1998) 1±9 5
4.1. Cluster analysis
In phase one, 24 of the 39 attributes were selected as
critical cost-determining factors for cluster analysis.
These attributes are ones that are used in the inter-
mediate and advanced models of COCOMO. The 63
projects were selected in six different ways (as shown
in Table 4) to serve as data for cluster analysis.
These six ways of clustering were compared using a
common set of testing data. Assuming that two pro-
jects sharing similar project attributes will have simi-
lar software development cost, each project in the
testing set was matched with a cluster and its software
development cost was estimated as the ranked-sum-
mean of all the cost of the projects in that cluster. The
error between the estimated cost and the actual cost
could then be measured. The average percentage
estimation error was then used as the basis for select-
ing the `best' way of clustering all 63 projects of the
COCOMO data. According to the results reported in
Table 5, the projects should be clustered in ways
suggested by using DATA25-3, which yields the low-
est average error in testing. This additional clustering
information, i.e., to which cluster each project
belonged, was passed to phase two.
4.2. Neural network
The appropriate network con®guration choices and
its sensitivity to various input data and activation
functions were then analyzed in a series of four
experiments. The ®rst determined the best network
con®guration among six different settings. The set-
tings are denoted by three numbers of the form,
m : n : o, where m is the number of neurons in the
input layer, n the number of neurons in the hidden
layer, and o the number of neurons in the output layer.
The same set of training and testing data (involving all
63 projects) were used. Of all the projects, 50 were
randomly selected as training data and the remaining
13 were used for testing. All 39 project attributes could
be used without screening, since the neural network
approach has the ability to discern relevant attributes
from irrelevant ones. The second experiment deter-
mined the sensitivity of the network towards input data
by training the network using three different sets of
training and testing data. The last combined the results
of the ®rst two to ®nalize the best setting of the
network con®guration. The results of each experi-
ment are shown in Tables 6±8. The best con®gura-
tion is found to be 20 : 15 : 1 trained by using 41
projects (34 for training and 7 for testing). The 41
projects are selected from the original 63 by eliminat-
ing extreme cases so that the range of the actual
development effort is reduced from 11 400 to 440
man-months.
Table 3
Cost drivers in intermediate COCOMO
Product attributes Required software reliability
Database size
Product complexity
Computer attributes Execution time constraint
Main storage constraint
Virtual machine volatility
Computer turnaround time
Personnel attributes Analyst capability
Applications experience
Programmer capability
Virtual machine experience
Programming language experience
Project attributes Modern programming practices
Use of software tools
Required development schedule
Table 4
Six datasets for cluster analysis
Dataset Means of selection
DATA50 Select 50 projects randomly
DATA34 Select 41 projects, excluding extreme
cases based on actual man-months
DATA25-1 Select 25 projects from DATA34 randomly
DATA25-2 Select 25 projects from DATA34 randomly
DATA25-3 Select 25 projects from DATA34 randomly
DATA25-4 Select 25 projects from DATA34 randomly
Table 5
Performance comparison of the cluster analysis datasets
Dataset Average % error
DATA50 57%
DATA34 39%
DATA25-1 37%
DATA25-2 29%
DATA25-3 26%
DATA25-4 36%
6 A. Lee et al. / Information & Management 34 (1998) 1±9
The cost estimates obtained from both the cluster
analysis and preliminary network analysis were used
as additional input attributes to train the neural net-
work a second time using the best con®guration
choices found earlier. In other words, the 41 project
cases (34 training and 7 testing) were used to train a
neural network of con®guration 20 : 15 : 1. The per-
formance of the integrated network was compared to
the one without integration by using four different sets
of testing data. Signi®cant improvement in network
performance in terms of estimation accuracy was
found in all four cases, as show in Table 9.
5. Conclusion
We demonstrated in this paper that integrating
neural network with cluster analysis is a viable and
promising approach to provide relatively accurate
estimates of software development cost. By integrat-
ing neural network with cluster analysis, one can
increase the training ef®cacy of the network, resulting
in a more accurate cost estimate than by using a pure
neural network approach. The estimates are derived
early on in the software development life cycle so that
appropriate software project management and control
can be exercised.
Acknowledgements
Dr. Balakrishnan's research is supported by the
Natural Sciences and Engineering Research Council
(NSERC) of Canada.
Appendix A
Project attributes in the COCOMO dataset
Project attributes
1 Project type
2 Year developed
3 Programming languages
4 Required software reliability
5 Database size
6 Product complexity
7 Adaptation adjustment factor
Table 6
Result of Experiment 1
Network configuration
(i : j : k)
Best average
% error
No. of iteration
10 : 0 : 1 495% 100000
15 : 0 : 1 346% 40000
20 : 0 : 1 494% 1000
10 : 5 : 1 148% 9000
15 : 10 : 1 238% 5000
20 : 15 : 1 382% 1000
Note: i is the number of neurons in the input layer, j the number of
neurons in the hidden layer, and k the number of neurons in the
output layer.
Table 7
Result of Experiment 2
Best average % error
Network
configuration
63 Projects
(50 training,
13 testing)
47 Projects
(38 training,
9 testing)
41 Projects
(34 training,
7 testing)
10 : 0 : 1 495% 77% 281%
15 : 0 : 1 346% 321% 186%
20 : 0 : 1 494% 92% 101%
10 : 5 : 1 148% 78% 62%
15 : 10 : 1 238% 65% 42%
20 : 15 : 1 382% 109% 36%
Table 8
Result of Experiment 3
Network configuration Best average % error
(41 projects)
10 : 0 : 1 281%
15 : 0 : 1 186%
20 : 0 : 1 101%
10 : 5 : 1 62%
15 : 10 : 1 42%
20 : 15 : 1 36%
30 : 20 : 1 178%
40 : 30 : 1 67%
50 : 40 : 1 156%
Table 9
Network performance comparisons on best average % error
Testing cases Pure NN NN�cluster Improvement
Set 1 36% 32% 12%
Set 2 37% 23% 37%
Set 3 62% 30% 51%
Set 4 52% 35% 33%
A. Lee et al. / Information & Management 34 (1998) 1±9 7
8 Execution time constraint
9 Main storage constraint
10 Virtual machine volatility
11 Computer turnaround time
12 Type of computer used
13 Analyst capability
14 Project team experience
15 Programmer capability
16 Virtual machine experience
17 Programming language experience
18 Personnel continuity on project
19 Modern programming practices
20 Software tools
21 Required development schedule
22 Requirement volatility effort multipliers
23 Effort multipliers
24 Software development mode
25 Total delivered source instructions in thousands
26 Adjusted delivered source instructions in thousands
27 Nominal man-months
28 Intermediate estimated man-months
29 Percentage estimation error in man-months estimation
30 Project productivity
31 Estimated development time in months
32 Percentage estimation errors for months estimation
33 Detailed estimated man-months
34 Percentage estimation error for detailed estimated man-
months
35 Normalized effort parameter
36 Basic estimated man-months
37 Basic estimation error ratio
38 Thousands of pages of project documentation
39 Pages of documentation per thousand source instruction
References
[1] J.W. Bailey, V.R. Basili, A meta-model for source development
resource expenditures, Proceedings of the Fifth International
Conference on Software Engineering, 1981, pp. 107±116.
[2] F. Bergeron, J. St-Arnaud, Estimation of information systems
development efforts: Pilot study, Information and Manage-
ment 22(4), 1992, pp. 239±254.
[3] B.W. Boehm, Software Engineering Economics, Englewood
Cliffs, Prentice-Hall, NJ, 1981.
[4] B.W. Boehm, P.N. Papaccio, Understanding and controlling
software costs, IEEE Transactions on Software Engineering
14(10), 1988, pp. 1462±1477.
[5] F.P. Brooks, The Mythical Man-Month: Essays on Software
Engineering, Reading, Addison-Wesley, MA, 1982.
[6] R.D. Ermick, In search of a better metric for measuring
productivity of application development, Proceedings of
Function Point Users Group Conference, 1987.
[7] D. Fletcher, E. Goss, Forecasting with neural networks: An
application using bankruptcy data, Information and Manage-
ment 24(3), 1993, pp. 159±167.
[8] J.R. Herd, J.N. Postak, W.E. Russel, K.R. Stewart, Software
Cost Estimation Study-Study Result. Technical Report
RADC-TR-77-220, Doty Associates, Inc., Rockville, MD,
1977.
[9] P. Langley, H.A. Simon, Applications of Machine Learning
and Rule Induction, Communications of the ACM 38(11)
55±64.
[10] L.A. Laranjeira, Software size estimation of object-oriented
systems, IEEE Transactions on Software Engineering 16(5),
1990, pp. 510±522.
[11] W.E. Lehder Jr., D.P. Smith, W.D. Yu, Software estimation
technology, AT & T Technical Journal (1988) 10±18.
[12] E.Y. Li, Arti®cial neural networks and their business applica-
tions, Information and Management 27, 1994, pp. 303±313.
[13] R.W. Lodewyck, P.S. Deng, Experimentation with a back-
propagation neural network: An application to planning and
user system development, Information and Management
24(1), 1993, pp. 1±9.
[14] G.C. Low, D.R. Jeffery, Function points in the estimation and
evaluation of the software process, IEEE Transactions on
Software Engineering 16(1), 1990, pp. 64±71.
[15] A. Porter, R. Selby, Empirically-guided software develop-
ment using metric-based classi®cation tree, IEEE Software
7(5), 1990, pp. 46±54.
[16] R. Selby, A. Porter, Learning from examples: Generation and
evaluation of decision trees for software resource analysis,
IEEE Transactions on Software Engineering 14, 1988, pp.
1743±1757.
[17] J.W. Shavlik, R.J. Mooney, G.G. Towell, Symbolic and
neural learning algorithms: An experimental comparison,
Machine Learning 6(2), 1991, pp. 111±143.
[18] K. Srinivasan, D. Fisher, Machine learning approaches to
estimating software development effort, IEEE Transactions
on Software Engineering 21(2), 1995, pp. 126±136.
[19] K.Y. Tam, Automated construction of knowledge-bases from
examples, Information Systems Research 1(2), 1990, pp.
144±167.
[20] C.E. Walston, C.P. Felix, A method of programming
measurement and estimation, IBM Systems Journal 16(1),
1977, pp. 54±73.
[21] Y. Yoon, T. Guimaraes, Selecting expert system development
techniques, Information and Management 24(4), 1993, pp.
209±223.
Anita Lee is an Associate Professor of
the Decision Science and Information
Systems area at the University of Ken-
tucky. She received her Ph.D. in Business
Administration from the University of
Iowa in 1990. Her research interests
include artificial intelligence, machine
learning, knowledge-based systems,
computer integrated manufacturing, and
group technology. She has published
extensively in numerous refereed jour-
nals including Annals of Operations Research, Expert Systems,
IEEE Expert, International Journal of Production Research, etc..
8 A. Lee et al. / Information & Management 34 (1998) 1±9
She is currently an associate editor for Journal of Database
Management.
Chun Hung Cheng obtained his Ph.D.
in Business Administration from the
University of Iowa and started his
teaching career at Kentucky State Uni-
versity. He returned to Hong Kong in
1994 and is now an Associate Professor
at the Chinese University of Hong Kong.
He conducts research in Information
Systems and Operations Management.
His research articles have appeared in
journals including Annals of Operations Research, Expert Systems,
Expert Systems with Applications, IEEE Transactions on Man,
Systems, and Cybernetics, IIE Transactions, International Journal
of Production Research, Operations Research, etc.
Jaydeep Balakrishnan is currently
Associate Professor of Operations Man-
agement in the Faculty of Management
at the University of Calgary. He has a
Ph.D. from Indiana University and an
MBA from the University of Georgia,
both in Operations Management. His
undergraduate degree is in Mechanical
Engineering from Nagpur University in
India. He has also worked for the
automobile industry in India. Dr. Balakrishnan's research interests
include facility layout. He has published in journals including
Management Science, The European Journal of Operational
Research, and OMEGA. He has also presented papers at various
international conferences. During 1995±96 he was a Visiting
Scholar at the Chinese University of Hong Kong.
A. Lee et al. / Information & Management 34 (1998) 1±9 9