Computer-aided decisions in human services: Expert systems and multivariate models

Computers in Human Behavior, Vol. 5, pp. 47-60, 1989 0747-5632/89 $3.00 + .00 Printed in the U.S.A. All rights reserved. Copyright © 1989 Pergamon Press pie

Computer-Aided Decisions in Human Services: Expert Systems and

Multivariate Models

Fiore Sicoly

Abstract-- Two approaches to the development of computerized supports for decision making are compared. Expert systems attempt to codify the formal and heuristic knowledge of human experts in the form of rules. However, estimates of relationships and probabilities provided by experts are prone to error and distortion. In contrast, multivariate procedures derive rules and relationships empirically by using the accumulated evidence from hundreds of cases on a data base. Results from several fields have consistently demonstrated that conclusions generated using statistical models are equal or superior to the decisions made by clinicians. The major strength of expert systems is the use of natural language and explanation facilities which make them more intelligible to the user. Combining this aspect of expert systems with the power of multivariate procedures may allow for the development of an approach that achieves optimal peoror - mance but is also more acceptable and accountable to the user.

Decision support systems in human services are now receiving more attention because of recent advances in computer technology. Decision aids make use of the computer's memory, reliability, and processing capabilities to complement the skill of professionals in making complex decisions. This paper considers two approaches to the development of computerized supports for decision making, expert systems, and multivariate models. Despite differences in implementation, the logic underlying these two approaches is much the same, especially when they are applied to problems of diagnosis or classification. It is proposed that an integration of these procedures, exploiting their unique strengths, would enhance both the performance and acceptability of computer-aided supports to decision making.

EXPERT SYSTEMS

Expert or knowledge-based systems have their origin in artificial intelligence. Such systems concentrate on capturing the knowledge of a human expert with the goal of developing a computer program that achieves performance levels comparable

Requests for reprints should be addressed to Fiore Sicoly, East York Board of Education, 840 Coxwell Avenue, Toronto, Ontario, Canada M4C 2V3.

47

48 Sicoly

to the expert. Expert systems are designed to perform the same role as the human expert consultant. They provide advice in situations where specialized knowledge and experience are needed. There are two major components of an expert system, the knowledge-base and inference engine. The knowledge-base comprises facts or beliefs in some domain of interest written generally as a collection of rules consist- ing of z f . . . then statements. The zf portion of the statement specifies an antecedent condition and the then portion specifies the consequence or conclusion that can be deduced to be true. Many existing expert systems contain hundreds of rules. MYCIN, a system used in medical diagnosis and one of the first to be developed, has approximately 500 rules. Many basic features of expert systems were worked out by MYCIN developers.

The inference engine (or rule interpreter) provides the general procedures the system uses in searching through the knowledge-base. The program refers to these rules as it attempts to arrive at a diagnosis or classification for a particular case. If there is a match between descriptive data for the case and specific zf statements in the knowledge base, the relevant then statements are inferred to be true.

Expert systems have been developed for use in several areas. Some of the better known expert systems include DENDRAL, which helps to determine the structure of complex organic molecules, INTERNIST/CADUCEUS, which contains rules linking over 600 diseases to thousands of symptoms, and P R O S P E C T O R , which was built to aid geologists in evaluating mineral sites. One area of human service, other than medicine, that has embraced expert system technology is that of educational diagnosis. Colbourn and McLeod (1983) reported the development of a system that takes the user from the initial suspicion that a reading problem may exist through to the formulation of an appropriate remedial strategy. Hof- meister and Lubke (1986) referred to a prototype expert system that generates behavior management advice for teachers. In 1984, an Artificial Intelligence (AI) Research and Development Unit was established at Utah State University for the purpose of exploring AI applications to special education. Working out of this unit, Ferrara and Hofmeister developed an expert system, CLASS.LD, to provide a second opinion about the accuracy of placement decisions for learning disabled students. According to Martindale, Ferrara, and Campbell (1987), an updated version of this program, CLASS.LD2, contains approximately 600 if-then rules in its knowledge base and provides conclusions with associated certainty factors. Re- cently, a special issue of Educational Technology (Tennyson & Ferrara, 1987) was devoted to exploring the current and future directions of AI in education.

MULTIVARIATE MODELS

Statistical procedures have also been used to support the decision making process. The most common of these are Bayesian techniques, which have been popular in medicine, and multivariate techniques such as discriminant function or multiple regression analysis. This article concentrates on multivariate methods.

Multivariate procedures have the capacity to summarize, in a single index, information from a large number of variables and hundreds of cases. The selection of relevant variables and the development of a reliable measurement process are essential to the success of the multivariate analysis. A structured instrument is used to collect a consistent base of information for a sample of cases. The formation of

Computer-aided decisions 49

two or more groups is possible by dividing cases where the outcome or decision of interest has occurred from cases where the outcome has not occurred. Results of the analysis identify how information from the descriptor variables should be combined to achieve optimal discrimination between groups.

The adequacy of the statistical model is tested using a different sample of cases (the validation sample). Accuracy is examined by comparing the model's predictions with actual case outcomes. Following this validation process, the discriminant function(s) or regression equation that has been derived can be used to classify new cases of unknown group membership. Once descriptive information about the new case has been collected for the same variables used in the linear equation, these values can be entered into the equation in order to generate the probability that the case belongs to a particular group. In essence, the program compares the profile of new cases with completed or closed cases already on the data base. One application of this approach is the Missouri Standard System of Psychiatry (Hedlund, Sletten, Evenson, Altman, & Cho, 1977), which routinely uses large sample multivariate procedures with split-half derivation and validation subsamples. Input variables have included 73 admission face sheet items and 111 mental status items. The 20 best predictor variables are selected from stepwise multivariate analyses. The Missouri data base has been used to predict psychiatric diagnosis, the assign- ment of drug treatment, unauthorized patient absence, and length of stay.

DIMENSIONS OF EXPERT SYSTEMS AND MULTIVARIATE MODELS

The focus of this article is on computerized systems that assist professionals with tasks related to diagnosis or classification in human services. According to Duda and Shortliffe (1983), the simplest and most successful expert systems are classification programs. Their purpose is to evaluate descriptive information for a given case to decide how it should be categorized. This definition also applies to multivariate procedures such as discriminant analysis. Expert systems and statistical models both have a knowledge base containing rules, relationships, and probabilities. Also included are strategies to search through the knowledge base, synthesize information, and arrive at a recommendation for cases of unknown classification. Expert systems and multivariate models generate results that include a degree of uncertainty; the inferences derived are not absolute but probabilistic in nature.

Despite some overlap in the logic underlying these two approaches, a number of differences become apparent when looking at aspects of the implementation process.

Identifying Rules and Relationships

The knowledge base is the most vital but also the most vulnerable aspect of expert system development. The ideal situation is to have an expert who rarely makes mis- takes. This may be difficult to achieve in human services where there is often a lack of well-established and widely accepted empirical knowledge. The correct response to a particular case may often be a matter of opinion. Interviewing professionals and trying to represent their expertise in the form of rules is one of the most complex and arduous tasks encountered in the construction of an expert system (Duda & Shortliffe, 1983). Two kinds of knowledge may be identified and encoded,

50 Sicoly

private and public. Public knowledge includes factual material that is widely shared and can be thought of as textbook knowledge. Experts are also considered to pos- sess private knowledge not in the published literature. This latter type of knowledge consists of intuitive judgment and rules of thumb (heuristics) that enable the expert to make educated guesses and deal with inexact or incomplete information. However, it would be dangerous to assume that when experts articulate their experience in rule form they do so without error or distortion. Experts' estimates of relationships and probabilities can deviate dramatically from objective estimates (Fox, 1984).

Statistical models are able to capture relationships in a concise, efficient, and reliable manner while avoiding many of the biases that may undermine clinical judgment (Arkes, 1981; Elstein, 1979; Faust, 1986). Unlike expert systems, the use of statistical inference does not require relationships between antecedent conditions and outcomes to be specified a priori. The relationships are derived using the accumulated evidence from hundreds of cases. An assumption underlying the use of statistical models is that cases comprising the data base have been adequately sampled. Although it is true that an expert's knowledge is based on observation and experience, it is unlikely that such information would be as complete or accurate as rules and relationships extrapolated from a comprehensive data base. Sys- tematically pooling the direct experience of many experts with many cases would lead to conclusions that are generally better than those derived from the introspections of one or two experts. This advantage would be accentuated as the level of agreement declines among experts in a particular field.

Deafing With Uncertainty

According to Shortliffe, Buchanan, and Feigenbaum (1984), the question of how to manage uncertainty remains a central issue in the development of expert systems. Most expert systems that can tolerate uncertainty employ some kind of probability measure to weigh and balance equivocal rules and relationships in the knowledge base. Uncertainty in a knowledge-based system reflects the confidence or belief of the expert and is not really a true probability. The statistical approach has the advantage of allowing for a direct assessment of the certainty or strength of relationships for variables and cases on the data base. Multivariate results indi- cate how effective the variables have been in explaining the phenomenon of interest, and the level of ignorance that remains.

Semblance To Human Reasoning

Although expert systems do not use human reasoning, they sometimes try to incor- porate problem-solving strategies used by human experts. However, reconstructing the process of human reasoning is a formidable task. According to Nisbett and Wil- son (1977), people have little ability to directly observe and verbally report upon higher order mental operations. Fox, Myers, Greaves, and Pegram (1985) outlined several methods of acquiring knowledge from experts. Their preferred method was the analysis of thinking-aloud protocols, an approach which originated in cognitive psychology. Protocol analysis collects information during actual performance of the task rather than through later questionnaires or interviews. Fox et al. considered informal techniques such as interviews with experts followed by the formulation of decision rules as too time consuming and unreliable. Although verbal


protocols offer some improvement over retrospective analysis, which is often used in the development of expert systems, this technique is not without limitations. According to Pitz and Sachs (1984), it is likely that verbal protocols can provide information about deliberately selected judgment strategies but not about more automatic, intuitive processes. Kassirer, Kuipers, and Gorry (1982) pointed out that differences in cognitive style among experts make it difficult to identify common patterns of thought solely from introspections. There is also a need for developing adequate methods of analyzing data from verbal protocols.

Mathematical procedures are designed to achieve optimal task performance. There is no explicit attempt to simulate the reasoning process that people use. The data base for statistical models needs only the information describing each case and the decisions that have been made. The statistical program determines empirically the correlation between antecedent variables and decisions (or other critical case events), and what factors were most influential in the decision making process. There have been criticisms that linear models are too simplistic to capture the complex, interactive process of judgment. According to evidence reviewed in a later section, the performance of these models equals and often surpasses that of clinicians.

Structure of Input Data

The use of a structured format for gathering relevant clinical information is required for the application of statistical procedures such as discriminant analysis. One of the principal ways that standardized methodologies increase reliability and consistency of diagnosis is by making uniform the amount and type of information obtained. According to Robins and Helzer (1986), when diagnosticians use a free form investigative technique they usually choose a likely diagnosis within the first few minutes and spend the remainder of the time trying to confirm it. Their literature review also provides evidence that clinicians often omit collecting data even on topics that are essential in making a diagnosis. A standardized data collection process ensures that premature closure does not occur. A common base of information can be used to gradually develop diagnostic categories that can be reliably assessed, that have a common etiology, and that allow predicting the course of a disorder with or without treatment.

When the statistical model is used to classify a new case of unknown group membership, descriptive information about the case must be collected for the same variables used to derive the model. This, of course, reduces the freedom and flexibility of the user. Expert systems do not require a standardized body of input data before attempting to categorize a test case although the program may request a missing piece of information under some circumstances. The ability to function when relevant information is missing is an advantage of expert systems. However, it may also be possible to come to different conclusions depending on what information is submitted to the program as describing a particular case. There is no mecha- nism to ensure complete coverage or a consistent approach to data collection and diagnosis.

Overlaps Among Rules or Variables

The problem in classification or diagnosis in human services is to infer from a set of manifestations to a specific disorder or set of disorders. Often, manifestations

52 Sicoly

are not independent of one another but may be causally linked in intricate ways. Gevarter (1985) pointed out that expert system developers are currently limited in their ability to maintain consistency and resolve conflict between overlapping items in the knowledge base. As already indicated, some expert systems attach probabilities to facts and rules in the knowledge base. Unless rules are independent, these probabilities may lead to exaggerated confidence in the adequacy of conclusions that are derived. Overlaps among rules and relationships are difficult for the expert to estimate and difficult for the system to manage, especially as the number of items increases. For instance, with 10 rules there are 45 but with 20 rules there are 190 possible overlaps that would have to be taken into consideration.

Statistical methods have also been criticized for failing to adjust for interrela- tionships among different factors. This criticism applies to Bayesian techniques which assume the independence of input cues. In contrast, a major advantage of multivariate procedures is their capacity to deal with information from many variables and cases while controlling for interdependence among the variables. The predictions generated by the program are adjusted for overlaps among variables so that redundant information does not inflate the final probabilities.

Updating the Knowledge Base

In an expert system, the rules must be changed or updated by the programmer based on feedback from the expert. Progressive refinement is easily accomplished for a number of reasons (Fox, 1984). First, the rule base is modular; each rule deals with a small independent piece of knowledge from the domain. Second, because the knowledge base is separated from the control mechanisms, the addition of new rules does not create new bugs as is typical with conventional programs. This is because each new rule should be logically independent from the older rules.

With statistical models, the rules and relationships are empirically determined. The relationships depend on the patterns that emerge for cases and variables on the data base. As the number and nature of cases on the data base change, the rules or relationships that are generated could conceivably change. However, the larger the number of cases used to derive the statistical model, the less the model is likely to be affected as new cases are added. If patterns do change as new cases are introduced, this will automatically be reflected in the computer output. This feature makes the statistical model self-correcting since there is no need to modify the underlying program. However, accumulating a sufficiently large sample may not always be practical. As well, transporting a particular model to another location may not be feasible unless there is some assurance that the derivation sample was representative of cases encountered in the second setting.

Explanation Facilities

One of the greatest strengths of an expert system is the use of natural language to communicate with the user. It can explain the logic it is using and can ask questions where needed information is missing. Each question asked by the system is the consequence of its attempt to apply a particular rule. The user provides single word answers including unknown when no information is available. The user may also respond with a number of commands including the why command. Expla- nation facilities expose the program's line of reasoning in a way users can under- stand and critique. The transparency of the program makes for a more accountable


and credible system. For instance, according to Hofmeister and Lubke (1986), CLASS.LD2 allows the user to obtain a printed record of the rules used by the program and how they were applied in reaching a conclusion that a student was or was not learning disabled. The record shows the questions asked by the computer program, the answers the user provided, and the rules that were applied to make judgments based upon these answers.

Multivariate procedures such as discriminant analysis are available as part of major statistical packages (e.g., SPSS, SAS), including versions for microcom- puters. However, application of these procedures, and especially interpretation and utilization of results, requires a good understanding of statistics. In statistical systems the methods for deriving rules and relationships are embedded in mathematical formulas making the internal workings of the program largely inaccessible to the average user. Although multivariate models might be considered more "intel- ligent," they are also less intelligible. Combining the explanation facilities of expert systems with the power of multivariate procedures has the potential to create an approach that exploits the major strength and eliminates the major weakness of each method.

VALIDATION AND PERFORMANCE OF DECISION MAKING SYSTEMS

Validation is one of the most important phases of development both for expert systems and multivariate models. Whether a system is based on qualitative or quan- titative data, it should satisfy standard criteria of reliability and validity. Validation has focused on whether the conclusion generated by the computerized system con- stitutes appropriate advice given the scope of questions the system was designed to answer. Test cases have been used to compare the recommendations produced by the computer program with some standard of correctness. In some areas of medicine these recommendations can be objectively verified using laboratory tests or other means. In human service fields such as education, psychology, and social work, disorders cannot be identified from blood tests or biopsy. One has only the client's behavior. In the absence of objective criteria, computer generated results must be evaluated by comparison with the decisions of experts reviewing the same test cases. Evaluation of a computer model by using clinical judgment as the criterion for accuracy can be complicated since assessment of the same case by different experts often produces different conclusions. We may be unable to determine the precise accuracy of a computerized system when we do not always know what the correct answer should be for the test cases. In human services, a low level of agreement between the computer model and expert opinion may reflect unrelia- bility in the criterion (i.e., clinical judgment) rather than failure of the computer model. Therefore, the performance of a computerized system should not be judged only according to its level of agreement with a human expert. The degree of con- sensus among experienced professionals must also be taken into account. If the computerized system approaches or surpasses the level of agreement among human experts, its performance can then be considered comparable to that of an expert.

Vafidation of Expert Systems

The application of expert system technology to human services is still in its infancy. It is therefore not surprising that validated systems are relatively scarce. Accord-

54 Sicoly

ing to Duda and Shortliffe (1983), the more mature expert systems have under- gone systematic evaluation to assess their performance relative to some accepted criterion such as agreement with human experts. Controlled evaluations of MYCIN (Yu et al., 1979) and INTERNIST (Miller, Pople, & Myers, 1982) have shown the computer recommendations to be as good as those provided by experts, although Clancey and Shortliffe (1984) indicated that validation has taken place in hypothetical experimental settings and not in active clinical environments.

In psychology, computer-based test interpretation (CBTI) systems have been developed with much of the work in this area focusing on the MMPI . While CBTI systems are not usually equated with expert systems, they too are predominantly aimed at modeling the judgmental process of a single clinician, and only partially based on actuarial or empirical data (Butcher, Keller, & Bacon, 1985). Despite their increasing popularity, the validity of CBTI results is not yet well-established and these systems remain controversial (Matarazzo, 1985). Investigations of the M M P I appear to be representative of the methods used to evaluate validity. Moreland (1987) pointed out that only a few studies have compared CBTI results with human interpretations using rigorous research designs. Most validity studies have involved asking CBTI recipients to rate the accuracy of the interpretations in view of their personal knowledge of test respondents. Consumer satisfaction studies have typically found accuracy rates close to 80%. Such findings should be viewed with caution because of lack of information on base rate accuracy. The issue of base rate accuracy can be addressed by having clinicians evaluate the accuracy of two CBTI reports, one genuine and one a bogus report which functions as a control. An unbiased estimate of accuracy can be obtained by comparing the two accuracy scores. In one such study (Moreland & Onstad, 1987), the accuracy of CBTI results was found to be 71% while the bogus reports received ratings as high as 44%.

Colbourn and McLeod (1983) developed an expert system intended to serve as a consultant in diagnosis and prescription of a remedial program for children with reading problems. According to these authors, the system's diagnoses were accurate, although details of the evaluation process were not presented. Martindale, Ferraro, and Campbell (1987) reported the results of a validation study of CLASS.LD2, an expert system designed to assist special education placement teams in the task of identifying learning disabled students. Data from 264 student files that had previously been reviewed by placement teams were used to evaluate the performance of CLASS.LD2. Of the total number of students, 110 had been identified by placement teams as learning disabled. For each student, CLASS.LD2's recommendation was compared with the actual decision made by the placement team. CLASS.LD2's decision did not match that of the placement team in 78 of the 264 cases. This accuracy rate of 70 % corresponds to a correlation (phi coefficient) of .39 and compares with an expected base rate accuracy of 52%. Most of the disagreements (68 out of 78) involved a student classified by the placement team as disabled but determined not to meet eligibility criteria by CLASS.LD2.

In a second component of the validation process, three experts independently reviewed the 78 cases where CLASS.LD2 and the placement teams had disagreed. The level of agreement between CLASS.LD2 and the experts was identical to the result obtained in the first phase of the evaluation (phi coefficient = .39). One important new finding emerged. The average correlation among judgments of the three experts was .58 compared to an average correlation of .40 between the experts


and CLASS.LD2. Since the larger correlation accounts for more than twice the variance, this clearly indicates a stronger tendency for experts to agree with each other than with the expert system. The gap between these correlations may be taken as a good indicator of how much CLASS.LD2 must improve in order to achieve a level of performance comparable to a human expert.

Validation of Statistical Models

Validation studies and evidence concerning the performance of statistical models come from several fields including psychology, medicine, and psychiatry.

Hoffman (1960) proposed that linear models could be used to represent expert judgment. He termed the linear model used to predict an expert's judgment a paramorphic representation. The paramorphic linear model has often been found to outperform the expert from which it is derived. This phenomenon is known as bootstrapping. In studies of bootstrapping, a clinician is usually asked to make diagnostic or prognostic judgments from a set of cues for each of a large number of target cases. The relationship between the input cues and the expert's assessment can be expressed as a multiple regression model that provides paramorphic representation of clinical judgment. Camerer (1981) reviewed 15 studies related to bootstrapping and found that bootstrapped models surpassed the expert's performance in 12 of these studies. The correlation between the clinician's prediction and actual outcomes was .33 averaged over the 15 studies, compared to an average correlation of .39 between bootstrapped judgments and actual outcomes.

In one study, Goldberg (1970) used MMPI profiles from 861 psychiatric patients who had been clearly diagnosed as either psychotic or neurotic. Twenty-nine clinical psychologists were then asked to make a diagnostic decision for each case based on the M M P I scores. It was then possible to compare the validity coefficients of various statistical models by looking at the degree of agreement between the model's predictions and actual diagnoses. When a model of each clinician was constructed from judgments of the 861 cases, 24 of the 29 models turned out to be more accurate predictors of the actual criterion diagnoses than were clinicians from whom the models were derived. Moreover, a simple linear regression model using a com- posite of five (out of 11) MMPI scales achieved a validity coefficient (r = .44) far greater than that of the average clinician (r = .28) and greater than that of the most accurate clinician (r = .39).

The extensive literature on statistical versus clinical prediction in psychology has consistently demonstrated the advantages of statistically combining information. Reviews by Meehl (1954), Sawyer (1966), Sines (1970), and Hedlund, Evenson, Sletten, and Cho (1980) concluded that statistical predictions were equal or superior to those made by clinicians. Sawyer reviewed 45 studies that produced 75 comparisons. In 47 comparisons results of statistical and clinical methods were approximately equal and in the remaining 28 comparisons statistical procedures were found to be decisively better than clinical judgment. According to Sawyer, the clinician may be able to contribute most as a valuable source of input data which should then be combined mechanically to achieve a decision or course of action with the highest probability of being correct.

It is especially revealing to look at some studies of statistical models in medicine where accuracy has been assessed using objective criteria, de Dombal (1976) reported the results of a study of 552 patients presenting with abdominal pain of

56 Sicoly

acute onset. A statistical formula, which had been developed using a prior survey of 700 cases, was used to select which of seven possible ailments was most likely. Computer generated diagnoses were saved and later compared with diagnoses reached by attending physicians, and also with the ultimate diagnoses verified at surgery or by appropriate tests. The physicians reached the correct diagnosis for 42-82 % of the 552 cases while the statistical program was correct 92 % of the time. de Dombal found that while the computer-based system was providing feedback to clinicians, the rate of perforated appendices before operation fell from 36% to 6% and the incidence of nondiseased appendices removed fell from 25% to 7%. Following the end of the experiment, decision making performance regressed to the previous levels.

Two teams of researchers (Altman, Evenson, & Cho, 1976; Fleiss, Spitzer, Cohen, & Endicott, 1972) compared alternative methods for generating psychiatric diagnoses by computer: Bayes method, discriminant analysis, and the logical decision tree approach. The two statistical methods, Bayes and discriminant function, used data for a sample of patients to devise an empirical classification scheme. None of the three approaches stood out as clearly superior when performance was tested using a validation sample, although all performed at a level comparable to that found among experienced professionals. The linear discriminant functions produced kappas of .54 and .47 with the diagnostic judgments of clinicians. This compared favorably with the level of agreement between well-trained clinicians (average kappa = .45). These researchers indicated that a low level of inter-clinician agreement may make it impossible to demonstrate a clear superiority of one computer model over another when agreement with expert diagnosis is used as the criterion for accuracy. Once the upper level of clinical reliability (i.e., agreement among clinicians) is reached, a higher level of agreement is unlikely to be demonstrated by the computer model. If the reliability of clinical diagnoses were to improve, the degree of agreement between each computer method and human experts would also increase.

CONCLUSION

The clinical-statistical controversy in psychology has spanned more than three de- cades and has been characterized as a war of attrition. A similar cycle of competi- tion between multivariate models and expert systems may be equally unproductive. An integrated approach that draws on the unique strengths of each methodology to offset inherent weaknesses may be mutually beneficial.

According to Fox (1984), the performance of expert systems is encouraging but as yet few have been introduced into clinical practice or had a full clinical trial car- ried out. Expert systems are still largely experimental. Construction of a complete, consistent, and accurate knowledge base is the major impediment to the development of an effective expert system. This development relies heavily on the intro- spection and verbal reports of experts, a process which can be time-consuming and unreliable. The decision rules and probabilities identified by experts may be prone to bias and error. These risks are accentuated when encoding private or heuristic knowledge that has not been properly validated.

Construction of an expert system may be most appropriate when there is a large amount of factual and empirical knowledge that is easily organized as rules. In fact,


many of the expert systems developed thus far have been in the field of medicine where a more formalized knowledge base exists. Physical disorders frequently have a specific etiology, prognosis and response to treatment, and clear base for differential diagnosis. For professionals such as psychiatrists, psychologists, and social workers who deal with abnormal behavior, the systematic search for causes and consequences, specific treatments, and differential diagnosis has proven to be a slow process (Quay, Routh, & Shapiro, 1987). In these human services, there is not an extensive core of formalized, readily available, and widely accepted empirical knowledge that can be translated into rules. Instead, disagreements among experts are common, a factor which would undermine the development of a credible expert system. As Schoech et al. (1985) pointed out, if experts cannot agree, then developing a generalized knowledge base is impossible and the system is limited to mimicking the idiosyncracies of a single clinician.

In human service environments, performance of a computerized system can be optimized by constructing a data base of many cases served by many experts. Mul- tivariate procedures would then be used to extract rules, relationships, and probabilities directly from the data base. This approach can be costly but creates a knowledge base that is considerably more reliable and valid than subjective estimates provided by experts. Statistical modeling avoids biases that may undermine clinical judgment, controls for interdependence among input variables, and provides an empirical measure of the degree of uncertainty associated with recommendations derived for each case. A standardized approach to data collection is required in order to implement multivariate models. This can be restrictive for the user but ensures consistency and increases the reliability of diagnosis and classification. The knowledge base for expert systems must be updated by a programmer. For statistical models, rules and relationships are empirically determined and so are updated automatically as changes occur in the number and nature of cases on the data base. In effect, the model is self-correcting. Evidence from several domains has consistently demonstrated that conclusions generated by multivariate procedures are equal or superior to the decisions made by clinicians.

Perhaps the most potent idea introduced by expert systems is that of explanation. The logic and decision rules of the system are made accessible, and the user is allowed to question the computer's reasoning in the same way as with a human colleague. The use of natural language and explanation makes these systems more intelligible and accountable to the user. Integrating this feature of expert systems with statistical models may allow for the development of an approach that achieves optimal performance and is also more acceptable to professionals. However, the compatibility of these two technologies has not been widely recognized. For instance, Thompson and Thompson (1986), discussing the knowledge acquisition bottleneck in expert system development, ask whether it is possible to find some way of organizing data to recognize patterns and extract knowledge directly from it. Duda and Gaschnig (1985) also referred to the possibility of developing a learning system that can induce rules from examples or cases. Apparently these authors were unaware that well-established computational techniques to accomplish this already exist. Multivariate procedures fulfill exactly this function when applied to a carefully constructed data base of case information. Rules generated by such empirical analyses may then serve as the foundation of an expert system.

Let us consider, for example, CLASS.LD2, an expert system designed to identify learning disabled students. As reported earlier, results of a thorough evalua-

58 Sicoly

tion indicated an average correlation of .40 between CLASS.LD2 and three experts who independently reviewed the same test cases. In contrast, the judgments of the three experts had an average correlation of . 58 with each other. The discrepancy between these two correlations shows how far CLASS.LD2 must go to attain a level of performance that is equal to a human expert. What would happen if the 264 cases used in the evaluation of CLASS.LD2 instead were used to develop a multivariate model for classification of students who may have learning disabilities? Such a statistical model could then be tested on a new sample of cases. It would be expected that the average correlation between recommendations of the multivariate model and decisions by experts would be as high as the correlation among the three experts referred to above. If this prediction was confirmed, it would then be possible to replace the existing CLASS.LD2 knowledge base and its 600 (approximate) if-then rules, with a data base and the resulting multivariate model. It would, of course, be essential to retain the user-friendly features of the expert system. This would include the ability to have questions generated by the program and answered by the user, and the ability to get a printed record of the rules used by the program and how they were applied in reaching a conclusion that the student was or was not learning disabled.

Initial optimism that computers would come to play a major role in clinical management and decision making has not yet been translated into practice. Few decision support systems have been implemented outside of a research environment even when performance has been shown to be excellent. Many professionals still believe that computer-based information processing systems are foreign to the decision making process and not applicable to dynamic and complex clinical problems. High levels of performance have been viewed with skepticism or as a prospective threat even though the intent is to develop helpful tools, not to usurp the clinician's authority and responsibility. There is still much to accomplish before gaining the confidence and support of professionals. A synthesis of multivariate procedures and expert system capabilities may be one strategy that has the potential to enhance both the performance and acceptability of computer-aided decision making systems.

Regardless of the approach that is used in constructing computerized systems, the importance of the validation process must be emphasized. Eyde and Kowal (1985), during their review of psychological support software, wondered whether consumer acceptability is replacing scientific evidence in the expanding market for such software. They stressed the need to monitor the development, documenta- tion, and marketing practices used in producing and promoting these support systems.

REFERENCES

Altman, H., Evenson, R.C., & Cho, D.W. (1976). New discriminant functions for computer diagnosis. Multivariate Behavioral Research, 11, 367-376.

Arkes, H.R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical Psychology, 49(3), 323-330.

Butcher, J.N., Keller, L.S., & Bacon, S.F. (1985). Current developments and future directions in computerized personality assessment. Journal of Consulting and Clinical Psychology, 53(6), 803-815.

Camerer, C. (1981). General conditions for the success of bootstrapping models. Organizational Behav- ior and Human Performance, 27, 411-422.

Clancey, w.J., & Shortliffe, E.H. (1984). Introduction: Medical artificial intelligence programs.


In W.J. Clancey & E.H. Shortliffe (Eds.), Readings in medical artificial intelligence: The first decade (pp. 1-17). Reading, MA: Addison-Wesley Publishing Company.

Colbourn, M., & McLeod, J. (1983). Computer guided educational diagnosis: A prototype expert system. Journal of Special Education Technology, 6(1), 30-39.

de Dombal, F.T. (1976). Computer-aided diagnosis: A practical proposition? In F.T. de Dombal & F. Gremy (Eds.), Decision making and medical care: Can information science help? (pp. 153-157). Amsterdam: North-Holland Publishing Company.

Duda, R.O., & Gaschnig, J.G. (1985). Knowledge-based expert systems come of age. In S.J. Andri- ole (Ed.), Applications in artificial intelligence (pp. 45-66). Princeton, N J: Petrocelli Books Inc.

Duda, R.O., & Shortliffe, E.H. (1983, April). Expert systems research. Science, 220, 261-268. Elstein, A.S. (1979). Human factors in clinical judgment: Discussion of Scriven's "Clinical Judg-

ment." In H.T. Engelhardt, Jr. , S.R. Spicker, & B. Towers (Eds.), Clinical judgment." A critical appraisal (pp. 17-28). Dordrecht, Holland: D. Reidel Publishing Company.

Eyde, L.D., & Kowal, D.M. (1985). Psychological decision support software for the public: Pros, cons, and guidelines. Computers in Human Behavior, 1, 321-336.

Faust, D. (1986). Research on human judgment and its application to clinical practice. Professional Psychology." Research and Practice, 17(5), 420-430.

Fleiss, J .L. , Spitzer, R.L., Cohen, J . , & Endicott, J. (1972). Three computer diagnosis methods compared. Archives of General Psychiatry, 27, 643-649.

Fox, J. (1984). Formal and knowledge-based methods in decision technology. Acta Psychologica, 56, 303-331.

Fox, J. , Myers, C.D., Greaves, M.F., & Pegram, S. (1985). Knowledge acquisition for expert systems: Experience in leukemia diagnosis. Methods of Information in Medicine, 24, 65-72.

Gevarter, W.B. (1985). Expert systems: Limited but powerful. In S.J. Andriole (Ed.), Applications in artificial intelligence (pp. 125-137). Princeton, NJ: Petrocelli Books Inc.

Goldberg, L.R. (1970). Man versus model of man: A rationale, plus some evidence, for a method of improving on clinical inferences. Psychological Bulletin, 73(6), 422-437.

Hedlund, J.L., Evenson, R.C., Sletten, I.W., & Cho, D.W. (1980). The computer and clinical prediction. In J.B. Sidowski, J .H. Johnson, & T.A. Williams (Eds.), Technology in mental health care delivery systems (pp. 201-235). Norwood, N J: Ablex.

Hedlund, J .L. , Sletten, I .W., Evenson, R.C., Altman, H. & Cho, D.W. (1977). Automated psychiatric information systems: A critical review of Missouri's Standard System of Psychiatry (SSOP). Journal of Operational Psychiatry, 8(1), 5-26.

Hoffman, P.J. (1960). The paramorphic representation of clinical judgment. Psychological Bulletin, 57, 116-131.

Hofmeister, A.M., & Lubke, M.M. (1986). Expert systems: Implications for the diagnosis and treatment of learning disabilities. Learning Disability Quarterly, 9, 133-137.

Kassirer, J.P., Kuipers, B.J., & Gorry, G.A. (1982). Toward a theory of clinical expertise. American Journal of Medicine, 73, 251-259.

Martindale, E.S., Ferrara, J .M. , & Campbell, B.W. (1987). A preliminary report on the performance of CLASS.LD2. Computers in Human Behavior, 3, 263-272.

Matarazzo, J.D. (1985). Clinical psychological test interpretations by computer: Hardware outpaces software. Computers in Human Behavior, 1, 235-253.

Meehl, P.E. (1954). Clinical versus statistical prediction." A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.

Miller, R.A., Pople, H.E., & Myers, J.D. (1982). INTERNIST-I, an experimental computer-based diagnostic consultant for general internal medicine. New England Journal of Medicine, 307, 468-476.

Moreland, K.L. (1987). Computer-based test interpretations: Advice to the consumer. Applied Psy- chology." An International Review, 36(3/4), 385-399.

Moreland, K.L., & Onstad, J.A. (1987). Validity of Millon's computerized interpretation system for the MCMI: A controlled study. Journal of Consulting and Clinical Psychology, 55, 113-114.

Nisbett, R.E., & Wilson, T.D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231-259.

Pitz, G.F., & Sachs, N.J. (1984). Judgment and decision: Theory and application. Annual Review of Psychology, 35, 139-163.

Quay, H., Routh, D.K., & Shapiro, S.K. (1987). Psychopathology of childhood: From description to validation. Annual Review of Psychology, 38, 491-532.

Robins, L.N., & Helzer, J.E. (1986). Diagnosis and clinical assessment: The current state of psychiatric diagnosis. Annual Review of Psychology, 37, 409-432.

60 Sicoly

Sawyer, J. (1966). Measurement and prediction, clinical and statistical. Psychological Bulletin, 66(3), 178-200.

Schoech, D., Jennings, H., Schkade, L.L., & Hooper-Russell, C. (1985). Expert systems: Artifi- cial intelligence for professional decisions. Computers in Human Services, 1(1), 81-115.

Shortliffe, E.H., Buchanan, B.G., & Feigenbaum, E.A. (1984). Knowledge engineering in medical decision making: A review of computer-based clinical decision aids. In W.J. Clancey & E.H. Shortliffe (Eds.), Readings in medical artificial intelligence: The first decade (pp. 35-71). Reading, MA: Addison-Wesley Publishing Company.

Sines, J.O. (1970). Actuarial versus clinical prediction in psychopathology. British Journal of Psychiatry, 116, 129-144.

Tennyson, R.D., & Ferrara, J. (1987). Introduction to special issue: Artificial intelligence in education. Educational Technology, 27(5), 7-8.

Thompson, B., & Thompson, W. (1986, Nov.). Finding rules in data. Byte, 149-158. Yu, V.L., Buchanan, B.G., Shortliffe, E.H., Wraith, S.M., Davis, R., Scott, A.C., & Cohen, S.N.

(1979). Evaluating the performance of a computer-based consultant. Computer Programs in Biomedi- cine, 9, 95-102.

Documents

Computer-aided decisions in human services: Expert systems and multivariate models