Critiquing As A Design Strategy For Engineering Successful ...bart.sys.virginia.edu/hci/papers/SGDissertation.pdf · Critiquing As A Design Strategy For Engineering Successful Cooperative

1

Critiquing As A Design Strategy For Engineering Successful Cooperative Problem-Solving Systems

By

Stephanie Anne Elisabeth Guerlain, Ph.D.

The Ohio State University, 1995

Dr. Philip J. Smith, Adviser

This research focused on the design of cooperative computerized decision aids, looking

in particular at the critiquing approach to providing decision support. Critiquing systems are

proposed to be more cooperative than decision support systems that are based on an

automation philosophy, since critiquing systems can support a human's decision-making

process, allowing the human to stay involved in the task, while providing context-sensitive

feedback when errors or faulty reasoning steps are detected. In addition, critiquing systems are

proposed to mitigate "the brittleness problem", i.e., the difficulty with which people are able to

detect and correct for faulty reasoning on the part of the computer. To test these proposals, a

part-task simulation study was run, comparing a critiquing system to no decision support on a

set of difficult immunohematology problems. Thirty-two certified medical technologists solved

an initial Pre-Test Case, after which members of the Treatment Group received a checklist

outlining the higher-level goal structure of the computer's knowledge base, and were trained on

the use of the critiquing system. All subjects then solved four Post-Test cases, one of which was

outside the range of cases that the computer was designed to support. (The Treatment Group

continued to use the critiquing system and checklist and the Control Group received no

decision support.) The results showed that the Treatment Group had a lower misdiagnosis rate

2

on all of the Post-Test Cases, with 100% correct performance on the three cases for which the

system was designed as compared to misdiagnosis rates of 33%, 38% and 64% incorrect for the

Control Group on the three respective cases (each difference is statistically significant, p < 0.05).

On the case for which the system's knowledge base was not fully competent, the Treatment

Group had an 18.75% misdiagnosis rate as compared to a 50% misdiagnosis rate for the Control

Group (p < 0.10). A detailed analysis of the behavioral protocols indicated that both the

checklist and the critiquing functions significantly contributed to these improvements in

performance and provided insight into how to design effective decision support tools.

Critiquing As A Design Strategy For Engineering Successful Cooperative

Problem-Solving Systems

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree

Doctor of Philosophy in the Graduate School of the Ohio State University

By

Stephanie Anne Elisabeth Guerlain, B.S., M.S.

************

The Ohio State University

1995

Dissertation Committee: Approved by:

Philip J. Smith, Ph. D.

David D. Woods, Ph. D. _________________

B. Chandrasekaran, Ph. D. Adviser

Department of Industrial & Systems Engineering

Graduate Program

Copyright by

Stephanie Anne Elisabeth Guerlain

1995

i

To my husband, Robert J. Haschart

ii

Acknowledgements

I would like to express my sincere thanks to my adviser, Dr. Philip J. Smith, for his

outstanding insight and guidance throughout my graduate career. I would also like to thank

my other committee members, Dr. David D. Woods and Dr. B. Chandrasekaran, for their input

and excellent course offerings. Patricia Strohm and Sally Rudmann provided invaluable blood

bank expertise and Larry Sachs was of great help for the statistical analyses. This project was a

team effort, and I would like to acknowledge fellow graduate students Thomas E. Miller, Susan

Gross, Jodi Obradovich and Craig Tennenbaum who also worked on the design, development,

and evaluation of various aspects of the Antibody IDentification Assistant. Melinda Green has

been an excellent support staff member of the Cognitive Systems Engineering Laboratory,

keeping us all organized. Finally, I would like to acknowledge the support of my family,

friends and, in particular, my husband, Bob, for their undoubting faith and patience. The

education I received at Ohio State was outstanding, and I owe that to the people with whom I

had the pleasure to work.

iii

Vita

October 13, 1967 ................................................... Born, Norwalk, Connecticut 1990 ..................................................................... B. S., Engineering Psychology magna cum laude Tufts University Medford, Massachusetts 1989 - 1990 ............................................................ Member of the Technical Staff, User-System Interface Group The MITRE Corporation Bedford, Massachusetts 1992 ..................................................................... Human Interface Designer Development Tools Group Apple Computer Cupertino, California 1993 .................................................................... M. S., Industrial & Systems Engineering The Ohio State University Columbus, Ohio 1995 .................................................................... Presidential Fellow The Ohio Sate University 1990 - 1995 ....................................................... Graduate Research Associate Graduate Teaching Associate Industrial & Systems Engineering The Ohio State University

Publications Chechile, R., Guerlain, S., & O'Hearn, B. (1989). A comparative study between mental workload and

a priori measures of display complexity. AAMRL/HEF Technical Report, Wright Patterson AFB, Dayton, OH.

Guerlain, S., Smith, P. J., Obradovich, J., Smith, J. W., Rudmann, S., and Strohm, P. (1995). The Antibody Identification Assistant (AIDA), an example of a cooperative computer support system. Proceedings of the 1995 IEEE International Conference On Systems, Man and Cybernetics.

Guerlain, S. (1995). Using the critiquing approach to cope with brittle expert systems, Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting - 1995, Santa Monica, CA.

iv

Guerlain, S. & Smith, P. J. (1995). Designing critiquing systems to cope with the brittleness problem. Position Paper for CHI '95 Research Symposium (May 6-7, 1995, Denver, Colorado U.S.A.).

Guerlain, S., Smith, P.J., Gross, S.M., Miller, T.E., Smith, J.W., Svirbely, J.W., Rudmann, S., and Strohm, P. (1994). Critiquing vs. partial automation: How the role of the computer affects human-computer cooperative problem solving, In M. Mouloua & R. Parasuraman (Eds.), Human Performance in Automated Systems: Current Research and Trends. (pp. 73-80). Hillsdale, NJ: Lawrence Erlbaum Associates.

Guerlain, S. (1993). Designing and Evaluating Computer Tools to Assist Blood Bankers in Identifying Antibodies. Master's Thesis, The Ohio State University.

Guerlain, S. (1993). Factors influencing the cooperative problem-solving of people and computers. In Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, 1 (pp. 387-391). Santa Monica, CA.

Guerlain, S. & Smith, P. J. (1993). The role of the computer in team problem-solving: Critiquing or partial automation? In Proceedings of the Human Factors Society 37th Annual Meeting, 2 (p. 1029). Santa Monica, CA.

Guerlain, S., Smith, P.J., Miller, T., Gross, S., Smith, J.W., & Rudmann, S. (1991). A testbed for teaching problem solving skills in an interactive learning environment. In Proceedings of the Human Factors Society 35th Annual Meeting, 2 (p. 1408). Santa Monica, CA.

Miller, T. E., Smith, P. J., Gross, S. M., Guerlain, S. A., Rudmann, S., Strohm, P., Smith, J. W., & Svirbely, J. (1993). The use of computers in teaching clinical laboratory science. Immunohematology, 9(1), (pp. 22-27).

Rudmann, S., Guerlain, S., Smith, P.J., Smith, J.W., Svirbely, J. & Strohm, P. (1992) Reducing the complexity of antibody identification tasks using case-specific, computerized displays, Transfusion, 32s, (p. 95s).

Smith, P. J., Guerlain, S., Smith, J. W., Denning, R., McCoy, C. E., and Layton, C. (1995). Theory and practice of human-machine interfaces, In Proceedings of ISPE '95 Intelligent Systems in Process Engineering, Snowmass, Colorado.

Smith, P. J., Miller, T., Gross, S., Guerlain, S., Smith, J., Svirbely, J., Rudmann, S., & Strohm, P. (1992). The transfusion medicine tutor: A case study in the design of an intelligent tutoring system. In Proceedings of the 1992 Annual Meeting of the IEEE Society of Systems, Man, and Cybernetics, (pp. 515-520).

Field Of Study

Major Field: Industrial & Systems Engineering Concentration: Cognitive Systems Engineering

v

Table Of Contents

Acknowledgments...................................................................................................................................ii

Vita ............................................................................................................................................................iii

List of Figures...........................................................................................................................................ix

List of Tables ............................................................................................................................................x

Chapter I. Introduction.....................................................................................................................................1

Document Overview.................................................................................................................4 II. From Traditional Decision Support to More Cooperative Problem-Solving Systems.............................................................................................................................................6

The Traditional "Consultation" Model of Decision Support ...............................................6 The "Automated Assistant" Model of Decision Support......................................................10 Supporting the Decision Making Process Rather than Replacing It...................................13 Human Error..............................................................................................................................13

Slips of Action .............................................................................................................15 Mistakes .......................................................................................................................15 Skill Development and Skill Maintenance...............................................................16

Requirements for a Cooperative Problem-Solving System .................................................16 The Critiquing Model of Decision Support ...........................................................................17

Previous Critiquing Studies.......................................................................................18 Potential Problems ......................................................................................................21 Potential Benefits.........................................................................................................23

The Next Step: Designing a Proof-Of-Concept Cooperative Critiquing System .............26 III. Antibody Identification as a Testbed ...........................................................................................27

The Practitioners, a.k.a., "Medical Technologists" or "Blood Bankers" ..............................27 The Goal of Antibody Identification: Finding Compatible Donor Blood .........................28 The Antibody Identification Procedure .................................................................................29 Types of Knowledge Needed ..................................................................................................31 Characteristics that make antibody identification difficult .................................................33

Medical Technologists (MTs) arrive in the blood bank with minimal practice .........................................................................................................................33 Most MTs rotate ..........................................................................................................33 Infrequent encounters with difficult cases ..............................................................34 Very little feedback on performance ........................................................................34

Expert strategies ........................................................................................................................35 1+. Forming early hypotheses.................................................................................35

vi

1a+. Hypothesizing the number of antibodies present .........................36 1b+. Hypothesizing the type of antibodies present ...............................37 1c+. Hypothesizing a specific antibody by finding a pattern

match ...................................................................................................37 2+. Ruling out............................................................................................................38

2a+. Ruling out using homozygous, non-reacting cells ........................40 2b+. Ruling out using additional cells .....................................................40 2c+. Ruling out masked antigens by inhibiting the positive

reactions ..............................................................................................41 2d+. Ruling out only those antibodies that will react on the

current panel.......................................................................................41 2e+. Ruling out the corresponding antibody if the antigen

typing is positive................................................................................41 3+. Collecting independent, converging evidence ...............................................42

3a+. Making sure the patient is capable of forming the hypothesized antibodies ...................................................................42

3b+. Using a test procedure that is known to change the reactivity of an antibody ...................................................................43

3c+. Asking, �Is this an unlikely combination of antibodies?� ............43 3d+. Making sure there are no unexplained positive reactions............43 3e+. Making sure there are no unexplained negative reactions...........44 3f+. Making sure that all remaining antibodies are ruled out. ............44 3g+. Using the "3+/3-" rule .......................................................................44

4+. Solving cases in an efficient manner................................................................44 4a+. Picking additional cells efficiently ...................................................45

Poor problem-solving strategies..............................................................................................45 2-. Ruling out incorrectly ........................................................................................46

2f-. Ruling out using reacting cells .........................................................46 2g-. Ruling out regardless of zygosity ....................................................46 2h-. Ruling out the corresponding antibody if the antigen

typing is negative...............................................................................46 4b- Not using all information provided by a test result ......................................47

Kinds of Cases ...........................................................................................................................47 One antibody reacting strongly.................................................................................47 Antibody to a high-incidence antigen......................................................................47 Weak antibody ............................................................................................................48 Multiple antibodies.....................................................................................................49

On separate cells with differing reaction patterns ...................................49 Reacting in the same pattern as each other ...............................................49 On overlapping cells ....................................................................................50 Masking..........................................................................................................50

Variable reactions........................................................................................................51 Antibodies showing dosage ........................................................................51

Recent transfusion.......................................................................................................52 Drug interactions ........................................................................................................52 Auto-immune disorders.............................................................................................53

vii

IV. The Design of the Antibody Identification Assistant (AIDA3).................................................54 The Task .....................................................................................................................................55 The Practitioners........................................................................................................................55 The Problem-Solving Tool........................................................................................................56

Design Principle 1: Use a Direct Manipulation Problem Representation as the Basis for Communication................................................................................56 Design Principle 2: Use Critiquing to Enhance Cooperative Problem-Solving..........................................................................................................................59 Design Principle 3: Represent the Computer's Knowledge to the Operator to Establish a Common Frame of Reference...........................................60

V. Experimental Procedure ................................................................................................................63

Subjects .......................................................................................................................................63 Experimental Design.................................................................................................................64 Cases used to test AIDA...........................................................................................................66

Pre-Test Case: Two antibodies looking like one ....................................................68 Post-Test Case 1: Two antibodies looking like one................................................68 Post-Test Case 2: One antibody reacting weakly, right answer can be ruled out (because system is not fully competent). ................................................69 Post-Test Case 3: Two antibodies, one masking the other....................................71 Post-Test Case 4: Three antibodies, on overlapping cells, reacting on all cells of the main panel................................................................................................72

Procedure ...................................................................................................................................73 Phase 1. Subject Demographic Data ......................................................................74 Phase 2. Introduction to the Interface....................................................................74 Phase 3. The Pretest Case ........................................................................................75 Phase 4. Training and Introduction to the Checklist and Critiquing ................75 Phase 5. Post-Test Cases ..........................................................................................78 Phase 6. Debriefing...................................................................................................78

Data collection ...........................................................................................................................79 Data Analysis.............................................................................................................................80

VI. Results and Discussion...................................................................................................................82

Unaided Subject Performance .................................................................................................82 Example Subject Interactions...................................................................................................84 Gross Performance Measures ..................................................................................................86

Statistical Comparison of Misdiagnosis Rates ........................................................86 Expert Subjects ..............................................................................................86 Less Skilled Subjects.....................................................................................87

Slips vs. Mistakes ........................................................................................................91 Questionnaire Results.................................................................................................94

Detailed Analyses......................................................................................................................99 Proactive Training vs. Reactive Feedback (Critiquing) .........................................100 The Timing of the Critiques.......................................................................................101 Subjects Overriding the Critiques.............................................................................103 Analysis of the Weak D Case ....................................................................................104

When to use critiquing vs. some other form of decision support.......................................107 VII. Conclusion .....................................................................................................................................110

viii

Appendices A. Sample Answer Sheet ...............................................................................................................123 B. Sample Statistical Calculations................................................................................................124 C. Definition of Classes of Errors Logged by the Computer....................................................134 D. Sample Behavioral Protocol Log .............................................................................................136 E. Sample Error Log ......................................................................................................................148 F. Number of Mistakes and Slips Made on Each Case by Each Subject in the

Control Group (n = 16) .............................................................................................................151 G. Number of Mistakes and Slips Made on Each Case by Each Subject in the

Treatment Group (n = 16) ........................................................................................................160

ix

List of Figures

Figure Page

1. The Antibody Screen .....................................................................................................................30

2. Anti-Fyb looks likely .....................................................................................................................39

3. Anti-E and anti-K can account for the reactions as well ...........................................................39

4. Sample Screen ................................................................................................................................58

5. Sample Checklist ............................................................................................................................61

6. Test Results Available ...................................................................................................................64

7. Experimental Design .....................................................................................................................66

8. Sample Error Message ...................................................................................................................76

9. Sample Summary Screen ...............................................................................................................79

10. Paths Taken to Solve the Weak D Case, Control Group ...........................................................106

11. Paths Taken to Solve the Weak D Case, Treatment Group ......................................................107

x

List of Tables

Table Page

1. Process Errors Made by Treatment and Control Group Subjects on the Pre-Test Case. ...................................................................................................................................................83

2. Pre-Test/Post-Test Comparison of Misdiagnosis Rates..............................................................87

3. Post-Test Case results... ...................................................................................................................89

4. Combining p-values Given by the Log-Linear Analysis of Misdiagnosis Rates on the Post-Test Cases, Taking Into Account Performance on the Pre-Test Case.... ...........................89

5. Correctness of Answers, Treatment Group..... .............................................................................90

6. Correctness of Answers, Control Group.... ...................................................................................91

7. Number of Subjects Committing Each Type of Error at Least Once Per Case.........................93

8. Combining p-values Across the Five Error Types... ....................................................................94

1

Chapter I

Introduction

In any domain that requires complex decision making and problem solving,

practitioners are likely to occasionally make errors in problem-solving. There are a number of

potential contributions to the occurrence of such errors, including:

1) Situational factors. The workload is either too high, causing stress and memory

overload or the workload is too low, causing vigilance problems,

2) Decision making strategies. Practitioners may be missing knowledge, using

inappropriate knowledge, or using simplifying strategies or heuristics which are not

adequate in all situations, and

3) The types of information available. Practitioners may not have the right kind of data

available to them for the current situation, or there may be too much data to effectively

integrate and draw appropriate conclusions.

For this reason, researchers have long been interested in designing computerized decision

support systems. In many situations, such a device can aid tremendously by offloading some of

the workload of the practitioner, by performing time-consuming or tedious tasks and by

remembering important details that may only be pertinent in rare cases.

One of the critical problems with advanced decision-support systems, however,

is their potential brittleness, or failure to perform competently in all situations. This brittleness

can arise because of a deliberate design decision to use an oversimplified model of the decision

task (due to cost, time or technological limitations), the inability of the designer to anticipate and

2

design for all of the scenarios that could arise during the use of the system, a failure of the

designer to correctly anticipate the behavior of the system in certain situations, a failure to

correctly implement the intended design, or because of hardware failures or bugs in the

underlying software environment.

The typical "safety valve" to deal with this problem is to keep a person "in the loop",

requiring that person to apply his or her expertise in making the final decision on what actions

to take. The current view held by the Food and Drug Administration (the agency which

regulates the use of automated medical devices) is that a system is safer if a human is required

to review the automated device's proposed actions (Brannigan, 1991; Gamerman, 1992). Indeed,

this is often the role delegated to the users of many expert systems. In describing one medical

expert system, for example, the authors state that: "The staff themselves would not be displaced

by this tool because their expertise would still be necessary to verify PUFF's output, to handle

unexpected complex cases, and to correct interpretations that they felt were inaccurate" (Aikins,

Kunz, and Shortliffe, 1983). Thus, the designers of this system acknowledge that the computer's

reasoning is not perfect, but they assume that the human will be able to detect and correct any

errors made by the system when it exhibits its brittleness.

Contrary to this assumption, empirical data indicates that people are often not capable

of judging the validity of an expert system's conclusions. People may ignore the advice of a

system, even when it is relevant (as is the case with many "rejected" decision support systems),

or heed the advice of a system, even when it is faulty. "Complacency" may occur when

monitoring for automation failures if the automation reliability is unchanging or if the operator

is responsible for more than one task (Parasuraman et al., 1993; Parasuraman et al., 1994). Less

"obvious" automation failures may cause practitioners to be unduly influenced by an expert

system's proposed intermediate inferences or final solutions when these inferences do not take

3

into account all of the relevant aspects of the data. This was demonstrated in a recent study in

the domain of flight planning (Layton, Smith and McCoy, 1994). In this study, if the scenario

was one where the computer's brittleness led to a poor recommendation and the computer

generated a suggestion early in the person's own problem evaluation, then the person's own

judgment was negatively influenced, resulting in a 30% increase of inappropriate plan selection

over users of a manual version of the system. A second study by Guerlain, Smith et al. (1994)

found similar results in a medical application, with misdiagnosis rates increasing almost 30%

when an automated support tool led users of the system "down the garden path" to a plausible,

yet incorrect answer.

This phenomenon of people not adequately judging an expert system's conclusions is a

symptom of a larger problem with many advanced decision support systems, namely that they

do not work cooperatively with practitioners within the field of practice for which they were

designed (Malin, Schreckenghost, Woods, Potter, Johannesen, Holloway, and Forbus, 1991).

Such systems may not be integrated with other tools, information, and representations currently

used by practitioners, and may introduce new workload because of the need to manage the

intelligent system. Furthermore, these systems may be difficult to understand and validate in

the context of a particular task situation, especially when other task parameters demand

cognitive resources. Finally, the role played by the human and computer agents involved can

have a large effect on performance. Human practitioners may be delegated to a supervisory

control role when using an expert system, but not have sufficient understanding of the

computer's reasoning process or access to enough relevant data to be able to correctly detect

faulty performance of the computer.

Thus, we have a potentially difficult tradeoff to deal with. One alternative "solution" is

to not introduce an aiding system at all in order to avoid the potential negative consequences of

4

its presence in certain situations, and to try to enhance the training of the practitioner

population in order to improve unaided performance. A second alternative is to accept the risks

associated with placing a potentially brittle support system in a high-consequence domain. A

third alternative is to have a system that provides some of the aforementioned benefits, while

minimizing the introduction of new forms of errors due to its brittleness and lack of cooperation

with domain practitioners.

A study by Guerlain (1993a) provided objective data indicating how the latter

alternative might be possible, even for systems that are brittle. This study showed that when a

problem-solving strategy is encoded into an expert system and its knowledge is applied

automatically by the computer, performance can degrade significantly if the task situation is

outside the computer's range of competence (the classic brittleness problem). Such a

degradation, however, did not occur when the computer used its knowledge to critique the user

who was performing the problem-solving task while the computer looked "over her shoulder."

Furthermore, many aspects of critiquing were identified as promoting cooperative problem-

solving performance between the human and the computer. These results provide initial

evidence that placing an expert system in a critiquing role may be a safer and more effective

form of decision support than automating all or part of the problem solving. The goal of the

research conducted for this dissertation, then, is to examine in much greater detail the critiquing

approach of decision support as a means to promote human-computer cooperative problem

solving.

Document Overview

Chapter II of this document gives an overview of different approaches to designing

decision support systems, by identifying many of the problems with the automation philosophy

underlying many such systems, and contrasting that with systems that focus more on a

5

cooperative interaction with the people who are utilizing it. In particular, critiquing systems are

identified as one type of decision aiding system that satisfies many of the requirements of a

cooperative decision support system.

The problem-solving task that was used as a testbed to study human-computer

cooperative problem-solving is antibody identification. Chapter III describes the antibody

identification procedure and the types of knowledge and strategies that experts use to solve

cases. These are contrasted with poor problem-solving strategies used by many medical

technologists.

Chapter IV describes the design and functionality of the critiquing system that was built

as a testbed for this research. A formal experiment was conducted to test the hypothesis that a

well-designed critiquing system can significantly reduce misdiagnosis rates compared to

unaided performance, even when the critiquing system is not fully competent. Chapter V

describes the experimental procedure used, including a description of each of the test cases used

and why they were chosen. An exploratory behavioral protocol analysis was also conducted as

part of this study to examine the influence of the system's design on the practitioners' problem-

solving behaviors. Chapter VI describes the results of the study, including an analysis of the

errors made and the overall system influence on practitioners' choices and uses of strategies.

Chapter VII gives closing comments, relating the results from this research to other domains

and gives a list of guiding principles for the design and evaluation of critiquing systems and

other cooperative problem-solving systems.

6

Chapter II

From Traditional Decision Support to More Cooperative Problem-

Solving Systems

A significant literature exists in the artificial intelligence community on the design and

evaluation of decision support "consultation" systems. To a large extent, these systems are

evaluated based on the computer's ability to solve problems as compared to expert practitioners

in that field of practice. This kind of evaluation, which focuses on the computer's reasoning

capabilities, hides the fact that this form of decision support is not practical or effective in actual

settings. For one, the interface is often poor. More fundamentally, the interaction is not set up

for effective communication between the human and the computer. In response to this, a more

recent wave of research has focused on the design of "cooperative" decision support systems.

The purpose of this chapter is to outline some of the problems with the automation model of

decision support and to provide principles for the design of cooperative decision support

systems that take into account human problem-solving and decision making. Critiquing is

proposed to be a form of decision support that satisfies a number of these principles. As such,

the few previous studies of critiquing systems are discussed and issues that remain to be

understood about critiquing are identified.

The Traditional "Consultation" Model of Decision Support

In examining the artificial intelligence literature, one finds that there have been many

attempts to build decision support systems to act in a "consulting" role. A typical interaction

7

with such a system in the medical domain, for example, is to have the computer program first

examine the available data about a patient, perhaps with some questions posed to the attending

physician about the patient's history, treatment, or the results of additional tests, and to then

develop a diagnosis and/or treatment plan for consideration by the practitioner. The idea

behind such a system is that, in practice, practitioners would "consult" the system as they might

consult other doctors for advice.

Such systems can vary along several dimensions, such as the intended type of support.

For example, in the medical domain, a system can be designed to aid with the diagnosis of

diseases (e.g., MYCIN, Shortliffe, 1976 and MENINGE, François, Robert, Astruc et al., 1993), or

with the management of a patient's treatment plan (e.g., ONCOCIN, Shortliffe et. al., 1981). The

systems can also vary according to the underlying computational model of the problem. Many

systems are knowledge-based, but some use mathematical models, such as Bayesian reasoning

(Sutton, 1989). Almost all of these systems, however, follow the same model of decision

support: The computer tries to solve the problem for the person and then gives its results (possibly

along with an explanation) to the person for review.

Some systems give one diagnosis, while others generate a list of possible diagnoses,

usually rank ordered according to the computer's model of how well the diagnosis accounts for

the data about the patient. Typically, evaluations of such systems focus on whether or not the

computer system is able to generate the "gold standard" (i.e., best answer) as either the top

answer or as at least a highly rated answer on a range of cases (e.g., Wellwood, Johannessen,

and Spiegelhalter, 1992; Hickam, Shortliffe, Bischoff, Scott, and Jacobs, 1985; Shamsolmaali,

Collinson, Gray, Carson, and Cramp, 1989; Bernelot Moens, 1992; François, Robert, Astruc et al.,

1993; Nelson, Blois, Tuttle et al., 1985; Plugge, Verhey, and Jolles, 1990; Sutton, 1989; Verdaguer,

Patak, Sancho et al., 1992; Berner, Webster, Shugerman et al., 1994). Usually, the results of such

8

evaluations show that the expert system performs better than novice practitioners and close to

or as well as the expert practitioners in terms of this gold standard evaluation. A few long-term

studies have been done comparing overall performance previous to the introduction of the

computer to overall performance during the period of time with the computer present to see if

the presence of the system has had any major affects on the treatment of patients (e.g.,

Wellwood, Johannessen, and Spiegelhalter, 1992), but even these evaluations focus on overall

performance before the introduction of the computer to overall performance with the computer.

Only just recently have researchers in the medical informatics area begun to realize that

focusing only on the computer's performance is a limited and unrealistic evaluation of a

decision support system, if the goal is to successfully incorporate the system into actual practice

(e.g., Forsythe and Buchanan, 1992; Miller and Maserie, 1990; Wyatt and Spiegelhalter, 1992).

Although evaluations of medical expert systems have rarely gone beyond the

computer's ability to identify the "gold standard", other issues have been identified as

potentially problematic with these systems as a decision aid. First, the human interface is

almost always cited as a problem (e.g., Berner, Brooks, Miller et al., 1989; Collinson, Gray,

Carson, and Cramp, 1989; Harris and Owens, 1986; Miller, 1984; Shortliffe, 1990). In particular,

many such systems require that the practitioner enter data into the computer so that it can have

the information necessary to perform its reasoning. It is neither the practitioner's job, nor place,

to spend time doing data entry. This is why one of the most cited requirements for a successful

medical informatics system is to already have the necessary data on-line (Linnarson, 1993;

Miller, 1984; Shortliffe, 1990). Second, these systems may have an incomplete knowledge base

or use simplifying assumptions that make them brittle, meaning that they can fail on cases that

the system was not designed to handle. This leaves the practitioner in the role of having to

detect and correct any problems generated by faulty computer reasoning (Aikins, Kunz and

9

Shortliffe, 1983; Andert, 1992; Bankowitz, McNeil, Challinor, Parker, Kapoor and Miller, 1989;

Bernard, 1989; Berner, Brooks, Miller et al., 1989; Gregory, 1986; Guerlain, Smith et al., 1994;

Harris and Owens, 1986; Miller, 1984; Roth, Bennett and Woods, 1988; Sassen, Buiël and

Hoegee, 1994).

A more in-depth evaluation of this "consultation" model of decision support reveals the

possible underlying causes for the poor user acceptance of these kinds of systems. For example,

Roth, Bennett and Woods (1988) conducted a study analyzing how users of an expert system

(designed to aid in the diagnosis of electro-mechanical equipment failures) interacted with and

used the system to diagnose faults. This study focused on the actual interaction with the

computer by particular users diagnosing particular faults, rather than on a global evaluation of

whether or not the expert system's knowledge base was accurate. This kind of evaluation

revealed reasons why the "consultation" mode of decision support is not cooperative at all.

With this consultation model, a human decision maker must give up control of the

problem-solving to the computer. Once the "black box" generates an answer, the human must

decide, often without adequate understanding of the computer's reasoning process, whether or

not to accept the computer's diagnosis or treatment plan. The system's support focuses on the

outcome of a decision, without providing the users of such systems adequate information about

the computer's problem-solving process. Furthermore, since the user is not involved in the

problem-solving, it may be necessary for the person to independently solve the problem in

order to adequately accept or reject the computer's proposed solutions. In other words, the

person must be an "expert" at the problem-solving to be able to detect and recover from any

faulty inferences or conclusions generated by the computer. However, by assigning the

computer the task of doing the routine problem-solving, users are giving up control to the

10

computer and losing skill in the meantime. The only other alternative is for users to always

solve the problems themselves, in which case use of the computer has little utility.

The traditional decision support model of having a computer independently solve a

problem and provide a final answer lessens the possibility of establishing a common ground for

communication. Users of such systems, who must "listen" to the computer as it tries to

retrospectively explain its reasoning (if such a facility is available), can become frustrated very

quickly, since the computer's explanation capabilities generally do not follow the

communication model employed by humans. Rather than engaging the user in the task,

supporting cooperative work, an expert system may actually cause breakdowns in

communication. Users must stop what they are doing and actively try to understand the actions

that the computer has done without their knowledge, and interpret messages that may not be

commensurate with their skill level or their formulation of the problem (Malin, Schreckenghost,

Woods, Potter, Johannesen, Holloway, and Forbus, 1991). Furthermore, the expert system's

consultation often comes at a time when the practitioner is busiest with other task demands

(Johannesen, Cook and Woods, 1995; Wiener, 1989).

The "Automated Assistant" Model of Decision Support

Many of the problems associated with the classic "consultant" expert system are present

in other forms of decision support that rely on an automation philosophy. For example, there

are many examples of computers serving as automated assistants, performing some subtask for

the person. Usually such systems are fairly well-integrated with the person's current task

environment. For example, the Traffic Collision Avoidance System (TCAS) is a system that is

installed in most commercial airplanes to monitor for surrounding air traffic and provide

warnings if air traffic is detected nearby. Furthermore, it instructs the pilot on how to avoid

11

traffic if certain safety envelopes are violated. Pilots are instructed to follow the advice of the

system.

Similar to the problems with expert system decision support systems, automated

assistants can be difficult for users to understand or use effectively. TCAS, for example, has

many modes of operation that can potentially confuse the user, since the system will act

differently depending on what mode it is in. Mode error has been cited as a common problem

with automation (Sarter and Woods, 1994) and can be the cause of some major accidents.

A second similarity is that automated assistants can also have brittle performance, such

as when there is noisy data or when a situation is encountered that the designer had not

anticipated. For example, when TCAS was introduced, the system did not know how to discern

"real" traffic from normal routine traffic and would generate false alarms. For example, TCAS

would sometimes instruct pilots to descend when they were just taking off, because the system

would detect traffic from the planes coming in for a landing and the planes on the ground.

Controlled studies have shown that a person using a brittle automated assistant may

not be able to cope well with such failure situations (Guerlain, 1993; Layton, Smith and McCoy

1994). First of all, there may be a biasing effect such that inappropriate inferences made by the

computer may seem reasonable to the person. Layton, Smith and McCoy (1994), for example,

evaluated alternative methods for providing decision support to pilots and dispatchers

rerouting an airplane due to bad weather on the original flight plan. It was found that a

significantly greater number of subjects using a partially automated system (that would

automatically generate alternative routes when problems were detected along the current route)

would select a faulty computer-generated plan over alternatives that they had explored, even

though in retrospect they concluded that they should never have accepted such a solution

12

because it was very risky. Nine out of ten members of a Control Group, who did not have the

computer generating any solutions, either rejected or did not generate this risky plan.

A similar phenomenon was found in an investigation of computer support systems

designed to aid with the identification of antibodies (Guerlain, 1993). A significantly greater

percentage of subjects who had the partially automated version of the system (that would

automatically rule out antibodies based on the available evidence) ruled out the correct answer

and misdiagnosed the case than subjects who did not have this automatic function available.

Furthermore, nine out of ten subjects who ruled out the correct answer with the automated

function did so without any further analysis of the case. Seven out of 10 subjects who did not

have the automated rule-out function available collected additional data before finishing the

case.

Different phenomena have been proposed to account for this biasing effect of a partially

automated system. This may be due to the person: 1) not being skilled enough to judge the

validity of the computer's conclusions, 2) not being actively involved in the task, 3) having an

inappropriate mental model of the system, 4) overreliance on the system, or 5) the triggering of

human cognitive biases (Fraser et al., 1992; Guerlain, 1993b; Layton, Smith and McCoy, 1994).

The study by Layton et al. (1994) has yielded some insight as to why performance can be better

when the person is actively engaged in the problem-solving him/her-self. It was found that

very local, data-driven factors can trigger a person's expertise at the appropriate time. The

verbal reports of the subjects, for example, showed that subjects would consider uncertainty in

the weather when generating a flight plan, but not when evaluating a flight plan that was

generated by the computer. By relegating the person to a higher level supervisory role, subjects

using the automated assistant were not encountering the triggering situations that allowed them

to apply their expertise as they would when doing the task themselves.

13

In conclusion, many problems have been identified with the automation or partial

automation model of decision support. Users of such systems may lose skill, may become

frustrated with the system because they cannot understand its reasoning process, may

misunderstand its intentions or reasoning, and may not be able to adequately cope with the

system's brittleness.

Supporting the Decision Making Process Rather than Replacing It

An automation philosophy is one that intends to reduce the consequences of human

error by replacing the fallible human. However, if the computer's reasoning is fallible, then this

philosophy breaks down. A different approach is to support the process by which humans make

decisions and solve problems, thus making it less likely that outcome errors will occur due to

faults in the person's reasoning or other errors made along the way. It is often the case that just

one faulty step in the reasoning process can lead a person astray. By focusing on supporting the

human's decision making process, it may be possible to correct the person at the site of the

problem, as s/he begins exploring a faulty path or making a judgment error that could lead to

an incorrect outcome. Such a decision aiding strategy relies on the ability to detect the kinds of

errors that people are likely to make.

Human Error

To a large extent, process errors can be predicted by studying the task domain and

understanding the strategies by which people solve problems in that domain. One major reason

for people's inability to be perfect problem solvers is the limits of their information processing

system, which only allows them to keep a few "chunks" of information in short term memory at

a time (Miller, 1956; Newell, 1972; Wickens, 1984). Thus, people must use strategies that reduce

the amount of information that must be considered at one time in order to achieve their goals.

14

One way to do that is to use heuristic reasoning methods that narrow down the search space.

People can use general, or "weak" methods, that are applicable to many problem-solving tasks,

such as a brute force technique (trying all possible solutions to see which ones fit) or means-

ends analysis (working towards a final goal by applying operators that will move the current

state of the problem-solving closer to the goal state). People can also use "strong" methods, that

take advantage of domain-specific knowledge of the task characteristics and problem

constraints. Such heuristic methods, whether "weak" or "strong", help people cope with their

limited information processing capabilities in order to achieve "good" overall performance.

However, such heuristic methods may sometimes fail, leading to poor outcome performance.

Kahneman and Tversky (1974), for example, have noted that the use of some common

heuristic judgmental methods (such as the representation heuristic) can lead to certain biases in

reasoning (such as insensitivity to sample size and the gambler's fallacy). This phenomenon of

heuristics leading to errors is not limited to the judgment of probabilities. All heuristic

reasoning strategies have the capability to fail in certain "garden-path" instances, in which the

assumptions behind the use of the strategy are violated.

For example, one documented problem-solving approach is for people to use an

elimination by aspect strategy (Tversky, 1972). This strategy is such that people will prune a

search space by selecting one aspect or characteristic of the problem and eliminating all those

solutions that do not meet a specified criteria on that dimension. This process is repeated on a

successive number of dimensions until a single solution is reached. For example, if one is

searching for a job, the location might be the first aspect considered. Thus, all jobs outside of the

location of interest would be eliminated from consideration. Salary might be the second

dimension, such that all jobs below a certain salary would be eliminated from the search space.

This process would continue until one solution had been reached. However, by reducing the

15

search space in this manner, a job that is absolutely spectacular on all other aspects and globally

preferred, but not ranked high on a previously considered aspect, would not be taken into

consideration.

Slips of Action

In addition to the errors induced by the use of fallible heuristic strategies, people make

errors because of slips, in which the person has the correct intention but fails to carry it out

correctly, either because the intention was forgotten (an error of omission) or the intention was

incorrectly carried out (error of commission) (Norman, 1981). Slips of action can account for

many of the errors made in interacting with the environment and can contribute to serious

outcome errors. For example, in the hospital blood bank, a year-long study of performance

showed that by far the most common type of error was a slip, e.g. a transcription error, and that

sometimes such a slip could have dire consequences (such as transfusing the wrong kind of

blood to a patient).

Mistakes

Mistakes are distinguished from slips in that they occur at the level of intention

formation rather than at the level of action selection. Thus, one may perform the correct action

sequence given the intention, but the intention is inappropriate for the given situation. If a

situation is not assessed appropriately, the prerequisites for an inappropriate rule may be met

and thus the rule is correct for the given situation assessment, but the assessment itself is wrong.

Alternatively, the situation may be assessed appropriately, but the wrong rule is instantiated

(Reason, 1990).

16

Mistakes can also occur because a person does not have the requisite knowledge to

adequately perform or understand the task. It is conceivable that in any complex problem-

solving task, all but true experts at the task will have some missing knowledge which will

hinder their problem-solving for at least some task situations.

Skill Development and Skill Maintenance

One major difference that has been identified between novices and experts is that

experts have a much better mental model of a situation, which allows them to interpret

appropriate cues to guide problem solving. Mental models or schemas are the representational

structures in memory which guide information storage and retrieval. As people learn a task,

they build up a mental model over time that, as it becomes more accurate, allows them to

become better and better at problem-solving. As people develop skill, they continue to test and

refine their knowledge. If such a process does not continue, then people will lose their expertise

over time. Therefore, skill development and skill maintenance are important factors that

contribute to the human's capability to perform well at a task.

Requirements for a Cooperative Problem-Solving System

Thus, when designing a decision support system, it is necessary to conduct an in-depth

cognitive task analysis of the task domain in order to understand the kinds of problem-solving

strategies that people are likely to employ and where those strategies might lead them astray.

This helps to define appropriate opportunities for a decision support system to re-direct

performance or warn the user if his/her solution could be improved by looking at the problem

in a different way. Furthermore, the system should be able to help users detect and recover

from slips, and help people to develop and maintain their skill. A decision support system

should also not hide information that would normally allow a person to detect and recover from

17

errors. Relegating the user to a supervisory control role, for example, may change a person's

ability to detect anomalous situations (Layton et al, 1994). A decision support system should in

fact encourage and teach the effective use of error detection strategies and supply the user with

the necessary cues and information to be able to do so.

People will also build up a mental model of the tool that they are using. A decision

support system should be designed to encourage a good mental model of the situation and of

how the tool is designed to aid in the analysis of that situation. Many researchers have

identified the importance of providing users with a good understanding of how a decision

support system works so that people can effectively judge the appropriateness of its analysis of

the situation (e.g., Giboin, 1988; Lehner and Zirk, 1987; Muir, 1987; Roth, Bennett and Woods

1988), similar to the way effective human-human teams work together (Serfaty and Entin, 1995).

This requires ample training and hands-on use of the system (Guerlain, 1993a; Muir, 1987). It is

also important for the design of the decision aid to be based on an effective understanding of

how the users of the system view the problem-solving, so that the computer's advice is relevant

(van der Lei, Westerman, and Boon, 1989)

Although much has been studied and written about individual aspects of human

problem solving and decision making, a critical challenge that now confronts us is how to

integrate what has been learned in order to develop decision aids that accommodate these

human characteristics. Simply put, a good cooperative problem-solving system should work

well with people. Not so simply put, it must try to overcome or supplement some of the

limitations of human information processing, reduce the consequences of human error, and

allow people to still apply their skills. Furthermore, it should encourage learning through

practice and feedback. It also needs to fit in with the person's current environment without

being obtrusive, difficult to learn, or overly complicated to use.

18

The Critiquing Model of Decision Support

The critiquing model is the third form of decision support that will be considered here.

Critiquing systems were originally explored as a decision aiding strategy by Perry Miller. A

critiquing system is a computer program that critiques human-generated solutions (Miller,

1986). In order to accomplish this task, the critiquing system must be able to solve parts of the

problem and then compare its own solution to that of the person. Then, if there is an important

difference, the system initiates a dialogue with the user to give its criticism and feedback. The

primary difference, then, between a critiquing system and an automated or partially automated

system is that the person always initiates actions and the critiquing system only uses its knowledge to

react to the user's understanding of the problem.

Previous Critiquing Studies

The first attempt at building a large-scale critiquing system for the medical community

was made by Miller (1986). He developed a prototype system, called ATTENDING, which was

designed to work in the anesthesiology domain. Based on this initial research, he also

experimented with critiquing systems for hypertension, ventilator management, and

pheochromocytoma workup. All of these prototypes operated in a similar manner. The user

was required to enter information about the patient's status and symptoms, as well as the

proposed diagnosis and treatment. The computer then critiqued the proposed solution,

generating a three paragraph output summarizing its critique.

Miller saw much potential to the critiquing approach and was able to provide

recommendations to other designers for developing good critiquing systems. First, Miller

concluded that choosing a sufficiently constrained domain was important. ATTENDING was a

system attempting to aid anesthesiologists in treating their patients, a task that takes years for

19

people to learn and practice. Attempting to build a useful expert system in this field turned out

to be too difficult due to the expanse of knowledge required. This lesson led him to switch to

the more constrained hypertension domain. Second, Miller concluded that critiquing systems

are most appropriate for tasks that are frequently performed, but require the practitioner to

remember lots of information about the treatment procedures, risks, benefits, side effects, and

costs, as these are conditions under which people are more likely to make errors if unaided, thus

making the critiquing system potentially valuable.

A second critiquing system was developed by Langlotz and Shortliffe (1983), who

adapted their diagnostic expert system, ONCOCIN (designed to assist with the treatment of

cancer patients) to be a critiquing system rather than an autonomous expert system because

they found that: "The most frequent complaint raised by physicians who used ONCOCIN is that

they became annoyed with changing or 'overriding' ONCOCIN's treatment suggestion". It was

found that since a doctor's treatment plan might only differ slightly from the system's treatment

plan (e.g., by a small difference in the prescribed dosage of a medicine), it might be better to let

the physician suggest his/her treatment plan first, and then let the system decide if the

difference is significant enough to mention to the doctor. In this manner, the system would be

less obtrusive to the doctor. Thus, Langlotz and Shortliffe changed ONCOCIN to act as a

critiquing system rather than a diagnostic expert system with the hopes of increasing user

acceptance.

A third critiquing system, called JANUS, was developed by Fischer, Lemke, and

Mastaglio (1990) to aid with the design of kitchens. It is an integrated system, in that the user is

already using the computer to design, and the system uses building codes, safety standards, and

functional preferences (such as having a sink next to a dishwasher) as triggering events to

critique a user's design.

20

To test the potential value of critiquing systems, Silverman (1992b) compared

performance on two versions of a critiquing system designed to help people avoid common

biases when interpreting word problems that included multiplicative probability. The first

system only used debiasers, meaning that it provided criticism only after it found that the user's

conclusion was incorrect. It had three levels of increasingly elaborate explanation if subjects

continued to get the wrong answer. Performance was significantly improved with the critiques

than without (69% correct answers for the Treatment Group after the third critique vs. 4%

correct for the Control Group), but was not nearly perfect. Subsequently, a second version of

the critiquing system was built that included the use of influencers, i.e., before-task explanations

of probability theory that would aid in answering the upcoming problems. With the addition of

these influencers, performance improved to 100% correct by the end of the third critique.

In examining these results and the performance of several other critiquing systems on

the market, Silverman (1992b) proposed that to be effective, a critiquing system should have a

library of functions that serve as error-identification triggers, and include the use of influencer,

debiaser, and director strategies. (A director demonstrates a strategy to the user). He sums up

his definition of a good critiquer by saying: "A good critic program doubts and traps its user

into revealing his or her errors. It then attempts to help the user make the necessary repairs."

The final study that will be discussed was conducted in our lab (Guerlain, Smith et al.,

1993). Knowledge about how to rule out antibodies was encoded into a computer, and

critiquing the user at the task (AIDA1) was compared to having the computer perform that

subtask (AIDA2). There was no statistical difference in outcome errors for cases for which the

computer's knowledge was competent (12% vs. 6% misdiagnosis rate). On a case for which the

system's knowledge was brittle, however, the critiquing system reduced misdiagnosis rates

from 72% down to 43% (p < 0.05).

21

Thus, the design of critiquing systems has been explored in a number of domains, but

we have only been able to find two studies using objective data to evaluate actual use of such

critiquing systems. Silverman's study compares alternative designs for a critiquing system,

finding improved performance with additional levels of critique when teaching students

probability theory. The study by Guerlain is the only source of objective data contrasting the

design of decision support systems based on the automation or partial automation models vs.

the critiquing model and looks at the processes by which actual practitioners using such a

system are aided with this kind of support. Guerlain's results suggest that cooperative problem-

solving is superior on brittle cases when using the critiquing model of decision support.

Potential Problems

Despite the potential value of critiquing systems, the act of designing a computer to

critique human performance is not sufficient for satisfying the requirements of a good

cooperative problem-solving system. For example, the ATTENDING systems developed by

Miller share many of the problems identified with the consulting expert system model of

decision support. These systems are designed such that the physician is required to enter the

patient symptoms as well as his/her proposed solution and then read the relatively long output

generated by the computer. Thus, the physician is required to act as the computer's secretary,

typing in all the information that is required (similar to many diagnosis and management expert

systems).

In order for a critiquing system to be successful, it should require very little extra effort

on the part of the human to interact with it. The computer must be able to directly infer the

user's conclusions, which can only be done if the person is already using the computer as an

integral part of task performance. The critiquing version of ONCOCIN was a step in the right

22

direction. Physicians were already using ONCOCIN to fill out patient data forms, so the expert

system used this information as its primary source of protocol data for the patient. JANUS was

also an integrated system, allowing users to design kitchens on the computer and get feedback

in the context of their work. The designers of AIDA1 used the strategy of putting the paper

forms normally used in the blood bank lab onto the computer so that the technologist's

problem-solving strategy could be directly inferred by his/her use of the system.

Second, users of critiquing systems still need to have an adequate understanding of the

computer's reasoning, so that the person can interpret messages appropriately. Silverman, for

example, found that an "influencer" that provided before-task explanations of the model of

problem-solving employed by the computer provided the users with a better understanding of

the problem domain and significantly improved their performance when using the critiquing

system.

Finally, a true test of a cooperative problem-solving system is that users of such a

system are able to detect faulty reasoning on the part of the computer. Silverman provides

some of what little data exists in terms of empirical assessments of critiquing systems, finding

significant improvement in performance with their use. The domain that he studied, however,

was an artificial task with untrained students as subjects. Perhaps most importantly, the task

was one where the system's knowledge was guaranteed to be correct. Thus, if the user

understood the advice being given by the computer and heeded it, s/he would always get the

case right. The study by Guerlain (1993) is the only known study that has tested the critiquing

model in a more complex, real-world domain and included an examination of performance on

cases for which the system's knowledge was brittle. That study provided some initial evidence

that having a computer act in a critiquing role can mitigate the brittleness problem. However,

overall misdiagnosis rates in that study were still poor (averaging 19% across all cases). Thus,

23

some of the users' process errors (related to ruling out antibodies) may have been reduced, but

the computer support was not enough to reduce outcome errors to an acceptable level.

Potential Benefits

Many of the problems with critiquing systems that were identified above have to do

with interface issues that can be resolved through better interface design and integration with

existing representations and data structures. Thus, although some critiquing systems (i.e., those

developed by Miller) and many traditional decision support systems have a common problem

(i.e., too much data entry required by the practitioner), these issues can be resolved for both

kinds of systems with a better interface (as was done by the designers of ONCOCIN, JANUS,

and AIDA1).

Furthermore, issues that can not be so easily resolved with systems that are based on an

automation philosophy (such as having control over the automation, having an appropriate

mental model of the system, and losing practice and expertise when using the system) can

potentially be resolved by designing the system using the critiquing approach. Critiquing

systems are potentially more cooperative and informative to practitioners than automated or

partially automated systems because they structure their analysis and feedback around the

problem-solving strategies and proposed solution generated by the user. Since there are often

many ways to solve a problem, the fact that the system uses the person's initial solution or

partial solution as a basis for communication reduces the amount of redundant information that

must be discussed. This contrasts with traditional diagnosis systems, where the computer

generates the entire solution and is unaware of the conclusions drawn by the practitioner. In

such situations, it is up to the person to process the computer's output, compare what it has

24

proposed to what s/he thinks s/he would have done, and then think about any differences that

were detected between the machine- and human-generated solution.

With the critiquing approach, the burden of making the initial comparison and deciding

what needs to be discussed further is placed on the computer (or, more accurately, on the

computer system designer). Furthermore, the feedback focuses on the particular aspects of the

solution that are in question. The feedback is therefore more likely to be pertinent to the user,

and in turn more understandable and hopefully more acceptable (Langlotz and Shortliffe, 1983;

Miller, 1986). In addition, partial or intermediate conclusions proposed by the user can be

critiqued immediately (instead of waiting until a complete answer is formulated by the person),

providing feedback in a more timely and potentially more effective context.

Furthermore, users of a critiquing system are doing the task themselves, and thus are

still able to apply their own skills and strategies. This is important for many reasons. First,

practitioners will not lose skill because of the introduction of the decision support system. In

fact, they may become more skilled because of the feedback provided by the decision aid

(Fisher, Lemke, and Mastaglio, 1990). Second, users of a critiquing system have the potential to

build up a better mental model of the decision support system's knowledge because they will be

reminded of the computer's view of the problem-solving each time they do something that the

computer thinks is wrong (Guerlain, 1993; Fisher, Lemke, and Mastaglio, 1990). Third, because

the system is only reactive to aspects of the task that it is knowledgeable about, the person can

still apply extra expertise and/or different strategies that may complement the computer's

knowledge. For example, because practitioners are doing the task themselves, they are more

likely to detect anomalous situations because they may encounter event- or data-driven

triggering factors that call to mind relevant expertise during their problem-solving. Thus, there

25

is the potential for better overall performance than either the computer working alone (as in the

automation mode) or the person working alone (as in the pre-automation mode).

Other potential benefits of critiquing systems are the following:

� Critiquing systems are flexible - they can work in conjunction with other decision support

techniques, such as good representations of the problem and can be seamlessly integrated

with information that is already online (Console, Conto, Molino, Ripa di Meana, and

Torasso, 1991; Guerlain, 1993; Fisher, Lemke, and Mastaglio, 1990; van der Lei, Musen, van

der Does, Man in 't Veld, and Bemmel, 1991).

� Critiquing systems can be designed to detect and correct common human weaknesses - such

as slips, mistakes, process errors, biased reviewing, hypothesis fixation, etc. (e.g., Silverman,

1992c).

� Critiquing systems are less likely to trigger human cognitive problem-solving biases.

� Critiquing systems can work in context of the task.

� Critiquing systems can not only be used as an on-line decision aiding system (Lepage,

Gardner, Laub and Golubjatnikov, 1992), but also to give experts practice on rare and

difficult cases, as a testing device to give feedback to supervisors and regulatory agencies

(van der Lei, Musen, van der Does et al., 1991) and to train new practitioners (Console,

Conto, Molino, Ripa di Meana, and Torasso, 1991; Fisher, Lemke, and Mastaglio, 1990;

Smith, Miller,, Fraser et al., 1990; Smith, Miller, Gross et al., 1991; Smith, Miller, Gross et al.,

1992).

Finally, there is evidence that designing a decision support system as a critiquing

system may be a strategy to mitigate the brittleness problem of expert systems. First, critiquing

systems that are acting on an incomplete knowledge base can still be helpful, whereas an

26

automated expert system cannot generate a solution if a problem is ill-specified (Fischer, Lemke

and Mastaglio, 1990). Second, evidence from the AIDA study showed that designing a system

as a critiquing system rather than a partially automated system reduced error rates on a case for

which the system's knowledge was incompetent by 29%. This suggests that humans are better

able to judge flaws in the computer's reasoning when interacting with a critiquing system than

when interacting with a traditional expert system that leaves the human out of the decision

process until the computer has completed its inferences.

The Next Step: Designing a Proof-Of-Concept Cooperative Critiquing

System

Critiquing is proposed to be a good model for studying the design of effective

cooperative problem solving computer systems. Although many aspects of critiquing systems

have been identified as potentially good ways to promote cooperative problem-solving, very

little research has been done to test the efficacy of these claims. The focus of the research

conducted here was to try to develop a proof-of-concept critiquing system that would

successfully aid practitioners on a wide range of difficult problems. The design strategy used

was to conduct an in-depth cognitive task analysis of the domain of interest (antibody

identification) and design a system that addressed the domain-specific problems identified, as

well as the general problems with many decision support systems that were identified in this

chapter. The next chapter (Chapter III) details the results of the cognitive task analysis of

antibody identification while Chapter IV discusses the design concepts used to develop an

integrated cooperative problem-solving system that revolves around the critiquing model of

decision support.

27

Chapter III

Antibody Identification as a Testbed

One domain that we have found to be highly suitable for studying the use of computer

aiding is that of antibody identification. This is a laboratory workup task, where medical

technologists must run a series of tests to detect antibodies in a patient's blood. Antibody

identification satisfies all of the requirements outlined by Silverman and Miller. It is a

sufficiently constrained domain which is frequently performed but difficult for people to do. It

requires analyzing a large amount of data and deciding which tests to run to yield the most

information. There is large variation in practice as to how to solve antibody identification cases,

and technologists have been documented to make errors in transcribing and interpreting the

data (Smith et. al., 1991; Strohm et. al., 1991). Furthermore, it has the classical characteristics of

an abduction task, including masking and problems with noisy data.

The Practitioners, a.k.a., "Medical Technologists" or "Blood Bankers"

Blood bank practitioners are trained by going to medical technology school. At a

minimum, students must complete a two-year program past high school to be certified as a

Medical Lab Technologist (MLT). During this time, an MLT learns not only blood banking, but

many other medical technology areas such as hematology and chemistry. A more advanced,

four-year bachelor's degree leads to certification as a Medical Technologist (MT), which is also a

general program involving many areas besides blood banking. After becoming an MT, one can

enroll in a Specialist in Blood Banking (SBB) program. The SBB program requires two years of

clinical experience and a baccalaureate for entry. The program is one to two years in length and

28

may lead to a master's degree in addition to certification as an SBB. For simplification, the term

"Medical Technologist", "MT", or "blood banker" will be used to describe practitioners who

work in the blood bank, but keep in mind that the discussion applies to practitioners at all levels

of certification.

The Goal of Antibody Identification: Finding Compatible Donor Blood

The blood banker�s goal is to make sure that a patient who needs a blood transfusion

does not have a transfusion reaction. One type of transfusion reaction takes place when the

patient�s immune system "recognizes" the donor blood as being foreign and attacks it. This can

happen because antigens, which are chemical structures on red blood cells, can elicit an immune

response in the form of antibodies. Antibodies can form against any foreign antigens that are

detected, i.e., those antigens that are present in the donated blood but not present in the

patient's blood.

Since there are over 400 known human blood antigens, the potential for incompatibility

between donor and recipient blood is quite high. When identifying compatible blood for a

patient, blood bankers do not try to find donor blood that exactly matches the antigenic

characteristics of the patient's blood because of the effort and cost involved. Rather, they try to

determine which antibodies the patient has at the time of the transfusion, and then give donor

blood lacking the antigens that those antibodies will recognize and attack.

Antibodies can form whenever foreign antigens are introduced into the human blood

stream (including past blood transfusions). Therefore, it is possible that the donor blood that

was compatible for transfusion one month will no longer be compatible the next month, because

in that time period, the patient may have formed antibodies against some of the antigens

present in the previously donated blood. For this reason, it is necessary to re-test patients for

antibodies each time they are to receive a transfusion.

29

The Antibody Identification Procedure

While the blood banker�s goal is to determine which antibodies are present in the

patient�s blood, the only direct information conveyed by running a blood sample test is whether

or not agglutination has occurred. Agglutination is a clumping of blood cells which indicates

that antibodies in the patient's serum have bound to antigens on the foreign red blood cells.

Agglutination in the test tube can be seen with the naked eye in most cases, but sometimes must

be confirmed via a microscopic examination.

Given that agglutination has occurred in a set of tests, blood bankers must then make a

series of inferences to determine what antibody-antigen reactions must have occurred to have

caused the agglutination. In going from raw data to a diagnostic conclusion, blood bankers

must call upon a large body of factual knowledge, apply strategies that have either been taught

or derived from past experience, and make hypotheses and predictions to help them through

the problem-solving process. The more advanced knowledge and strategies used by expert

blood bankers may take years for practitioners to learn. Indeed, these more advanced problem-

solving skills are never mastered by some practitioners.

The basic procedure for typing a patient's blood for antibodies involves combining test

cells (red blood cells), which contain known antigens, with the patient's serum, which may

contain antibodies. The test cells have been carefully typed for the presence or absence of

antigens by the commercial supplier of the cells. When the test cells and the patient�s serum are

combined, the blood banker looks for agglutination, which indicates that antibodies from the

patient�s serum have bound to some of the antigens contained in the test cells. The amount of

agglutination is rated on a scale from 0 (no clumping) to 4+ (very strong; one big clump).

Initially, this process is performed with two or three different test cells that cover all of

the major antibodies likely to be formed. This is called the antibody screen test (see Figure 1). If

30

there is a positive reaction with any of these screening cells, then the process is repeated with

many more test cells to allow the blood banker to determine which antibodies the patient must

have to be causing the reactions.

Figure 1. The Antibody Screen

There are several different reagents which, when added to the patient serum/test cell

combinations, will enhance or diminish the reactions of certain antibodies. For example, adding

enzymes to the test tubes will have the effect of enhancing the antibody reactions to the antigens

in the Rh system of antigens and eliminating the antibody reactions to the M, N, S, s, Fya, and

Fyb antigens.

The test tubes can also be heated or cooled. At certain temperatures, some antibodies

are more likely to agglutinate than others. Because antibodies react differently depending on

the process of testing, it is not enough to mix each patient serum/test cell combination in just

31

one way. At a minimum, three tests are generally performed: Immediate Spin (mixed and read

immediately), 37° (heated for 30 minutes at 37°C, then mixed and read), and AHG (also known

as the �Coombs� phase, where anti-human globulin reagent is added to the vials, then mixed,

washed, and read).

Types of Knowledge Needed

Knowing the antibody history of patients will help blood bankers begin blood typing

because once an antibody has formed once, the immune system remains sensitive to the

corresponding antigen and will quickly form antibodies against that antigen if it is seen again.

Although specific antibody formation histories are often unavailable, there are general

clues that give blood bankers a sense of the likelihood of antibodies being present. For example,

the more times a patient has had foreign red cells introduced into their blood stream (from past

transfusions or past pregnancies), the more likely it is that the patient has formed antibodies.

Knowing a patient�s ethnicity can also help blood bankers to make blood typing

inferences. For example, people of Caucasian background rarely display the V antigen. This

antigen is much more common among people of African descent.

Even without any information about a patient, blood bankers can draw on knowledge

about the formation of antibodies in general to aid them in their diagnosis. Some antigens are

more common and elicit a greater degree of antibody formation than others. For example, it is

likely that anti-D (the antibody formed against the D antigen) will form before anti-C in a

patient who has been given previous transfusions containing the C and D antigens, assuming

that the patient lacks both those antigens.

Blood bankers can also use knowledge about the distribution of antigens in the

population to help determine the likelihood of various antibodies being present. Two

conditions must normally be met before an antibody will be formed: 1) the patient�s blood must

32

lack the antigen and 2) the patient must have been exposed to that antigen. Antibodies against

high incidence antigens are therefore rare because almost all patients contain these antigens and

would not normally form antibodies against them. Antibodies against low incidence antigens

are also rare even though almost all patients lack these antigens, because exposure to them is

unlikely.

Blood bankers sometimes experience difficulty when interpreting a set of agglutination

reactions. This difficulty arises because there is not a definite one-to-one mapping between a

positive reaction and a particular antibody. On the contrary, a positive reaction means only that

one or more antibodies have reacted with one or more of the many antigens present on a given

test cell. To determine which antibodies are present in a patient�s blood requires a process of

elimination similar to the exercises found on a logic exam:

If the C antigen is present on all of the positively reacting cells and not present on all of

the negatively reacting cells and the patient lacks the C antigen, then anti-C could be

one of the antibodies present.

The use of such rules is not necessarily straightforward. Belief in hypotheses must be tempered

by the fact that there could be noisy data, weak expressions of antibodies, or multiple antibodies

together, overlapping and potentially masking the presence of others.

Characteristics that make antibody identification difficult

Antibody identification is only one of the many tasks performed by medical

technologists when they work in the blood bank. There are many factors about this task that

make it difficult to perform well, some of which are listed below.

33

Medical Technologists (MTs) arrive in the blood bank with minimal practice

Many Medical Technologists arrive in the blood bank lab out of school with very little

practice in solving antibody identification cases. Consequently, they have not yet adequately

learned how to apply the facts taught in school, nor have they formed the effective strategies

necessary to solve the wide range of antibody identification cases that they might encounter.

Quite often, it is the responsibility of the blood bank supervisor to oversee and train newly

arrived graduates on the procedures for solving a case. This means that the extent to which the

trainee learns the task is dependent on the local instruction received within the lab. Such local

instruction is neither required nor formalized and may or may not include antibody

identification problems. It is not necessary for blood bank supervisors to train their techs on this

task. Therefore, the skill and problem-solving strategies used by practitioners can vary to a

great extent.

Most MTs rotate

Since Medical Technologists are trained in many areas, such as Chemistry, Hematology,

and Blood Bank, they may rotate, working in one area for some time period before moving on to

the next area. Thus, they may work in Blood Bank just a few months out of the year. Because

blood banking requires so much knowledge and skill, it is difficult for a rotating practitioner to

develop and maintain expertise in that area.

Infrequent encounters with difficult cases

Most of the cases a blood banker encounters require only an initial test called an

antibody screen test. If the results of this initial test are negative, then there is no need to run

the further tests required for a full workup, since no antibodies are likely to be present.

34

Therefore, depending on what shift the blood banker works and the size of the hospital, it may

be seldom that the blood banker gets a case with positive results from the initial screening test.

With so few encounters of cases requiring a full workup, there is less of a chance for blood

bankers to develop a sense of the probabilities of various antibody combinations and to build up

the pattern recognition and problem-solving skills that can aid in this kind of diagnosis.

Very little feedback on performance

Unless blood bankers ask for assistance on a case, they work alone to determine the

answer. In most labs, no one checks their procedure or their reasoning for coming up with an

answer. Furthermore, once a diagnosis is made, blood bankers never really know whether they

were "right" or not. Based on the antibodies that they identify, blood lacking the corresponding

antigens is dispensed to the patient. If the diagnosis was wrong, there is a chance that the blood

that is dispensed would still be compatible because it may also happen to lack the antigens

against which the patient actually has antibodies. So, an incorrect diagnosis goes undiscovered

and the blood banker continues to think that s/he is performing adequately.

Even if the patient does get incompatible blood, the ensuing transfusion reaction may

not be evident as such to the administering doctor. The patient is already sick, possibly

receiving other medication and treatments, so if s/he gets sicker one day, it may not be

recognized as having been caused by the blood transfusion just received. Rather, it could be

attributed to some other procedure or precondition. Again, the blood banker may not get any

feedback and assume that "no news is good news", making it difficult for the blood banker to

accurately gauge his/her performance .

35

Expert strategies

Although there is complexity in identifying antibodies, expert blood bankers perform in

this domain quite well. The expert blood banker tries to sort out which antibodies are causing

the reactions by recognizing reaction patterns and making early hypotheses upon which to base

further analyses. In order to minimize the chance for an incomplete or incorrect diagnosis, the

expert blood banker tries to collect independent, converging evidence to "rule-in" the

hypothesized antibodies and to rule out all other possible contenders. Thus, there is a high-

level skill involved in knowing how to combine various problem-solving strategies such that the

overall protocol is likely to succeed. Following is a list of "middle-level" expert strategies that

have been identified which can be combined together to form a good protocol. In general, each

of these strategies are good, so they are numbered with a '+' sign after them, but it must be

remembered that if applied in isolation, without applying other strategies to collect converging

evidence (to guard against the fallibility of the heuristics), then poor performance can still occur.

Later, known poor strategies will be identified with a '-' sign after them, to indicate that they are

usually poor strategies to apply.

1+. Forming early hypotheses

Each time blood bankers run tests on a patient�s blood, time and money are spent.

Because there are many different tests possible, blood bankers need to know under which

conditions the various tests are diagnostic. Consequently, if they can form a hypothesis early in

the blood typing process, they can make predictions of how various tests will affect future

reactions and pick tests that are most informative for that case. For example, if the blood banker

hypothesizes that a patient has the M antibody, then s/he can combine that with the knowledge

that anti-M is most likely to agglutinate at cold temperatures and run the cells using cold

36

temperatures. This is diagnostic in this case because, if the reactions are enhanced, then that is

evidence that at least one cold-reactive antibody is part of the patient�s blood profile. If, on the

other hand, no reactions occur, then all antigens on the test cells that are cold-reactive can be

ruled out.

1a+. Hypothesizing the number of antibodies present

Experts tend to make a very early hypothesis about the number of antibodies present by

looking at the reaction patterns across test results and hypothesizing that one antibody is

accounting for each different reaction pattern. For example, if four cells are reacting '0 0 3+'

across the three phases of testing (IS, 37°, and AHG) and three cells are reacting '2+ 2+ 1+', then

the expert will hypothesize that two different antibodies are causing the two different reaction

patterns. If all the reacting cells have the same reaction pattern, '0 0 2+' for instance, the expert

will hypothesize that there is just one antibody present. If the reaction patterns are only slightly

different, some '0 0 2+' and some '0 0 3+', for instance, then the expert might hypothesize that

there is either one antibody reacting variably or two different antibodies causing the reactions.

It is possible that two or more antibodies could be reacting with the same reaction

pattern. Therefore, the heuristic that a set of reaction patterns is caused by one antibody is a

simplifying assumption that helps expert blood bankers to start forming hypotheses. Their

hypotheses might be revised later if they encounter evidence in the analysis that suggests the

need to do so.

1b+. Hypothesizing the type of antibodies present

A second way to form early hypotheses is to look at the reaction patterns and

hypothesize the type of antibodies that are present. Each antibody tends to react in a certain

37

pattern, depending on what genetic system the antibody belongs and other factors. Blood

bankers can use this information to help them identify the subset of antibodies that are likely to

react with the given reaction pattern. For example, reactions that are negative in the Immediate

Spin and 37° phases but strong in the AHG phase are most likely exhibited by antibodies

belonging to the Kell, Duffy, Kidd and Rh systems. Therefore, they will hypothesize this and

only look in those three systems for the specific antibody that matches.

1c+. Hypothesizing a specific antibody by finding a pattern match

Once the blood banker has narrowed down the possible set of antibodies and perhaps

run some further tests, then s/he will look for the specific antibodies causing the reactions. One

technique for doing so is to find an antigen that is present on all of the test cells that are reacting

with the same pattern and not present on all of the cells that are not reacting with that pattern.

If, for instance, the blood banker has hypothesized that the '0 0 3+' reactions on cells 1, 6, and 9

are all caused by the same Rh antibody, then s/he will look at the test cells for the presence of

an Rh antigen on just cells 1, 6, and 9. If one can not be found, then s/he may look in other

systems that may react with that pattern, hypothesize that two antibodies are reacting together

to form the three reactions, or hypothesize that one antibody is reacting variably, accounting for

these reactions plus other, different ones. This is one strategy in particular that, if applied alone

without alternative strategies for converging evidence, can lead to a premature conclusion that

may be wrong.

2+. Ruling out

Even though an antibody may seem very probable, it is important to rule out all other

frequent, clinically significant antibodies to be sure that other antibodies are not being masked

38

by the reactions of the first one. This converging evidence is protection against slips (Norman,

1981, Norman, 1989) and the fallibility of the heuristics used (Smith et al., 1991). If a set of

antibodies cannot be ruled out, it may become evident that the group of them together or some

subset therein could account for the reactions as well as those originally recognized as being

possible.

For example, when first looking at the antigram panel for the case shown in Figure 2, it

looks as if anti-Fyb is a very likely candidate because the Fyb antigen is present on all cells

where there is a positive reaction (strategy 1c+). However, after ruling out on this panel, four

other antibodies still remain as possibilities (see Figure 3). In looking at the remaining set, two

subsets could account for the positive reactions. Anti-E and anti-K together could account for

the reactions because one or the other is present on all reacting cells. Or, anti-E, anti-K, and

anti-Fyb could all be reacting together. At this point, it is necessary to run further tests that will

discriminate between these three sets of answers. It turns out that, for this case, anti-E and anti-

K together form the answer, not anti-Fyb as originally hypothesized. This case clearly

demonstrates the importance of ruling out other contenders even though one answer may at

first seem very likely.

39

Figure 2. Anti-Fyb looks likely

Figure 3. Anti-E and anti-K can also account for the reactions, however.

40

2a+. Ruling out using homozygous, non-reacting cells

The heuristic that most experts will use for ruling out is to look at test cells that have no

reactions ('0 0 0') and to rule out antigens that are present and homozygous on those test cells.

A homozygous antigen is one that is present on the cell without its corresponding genetic allele.

For example, Fya is normally homozygous (double dose) on a cell if its genetically paired

antigen, Fyb, is not present on the cell. An antigen is said to be heterozygous (single dose) if

both antigens in the pair are present on the test cell. An antibody tends to react more strongly

with an antigen that is homozygous than with one that is heterozygous. Therefore, if the test

cell is not reacting, it is safest to rule out using homozygous antigens, since they have a double

dose of antigen and would be most likely to cause a reaction if the antibody really was present.

2b+. Ruling out using additional cells

After running an antigram panel that has anywhere from ten to twenty test cells, there

may be some antibodies that still can not be ruled out. One strategy for ruling these out is to

selectively pick additional cells from other panels. Efficiently picking additional cells requires

having formed a hypothesis about which antibodies are present. That way, test cells can be

chosen that are negative for those antigens considered to be present and homozygously positive

for those antibodies to be ruled out. For example, if anti-C is hypothesized as being present and

anti-Fyb still needs to be ruled out, then the practitioner will look on other panels for a test cell

that is negative for the C antigen and homozygously positive for the Fyb antigen. If the

practitioner's hypothesis is correct, then there will be no reaction and anti-Fyb can be ruled out

according to the strategy of ruling out on homozygous, non-reacting cells.

41

2c+. Ruling out masked antigens by inhibiting the positive reactions

If some antibodies can not be ruled out because there are not enough non-reacting cells,

then, depending on the antibody that is causing the reactions, the blood banker can use a

procedure that inhibits those reactions (i.e., if Fya is hypothesized as being present, running the

cells at enzymes will inhibit the Fya reactions). Therefore, if the reactions are negative, then all

those antibodies that would react at that test phase with that test cell can be ruled out.

2d+. Ruling out only those antibodies that will react on the current panel

The heuristic of ruling out on non-reacting cells will only work for antigens that are

going to react given the current testing procedure. If the current procedure inhibits some

antibodies from reacting, (i.e., running the cells at enzymes will inhibit the Fya reactions, as

explained above), then only those antigens that would not be destroyed can be safely ruled out

with that panel. The practitioner, therefore, must know how the various test procedures will

affect all of the antibodies and know when to refrain from using the normal rule-out heuristic.

2e+. Ruling out the corresponding antibody if the antigen typing is positive

Running antigram panels will show the presence or absence of antibodies in the

patient�s blood. A completely different kind of test can be run to determine which antigens the

patient possesses. This test can be used to help rule out antibodies because if the patient

possesses an antigen in his/her own blood, s/he will not form the corresponding antibody,

(barring some auto-immune disorder). Therefore, the blood banker can type the patient's red

blood cells for an antigen. If the results are positive, the corresponding antibody can be ruled

out.

42

3+. Collecting independent, converging evidence

It is not a good idea to make a diagnosis based on just one type of test, or by using one

problem-solving heuristic. For example, the strategy of hypothesizing a specific antibody by

finding a pattern match (1c+) can actually lead to an erroneous diagnosis if the hypothesized

antibody is masking the presence of other antibodies. A general way to minimize the chance of

a misdiagnosis is to collect converging evidence. In other words, it is wise to use a set a

strategies and test results that will independently point to the same answer as being conclusive.

This section lists a number of meta-level strategies used by expert practitioners to help them

catch their own errors and increase the likelihood of correctly solving a case.

3a+. Making sure the patient is capable of forming the hypothesized antibodies

As a reminder to the reader, antigen typing is a different kind of test than combining

test cells with the patient's serum. Antigen typing is used to test the patient's red blood cells for

antigens. Positive results can be used to rule out an antibody as described above. This test can

also be used as a final check on the answer. If a patient is said to have an antibody, typing the

patient's cells for the corresponding antigen should have a negative result. Thus, if the results

are negative, that is converging evidence that the corresponding antibody could be in the

patient's blood. If, on the other hand, the results are positive, the antibody should be ruled out

and another answer must be found for the case. For example, in the case described above,

where anti-Fyb looks very likely based on the antigram panel alone, it turns out that an antigen

typing test shows the patient to possess the Fyb antigen and to lack both the E and the K

antigens. This information rules out anti-Fyb, which looked very likely at first, and provides

more evidence for the possibility of anti-E and anti-K.

43

3b+. Using a test procedure that is known to change the reactivity of an antibody

Another way to get converging evidence for the presence of an antibody is to run the

test cells at a different phase (i.e., using different reagents, changing the temperature of the cells,

using a longer incubation time, etc.) and see if the results change as would be predicted for that

antibody (either enhanced or inhibited). If such a change takes place, that is more evidence for

the presence of that antibody. For example, since anti-Fyb is destroyed by the use of enzymes,

then negative reactions with enzymes provides more evidence that anti-Fyb could be a

contender. (In fact, in the case described above, the reactions were NOT eliminated when

enzymes were added, providing evidence that anti-Fyb was not likely to be the only antibody

causing the reactions).

3c+. Asking, “Is this an unlikely combination of antibodies?”

Expert blood bankers will check their answer for plausibility given the normal

formation patterns of antibodies. Due to the way antibodies are formed, some antibody

combinations are extremely unlikely. For instance, anti-D will almost always form before anti-C

in a patient that lacks both the D and C antigens. Therefore, if such is the case, and the patient is

diagnosed as having anti-C alone, then that should stand out as "a unicorn", i.e., an extremely

unlikely event. It does not matter how well the normal antibody identification procedure points

to anti-C alone, such a rare finding should prompt the blood banker to rethink the case and

examine it more closely.

3d+. Making sure there are no unexplained positive reactions

Expert blood bankers will review their cases to be sure that all the data is accounted for

by their interpretations. They make sure that all positive reactions are accounted for by the

antibodies chosen to be the answer (i.e., that there are no unexplained positive reactions).

44

3e+. Making sure there are no unexplained negative reactions

Similarly, experts will check to be sure that each non-reacting test cell does not have any

of the hypothesized antibodies present.

3f+. Making sure that all remaining antibodies are ruled out.

Good practitioners will rule out all remaining, clinically significant antibodies, to make

sure that there are no underlying antibodies.

3g+. Using the "3+/3-" rule

A final way to have converging evidence for the presence of each hypothesized

antibody is to make sure that there are at least three test cells which are reacting to just one of

the hypothesized antibodies and three test cells that are negative for all of the antigens that

would cause reactions. In other words, if anti-c and anti-Jka are the hypothesized antibodies,

then the practitioner tries to have at least three cells that are positive for the c antigen and

negative for the Jka antigen, three cells that are negative for the c antigen and positive for the

Jka antigen, and three cells that are negative for both antigens. Fulfilling this condition often

requires finding additional cells on other panels that have the right characteristics. This

heuristic seems to be especially useful in avoiding the presence of extraneous antibodies in an

answer set.

4+. Solving cases in an efficient manner

Solving antibody identification cases is time consuming and expensive. For each

additional test that the blood banker runs, more time and money are spent. Therefore, blood

45

bankers need to solve cases as efficiently as possible. In order to do so, they need to form early

hypotheses and know which tests will most efficiently distinguish between the various

hypotheses. One way to do this is to pick additional cells effectively.

4a+. Picking additional cells efficiently

Selecting additional cells can aid in ruling out and confirming hypotheses. The best

way to do so is to find additional cells that will yield the most information. For example, if anti-

E is hypothesized as being present and anti-C, anti-Lea, and anti-M still need to be ruled out,

the blood banker can either find three cells, each to rule out one of the antibodies (i.e., one cell

that is negative for the E, M, and Lea antigens and positive for the C antigen, another cell that is

negative for the E, C, and M antigens and positive for the Lea antigen, and a third cell that is

negative for the E, C, and Lea antigens and positive for the M antigen). Or, the blood banker

can try to find just one cell that is negative for the E antigen and positive for the other three

antigens. This will allow all three antibodies to be ruled out at once. Obviously, if such a cell

can be found, it is much more efficient to run that one cell rather than three separate cells.

Poor problem-solving strategies

Blood bankers vary to the extent that they understand and use all the knowledge that

they need to solve a case. Poor performance can result from failing to use a good strategy or

from using incorrect strategies (or from making slips). Since the good strategies are already

outlined above, failure to use those strategies will be identified by the number followed by a '-'

sign. The following section lists known incorrect strategies that have been observed in use.

46

2-. Ruling out incorrectly

2f-. Ruling out using reacting cells

Some blood bankers will rule out using antigens that are not present on reacting cells.

This strategy will only work if there is just one antibody present. As soon as there is more than

one antibody, such a strategy might cause the correct answer to be ruled out. A cell that is

reacting only indicates that one or more of the antigens on the cells is causing reactions. It does

not indicate which ones can be ruled out.

2g-. Ruling out regardless of zygosity

Some antibodies will react more strongly with a homozygous antigen than a

heterozygous antigen. For these antibodies, if the reactions are not very strong, then the

difference might be enough for the homozygous antigen to react but for the heterozygous

antigen to not react. For this reason, it is not a good idea to rule out these antibodies using a

heterozygous cell. Many blood bankers, however, do not take zygosity into account at all when

they rule out. For those that do take zygosity into account, they may not remember which

antibodies are affected by zygosity and which ones are not.

2h-. Ruling out the corresponding antibody if the antigen typing is negative

Some blood bankers misunderstand the results from the antigen typing test, and will

rule out the presence of an antibody if the results from the corresponding antigen typing is

negative. This is the opposite of what should be done, which is to rule out the corresponding

antibody if the antigen typing results are positive.

47

4b- Not using all information provided by a test result

Often, practitioners will not make all of the inferences possible given a test result. As an

example of this problem, subjects may not use the information from a positive antigen typing on

the patient's cells to rule out the corresponding antibodies (2e-). As another example, when a

subject runs an additional cell so that s/he can rule out a particular antibody, it may be that she

fails to notice that other antibodies can be ruled out on that cell as well.

Kinds of Cases

Since most of the strategies used in solving antibody identification cases are heuristic in

nature, there will be instances where the strategies may be less useful or even detrimental to

finding the correct answer. A list of the kinds of cases encountered and the effectiveness of

various strategies follows.

One antibody reacting strongly

The simplest case a blood banker will encounter is a case where the patient has just one

antibody reacting strongly in its expected phases. Since the antibody is fairly common, the

blood banker has seen similar cases in the past. Since there is only one antibody, it is easy pick

out because all reacting cells contain the antigen and all non-reacting cells do not contain the

antigen (strategy 1c+).

Antibody to a high-incidence antigen

There are instances where a one-antibody case is more difficult to solve. If the antibody

is one that reacts against a high-incidence antigen, i.e., an antigen that almost all the test cells

contain, then the antibody will react with all of those cells. So, even though the antibody is

easily diagnosed (strategy 1c+), it is difficult to rule-out all other contenders because there are

48

no non-reacting cells. A good strategy in this case is to run the cells using a phase of testing that

is known to inhibit the reactions of the hypothesized antibody, allowing the practitioner to get

data for those antibodies that will not be destroyed by the same procedure (strategy 2c+).

Antigen typing can also help to rule out antibodies (strategy 2e+), although this is a more

expensive type of test.

Weak antibody

Another difficult one-antibody case occurs if an antibody is not reacting strongly. This

can happen for a number of reasons.

1) The patient may have been transfused long ago and may have formed an antibody that is

no longer present. However, the patient is still highly sensitive to that antigen and will

form an antibody quickly if exposed to such an antigen again.

2) The patient has been transfused recently and is currently forming antibodies.

3) Exposure to certain drugs may cause weak reactions when testing for antibodies.

4) Pregnant women who are Rh negative will often be given Rho Gam, a drug that causes

weak reactions with the D antigen.

In these cases, there may be very few positive reactions because the antibody is reacting to only

some of the test cells that contain the antigen. The result is that there is no good match between

the reactions exhibited and any one contender, so looking for a pattern match will not be an

effective strategy (1c+). Furthermore, the strategy of ruling out on homozygous, non-reacting

cells (2a+) may cause the correct answer to be ruled out. Ruling out using reacting cells (2f-),

which is normally considered to be a poor strategy, is actually one of the few effective rule-out

strategies in a weak antibody case, because it will not rule out a weak antibody, so long as it is

49

the only antibody present. If more than one antibody is present, then this strategy is likely to

rule them both out (which is why it is normally not a good strategy to apply).

The most effective strategy in this case is to run the cells in a test phase that will

enhance the reactions (3b+). Of course, one needs to be able to hypothesize which type of

antibody is reacting (1b+) to know which type of test is likely to enhance the reactions.

Knowing which antibodies are likely to react strongest in which phases will help to narrow

down the answer.

Multiple antibodies

As soon as there are two or more antibodies in a case, the task of identifying them

becomes more difficult.

On separate cells with differing reaction patterns

The simplest multi-antibody case is when the antibodies react in very different patterns

and the cells reacting to the two or more antibodies do not overlap. In other words, any cell that

contains an antigen that is causing a reaction does not contain any of the other antigens that are

causing reactions. Since the antibodies are reacting in different patterns, it is possible to

decompose the problem into several, simpler, single antibody cases (1b+).

Reacting in the same pattern as each other

If the multiple antibodies react at the same phase and temperature, such as two Rh

antibodies, then both antibodies may be reacting exactly the same. Therefore, to the

practitioner, it may appear that there is only one antibody present (1a+). Here is where the

strategy of grouping all like reactions and assuming that only one antibody is causing them

50

(1c+) will fail. If the practitioner can not find one antibody that will account for the reactions,

s/he needs to start looking for groups of antibodies that together could be causing the reactions.

On overlapping cells

Things get more complicated as soon as some of the test cells possess two or more

antigens that are causing reactions. The antibodies reacting together usually do not react in an

additive fashion, so that an antibody that reacts '1+ 1+ 0' by itself combined with an antibody

that reacts '0 1+ 2+' by itself may produce a '1+ 1+ 2+' reaction when the antibodies occur

together. Thus, there may be three different patterns of reactions accounted for by two

antibodies � one pattern of reaction occurs when only one antibody is reacting with some of the

test cells, a second kind of reaction occurs when the other antibody is reacting alone with

different test cells, and a third kind of reaction occurs when both antibodies are reacting

together on the test cells that contain both antigens.

Masking

Masking is a special case of the multiple antibody scenario in which one or more

antibodies completely cover up the presence of another antibody. This can happen when all of

the test cells that contain the antigen corresponding to the masked antibody also have antigens

that are reacting with other antibodies at least as strongly in every phase that the masked

antibody reacts. Thus, since there are no cells for which the covered antigen is present without

other reacting antigens also being present, and since there are no noticeable differences in the

reactions, it appears to the practitioner that only one antibody is present (1a+). Here is a case

where making sure that all remaining antibodies can be ruled out (3e+) is necessary to correctly

solve the case. In trying to rule-out the masked antibody, the practitioner can run the cells at a

51

phase which destroys the dominant antibody and not the underlying one (2c+). Alternatively,

one can try to find a test cell from another panel that is positive for the antibodies still not ruled

out and negative for the antibody that has been confirmed (2b+, 4a+). Finally, one can type the

patient for those antigens. If the results are positive, those antibodies can be ruled out (2e+).

Variable reactions

Some antibodies are likely to have more variable reactions than others. For example,

the P1 antibody can react weakly with some cells and more strongly with others. A panel that

has 2+, 3+, and 4+ reactions might suggest to the blood banker that there are multiple antibodies

present since the reactions are so different (1a+). It is possible, however, for one antibody to

react differently like that. Here is a case where one antibody may look like many. In addition,

because of this variability in reactivity, an antibody may not always show its normal pattern of

reactivity across different test phases (IS, 37°, AHG). Here is a case where the simplifying

assumption that one type of reaction is caused by one antibody (1c+) is going to fail. The

practitioner needs to be able to give up that strategy if it is not helping him/her to find an

answer for a given case.

Antibodies showing dosage

A special case of the variable reactions, and one that is easier to spot, is for an antibody

to react more strongly with a homozygous antigen than with a heterozygous antigen. Some

antigens are genetically paired with others, such as M and N. A test cell that possesses just one

antigen of the pair is normally homozygous (double dose) for that antigen. It will probably

react more strongly than a cell that is heterozygous (one dose) for the antigen. Here, the blood

banker needs to recognize that the test cells are reacting in the same pattern, but in different

52

strengths. Grouping them all together and comparing the zygosity of antigens with the test cells

will help the practitioner to find the correct answer (1c+).

Recent transfusion

If a patient has been transfused recently, then there may still be the previous donor's

red blood cells in the patient's system. Thus, in antigen typing a sample of the patient's red

blood cells, some of the cells tested will be the patient's and some will be from the transfusion

donor. Because it is difficult to determine which of the antigens detected belong to the patient

and which belong to the previous donor, the blood banker can not interpret the antigen typing

tests reliably. A further problem with diagnosing a recently transfused patient is that the

patient's immune system may currently be forming antibodies against some of the antigens

found in the transfusion donor's blood. The newly forming antibodies may not yet show up as

positive reactions when being tested in vitro, (e.g., in the test tube), but could be strong enough

to cause a reaction if the patient received that kind of blood for their next transfusion. The

blood banker, therefore, must know when the antigen typing test is valid and when it is not. If

an antigen typing test is used when the patient has recently been transfused, there may be false

positive results (2e+).

Drug interactions

Similar to the problems that can occur from a recent transfusion, certain drugs in a

patient's system can both cause the formation of antibodies and interfere with the interpretation

of various test results. Thus, blood bankers must be aware of the kinds of medication received

by each patient and know how those medications will affect their ability to interpret test results.

53

Auto-immune disorders

If a patient has an auto-immune disorder, s/he can form antibodies against his/her own

antigens. This special case is not treated in this dissertation.

54

Chapter IV

The Design of the Antibody Identification Assistant

(AIDA3)

The ultimate goal of this research is to develop a computer system that improves the

antibody identification procedure, both by making the task simpler and by efficiently bringing

more knowledge to the blood banker. Based on studies of the expert strategies and

erroneous/inefficient strategies found to be used in this domain, a number of opportunities for

a computer to aid the blood banking practitioner were identified. In the short term, the

computer can help practitioners on specific cases by checking for slips and the use of inadequate

strategies. In the long term, a well-designed system can also help the practitioner to learn the

problem-solving strategies and extensive knowledge necessary to become a true expert.

The research presented here is an extension of previous research focusing on the design

of decision support tools for certified medical technologists as they perform the task of antibody

identification. Several additions were be made to the previous version of the critiquing system,

as well as a change in the way users are trained to use the system. These changes were intended

to reduce the error rates prevalent in this community (Smith et al, 1991a; Smith et al, 1991b;

Strohm et al, 1991; Guerlain, 1993a) and to provide a demonstration of how to design an

effective cooperative problem solving system for this task and tasks with similar characteristics.

In order for this research to generalize as desired, it is necessary to form a mapping between the

characteristics of this domain, this class of users and this aiding strategy, so that results from

this study can transfer to other domains and classes of users with similar characteristics.

55

The Task

The task, as indicated, is a medical diagnosis task that can be characterized abstractly as

an abductive reasoning task (Chandrasekaran, 19xx; Josephson and Josephson, 1994; Pople,

1973), so characteristics such as masking and noisy data are factors that are known to cause

problems for users (Fraser, Strohm, Smith, et al., 1989). It is a high-consequence task, since an

incorrect diagnosis can lead to transfusion reactions and possibly even death. Usually there is

not significant time pressure to complete the task, although time pressure is a factor in

emergency (STAT) situations. In addition, there are financial pressures to limit costs.

The Practitioners

The practitioners are certified medical technologists, who have been documented to

make a significant number of errors on this task (Smith, Miller, Fraser, et al., 1991; Guerlain,

1993a). These errors include slips, failures to form appropriate hypotheses to guide problem

solving, failure to rule out alternative hypotheses, failure to collect independent, converging

evidence for the answer, failure to use as much information as possible from a test result,

ignoring base rates, biased assimilation, and biased reviewing. Many of these errors are due to

a lack of training and practice with the task, since a given particular practitioner may only

perform this task occasionally (especially if the hospital is small or if the technologist rotates

through other labs besides blood bank).

The Problem-Solving Tool

The computer support system used for this study is called the Antibody IDentification

Assistant 3 (AIDA3) to distinguish it from the previous versions of the system that were studied

earlier. The system was developed on the Macintosh using Symantec's® Think C programming

language. With all error checking turned off, the system can be used as an information display

56

tool that allows practitioners to request and interpret the various tests used for antibody

identification similar to the way they normally would using paper and pencil. With error

checking turned on, the system monitors the practitioner's procedure for errors and provides

feedback if errors are detected. Both systems have the same set of test cases built into them.

These cases were either designed by an expert blood banker or were taken from real patient

data to ensure validity. The cases that were used for testing were carefully selected to have

certain characteristics (weak antibodies, multiple antibodies, etc.) and predictions were made as

to how a practitioner's performance would change depending on the case characteristics, the

practitioner's strategy, and the type of system enhancements the practitioner was using. Three

design principles were used to guide the design of this problem-solving tool.

Design Principle 1: Use a Direct Manipulation Problem Representation as the

Basis for Communication

First, the interface is designed not only to be helpful and easy to use, but also to provide

data for the computer to diagnose errors in the user's problem solving. The technologist can

request test forms and mark hypotheses on those forms, so the computer is able to watch the

person's problem-solving process, potentially detecting errors in the subject's procedure. Thus,

no extra work is required on the user's part to feed information to the computer. Practitioners

just work as they naturally would and, because of the interface design, the data on the user's

problem-solving activities is rich enough for the computer to detect problems and provide

feedback. A description of how the user interacts with the system follows.

For each of the cases built into the system, the practitioner performs the antibody

identification process by asking the computer (via a pull-down menu option) to show test

results and other pertinent information, such as relevant facts about the patient's medical

57

history. The computer has data stored in it for the results of every possible test so the user can

choose to look at those tests that are deemed pertinent to the particular case.

Users can make markings on the data sheets as they would on paper by selecting from a

set of color-coded "markers", available as buttons along the top of the screen, and clicking on

cells of interest using the mouse. Rows, columns, and cells can be highlighted using a yellow

pen. Antibodies can be marked as either 'ruled-out', 'unlikely', 'possible', 'likely' or 'confirmed',

using pens ranging from green for ruled-out to red for confirmed. The colors are chosen to

correspond somewhat with the danger of introducing that antigen into the patient's blood

stream. A ruled-out (green) antibody indicates that it is safe to use blood for a transfusion that

contains the corresponding antigen, while a confirmed (red) antibody indicates that a

dangerous transfusion reaction could occur if blood containing that antigen is given to the

patient.

The data sheets used in AIDA3 are very similar to those currently used in paper form in

labs. The organization of the display is that shown in Figure 4. The only difference between the

paper version and the one on the computer is that the background grid has been made less

salient than the data contained inside. This follows the principles of Tufte (1990) of reducing the

amount of "chartjunk", or background display information, so that the important data is

enhanced.

58

Figure 4. Sample Screen.

As the user selects tests, any antigens that have been marked as ruled-out, possible, etc.

on previous panels carry over to the current panel. This aids the user in remembering where

s/he is in a case. Using paper forms, blood bankers must copy over their markings from panel

to panel. Here, the computer performs that subtask for them, reducing the potential for slips

and saving them time.

59

Design Principle 2: Use Critiquing to Enhance Cooperative Problem-Solving

The second design principle followed was to use a critiquing approach to decision

support, because of the previous benefits found with designing the system as a cooperative aid.

Based on our studies of human experts, the AIDA3 system was designed around a broad

strategy of collecting converging evidence before completing a case. This global strategy

provides protection against the fallibility of the heuristic methods underlying strategies applied

at different points in the case (i.e., individual steps on the checklist). To help ensure use of this

strategy, AIDA3 monitors for both errors of commission and errors of omission. The types of

knowledge encoded into the second version of the system include detecting:

1) Errors of commission (due to slips or mistakes):

� Errors in ruling out antibodies (same as in previous study).

2) Errors of omission (due to slips or mistakes):

� Failure to rule out an antibody for which there was evidence to do so.

� Failure to rule out all clinically significant antibodies besides the antibodies

included in the answer set.

� Failure to confirm that the patient did not have an auto-immune disorder (i.e.,

antibodies directed against the antigens present on their own red blood cells).

� Failure to confirm that the patient was capable of forming the antibodies in the

answer set (i.e., that the patient's blood was negative for the corresponding

antigens, a requirement for forming antibodies in the first place if the possibility

of an auto-immune disorder has been ruled out).

3) Errors due to masking:

� Failure to detect and consider potentially masked antibodies.

60

4) Errors due to noisy data:

� Failure to detect situations where the quality of the data was questionable.

5) Answers unlikely given the data (low probability of data given hypothesis):

� Failure to account for all reactions.

� Inconsistency between the answers given and the types of reactions usually

exhibited by those antibodies (e.g., that a warm temperature antibody was

accounting for reactions in cold temperatures)

6) Unlikely answers according to prior probabilities (regardless of the available

evidence)

� Antibody combinations that are extremely unlikely due to the way the human

immune system works.

Design Principle 3: Represent the Computer's Knowledge to the Operator to

Establish a Common Frame of Reference

Third, a check-list was designed that enumerates the subgoals the computer considers

necessary to adequately solve a case (Figure 5 shows the checklist). This checklist provides an

explicit, high-level representation of the computer's goal hierarchy. The design of the system is

such that users can apply additional strategies without interference from the computer, and can

override a critique from the computer, but the checklist makes it clear what steps the computer

expects the person to have done before completing a case. The computer also allows the user

flexibility in deciding what order to use in completing the subgoals listed on the checklist (i.e.,

the computer does not monitor for the ordering of the steps listed in the checklist except when

that ordering is critical to successful problem-solving).

61

Name: __________________ Phone Number: _____________ Hospital: _____________

Checklist for Alloantibody Identification

Case: ____________

Step 1. Complete ABO and Rh typing. Step 2. Check screen cells.

a. Mark the unlikely antibodies (usually f, V, Cw, Lua, Kpa, Jsa). b. Rule out antibodies.

� Homozygous: C, E, c, e, M, N, S, s, Lea, Leb, Fya, Fyb, Jka, and Jkb

� Homozygous or Heterozygous: D, P1, Lub, K, k, and Xga (as well as

the six unlikely antibodies: f, V, Cw, Lua, Kpa, Jsa) Step 3. Check patient history if available. Step 4. Check auto control on the Poly Panel. Step 5. Check the Polyspecific Panel. (If necessary, use another panel to enhance

reactions.) a. Rule out antibodies.

Antibody reactions that could be weakened in certain test conditions: Enzyme: M, N, S, s, Fya, Fyb, Xga Prewarm: M, N, P1, Lea, Leb, Lua, Eluate: M, N, P1, Lea, Leb Room Temperature: D, C, E, c, e, f, V, Cw, s, Lub, K, k, Kpa, Jsa, Fya, Fyb, Jka, Jkb, Xga Cold 4° C: D, C, E, c, e, f, Cw, S, s, P1, Lub, K, k, Kpa, Jsa, Fya, Fyb, Jka, Jkb, Xga

b. Mark likely antibodies. Step 6. If necessary, use additional cells to rule out the remaining antibodies, and to help you to

confirm your answer. Step 7. If necessary, use antigen typings to rule out the remaining antibodies. Step 8. Use antigen typings to help confirm your answer. Step 9. Make sure that all antibodies that have not been confirmed or marked unlikely

(usually f, V, Cw, Lua, Kpa, Jsa) have been ruled out. Step 10. Make sure the confirmed antibodies are not on any non-reacting cells. Step 11. Make sure that at least one confirmed antibody is on every reacting cell. Step 12. Look at your answer and ask whether it is plausible (or is it a "unicorn"?)

Figure 5. Sample Checklist

62

Although a checklist is not the only way that such information could be conveyed or

represented, it was hypothesized that the checklist would work as an effective aid for a number

of reasons. First, the checklist serves as an external memory aid, reminding users of certain

types of knowledge related to antibody identification, such as: 1) Factual information (i.e., what

antibodies are destroyed in certain phases of testing) and 2) Procedural information (i.e., what

constitutes a complete protocol).

Second, since the checklist is a representation of the kinds of knowledge the computer is

expecting the user to apply in solving a case, the introduction of the checklist provides the user

with an appropriate frame of reference for interpreting any feedback given by the computer. In

other words, the checklist helps to ensure that the user has an appropriate mental model of the

problem-solving strategies understood to be correct by the computer. Use of the checklist is a

way for designers to ensure that both the computer system and the practitioners using it have a

common frame of reference for communication and understanding.

Finally, the checklist provides an alternative form of aid to practitioners in situations for

which the critiquing system is not helpful. For example, if a practitioner gets stuck during a

case, s/he can review the checklist to see if there are any other tests or knowledge that may be

applicable, since the checklist lists the goals that should be completed before finishing a case.

63

Chapter V

Experimental Procedure

An earlier study with the AIDA system had shown that if the computer is

knowledgeable about one aspect of the antibody identification procedure (how to rule out

antibodies), then critiquing the users' application of that strategy may be more appropriate than

automating the task in cases where the computer's knowledge is not fully competent. The goal

of this second study was to see if misdiagnosis rates could be reduced or eliminated with the

design of a more complete critiquing system, and to explore its effects on cooperative

performance.

Subjects

Two subject pools were used to test AIDA3. The first was a group of four "experts"

(certified Specialists in Blood Bank (SBBs)) who were tested with the system as a pilot group.

These subjects came from three different hospitals. (The objective of this preliminary study was

to make sure AIDA3 did not interfere with the performances of skilled practitioners.)

Subsequently, thirty-two blood bankers from seven different hospitals were tested. All

of these technologists were identified by their supervisors as "actually performing the task of

antibody identification as part of their job but who would benefit from additional experience

and training". Their years of experience ranged from 1 to 35 years (with a mean of 10 years).

64

Experimental Design

Half of the subjects in each group were randomly assigned to be in the Control Group

and the other half were randomly assigned to be in the Treatment Group. All of the subjects

were tested on the same six cases. The first case was used to give both groups the same initial

training on how to use the system. Subjects were shown how to use the pull-down menus to

select test results (see Figure 6) and how to interpret the test results on each screen. After

walking the subject through the layout of each type of test and explaining how to interact with

the system (i.e., how to see the results for a particular test cell, how to mark an antibody as ruled

out, how to mark an answer for the case, etc.) the subject was asked if s/he understood how to

use the system and if s/he was ready to continue. During this initial training, no knowledge

specific to blood banking was discussed, except in relation to how the computer displayed test

results and how the user interacted with the computer. Furthermore, subjects were not asked to

solve the first case, but just used it to practice selecting and marking individual test results.

Figure 6. Test Results Available.

65

All subjects (in both the Treatment and Control Group) solved the second case (herein

referred to as the "Pre-Test Case") without any aid from the computer. Thus both groups were

using the control version of the system. The purpose of this Pre-Test Case was to get a

benchmark on the practitioners' current performance strategies (i.e., did they rule-out, did they

do antigen typing, did they seem to notice that there were two reactions present) against which

to compare the Treatment Group's strategies when using the experimental system. The Pre-Test

Case was one of two matched cases, and it was randomly determined at run-time which of the

two cases a particular subject solved as a Pre-Test Case and which as a Post-Test Case. With

this design, a within-subjects comparison could be made for the Treatment Group. After

solving the first Post-Test Case that was matched in characteristics to the Pre-Test Case, both

groups solved three more cases, with the Treatment Group using the critiquing and the checklist

and the Control Group solving the cases on their own. Performance on these cases could be

examined for differences in a between-subjects manner.

Figure 7 shows the experimental design for this study. It was hypothesized a priori that:

1) There would be a within-subjects reduction in misdiagnoses rates for the Treatment

Group as they went from solving the first of the two matched cases without any

critiquing to solving the other matched case after the checklist and critiquing were

introduced, but that the Control Group would not show such improvement,

2) There would be a between-subjects improvement, such that the Treatment Group

would have a significantly lower misdiagnosis rate than the Control Group for all of the

Post-Test Cases, and

3) The critiquing system would influence practitioners' cognitive problem-solving

processes, promoting effective use of strategies for solving cases, and effective

cooperative problem-solving between the human and the computer (i.e., the system

66

would detect errors in the human's problem-solving, the practitioners would find the

system helpful and beneficial, and the practitioners would be able to detect and recover

from errors generated by the computer's faulty reasoning).

The order for these two cases was randomly decided at run-time.

Test Cases Pre-test Case 2 antibodies looking like 1 (matched to Case 1)

Case 1 2 antibodies looking like 1 (matched to Pre-test case)

Case 2 Weak antibody (the "brittle" case)

Case 3 1 antibody masking another

Case 4 3 antibodies reacting on all cells (from another lab)

Control Group Treatment Group

Introduce checklist and training cases for Treatment Group

Figure 7. Experimental Design.

Cases used to test AIDA

A crucial part of testing a cooperative problem solving system like AIDA is to use a set

of tasks that test the range of scenarios that might be encountered in practice. If such a range of

67

tasks is not used, then results may not be representative of actual performance with the system.

Thus, it was important to include cases where the computer's support tools fail, and to see

whether the technologists would detect and cope with such failures. Picking scenarios where

one's design might fail is imperative for understanding how the introduction of tools may

influence or change performance in potentially dangerous ways. Issues such as loss of skill,

fixation on one set of solutions (Fraser, et al, 1992), and how the dynamics of the system and

characteristics of a case can combine to cause problems, are all central to our understanding of

how humans interact with complex systems. Analyzing how users cope with such system

failures may lead to better solutions, or more generally, to the identification of principles

describing how people interact with decision support systems. Therefore, in testing AIDA, a

case was used for which the computer�s rule-out knowledge was not fully competent. (This

case was the same previously tested weak anti-D which did not react with all of the test cells.)

There were four Post-Test Cases. The first Post-Test Case was randomly selected from

one of two matched cases, the other of which was the Pre-Test Case. Both of these cases had the

characteristic that the original testing panel seems to indicate that only one antibody is present,

but in actuality, two different antibodies are together accounting for the reactions. The second

Post-Test Case was the weak antibody case, for which the computer's knowledge was not fully

competent. Case 3 was a masking case (where one antibody masks the presence of another).

Case 4 was sent to us by a blood bank lab that had no knowledge of our work. It turned out to

be a difficult three antibody case. The following section describes each of these cases in more

detail.

68

Pre-Test Case: Two antibodies looking like one

Both the Pre-Test Case and the first Post-Test Case had the same characteristics (two

antibodies looking like one) and it was randomly determined at run-time which case a

particular subject would get as the Pre-Test Case and which as the first Post-Test Case. The

pattern of reactions are somewhat consistent with the pattern for one antibody (either anti-Fyb

or anti-S) showing dosage. There are '0 0 2+' and '0 0 3+' reactions in the AHG phase on all cells

for which this single antigen is present. Although this pattern is not following the pattern of

dosage well (some of the '0 0 2+' reactions are on cells where the single antigen is homozygous

and some of the '0 0 3+' reactions are on cells where the single antigen is heterozygous), past

experiments have shown that practitioners often fail to note this inconsistency in the dosage

pattern and hypothesize that single antigen as the answer (Rudmann et al., 1992). Those

practitioners who do not follow through with ruling out and antigen typing could conclude that

the single antigen is the answer while in fact, two different antibodies (either anti-E and anti-K

or anti-c and anti-K) are causing the reactions. The single antigen can be ruled out by running

additional cells.

For this case, it was predicted that many practitioners would originally hypothesize the

single antigen as the answer. Those who followed through with ruling out and antigen typing

were predicted to be able to eventually rule out the single antigen and find the correct set of

antibodies.

Post-Test Case 1: Two antibodies looking like one

Case 1 had the same characteristics as the Pre-Test Case, as described above. It was

predicted that those subjects using the critiquing system would eventually get the right answer

because they would follow a complete protocol, as prescribed by the checklist.

69

Post-Test Case 2: One antibody reacting weakly, right answer can be ruled out

(because system is not fully competent).

Case 2 was a real patient case used to test how practitioners cope with the brittleness of

an expert system. On this case, there are very weak reactions caused by a newly forming anti-

D. The case history tells us that the patient is a pregnant woman in her 36th week of gestation.

She is also Rh negative. Such information should prompt the blood banker to hypothesize that

she is probably getting the drug Rho Gam to counteract the possible negative effects if her baby

is Rh positive. Rho Gam will cause reactions against cells that contain the D antigen. Therefore,

anti-D is most likely to be detected from antibody testing. The reactions for this case, however,

are very weak. Only two cells are reacting '0 0 +/-'. Three of the test cells that contain the D

antigen are not reacting. Therefore, the normal rule-out heuristics fail: anti-D can be ruled out

according to the heuristics when anti-D is actually the answer.

The blood banker needs to recognize that the reactions are very weak and try to

enhance the reactions somehow. Running enzymes will enhance the reactions of all Rh

antibodies, including anti-D. If the blood banker does not recognize the need to enhance the

reactions, s/he may try to determine the answer to the case just by looking at the Polyspecific

reactions. On that panel, all antibodies can be ruled out except anti-E. Anti-E is heterozygous

on some of the cells that are non-reacting and homozygous on the two cells that are reacting. It

appears that anti-E could be accounting for the reactions, showing the classic pattern of dosage.

However, anti-E alone in an Rh negative patient is extremely unusual and should be recognized

as a "unicorn" (an unusual answer). This is because the normal formation of antibodies is such

that anti-D will almost always form before another Rh antibody if the patient lacks both the

antigens.

70

The clues in this case that should prompt the blood banker to find the correct answer

are the following: the reactions are very weak so they should try to be enhanced by running the

cells at a different temperature or using a different technique (such as running the cells with

enzymes). The patient is Rh negative, so any Rh antibodies that form should include anti-D.

The patient is in her 36th week of gestation, so it is likely that she is receiving Rho Gam, a drug

that will induce the formation of anti-D. Finally, running additional cells, even at Polyspecific,

allows anti-E to be ruled out on a homozygous cell.

For this case, two features of the critiquing system could help practitioners on this case.

The first feature is the fact that the system has some meta-knowledge embedded in it to detect

that some of its own rules may not be valid. In particular, when there are weak reactions on a

panel, the system provides a warning message to the user saying that rule-out may not be

appropriate and that enhancing the reactions would be a good idea.

Although the system does not require practitioners to rule out antibodies on this case

(as it does on cases for which there are no weak reactions), it also does not prevent users from

ruling out antibodies if they choose to do so. In essence, the system gives the user a warning

about the appropriateness of ruling out and is leaving it up to the practitioners to decide

whether or not to rule-out. If users do choose to rule out antibodies despite the system's

warning, they may end up with the anti-E as the answer (since anti-E matches the pattern of

reactions fairly well).

The second feature that the system has is a set of detectors that check for unlikely

antibody combinations. These unicorn detectors will detect that anti-E alone in an Rh negative

patient is an extremely unlikely event, given the way the immune system works. Thus, if users

mark anti-E as their answer, the system suggests enhancing the reactions with enzymes to see if

71

the presence of anti-D (which is likely to have formed before anti-E in this Rh negative patient)

becomes more obvious through the enhancement.

For this case, it was predicted that some practitioners would heed the weak reaction

warning of the system and enhance the reactions and most likely detect the anti-D. For others

who did not heed the warning, it was predicted that many of them would conclude that anti-E

was present and then get the system's warning about anti-E alone in an Rh negative patient,

prompting them to re-evaluate their answer and probably discover anti-D. If the user

concluded something different than anti-E or anti-D, it was unclear whether the user would be

able to recover and detect the correct answer.

Post-Test Case 3: Two antibodies, one masking the other

Case 3 was used to test for completeness of rule-out. For this case, reactions are all '0 0

3+' on the main panel at the AHG phase. This suggests the presence of only one antibody.

Anti-Fya immediately looks like a good candidate, since all reacting cells contain the Fya

antigen and all non-reacting cells are negative for the Fya antigen. It was predicted that most

practitioners would see this and mark anti-Fya as a likely candidate. For those who did not

rule-out all other possible antibodies, the case would likely be considered as being done at this

point when in fact, there is a second antibody, anti-E, that is present as well. This antibody is

completely masked by the anti-Fya reactions on the main panel. The typical way for the

practitioner to discover that anti-E is present is in the process of trying to rule it out. To rule it

out, the practitioner would run additional cells that are positive for E but negative for Fya. In

running such an additional cell, the reactions are positive, rather than the expected negative,

indicating that there is something else besides anti-Fya causing reactions. With a little looking,

anti-E can quickly be determined to be that antibody.

72

For this case, it was predicted that many practitioners would originally hypothesize the

single antigen as the answer. Those who followed through with ruling out and antigen typing

were predicted to be able to eventually detect the second antigen and find the correct set of

antibodies. It was predicted that those subjects using the critiquing system would eventually

get the right answer because they would follow a complete protocol, as prescribed by the

checklist.

Post-Test Case 4: Three antibodies, on overlapping cells, reacting on all cells of

the main panel

Case 4 came from a blood bank lab that had no knowledge of our work. We asked this

lab to supply us with "a case that many blood bankers would have difficulty solving." On this

case, three antibodies, anti-E, anti-c, and anti-Jkb were reacting together, such that all of the cells

on the main panel had positive reactions. A knowledgeable blood banker would notice that

there were three different patterns of reactions ('0 0 2+', '0 0 3+' and '0 2+ 3+') and hypothesize

that at least two antibodies were present.

Anti-E is fairly easy to recognize as causing the stronger, '0 2+ 3+' reactions, but it is

very difficult to detect what could be causing the other reactions. This is because the anti-c and

anti-Jkb are overlapping with each other and with the anti-E. Furthermore, anti-c and anti-Jkb

are both showing dosage, so the reactions accounted for by their presence are variable.

Furthermore, no antibodies can be ruled out using this panel because there are no non-reactive

cells. Finally, just two antibodies, anti-E and anti-c can account for all of the reactions on the

main panel, so practitioners might abandon anti-Jkb as part of their hypothesis set when looking

at that set of reactions. On the other hand, anti-E and anti-Jkb account for all of the reactions on

73

the additional cells panel, so practitioners might abandon anti-c as part of their hypothesis set

when looking at that set of reactions.

The only way to rule out antibodies is to: 1) Run the cells at Room Temperature, in

which case, none of the cells will react and all those antibodies that normally react at Room

Temperatures can be ruled out (anti-M, -N, -S, -s, etc.) 2) Run additional cells, in which case

there will be two non-reacting cells (because they happen to be negative for anti-E, anti-c, and

anti-Jkb) which can be used to rule out some of the antibodies. 3) Use antigen typings to type

the patient for the presence of certain antigens, which will allow more antibodies to be ruled

out.

For this case, it was predicted that many practitioners would not know how to proceed

because all of the cells are reacting. Some might predict a high frequency antibody, such as anti-

k or anti-Lub as the answer, since these antigens are present on all of the donor cells. These

high frequency antigens can be ruled out, however, either by running additional cells or by

doing antigen typing. Those who used additional cells and antigen typing would be able to rule

out all but the three antibodies present (and two low-frequency antibodies). Those who

followed through with ruling out and antigen typing were predicted to be able to eventually

solve the case. It was predicted that those subjects using the critiquing system would eventually

get the right answer because they would follow a complete protocol, as prescribed by the

checklist. Furthermore, if they hypothesized just a subset of the antibodies present, the system

would flag them and tell them that reactions were present for which no confirmed antibody was

present.

Procedure

There were six phases to the experiment. During Phase 1, the experimenter collected

demographic data from the subject being tested. Phase 2 introduced subjects to the basic AIDA

74

interface. Phase 3 was used to test subjects on a Pre-Test Case without any aiding for either

groups. Phase 4 was used to introduce subjects in the Treatment Group to the checklist and the

critiquing version of the system. Subjects were asked to solve partial cases with the checklist

and critiquing system. The Control Group also solved these same partial cases, but without any

aiding by the computer or use of the checklist. Phase 5 was used to test all subjects on four Post-

Test Cases. Phase 6 was a debriefing stage where treatment subjects were asked to fill out a

questionnaire rating the utility and usability of the critiquing system.

Phase 1. Subject Demographic Data

In Phase 1, subjects were briefly told what the experiment would involve and asked to

sign a consent form. Their name, hospital, certification level, and years of blood bank

experience were logged. Furthermore, subjects were asked with what frequency (number of

times per month) they normally encountered antibody identification cases.

Phase 2. Introduction to the Interface

In Phase 2 of the experiment, all subjects underwent the same training and testing.

Training on the system was done with the first two cases. This involved initially showing the

subjects how to use the mouse to select test results from the pull-down menus provided (with

all of the critiquing messages turned off). Each subject was then asked to select one by one each

of the test results.

For each kind of test result screen, the experimenter told the subjects what kinds of

operations could be performed and asked them to try each function as it was being described.

For instance, on the ABO/Rh panel, subjects were asked to select their ABO/Rh interpretation

using the drop-down menus provided. On the antibody screening panel, subjects were asked to

75

highlight rows and columns, mark individual antibodies as ruled out, unlikely, possible, likely,

and confirmed. They were then told to undo some of those actions. As subjects worked

through each panel, it was pointed out to them how their markings were automatically carried

over from screen to screen. On screens where test results were not automatically given to them,

subjects learned how they could simulate running a test by highlighting the appropriate row or

column.

It was explained to the subjects that a case was not considered complete until they hit

the Done button, so that it was perfectly acceptable to undo or re-mark antibodies as much as

they liked until they considered themselves done with the case.

During training, subjects could ask as many questions as they liked, and all questions

having to do with the interface were answered. However, questions specific to blood banking,

such as "What's the rule-out rule again?" were not answered. All interface features were taught

on the first case, and the experimenter refrained from talking on subsequent cases unless the

user was still having trouble or did not remember how to perform an action (such as how to get

an antigen typing result).

Phase 3. The Pretest Case

Phase 3 involved testing subjects on a Pre-Test Case. This case was randomly selected

at run-time from one of two matched cases, the other of which became the first Post-Test Case

for that subject. Both groups used the control version of the system, without any critiquing or

feedback. They were asked to solve the case as they normally would and to mark their answer

on an answer sheet (see Appendix A for a sample answer sheet).

76

Phase 4. Training and Introduction to the Checklist and Critiquing

In Phase 4, the Treatment Group was given a checklist like the one shown in Figure 5. It

was explained that the computer would now be monitoring them, and that it would check that

each of the steps in the checklist was followed. It was explained that certain steps in the

checklist would be practiced with some training cases. During all of these training sessions, the

critiquing system was monitoring for errors according to the lesson being followed.

The first training case was an ABO/Rh screen, corresponding to Step 1 in the checklist.

Subjects were told that the computer expected this test to be run first for every case. They were

then asked to purposely misinterpret the test, so that they would encounter an error message

from the computer (see Figure 8). Subjects were shown how they could either

77

Figure 8. Sample ABO/Rh Error Message.

undo their action by clicking on the "Undo Marking" button, or override the computer by

clicking on the "Leave as Is" button. It was explained to them that overriding the computer was

always an option, but we asked that they fill out a form explaining their reason for disagreeing

with the computer.

The second type of training case was an Antibody Screen. Subjects were told to follow

Step 2 in the checklist, namely to mark the six low frequency antibodies (anti-f, anti-V, anti-Cw,

anti-Lua, anti-Kpa, and anti-Jsa) as Unlikely, using the Unlikely button. Subjects were then told

to rule out antibodies according to the strategy described in the checklist, namely, to only rule

78

out on non-reacting cells, taking into account zygosity. Subjects practiced this step with three

Antibody Screen practice cases, one of which had no nonreacting cells, and thus could not be

used to rule out any antibodies.

The third type of training case was ruling out on full panels, according to Step 5 in the

checklist. Subjects had to follow the same steps as outlined in Step 2 (marking Unlikely

antibodies and Ruled Out antibodies) with the exception that if the panel results were run at

certain temperatures or phases of testing (e.g., Enzymes, Prewarm, Room Temperature, etc.)

then a corresponding set of antibodies could not be ruled out, since those antibody reactions are

normally weakened or destroyed in those phases of testing. Subjects also had to mark whether

or not the patient had an auto antibody present (Step 4 in the checklist), based on the

information located at the bottom of the panel in the Auto Control section of the panel. Subjects

practiced ruling out in this manner on four panels, two of which had no exceptions to the rule-

out procedure, and two of which had exceptions (both being due to an Enzyme panel).

The fourth type of training case corresponded to Step 7 in the checklist. Subjects were

given three antigen typing screens and asked to rule out antibodies based on the presence of

antigens in the patient's blood. (If a patient possesses an antigen in their own blood, then there

should not be an antibody formed against it).

Finally, subjects were asked to solve two entire, single-antibody cases using the

critiquing system and following the whole checklist. For each case, subjects were asked to fill

out an answer sheet. The first case was a straightforward anti-C and the second was a

straightforward anti-K.

After completing each lesson, the computer gave the subjects a summary of their

performance, showing the correct answer for the lesson, vs. their answer. Furthermore, a list of

79

error messages showed what procedural errors had been made during the problem-solving (see

Figure 9 for a sample summary screen).

During Phase 4, the Control Group was asked to solve the two entire cases solved by

the Treatment Group during training. However, the Control Group did not receive any aiding

or feedback from the computer or experimenter. For each case, subjects were asked to fill out an

answer sheet.

Phase 5. Post-Test Cases

During Phase 5, both groups solved the four Post-Test Cases described in the previous

section. The experimenter refrained from answering any questions from either subject group.

The Control Group filled out answer sheets for each case and got no feedback from the

computer. The Treatment Group was asked to follow the checklist and fill out an answer sheet

for each case. The critiquing system was monitoring for errors and giving error messages if any

were detected.

Phase 6. Debriefing

In the final phase of the experiment, subjects were told how they did on the cases and, if

in the Treatment Group, were asked to fill out a questionnaire. If there were any questions

about the cases or the experiment in general, they were answered. When there were no more

questions, subjects were thanked and paid $50.00 for their time.

80

Figure 9. Sample Summary Screen.

Data collection

As subjects were working on the test cases, the AIDA system was logging all of the

person's actions (such as which buttons were selected, what answers were typed in etc.) in such

a way that all of the actions could be reproduced by running the program again with the log

data as input. Besides user actions, user performance measures were also being logged, i.e., the

time taken to complete each case, the number of incorrect rule-outs, and incorrect or incomplete

answers. The computer logged all errors that it detected whether or not it displayed them to the

81

user. Thus, errors for both the Control Group and Treatment Group were logged. The system

automatically codes the data, counting types of errors, time to complete cases, and misdiagnosis

rates. Appendix C shows a sample of the coding categories detected and logged by the

computer. An actual sample of this log data is shown in Appendices D and E. These logs, along

with the questionnaire results, will be the primary sources of data for the experiment.

Data Analysis

Three types of analyses were made: 1) An analysis of outcome performance, as

measured by misdiagnosis rates, 2) A behavioral protocol analysis to examine subjects'

strategies, behaviors, and process errors, and 3) A questionnaire to get subjective reactions to

using the critiquing system. A number of statistical tests were run to measure differences in

misdiagnosis rates. McNemar's Chi Square test was used to test the hypothesis that subjects in

the Treatment Group improved in performance from the Pre-Test Case to the matched Post-Test

Case. Fisher's exact test was used to test the hypothesis that the Treatment Group had better

performance on each of the Post-Test Cases than the Control Group. Finally, a test for

difference between the two groups was conducted using a log-linear analysis that takes into

account performance on the Pre-Test Case (see Appendix B for a sample of the statistical

calculations used). A behavioral protocol analysis of subjects' performances was also conducted

to study differences in strategies used by subjects, how those strategies were influenced by the

system design and how they lead to good or bad outcomes (see Appendices C through G for

sample behavioral protocols and error logs that were used for this analysis). Finally,

questionnaire results were examined to determine the perceived usability and utility of the

critiquing system.

82

Chapter VI

Results and Discussion

As discussed earlier, two populations were studied. The first was a set of four highly

proficient technologists, the second a set of thirty-two practitioners who their supervisors

identified as "actually performing the task of antibody identification as part of their job but who

would benefit from additional experience and training." In this section, data is first presented

describing unaided performance for the population needing additional experience and training.

To give a flavor for the interaction with the critiquing system, a few sample interactions are then

given. Comparison of misdiagnosis rates for the Treatment vs. Control Groups are then given,

as well as results from the Questionnaire that was administered to the Treatment Group.

Finally, a detailed protocol analysis highlights the important behaviors that were exhibited with

use of the system.

Unaided Subject Performance

In order to understand unaided performance on this task, we can look at the problem-

solving strategies exhibited by the subjects in the Control Group on all of the cases and the

Treatment Group on the Pre-Test Case.

For example, one Control Group subject made many process errors that lead to an

incorrect solution on all of the cases. On Case 1, for instance, this subject ruled out by using a

strategy that will fail in multiple antibody cases (ruling out using reacting cells, strategy 2f-),

which caused her to rule out both of the right answers (anti-c and anti-K). She also marked anti-

S as the answer, even though it accounted for most, but not all, of the reactions exhibited. Thus,

83

she is violating strategy 3d+ (Making sure there are no unexplained positive reactions).

Furthermore, she made other procedural errors such as ruling out heterozygously (2g-), ruling

out antibodies using results from test procedures that usually inhibit those reactions (2d-), and

failing to do antigen typing for the antibody marked as the answer (3a-).

Multiple process errors were made by unaided subjects. Table 1 shows the number of

subjects (Treatment and Control groups combined) who made a particular kind of error at least

once on the Pre-Test Case. Four subjects' data were not counted in this analysis because the

data was invalid. (Due to a bug in the program, the reactions for a particular test panel that

these subjects used were incorrect. The other subjects never accessed this test panel).

Thus, we have evidence consistent with previous studies that practicing medical

technologists make a significant number of process errors and outcome errors when solving

antibody identification cases.

Table 1. Process errors made by Treatment and Control Group subjects on the Pre-Test Case.

Error Number of Subjects (out of 28) Who

Committed that Error at Least Once on the Pre-Test Case

1. Ruling out Hypotheses Incorrectly 20 2. Failing to Rule Out When Appropriate 14 3. Failure to Collect Converging Evidence 26 4. Data Implausible Given Answer 11 5. Answer Implausible Given Prior Probabilities 11

84

Example Subject Interactions

As a comparison to unaided performance, this section gives the reader an idea of how

the critiquing system interacted with a sample subject, detecting errors in performance and

steering the subject towards a successful solution path. This subject got the Pre-Test Case

wrong, but then got the rest of the cases right with the aid of the critiquing system. On two of

those Post-Test Cases, she initially had an incorrect solution set, but changed her answer in

response to the critiques she received.

On the Pre-Test Case, this subject correctly reviewed initial data about the patient, such

as the ABO/Rh and the Case History. After seeing that the initial Antibody Screen results were

positive, she selected a full panel for interpreting test results. There, she ruled out using

homozygous, non-reacting cells (a good strategy) and selected additional cells for further

analysis. At this point, she confirmed anti-Fyb and continued on to the next case. The correct

answer for the case was anti-E plus anti-K. Anti-Fyb accounted for most of the reactions, but it

did not fit the pattern of dosage (strength of reaction depending on the strength of the antigen)

and did not account for two of the reacting cells, (one on the initial Antibody Screen test and

one on the Additional Cells panel). This subject's erroneous conclusion stemmed from

following an incomplete protocol. She did not try to rule out all remaining antibodies, and did

not run an Antigen Typing test as independent evidence leading towards her answer.

Furthermore, her answer did not account for two of the reactions seen, nor the strength of

reactions on the reacting cells. On the matched Post-Test Case, however, this subject correctly

followed a complete protocol, being sure to rule out all remaining antibodies besides the ones

marked as Confirmed, and successfully solved the case.

On Post-Test Case 2, the weak antibody case, the system alerted the subject to the fact

that since some of the reactions were weak, rule-out might not be an appropriate strategy. The

85

subject heeded this warning and enhanced the reactions before proceeding with rule-out. When

looking at the reactions on the Additional Cells, this subject tried to run another test, but the

system warned her that she could have ruled out more antibodies on that panel. Because of this

message, she continued to rule out on the Additional Cells panel that she was looking at, and

was able to finish ruling out all remaining antibodies besides anti-D. Thus, she confirmed anti-

D and continued with the next case. Here, the system aided her by suggesting that she enhance

reactions before ruling out and, once at such an enhanced phase of testing, checking to be sure

that she ruled out all of the antibodies possible. In this way, the system helped her to avoid

running extra tests which were not necessary.

On Post-Test Case 3, this subject solved the case to the point where anti-Fya and anti-E

were the only remaining antibodies. At this point, she confirmed anti-Fya (which accounts for

all of the reactions) and marked anti-E likely. The system reminded her that she had not ruled

out all antibodies besides anti-Fya and warned her that anti-E was confounded with anti-Fya.

This message prompted her to run the cells at Enzymes (a technique that will destroy Duffy

antibodies, including Fya, and enhance Rh antibodies, including anti-E). Thus, she was able to

expose the presence of anti-E and correctly add anti-E to her answer set.

On Post-Test Case 4, this subject proceeded to the point where all antibodies but anti-

Jkb, anti-c, and and-E were ruled out. Again, she marked one of them as Confirmed (anti-Jkb),

but did not confirm or rule-out either of the other two remaining antibodies. The system

warned her that 1) she had not ruled out all remaining antibodies, 2) the confirmed antibody

did not account for many of the reactions exhibited, 3) it is rare to see anti-Jkb as the only

antibody, and 4) antibodies tend to form in a certain order, and that anti-c and anti-E would be

more likely to form before anti-Jkb. Thus, the system used knowledge about prior probabilities

as well as data specific to that case to warn the subject that her answer was implausible. Plus,

86

the system made the general remark that her protocol was incomplete (i.e., that she had not

ruled out all remaining antibodies). In response to these messages, the subject further examined

the case and included anti-E and anti-c in her answer set, thus getting the case right.

Gross Performance Measures

The following sections give the results for the overall misdiagnosis rates, a comparison

of the mistakes and slips made by the two groups, and discusses the results from the

questionnaire.

Statistical Comparison of Misdiagnosis Rates

This section gives the misdiagnosis rates of the two subject populations (expert and less-

skilled practitioners).

Expert Subjects

The group of four experts was tested prior to the group of thirty-two less-skilled

practitioners, primarily as a check to evaluate AIDA for usability and to make sure AIDA did

not create difficulties or induce new errors for skilled technologists. Briefly, two of these four

technologists were tested as the Control Group and two as the Treatment Group. All four

subjects got all of the cases (Pre-Test and Post-Test) correct on the first try. Thus, there is no

evidence from this data to suggest that the system interfered with expert problem-solving

performance. No further analyses were made regarding the expert subjects' performance with

the system, although the experts' responses to the questionnaire will be included in the section

discussing questionnaire results, since a couple of their suggestions merit consideration.

87

Less Skilled Subjects

Of the thirty-two less skilled blood bankers tested in the actual evaluation study, sixteen

were randomly assigned to the Control Group and sixteen to the Treatment Group. In

analyzing the data, it was discovered that on one of the two matched cases, the reactions to one

set of test results that was requested by four of the subjects were incorrect. Thus, the data from

those four subjects was discarded from the analyses that follow.

As would be expected, the results showed that there was no significant difference in

performances on the Pre-Test Case for the Control and Treatment Groups (using Fisher's exact

Test, see Table 2). The misdiagnosis rates were eliminated for the Treatment Group from 4/15

wrong on the Pre-Test Case to 0/15 wrong on the matched Post-Test Case 1, although this

difference failed to reach significance (using McNemar's Chi Square for dependent samples, χ2

= 2.25, p = 0.133, 1 df). The Control Group also failed to show a significant improvement in

performance from the Pre-Test Case to Case 1, as would be expected.

Table 2. Pre-test/Post-test comparison of misdiagnosis rates.

Test Cases Pre-test Case

2 antibodies looking like 1 (randomly chosen from one of two matched cases, the other of which was Case 1)

Case 1 2 antibodies looking like 1 (randomly chosen from one of two matched cases, the other of which was the Pre-Test Case)

Control Group

6/14 wrong 5/14 wrong NS

Treatment Group

4/15 wrong 0/15 wrong NS

88

The between-subject comparisons showed marked differences in performance across

the two groups (see Table 3). On Cases 1, 3, and 4, all subjects in the Treatment Group solved

the cases correctly, while 5/15 of the subjects in the Control Group misdiagnosed Case 1, 6/16

misdiagnosed Case 3, and 10/16 misdiagnosed Case 4. Using Fisher's exact test, each of these

differences is significantly different (p < 0.05). For the case that the system was not designed to

completely handle (Case 2), 8/16 subjects in the Control Group misdiagnosed the case

compared to 3/16 in the Critiquing Group. This improvement in performance is marginally

significant (p = 0.072). Thus, with the design of a critiquing system and checklist, we were able

to eliminate misdiagnoses on cases for which the system was designed (Cases 1 and 3) and on a

case for which the system was not explicitly designed but for which the system's knowledge

was appropriate (Case 4). Finally, misdiagnosis rates were reduced on a case for which the

system's knowledge was not fully competent (Case 2), but not significantly so.

89

Table 3. Post-Test Case results.

Test Cases Case 1

2 antibodies looking like 1 (randomly chosen from one of two matched cases, the other of which was the Pre-Test Case)

Case 2 weak antibody (for which the system was not designed to adequately handle)

Case 3 1 antibody masking another

Case 4 3 antibodies reacting on all cells (a case for which the system was not explicitly designed, sent by another blood bank lab)

Control Group

5/15 (33.3%) wrong

8/16 (50.0%) wrong

6/16 (37.5%) wrong

10/16 (62.5%) wrong

Critiquing Group

0/16 (0.0%) wrong

3/16 (18.75%) wrong

0/16 (0.0%) wrong

0/16 (0.0%) wrong

Signi- ficance

p < 0.05 p = 0.072 p < 0.01 p < 0.001

Besides individual comparisons using Fisher's exact test, a log-linear analysis was run to

take into account the difference in performance on the Pre-Test Case. Both Treatment and

Control Groups were subdivided into whether or not the Pre-Test Case was correct. This

analysis gave very similar results on individual cases and gave a combined significance level

(Weiner, 1971) of p ≤ 0.000005 favoring performance for the Treatment group (see Table 4).

Table 4. Combining p-values given by the Log-Linear analysis of misdiagnosis rates on the

Post-Test Cases, taking into account performance on the Pre-Test Case.

Case p-value -ln(p) 1 0.0234 3.76 2 0.0957 2.35 3 0.0078 4.85 4 0.0002 8.52

Total = 19.47

90

χ2 = 2(19.47) = 38.94 df = 8 p = 0.000005

91

Tables 5 and 6 give a subject by subject breakdown of misdiagnoses per case. These

tables show if a subject got a case right (represented by a 1) or wrong (represented by a 0) or, in

the case of the Treatment Group on the Post-Test Cases, got some feedback from the computer

regarding the plausibility of the answer. In this case, the table may show a series of answers

(such as 0-0-1), the last one indicating the correctness of the final answer given by the subject.

Table 5. Correctness of Answers, Treatment Group.

(0 = wrong, 1 = right. A series of numbers indicates the subject marked an answer more than

once, in response to critiques given by the computer).

Subject Pre-Test

Case Case 1 Case 2 Case 3 Case 4

T1 1 1-1-1-1 1 0-1 0-1 T2 0 1 0-0 1 1 T3 1 1 0-1 1 0-1 T4 1 1 1 1 0-0-0-0-1 T5 0 1 1-1 1 1 T6 1 1 0-0-0-0-0-0-1 0-0-0-1 1 T7 1 1 0-1 1 1 T8 1 1 0-1-1 1 1 T9 invalid data 1 0-0 1 1 T10 1 1 0-0-0 1 1 T11 1 1 1 1 1 T12 1 1 0-1 1 0-1 T13 0 1 1 0-1 0-1 T14 1 1 1 1 1 T15 0 1 1 1 1 T16 1 1 1 1 1

92

Table 6. Correctness of Answers, Control Group.

(0 = wrong, 1 = right).

Subject Pre-Test


C1 0 0 0 0 0 C2 0 1 0 1 0 C3 1 1 1 1 1 C4 0 1 1 0 0 C5 1 invalid data 1 1 0 C6 1 1 1 1 1 C7 0 0 1 1 0 C8 1 1 1 1 1 C9 invalid data 0 0 0 0 C10 1 1 1 1 0 C11 1 1 0 0 1 C12 0 0 0 0 0 C13 0 0 0 0 0 C14 1 1 0 1 1 C15 1 1 1 1 1 C16 invalid data 1 0 1 0

Slips vs. Mistakes

The behavioral data logs were examined for evidence that the errors detected by the

critiquing system were either mistakes (from subjects having either missing or incorrect

knowledge) or slips (from subjects either making unintentional actions or oversights according

to their current goals and knowledge). The identification of slips in the behavioral protocol data

were operationally defined as follows: The user provided evidence of forming the goal relating

to an action but either failed to carry it out (or carry it out completely) or carried it out in a way

that did not correspond to his/her current or known strategy. Any error messages relating to

the plausibility of the answer were considered to be mistakes. Thus, the subjects' data was

93

analyzed to develop a model of what inferences and procedures they knew how to perform

correctly, and that model was used to determine if a particular error was a slip or a mistake.

Using this classification scheme, the number of subjects in each group making a

particular kind of mistake or slip is shown in Table 7 (More detailed tables are shown in

Appendices F and G) . On the Pre-Test Case, a comparable number of subjects in each group

are making mistakes. On the Post-Test Cases, the number of subjects making process errors in

the Treatment Group is consistently less. It is difficult to make comparisons across the Post-Test

Cases, since each case has different characteristics and thus different kinds of process errors are

likely to be made. It is appropriate, however, to compare performance from the Pre-Test Case

to the matched Post-Test Case, to see if fewer subjects in the Treatment Group make process

errors after receiving the training and the checklist. Individual comparisons made for each of

the five error types shows that only Error Type 3 (Failure to Collect Converging Evidence) is

significantly reduced from the Pre-Test Case to the Post-Test Case, (p < 0.01, McNemar's Chi

Square for dependent samples, df = 1). When the p-values are combined across the five Error

Types, however (see Table 8), there is an overall significance level of p < 0.01, showing that the

training and the checklist indeed reduced the number of subjects making process errors from

the Pre-Test Case to the Post-Test Case. In terms of slips made, there does appear to be an

increase in the number of slips made by the Treatment Group on the Post-Test Cases. This

difference is significant for Error Type 2 (Failure to Rule Out When Appropriate), (p < 0.05,

McNemar's Chi Square for Dependent Samples, df = 1). This significant increase may be due to

the fact that the protocol followed by members of the Treatment group was to initially mark low

frequency antibodies as Unlikely. Many subjects in the Treatment Group subsequently failed to

notice that a previously marked Unlikely antibody could be changed to Ruled Out, and this

94

would be detected by the system as an error of Type 2 and classified as a slip, since subjects had

previously demonstrated the intention and ability to rule out all antibodies on a panel.

Table 7. Number of Subjects Committing Each Type of Error at Least Once Per Case

(Mistakes are the first number shown and slips are shown in parentheses)

ERROR TYPE GROUP CASE Pre-Test


1. Rule out Hypothesis Incorrectly

Control Treatment

10 (3) 8 (5)

13 (5) 3 (7)

11 (3) 6 (8)

12 (3) 2 (4)

11 (3) 8 (6)

2. Failure to Rule Out When Appropriate

Control Treatment

7 (2) 7 (2)

10 (4) 5 (10)*

5 (1) 4 (4)

9 (2) 3 (8)

11 (1) 5 (6)

3. Failure to Collect Converging Evidence

Control Treatment

10 (0) 10 (0)

12 (0) 1** (0)

11 (0) 3 (0)

10 (0) 3 (0)

11 (0) 4 (0)

4. Data Implausible Given Answer

Control Treatment

7 (0) 4 (0)

5 (0) 0 (0)

7 (0) 5 (0)

1 (0) 2 (0)

7 (0) 5 (0)

5. Answer Implausible Given Prior Probabilities

Control Treatment

5 (0) 6 (0)

7 (0) 1 (0)

5 (0) 8 (0)

4 (0) 3 (0)

13 (0) 4 (0)

* Significant increase in number of slips made for process error Type 2 for the Treatment Group from the Pre-Test Case to the Matched Post-Test Case (McNemar's χ2 = 4.9, p < 0.05, 1 df) **Significant reduction in process error Type 3 for the Treatment Group from the Pre-Test Case to the Matched Post-Test Case (McNemar's χ2 = 7.11, p < 0.01, 1 df)

95

Table 8. Combining p-values across the five error types.

Error Type p-value -ln(p) 1 0.2285279 1.48 2 0.5051982 0.68 3 0.0076654 4.87 4 0.1336145 2.01 5 0.0736382 2.61 Total = 11.65 χ2 = 2(11.65) = 23.30 df = 10 p < 0.01

Questionnaire Results

The questionnaire was administered to all members of the Treatment Group to get a sense

of how they viewed the software in terms of usability and utility. Two subjects did not fill out

the questionnaire. Responses to the questionnaire by the two expert subjects from the pilot

study are also included in the following section, because their comments were commensurate

with those made by the less-skilled subjects. Four open-ended questions were asked:

1) How would you rate this software in terms of ease-of-use?

In response to this question, nine of the less-skilled subjects wrote that the software was

very easy to use and three said it was easy to use after initial explanation and practice. One

subject mentioned that she did not like the software: "I did not like it but I'm not a computer

person and I tend to want to do things concretely", and one mentioned that she, "had some

trouble with getting on the wrong line on panels". The two expert subjects from the pilot study

96

said, "I found it easy to use, but it seemed to take me longer to do the identification." and the

other said it was "quite easy after you get the hang of it."

2) Would you find this software useful for your job?

When asked, " Would you find this software useful for your job?", all of the less-skilled

subjects either said, "Yes" (8 Subjects), "Absolutely"(1 Subject), "Very Useful" (1 Subject), or

mentioned particular aspects of the software that would be useful such as: "... to show how... to

solve antibody problems" (1 Subject), "...it was useful in telling you when you made the wrong

assumptions" (1 Subject), "to maintain proficiency...", (1 Subject), and "...in teaching new

employees and students..."(1 Subject). The two expert subjects both concurred on this last point,

one saying that it would be , "great for new employees and students.", and the other saying it

would be "useful in teaching students." This subject also mentioned that, " If put into use in the

Blood Bank, all panel results would be saved on computer disc."

3) What did you like most about this software?

In response to this question, five of the less-skilled subjects mentioned ease of use

and/or interface features, such as: "[it was] user friendly", "It was fun to use!", "It made it very

easy to rule out antibodies.", and "The highlighting was very helpful when I started making use

of it." One of the expert subjects said, "I liked highlighting positive reactions and doing the rule

outs and the screen showing what was still left as possible antibodies."

Three of the less-skilled subjects mentioned aspects related to teaching/enforcing a

logical protocol, such as: "It takes you through the antibody identification step by step.", "The

format was very logical.", and "[it was useful] to update my thinking and have a set format to

help rule out and ID."

97

Finally, nine of the less-skilled subjects and one of the expert subjects specifically

mentioned the critiquing/error checking as useful. The less skilled subjects liked, "The ability to

view your answers and the computer interaction allowing you to see what the problems with

your answers are.", "The "beeps" to warn you of a possible mistake were helpful.", "That it alerts

you to things you may have missed.", "The thoroughness of the antibody testing � you can't

complete the panels if you have not exhausted all possibilities.", "The explanations of problems

when you could not go to the next step.", "It finds things that I may have skipped or missed and

then makes good suggestions.", "The capability of letting the tech know about the wrong or

inconsistent results by audible alarm.", and "The error check is great, particularly in multiple

antibodies ID with additional alleles." The expert subject said she liked, "The corrections if you

made an incorrect assumption."

4) What would you suggest to improve this software (including additional functions that you

would like it to perform)?

In response to this question, eight subjects had no suggestions. Two of the less-skilled

subjects and both of the expert subjects mentioned aspects related to the interaction. The less-

skilled subjects said in turn that they would like, "A chance to skip steps if you knew the

answer but the need to confirm that you knew it even though you didn't complete all steps.",

and "To be able to look at a screen (say a panel) and be able to go back out without it telling you

to put in answers." One of the experts said, "I felt I was flipping back and forth from screens.

Especially the weak reaction that was stronger with increased serum/cell and enzyme." The

other expert subject said, "I would like it to give you a chance to correct yourself if you hit the

wrong button when you rule out."

98

Two of the less-skilled subjects mentioned interface issues as being difficult, specifically

that they had difficulty manipulating the hierarchical menu and trackball.

Finally, two of the less-skilled subjects mentioned information needs, such as: "The

ability to view the correct answer at the end of each case." and "Maybe [having] another set of

additional cells for ruling out, otherwise great software!"

Thus, overall, the results from the questionnaire were very positive. All of the subjects

had good comments and almost all of them seemed to welcome the feedback and training

provided by the critiquing aspects of the software. All of the subjects also thought that the

software would be useful for their job and an overwhelming majority mentioned specific

features that they liked, such as the ability to see things better on the screen, the logical format

of the protocol enforced, and the error checking to aid them if they made mistakes or were

stuck.

In terms of suggested improvements, it may be necessary to re-design the menu system

so that no hierarchical menu selections are necessary and to always provide a mouse to subjects

as an alternative to the trackball for input, since these were mentioned as being difficult to use

by a few of the subjects. One subject mentioned navigation issues, finding it difficult to

compare results across screens. This problem merits further consideration, with perhaps a re-

design of the way information is displayed so that values that need to be compared can be

presented in view at the same time in a way that supports the users' goals.

Two of the other suggestions merit consideration and are difficult design challenges.

The subject who wanted , "a chance to skip steps if you knew the answer but the need to

confirm that you knew it even though you didn't complete all steps" was asking for a computer

that was smart enough to know when a person was able to skip some of the steps in a

99

procedure. This is a very difficult design dilemma, and goes against our philosophy of the

design of this critiquing system. As will be argued later, the success of this critiquing system in

reducing misdiagnosis depended relied heavily on its enforcement of the protocol (although

subjects could override any critiques), reminding subjects to consider all of the data that they

had encountered in a case, and to collect sufficient data to confirm an answer.

The subject who wanted, "to be able to look at a screen (say a panel) and be able to go

back out without it telling you to put in answers" was referring to a similar problem, but

perhaps on a smaller scale. This subject wanted the system to refrain from critiquing when new

test results were requested. The design of this system is such that it gives three types of

critiques: 1) It checks for errors of commission as soon as the subject performs an action when

viewing a panel of test results (such as ruling out an antibody) 2) It checks for errors of omission

related to that particular panel (such as failing to rule out an antibody) as soon as the subject

wants to leave a screen to view a different test result and 3) It checks for plausibility of the

answer when the subject says they are done with the case and "overall" errors of omission, i.e., a

failure to complete the full protocol and to collect converging evidence. Thus, one could re-

design the system such that it did not immediately check for errors of omission when the subject

left a panel. However, there is a tradeoff such that a message that is displayed at a later time

may be out of context when the error is finally pointed out. Furthermore, if what was missed

was important to the problem-solving, it may be that in going back to fix the problem the reason

for viewing a new test result may be resolved.

This problem is worth thinking about some more, since an error message at this

juncture may interrupt the users' current thought processes and lead them on an error-recovery

track that makes them forget their original intention of selecting the new test. It might be

possible to push some of the less important checks to the end of the case. For example, the

100

system currently checks that a person has marked whether or not an auto-antibody is present

when the Polyspecific Albumin panel is viewed (since that is where this information is located),

but this information is not usually directly relevant to the rest of the problem-solving, so

perhaps this kind of a check could be done later, thereby reducing the number of times that this

message might interrupt practitioners when it is not currently relevant to them.

The subject who said, "I would like it to give you a chance to correct yourself if you hit

the wrong button when you rule out" gave a good suggestion but also one that would be

difficult to implement. Perhaps a small delay could be introduced, to allow the subject a short

amount of time to undo an action without a critique from the computer. For subjects who are

very fast at ruling out, though, the system may end up giving an error message after they have

ruled out another antibody, and then the message would be out of context. This is a design

change that would need to be tested to see if it would work well or not. Delaying the check for

errors of commission until a user selects another test panel would only heighten the problem of

the system interrupting the users' thought process out of context.

The next suggestion that was made for improving the software was: "The ability to view

the correct answer at the end of each case." The system does not know the answer to any of the

cases, so it would not be able to tell the subjects the answer (unless the answer was programmed

in ahead of time � a possibility in a test situation but not in real lab settings). The final

suggestion made was "Maybe another set of additional cells for ruling out", meaning that the

subject wanted more test cells available. The number of additional cells that we provided,

however, was comparable to the number of additional cells that most labs have to work with

and was certainly sufficient for solving these cases. Thus, this suggestion is not realistic in a lab

setting and would encourage inefficient diagnosis practices.

101

Detailed Analyses

Besides summary statistics and questionnaire results, more detailed analyses of

behavior for the group of less-skilled subjects were conducted from the behavioral protocol logs

that were automatically generated by the computer, to determine if important behaviors with

the system, whether good or bad, could be identified.

Proactive Training vs. Reactive Feedback (Critiquing)

One interesting question to ask is to what extent is the improvement in performance of

the Treatment Group due to the initial training and use of the checklist (proactive training) and

to what extent is it due to the presence of the critiquing system monitoring their performance

(reactive feedback)? Clearly, a large improvement is seen from the Pre-Test Case to the

matched Post-Test Case in terms of outcome errors (which were eliminated) and process errors.

This indicates that the proactive training with the checklist was immediately helpful to subjects

and helped to significantly improve their procedural performance.

It is also interesting to note, however, that subjects in the Critiquing Group did not

always get a case right immediately (see Table 5). Even though they eventually got the right

answer on almost all of the Post-Test Cases, this was not without assistance from the computer.

For example, in 18 instances, subjects indicated that they were done with a case and the

computer detected one or more errors (see Table 5). Fourteen of the errors were concerned with

errors of omission in their procedure (an incomplete protocol), 17 with an inconsistency with the

answer marked given the reactions, and 16 with an implausible answer given prior

probabilities, for a total of 47 errors detected. On 16 out of the 18 cases, the subjects' answers

were wrong and of those 16, 13 subjects subsequently changed their answer to the correct one

because they were prompted by the critiques to re-examine the case (remember that the

102

computer does NOT know any of the answers and is merely checking for particular kinds of

process and intermediate inference errors). Thus, besides having evidence of the benefits of a

proactive training approach, we also have evidence that the presence of the critiquing system,

giving context-specific critiques in response to the current case situation and the person's state

of problem-solving, was beneficial in improving overall performance.

The Timing of the Critiques

In order to examine in more detail the timing of the critiques (i.e., should there be

immediate feedback about errors of omission on the current test panel when the person selects a

new test panel?), an analysis was made of the number of times that a person selected a new test

panel, got an error of omission for the current test panel, and then subsequently failed to re-

select the same new test panel. In order to determine whether the critiquing was interrupting

the person's thought process, each situation was examined to determine if 1) the subsequent

change in selection was due to a slip in selecting the test, 2) whether the new test was no longer

needed after fixing the critique, 3) the change in selection appeared not to matter significantly or

4) it appeared that the critique clearly interrupted the subject's thought process. Further, it was

noted which message was being displayed, either A) the person did not mark the auto control,

R) the person did not do all of the rule-outs possible, or U) the person did not mark all of the

unlikely antibodies as Unlikely.

The most common scenario is that the person did not complete all of the rule-outs

possible on a test panel and, after adhering to the message and finishing the rule-outs, no longer

needed to view the subsequent test (this happened to ten subjects). This suggests that the

critique for the rule-out error of omission is appropriate to display at that time since, in fixing

the errors detected by this message, the need for selecting the subsequent test was eliminated.

103

The Auto-Control message may or may not have interrupted the users' thought processes since,

in two instances subjects changed their subsequent test selection but it was impossible to tell

from the context if that change in selection was important to the overall problem-solving. The

Unlikely message clearly interrupted one subject in two instances. In both those instances, she

selected the Case History menu item, but received a message saying that she had forgotten to

mark all of the unlikely antibodies. After remedying the situation, she then failed to select the

Case History again, and thus missed viewing possibly important information.

These scenarios illustrate that messages that are not directly relevant to a specific panel

should perhaps be displayed at a later time (i.e., when the person has indicated that s/he is

done), to avoid interruption of users' thought processes as much as possible. Alternative

solutions to this problem may also be found, such as reminding the person where they had

intended to go next so that they can decide whether or not to pursue the same course of action

after fixing a problem. In trying to decide when to display a particular critique, one question

that designers must ask themselves is the following: "Will the immediate remediation of the

problem being pointed out possibly preclude the reason for selecting further tests?" If the

answer to that question is yes, then the message should be displayed immediately. If no, then

the message should perhaps be displayed at a later time.

The problem of when to display a critiquing message deserves some higher-level

thought, particularly if critiquing is going to be used as a decision support strategy in a higher-

tempo, increased workload situation. Studies of multi-person teams working together in a high-

stress situation show that a person will judge the "interruptability" of another person before

making a suggestion to a co-worker (Johannesen, Cook and Woods, 1995). Designers of

critiquing system should consider this problem further of how to give critiques in context while

minimizing the extent to which the critiques may be interrupting users' thought processes.

104

Subjects Overriding the Critiques

In 10 out of 249 instances, subjects chose to over-ride a critiquing message. One

message was a rule-out error of omission when leaving the initial Antibody Screen panel (the

subject had ruled out most of the antibodies possible but had missed one). In the other nine

instances, the subject over-rode the computer in response to one of the "end-checkers" (the

messages generated at the end of a case when the subject has confirmed a set of antibodies and

the computer detects errors of omission in their protocol or a problem with the plausibility of

the answer given prior probability information or given data in the case). These end-checkers

fire one at a time in a particular order.

If the subject has not ruled out all remaining clinically significant antibodies, then a

general message is displayed, saying that it is a good idea to complete rule-outs before finishing

a case. One subject overrode this message twice and another subject overrode this message

three times. In all five of these instances, the next "end-checker" message that was generated,

pertaining to the plausibility of the answer, was adhered to by the subject.

In two instances, subjects on the weak D case had marked anti-D plus anti-E as their

answer and received a message from the computer that they should not have (due to a bug in

the program), namely that anti-E as the only Rh antibody in an Rh negative patient is rare. This

message should not have fired since the subject had marked more than one Rh antibody, and

both subjects receiving this message ignored it. This gives some evidence that buggy

knowledge in the system will be detected by subjects and appropriately ignored.

In one instance, a subject had marked anti-E alone as the answer on the weak D case

and correctly received the end-checker message that says that anti-E alone in an Rh negative

patient is rare. This subject over-rode the computer and got the case wrong. In another

105

instance, a subject marked anti-K as the answer on the weak D case and got another "end-

checker" message, namely that Kell antibodies (including anti-K) do not normally react in the

pattern seen on the case. This subject over-rode the computer and got the case wrong. Further

analysis of performance on this case is given in the next section.

Analysis of the Weak D Case

It is worth examining the Weak D case in more detail, because this case is one where the

knowledge in the system is not fully competent for solving the case. Thus, there is the potential

for "brittleness" in the computer's reasoning. The question to consider is if the critiquing system

is still helpful to subjects solving such a case and if so, what are the mechanisms that are

contributing to this improvement? One measure of performance is the misdiagnosis rate for the

two groups. The Control Group had a 50% misdiagnosis rate as compared to the Treatment

Group, who had a 18.75% misdiagnosis rate (p = 0.072). This measure is marginally significant,

suggesting a trend towards improved performance with the critiquing system. Thus, one may

ask what aspects of the critiquing system are contributing to this possible improved

performance.

Three design features in particular had the potential to aid subjects. The first was the

application of some "meta-knowledge" such that the critiquing system is aware that its rule-out

strategy is fallible in the case of weak reactions. The system's "solution" in this case is to warn

the user that ruling out when there are weak reactions is dangerous, and the system suggests

trying to enhance the reactions first. The second possibly helpful design feature is the use of

prior probability information when examining the plausibility of an answer. In particular, one

common misdiagnosis on this case (based on the case characteristics and previous testing of this

case on practitioners) is anti-E, since anti-E accounts for the weak reactions on the initially

106

displayed test reactions. As part of its check for the plausibility of answers, the system "knows"

that anti-E is a rare finding when the patient is Rh negative and anti-D has not been confirmed

(as is the case here). Thus, the system displays the following message: "Anti-E as the only Rh

antibody is uncommon in an Rh negative

person. Normally anti-D would form first. It would be better to double check and ensure that

anti-D is not present by doing an enzyme panel, by increasing the serum:cell ratio or by using

some other enhancement technique. (In addition, if this patient is a pregnant woman, check

to see if she has been administered RhIG.)" Such a message may prompt the subject who marks

this answer to re-consider the answer to the case. Other "prior probability" messages could

potentially be instantiated if the person marks an answer other than anti-E. Furthermore, an

answer besides anti-E will not account for all of the exhibited reactions and would thus cause

the system to warn the user that the answer given does not account for all of the data seen on

the case.

Figures 10 and 11 show the paths taken when solving the Weak D Case for the Control

and Treatment Groups respectively. (One subject's protocol data was lost due to a computer

error and thus it is not clear how he arrived at the correct answer and his path is shown as a

question mark in Figure 11). In comparing these two figures, we see that a comparable group of

subjects in both groups successfully solved the case "on their own" by either waiting to rule out

until the reactions were enhanced (5 subjects in each group) or by enhancing the reactions after

having ruled out D and subsequently confirming D (3 subjects in the Control Group and 2

subjects in the Treatment Group), such that 8 subjects in each group initially solved the case and

8 misdiagnosed it. However, the Treatment Group had the benefit of the critiquing messages

that check the plausibility of a solution. Five of the subjects receiving such a message

subsequently changed their answer to correctly include anti-D as part of their answer. In

107

conclusion, it is unclear if the "warning" at the beginning of the case aided subjects at all, but the

end-of-case error checking (that checks the plausibility of an answer) was clearly beneficial. The

benefit of these end-checkers is evident on other cases as well, causing subjects to correct their

answers in three instances on Case 3 and in five instances on Case 4 (see Table 5).

Weak D Case

Don't rule out Rule out (including D)

Enhance reactions

Confirm D Confirm E Confirm low frequency antibody

Confirm nothing

Enhance reactions Un-rule out D Confirm D

5 11

5

5

3

233

3

Figure 10. Paths taken to solve the Weak D Case, Control Group

108

Antibody Screen "Be careful ruling out because there are weak reactions"

Don't rule out Rule out (including D)

Enhance reactions

Confirm D Confirm KConfirm E Confirm nothing

Enhance reactions Un-rule out D Confirm D

5 10

5

5

2

116

2

"E alone in an Rh negative patient is rare"

"The pattern of reactions is not consistent with anti-K"

Confirm D Confirm D+E Confirm E Confirm K

32

1

?

1

Confirm D

1

Figure 11. Paths taken to solve the Weak D Case, Treatment Group

When to use critiquing vs. some other form of decision support

Critiquing appears to be a very well-matched decision support strategy to the domain

of antibody identification. To what extent can these results be generalized to other domains? In

general, the approach that has been taken in the design of this critiquing system is to examine

109

the constraints on the task and check to be sure that the person's solution to the problem does

not violate any of those constraints (by checking that the solution takes into account all of the

data that is currently available, as well as general domain knowledge that also should be taken

into account when generating a solution). These two techniques are applicable in almost any

domain and should definitely be considered in the design of any decision support system.

There are many other benefits of using the critiquing approach to decision support.

First, practitioners will not lose skill on a task (a typical problem with the introduction of

automation). In fact, users of a critiquing system have the potential to learn from the computer's

critiques. Thus, the same system that is used for on-the-job decision support can also be used

for training or for maintaining job proficiency.

Second, it is likely that users of a critiquing system have the capability to build up a

better mental model of this type of a decision support system than one that generates a solution

for the user, because the system provides feedback in the context of the user's formulation of the

problem, and only when a discrepancy is detected.

Third, there is a growing body of evidence that practitioners are more likely to question

a garden-path result and/or explore more alternatives if doing the task themselves. For

example, results from a study by Layton, Smith, and McCoy (1994) showed that much of what

triggers practitioners to apply their expertise at the appropriate time when solving a difficult

problem is data-driven. Critiquing systems can support users as they solve a problem by

providing context-specific feedback. More often than not, tasks are such that a problem-solver

is confronted with too much information, or information that is not "pre-packaged", so that

much seemingly disparate information has to be accounted for when generating a solution.

Once a solution has been generated, a typical "mistake" made by a problem-solver is that

inadequate aspects of the plan or solution are overlooked (either because of time constraints, a

110

failure to understand how the solution has violated some of the constraints, or biases/memory

distortions, such as biased assimilation). A critiquing approach to decision support is such that

the person generates a solution or plan, and the computer system analyzes the plan based on

pre-defined constraints.

Thus, in any domain where a computer system can take into account case-specific

parameters when analyzing the plan and give context-sensitive feedback, a critiquing system is

a viable form of decision support. Finally, besides considering critiquing as the final and only

form of decision support, one could also use critiquing as a stepping stone to automation or in

conjunction with automation. For example, a computer system which generates alternative

solutions to a problem could also use its knowledge to critique any additional plans generated

by a human.

111

Chapter VII

Conclusion

This study focused on how to design a cooperative decision support system. By

cooperative, it is meant that the human problem-solver and computerized support system

working in partnership interactively have better performance than either one working alone.

Thus, if the design is effective, the computer should be able to detect and correct errors made by

the human and the human should be able to detect and correct errors made by the computer,

and the design of the support system should help to enhance the user's performance by helping

to trigger or stimulate the application of relevant expertise. This contrasts with the automation

model of decision support, which tries to reduce human error by replacing the fallible human

with an automated decision aid. This latter philosophy breaks down if the automated decision

aid is also fallible. The form of cooperative support studied here was to develop a

representation (in the form of a checklist) to provide guidance in the form of a high level goal

structure, and to have the computer act in a critiquing role, monitoring the human's problem-

solving process for faulty reasoning steps.

The study presented here builds upon previous work examining the effectiveness of

critiquing as a form of decision support. In particular, one previous study provided initial

empirical evidence that critiquing is an effective approach to support cooperative problem-

solving even on a case where the computerized decision aid is fallible, whereas an automated

system was shown in that study to significantly worsen performance on this case by 29%. In the

present study, the Treatment Group (supported by the written checklist and a more complete

112

critiquing system than the one tested previously) correctly identified this same case 32% more

often than the Control Group, even though the critiquing system was still not fully competent

on that case. These results provide further evidence that a critiquing system does not make

performance any worse than a person working alone when the computer's reasoning is faulty.

Furthermore, on cases where the computer's reasoning was fully competent, misdiagnosis rates

were eliminated for subjects using the critiquing/checklist system, whereas subjects with no

decision support were misdiagnosing cases 33% to 63% of the time.

Critiquing, although not explored to date by very many researchers as a form of

decision support, seems to be a viable solution for greatly improving performance on certain

kinds of tasks, including the important, real world medical diagnosis task of antibody

identification. Clearly, this is a task that medical technologists find difficult, since many of them

are getting moderately difficult, yet realistic, patient cases consistently wrong when unassisted.

A well-designed critiquing/checklist system has proven to be a method for virtually eliminating

the errors that it was designed to catch, and for aiding on cases for which its knowledge is

incomplete.

A systems approach was taken in the design of this decision support system. This

systems approach led us to design a computer system that revolved around the application of a

complete protocol, using a number of complementary problem-solving strategies to

independently converge on an answer. The critiquing model of interaction was employed so

that the human practitioners could stay involved in the task, apply their own expertise, learn

from the computer, and judge the computer's feedback in a context-sensitive manner. There

was evidence that the critiquing system aided subjects by catching slips and mistakes and

helping users to recover from these errors, employing five different types of error checking

mechanisms, (checking for errors of commission, checking for errors of omission, checking for

113

an incomplete protocol, checking that the data was consistent with the answer, and checking

that the answer was plausible given prior probabilities). The use of a checklist was beneficial in

quickly training subjects on the high-level goal structure implicit in the computer's knowledge

base, and served as a reminder to subjects of the steps necessary to successfully solve a case.

Finally, the success of the system's interaction with the user relied on its unobtrusive interface

that allowed subjects to naturally solve antibody identification cases as they normally would

using paper and pencil, while providing the computer with a rich set of data regarding the

characteristics of the case and the user's problem-solving steps without requiring the

practitioner to enter information that was outside of normal task requirements.

Subjects in this study reported that the system was easy and fun to use, and that it

would be helpful on the job for identifying antibodies. Subjects also noted that the same system

could be used for maintaining proficiency and/or training new hires on the procedures for

solving antibody identification cases. Thus, a correctly designed critiquing system can not only

immediately improve overall performance by catching slips and mistakes in a more cooperative

and less obtrusive manner than many automated systems, but it also has the potential to

transfer much of its knowledge and strategies to the person by the nature of the interactions.

One issue that remains to be explored is the extent to which the results from this study

can be generalized to other domains. Prototype critiquing systems have been developed for

medical diagnosis tasks, design tasks, and training tasks, but were not extensively tested in

realistic domain settings. Important characteristics to consider when trying to decide if

critiquing is a viable alternative is the extent to which the computer would be able to

unobtrusively gather data about the task situation and the person's reasoning process, so as to

have the necessary information to generate timely and appropriate critiques.

114

Several guiding design concepts were successful in the design of this critiquing system,

and may be generalized to other domains and other types of decision support:

1) Design the interface so the human naturally uses the computer system as an integrated

information-based tool, while at the same time providing the computer with adequate

information about the task situation and the human's reasoning process.

2) Develop a knowledge base that checks for context-specific errors in reasoning including:

a) errors of commission,

b) errors of omission,

c) a solution that is not based on converging evidence,

d) a solution that is not consistent with all of the available data, and

e) a solution that is not plausible given prior probabilities.

3) Develop a representation (such as a written checklist) that outlines the higher-level goal

structure expected by the computer system. This can have several benefits:

a) the checklist serves as an external memory aid, reminding users of certain types of

knowledge related to the task, such as factual and procedural information,

b) the checklist provides the user with an appropriate frame of reference for

interpreting any feedback given by the computer. In other words, use of the

checklist is a way for designers to ensure that both the computer system and the

practitioners using it have a common frame of reference for communication and

understanding, and

c) the checklist provides an alternative form of aid to practitioners in situations for

which the critiquing system is not helpful. For example, if a practitioner gets stuck

during a case, s/he can review the checklist to see if there are any other tests or

115

knowledge that may be applicable, since the checklist lists the goals that should be

completed before finishing a case.

This study goes well beyond previous analyses of critiquing systems for a number of

reasons:

1) A complete, usable system with a direct manipulation interface was developed, which

allowed practitioners to work normally and provided detailed information to the critiquing

system about the task context and the user's problem-solving behavior.

2) The critiquing system contains error-checking knowledge that is based on a detailed

cognitive task analysis of the strategies and kinds of errors made by practitioners.

3) The system was tested with certified practitioners on realistic, difficult patient cases. Both

highly experienced and less experienced subject populations were studied and it was shown

that the critiquing system did not interfere with the former group and significantly helped

the latter group.

4) A behavioral protocol analysis showed how and why the critiquing system was helpful,

pointing to error checking mechanisms that were particularly beneficial (such as the "end-

checkers" that check the answer for plausibility and check for errors in the person's overall

protocol that led to the answer). This protocol analysis also led to the conclusion that both

the initial training with the checklist and the presence of the critiquing system contributed

to improvements in performance.

5) The protocol analysis uncovered areas for further study, such as the issue of when to

display a critiquing message so that it is in context but not interrupting the user's thought

processes.

6) The experimental design allowed us to make both within-subjects and between subjects

comparisons of performance, showing that the critiquing system eliminated misdiagnosis

116

rates on three difficult cases and also reduced misdiagnosis rates on a fourth case for which

the system's knowledge base was not fully competent.

7) We had evidence from the questionnaire that subjects enjoyed using the system and would

find it useful on the job.

Finally, it should be noted that the data collection technique used for this study was

quite successful for examining behaviors and strategies employed by subjects. The computer

automatically generated a time-stamped behavioral protocol log, based on the actions

performed when using the computer. The computer also generated an error report, logging all

of the types of errors made by each subject on each case. These data logs were relatively easy to

analyze for evidence of different kinds of problem-solving behaviors and could be re-analyzed

as new important characteristics were identified. Furthermore, these computer-generated data

logs and error reports will allow for long-term and remote data collection for future studies

when the system is introduced into widespread use.

117

List of References

Aikins, J., Kunz, J., & Shortliffe, E. (1983). PUFF: An expert system for interpretation of pulmonary function data. Computers and Biomedical Research, 16, 199-208.

Bernard, J. A. (1989). Applications of artificial intelligence to reactor and plant control. Nuclear Engineering and Design, 113, 219-227.

Berner, E., Webster, G., Shugerman, A., Jackson, J., Algina, J., Baker, A., Ball, E., Cobbs, C., Dennis, V., Frenkel, E., Hudson, L., Mancall, E., Rackley, C., & Taunton, O. (1994). Performance of four computer-based diagnositic systems. The New England Journal of Medicine, 330(25), 1792-1296.

Bernelot Moens, H. J. (1992). Validation of the AI/RHEUM knowledge base with data from consecutive rheumatological outpatients. Methods of Information in Medicine, 31, 175-181.

Billings, C. E. (1991). Human-Centered Aircraft Automation Philosophy: Concepts and Guidelines. NASA Ames Research Center, NASA Technical Memorandum 103885.

Brannigan, V. M. (1991). Software quality regulation under the Safe Medical Devices Act of 1990: Hospitals are now the canaries in the software mine. Proceedings of the Fifteenth Annual Symposium on Computer Applications in Medical Care, American Medical Informatics Association, November 17-20, Washington D.C. p. 238-242.

Chandrasekaran, B. (19xx). Generic tasks as building blocks for knowledge-based systems: The diagnosis and routine design examples. The Knowledge Engineering Review.

Console, L., Conto, R., Molino, G., Ripa di Meana, V., & Torasso, P. (1991). CAP: A critiquing expert system for medical education. In M. Stefanelli, A. Hasman, M. Fieschi, & J. Talmon (Ed.), AIME 91 Proceedings of the Third Conference on Artificial Intelligence in Medicine, 44 (pp. 317-327). Maastricht: Springer-Verlag.

Eberhardt, K. R., and Fligner, M. A. (1977). A comparison of two tests for equality of two proportions. American Statistician, 31(4), 151-155.

Fischer, G., Lemke, A., Mastaglio, T., and Morch, A. (1991). Critics: an emerging approach to knowledge-based human-computer interaction, International Journal of Man-Machine Studies, 35, 695-721.

Fischer, G., Lemke, A., & Mastaglio, T. (1990). Using critics to empower users. In CHI '90 Human Factors in Computing Systems Conference Proceedings (pp. 337-347). New York: Association for Computing Machinery.

118

François, P., Robert, C., Astruc, J., Begue, P., Borderon, J., Floret, D., Lagardere, B., Mallet, E., Pautard, J., & Demongeot, J. (1993). Comparative study of human expertise and an expert system: Application to the diagnosis of child's meningitis. Computers and Biomedical Research, 26, 383-392.

Fraser, J. M., Strohm, P., Smith, J. W. J., Galdes, D., Svirbely, J. R., Rudmann, S., Miller, T. E., Blazina, J., Kennedy, M., & Smith, P. J. (1989). Errors in abductive reasoning. Proceedings of the 1989 IEEE International Conference on Systems, Man, and Cybernetics, 1136-1141.

Fraser, J. M., Smith, P. J., & Smith, J. W. (1992). A catalog of errors. International Journal of Man-Machine Systems, 37, 265-307.

Galdes, D. (1990). An Empirical Study of Human Tutors: The Implications for Intelligent Tutoring Systems. Ph.D. Dissertation, The Ohio State University.

Gamerman, G. E. (1992). FDA regulation of biomedical software. Proceedings of the Sixteenth Annual Symposium on Computer Applications in Medical Care, American Medical Informatics Association, p. 745-749.

Giboin, A. (1988). The process of intention communication in advisory interaction. IFAC Man-Machine Systems, 365-370.

Gregory, D. (1986). Delimiting expert systems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-16(6), 834-843.

Guerlain, S., Smith, P. J., Miller, T., Gross, S., Smith, J., & Rudmann, S. (1991). A testbed for teaching problem-solving skills in an interactive learning environment. Proceedings of the Human Factors Society, 1408.

Guerlain, S. (1993a) Designing and Evaluating Computer Tools to Assist Blood Bankers in Identifying Antibodies. Master's Thesis, The Ohio State University.

Guerlain, S. (1993b). Factors influencing the cooperative problem-solving of people and computers. In Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, 1 (pp. 387-391). Seattle, WA:

Guerlain, S., Smith, P. J., Gross, S. M., Miller, T. E., Smith, J. W., Svirbely, J. R., Rudmann, S., & Strohm, P. (1994). Critiquing vs. partial automation: How the role of the computer affects human-computer cooperative problem solving. In M. Mouloua & R. Parasuraman (Eds.), Human Performance in Automated Systems: Current Research and Trends (pp. 73-80). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Harris, S. D., & Owens, J. M. (1986). Some critical factors that limit the effectiveness of machine intelligence technology in military systems applications. Journal of Computer-Based Instruction, 13(2), 30-34.

119

Hickam, D., Shortliffe, E., Bischoff, M., Scott, A., & Jacobs, C. (1985). The treatment advice of a computer-based cancer chemotherapy protocol advisor. Annals of Internal Medicine, 103(6 pt 1), 928-936.

Jones, P. and Mitchell, C. (1995). Human-computer cooperative problem solving: Theory, design, and evaluation of an intelligent associate system, IEEE Transactions on Systems, Man, and Cybernetics, 25(7), 1039-1053.

Johannesen, L., Cook, R., & Woods, D. (1995). Grounding Explanations in Evolving, Diagnostic Situations (Center for Cognitive Science Technical Report No. 14). Columbus, OH: The Ohio State University.

Josephson, J. and Josephson, S. (1994). Abductive Inference. Computation, Philosophy, Technology, Cambridge University Press.

Langlotz, C. P., & Shortliffe, E. H. (1983). Adapting a consultation system to critique user plans. International Journal of Man-Machine Studies, 19, 479-496.

Layton, C., Smith, P. J., & McCoy, E. (1994). Design of a cooperative problem-solving system for enroute flight planning: An empirical evaluation. Human Factors, 36(1), 94-119.

Lehner, P. E., & Zirk, D. A. (1987). Cognitive factors in user/expert-system interaction. Human Factors, 29(1), 97-109.

Lepage, E. F., Gardner, R. M., Laub, R. M., & Golubjatnikov, O. K. (1992). Improving blood transfusion practice: Role of a computerized hospital information system. Transfusion, 32, 253-259.

Linnarsson, R. (1993). Decision support for drug prescription integrated with computer-based patient records in primary care. Medical Informatics, 18(2), 131-142.

Malin, J., Schreckenghost, D., Woods, D., Potter, S., Johannesen, L., Holloway, M., and Forbus, K. (1991). Making Intelligent Systems Team Players: Case Studies and Design Issues. Volume 1: Human-Computer Interaction Design, NASA Technical Memorandum 104738, Houston, TX: NASA Johnson Space Center.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 91-97.

Miller, P. (1986). Expert Critiquing Systems: Practice-Based Medical Consultation by Computer. New York: Springer-Verlag.

Miller, R., & Maserie, F. (1990). The demise of the "greek oracle" model for medical diagnostic systems. Methods of Information in Medicine, 29, 1-2.

Miller, R. A. (1984). INTERNIST-/CADUCEUS: Problems facing expert consultant programs. Meth. Inform. Med., 23, 9-14.

120

Miller, R. A., Pople, J., H. E., & Myers, J. D. (1982). INTERNIST-1, An experimental computer-based diagnostic consultant for general internal medicine. , 307, 468-476.

Muir, B. (1987). Trust between humans and machines, and the design of decision aids. International Journal of Man-Machine Studies, 27, 527-539.

Nelson, S. J., Blois, M. S., Tuttle, M. S., Erlbaum, M., Harrison, P., Kim, H., Winkelmann, B., & Yamashita, D. (1985). Evaluating RECONSIDER, a computer program for diagnostic prompting. Journal of Medical Systems, 9(5/6), 379-388.

Newell, A., & Simon, H. A. (1972). Human Problem Solving. Englewood Cliffs, N. J.: Prentice-Hall Inc.

Norman, D. (1981). Categorization of action slips. Psychological Review, 88, 1-15.

Norman, D. (1988). The Psychology of Everyday Things, Basic Books, Inc., Publishers New York.

Norman, D. A. (1990). The 'problem' with automation: Inappropriate feedback and interaction, not 'overautomation'. In D. E. Broadbent, A. Baddeley, & J. J. Reason (Eds.), Human Factors in Hazardous Situations (pp. 569-576). Oxford, England: Clarendon Press.

Parasuraman, R., Molloy, R., & Singh, I. (1993). Performance consequences of automation-induced "complacency". International Journal of Aviation Psychology, 3(1), 1-23.

Parasuraman, R., Mouloua, M., & Molloy, R. (1994). Monitoring automation failures in human-machine systems. In M. Mouloua & R. Parasuraman (Eds.), Human Performance in Automated Systems: Current Research and Trends (pp. 45-49). Hillsdale, NJ: Lawrence Erlbaum Associates.

Pea, R. (1985). Beyond amplification: using the computer to reorganize mental functioning. Educational Psychologist, 4, 167-182,

Plugge, L., Verhey, F., & Jolles, J. (1990). A desktop expert system for the differential diagnosis of dementia. International Journal of Technology Assessment in Health Care, 6, 147-156.

Pryor, T. A. (1994). Development of decision support systems. In M. Shabot and R. Gardner (Eds.) Decision Support Systems in Critical Care (pp. 61-73). New York: Springer-Verlag.

Pryor, T. A. (1983). The HELP system. Journal of Medical Systems, 7, 87-101.

Rasmussen, J. (1983). Skills, rules, and knowledge: Signals, signs, and symbols, and other distinctions in human performance models. IEEE Transactions on Systems, Man and Cybernetics, SMA-13(3), 257-266.

Roth, E., Bennett, K., and Woods, D. (1988). Human interaction with an "intelligent" machine, Cognitive Engineering in Complex Worlds, (pp. 23-69), London: Academic Press.

121

Rothschile, M. A., Swett, H. A., Fisher, P. R., Weltin, G. G., & Miller, P. L. (1990). Exploring subjective vs. objective issues in the validation of computer-based critiquing advice. Computer Methods and Programs in Biomedicine, 31, 11-18.

Rudmann, S., Miller, T., Smith, P.J., and Smith, J. W. (in press). Problem-based education for immunohematologists. Clinical Laboratory Science.

Sarter, N. B., & Woods, D. D. (1994). Decomposing automation: Autonomy, authority, observability and perceived animacy. In M. Mouloua & R. Parasuraman (Eds.), Human Performance in Automated Systems: Current Research and Trends (pp. 22-27). Hillsdale, NJ: Lawrence Erlbaum Associates.

Sassen, A., Buiël, E., & Hoegee, J. (1994). A laboratory evaluation of a human operator support system. International Journal of Human-Computer Studies, 40, 895-931.

Schewe, S., Scherrmann, W., & Gierl, L. (1988). Evaluation and measuring of benefit of an expert system for differential diagnosis in rheumatology. Expert Systems and Decision Support in Medicine, 351-354.

Serfaty, D. and Entin, E. (1995). Shared mental models and adaptive team coordination. Proceedings of the First International Symposium on Command and Control Research and Technology, National Defense University: Washington, D.C..

Shamsolmaali, A., Collinson, P., Gray, T., Carson, E., & Cramp, D. (1989). Implementation and evaluation of a knowledge-based system for the interpretation of laboratory data. In AIME '89, (pp. 167-176).

Shortliffe, E. H. (1976). Computer-Based Medical Consultations: MYCIN. New York: Elsevier.

Shortliffe, E. H., Scott, A. C., Bischoff, M., Campbell, A. B., van Melle, W. and Jacobs, C. (1981). ONCOCIN: An expert system for oncology protocol management. In Proceedings of the seventh International Joint Conference on Artificial Intelligence, Vancouver, British Columbia, pp. 815-822.

Shortliffe, E. (1990). Clinical decision-support systems. In E. Shortliffe & L. Perreault (Eds.), Medical Informatics. Computer Applications in Health Care (pp. 466-500). New York: Addison-Wesley Publishing Company.

Silverman, B. (1992a). Survey of expert critiquing systems: Practical and theoretical frontiers., Communications of the ACM, 35(4), 107-127.

Silverman, B. G. (1992b). Building a better critic. Recent empirical results. IEEE Expert, April, 18-25.

Silverman, B. G. (1992c). Modeling and critiquing the confirmation bias in human reasoning, IEEE Transactions on Systems, Man, and Cybernetics, 22(5), 972-982

Simon, H. (1969). Sciences of the Artificial. Cambridge, MA: MIT Press.

122

Smith, P. J., Miller, T. E., Fraser, J., Smith, J., Svirbely, J., Rudmann, S., & Strohm, P. (1990). An intelligent tutoring system for antibody identification. Proceedings of the 14th Annual Symposium on Computer Applications in Medical Care. 1032-1033.

Smith, P.J., Miller, T.E., Gross, S., Guerlain, S., Smith, J., Svirbely, J., Galdes, D., Rudmann, S., & Strohm, P. (1991). The transfusion medicine tutor: Methods and results from the development of an interactive learning environment for teaching problem-solving skills. Proceedings of the 35th Annual Meeting of the Human Factors Society, 1166-1168.

Smith, P. J., Miller, T. E., Fraser, J., Smith, J. W., Svirbely, J. R., Rudmann, S., Strohm, P. L., & Kennedy, M. (1991). An empirical evaluation of the performance of antibody identification tasks. Transfusion, 31, 313-317.

Smith, P. J., Galdes, D., Fraser, J., Miller, T., Smith, J. W., Svirbely, J. R., Blazina, J., Kennedy, M., Rudmann, S., & Thomas, D. L. (1991b). Coping with the complexities of multiple-solution problems: A case study. International Journal of Man-Machine Studies, 35, 429-453.

Smith, P. J., Miller, T., Gross, S., Guerlain, S., Smith, J., Svirbely, J., Rudmann, S., & Strohm, P. (1992). The transfusion medicine tutor: A case study in the design of an intelligent tutoring system. In Proceedings of the 1992 Annual Meeting of the IEEE Society of Systems, Man, and Cybernetics, (pp. 515-520).

Strohm, P., Smith, P. J., Fraser, J., Smith, J. W., Rudmann, S., Miller, T., & Kennedy, M. (1991). Errors in antibody identification. Immunohematology, 7, 20-20.

Sutton, G. C. (1989). How accurate is computer-aided diagnosis? The Lancet(October 14, 1989), 905-908.

Tufte, E. R. (1990). Envisioning Information, Graphics Press, Cheshire, CT.

Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79, 281-299.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 11124-1131.

van der Lei, J., Musen, M. A., van der Does, E., Man in 't Veld, A. J., & van Bemmel, J. H. (1991). Comparison of computer-aided and human review of general practitioners' management of hypertension. The Lancet, 338(Dec. 14, 1991), 1504-1508.

van der Lei, J., Westerman, R. F., & Boon, W. M. (1989). Evaluating expert critiques. Issues for the development of computer-based critiquing in primary care. In J. Talmon & J. Fox (Eds.), Lecture Notes in Medical Informatics, Proceedings of the Workshop "System Engineering in Medicine" Maastricht, March 16-18, 1989 (pp. 117-128). Springer-Verlag.

Verdaguer, A., Patak, A., Sancho, J., Sierra, C., & Sanz, F. (1992). Validation of the medical expert system PNEUMON-IA. Computers and Biomedical Research, 25, 511-526.

123

Weiner, B. J. (1971). Statistical Principles in Experimental Design, Second Edition. NY: McGraw-Hill.

Wellwood, J., Johannessen, S., & Spiegelhalter, D. J. (1992). How does computer-aided diagnosis improve the management of acute abdominal pain? Annals of the Royal College of Surgeons of England, 74, 40-46.

Wickens, C. D. (1984). Engineering Psychology and Human Performance. Columbus, Ohio: Charles Merrill.

Wiener, E. L. (1989). Human Factors of Advanced Technology ("Glass Cockpit") Transport Aircraft (Technical Report No. 117528). NASA Ames Research Center.

Woods, D. D., (1984). Visual momentum: a concept to improve the cognitive coupling of person and computer. International Journal of Man-Machine Studies. 21, 229-244.

Woods, D. D. (1991). The cognitive engineering of problem representations, in Human-Computer Interaction and Complex Systems, edited by George S. Weir and James L. Alty, Academic Press Limited, London, 169-188.

Woods, D. D. (1992). Cognitive activities and aiding strategies in dynamic fault management. Cognitive Systems Engineering Laboratory Technical Report CSEL 92-TR-05, The Ohio State University.

Woods, D. D., Johannesen, L. J., Cook, R. I., & Sarter, N. B. (1994). Behind Human Error: Cognitive Systems, Computers, and Hindsight. Dayton, OH: CSERIAC.

Woods, D. D. (1994). Automation: Apparent simplicity, real complexity. In M. Mouloua & R. Parasuraman (Eds.), Human Performance in Automated Systems: Current Research and Trends (pp. 1-7). Hillsdale, NJ: Lawrence Erlbaum Associates.

Zachary, W. (1986). A cognitively based functional taxonomy of decision support techniques. In Human-Computer Interaction (pp. 25-63). Hillsdale, New Jersey: Lawrence Erlbaum Associates, Inc.

Zhang, J., & Norman, D. (1994). Representations in distributed cognitive tasks. , 18, 87-122.

124

Appendix A. Sample Answer Sheet

Name: __________________________

Answer Sheet

Your answer:

ABO ________

Rh ________

Alloantibodies ___________

Certainty about alloantibodies (check one):

____Unsure _____Fairly Sure _____Certain

Comments: ____________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________ _____________________________________________________

125

Appendix B. Sample Statistical Calculations

When deciding what statistical test to run on the misdiagnosis rates recorded for this

experiment, an ideal test would be one that is non-parametric (i.e., for nominal data), takes into

account the repeated measures aspect of the data (since the same subject solved five cases) and

also takes into account performance on the Pre-Test Case (to account for any differences

between the two groups of subjects before the treatment was introduced). However, such a

statistical test does not exist. Fisher's exact test is a good statistic for testing the difference

between two groups for nominal data, particularly when the expected cell frequencies are small

(when a Chi Square test is not valid). Thus, this test was run to test the difference in

performance between the two groups on each of the individual cases. However, this test does

not take into account the repeated measures aspect of our experimental design, nor does it

account for performance on the Pre-Test. A log-linear analysis is a statistic that can be run to

take into account performance on a Pre-Test case for nominal data. Thus, this statistic was also

run on the data and the p-values for each of the individual cases were combined to get an

overall significance across cases. A final test that could be run is a Repeated Measures Analysis

of Covariance, which will take into account performance on the Pre-Test Case, takes into

account the repeated measures aspect of the design, but assumes interval (not nominal) data.

The results from this test will be reported as well to give a comparison.

In order to test for a difference in performance for nominal data in a within-subjects

manner, (i.e., when going from the Pre-Test Case to the Post-Test Case for the Treatment Group)

a McNemar's Chi Square test must be used. McNemar's Chi Square Assumptions. Nominal data, dependent samples. Method.

Post

Incorrect Correct Pre Incorrect a b Correct c d χ2 = (|b-c|-1)2

126

b+c

127

Example 1. Difference in misdiagnosis rates from Pre-Test Case to Post-Test Case for the Control Group.

Post

Incorrect Correct Pre Incorrect 4 2 Correct 0 7 χ2 = (|2-0|-1)2 = (2-1)2 = 1 = .50 -> accept Ho. (requires 3.84 to be significant at α = 0.05) 2+0 2 2 Example 2. Difference in misdiagnosis rates from Pre-Test Case to Post-Test Case for the Treatment Group.

Post

Incorrect Correct Pre Incorrect 0 4 Correct 0 11 χ2 = (|4-0|-1)2 = (4-1)2 = 9 = 2.25 -> accept Ho. (requires 3.84 to be significant at α = 0.05) 4+0 4 4 Example 3. Difference in number of Treatment subjects who made process errors from the Pre-Test Case to Post-Test Case. Error Type 1: Ruling Out Hypotheses Incorrectly

Post

Incorrect Correct Pre Incorrect 0 8 Correct 3 3 χ2 = (|8-3|-1)2 = (5-1)2 = 16 = 1.45 -> accept Ho. (requires 3.84 to be significant at α = 0.05) 8+3 11 11 Error Type 2: Failing to Rule Out Hypotheses When Appropriate

Post

Incorrect Correct Pre Incorrect 1 6 Correct 3 4

128

χ2 = (|6-3|-1)2 = (3-1)2 = 4 = .44 -> accept Ho. (requires 3.84 to be significant at α = 0.05) 6+3 9 9 Error Type 3: Failing to Collect Converging Evidence

Post

Incorrect Correct Pre Incorrect 1 9 Correct 0 4 χ2 = (|9-0|-1)2 = (9-1)2 = 64 = 7.11 -> reject Ho. (requires 6.64 to be significant at α = 0.01) 9+0 9 9 Error Type 4: Implausible Answer Given Data

Post

Incorrect Correct Pre Incorrect 0 4 Correct 0 10 χ2 = (|4-0|-1)2 = (4-1)2 = 9 = 2.25 -> accept Ho. (requires 3.84 to be significant at α = 0.05) 4+0 4 4 Error Type 5: Implausible Answer Given Prior Probabilities

Post

Incorrect Correct Pre Incorrect 1 5 Correct 0 8 χ2 = (|5-0|-1)2 = (5-1)2 = 16 = 3.20 -> accept Ho. (requires 3.84 to be significant at α = 0.05) 5+0 5 4 Since many of these differences are "close" to significance, combine the p-values across the five kinds of process errors to obtain an overall significance level by using the following method:

Error Type

p-value -ln(p)

1 0.2285279 1.48 2 0.5051982 0.68 3 0.0076654 4.87 4 0.1336145 2.01 5 0.0736382 2.61

Total = 11.65

129

χ2 = 2(11.65) = 23.30 df = 10 p < 0.01

Thus, over all types of errors, there is a significant improvement from the Pre-Test Case to the matched Post-Test Case for the Treatment Group. Chi Square Assumptions. Nominal data. Most expected cell frequencies must be ≥ 5. The cell entries must be independent of each other. Method. χ2 = Σ ((fo-fe)2/fe) where, Observed frequencies (fo) = : Correct Incorrect Totals Control Group a b a+b Treatment Group

c d c+d

Totals a+c b+d N=a+b+c+d Expected cell frequencies (fe) are calculated by multiplying the row total*column total and dividing by N = : Correct Incorrect Control Group (a+b)*(a+c)

N (a+b)*(b+d) N

Treatment Group

(c+d)*(a+c) N

(c+d)*(b+d) N

Example 1: Difference between two groups, Pre-Test Case. Correct Incorrect Totals Control Group 6 8 14 Treatment Group

11 4 15

Totals 17 12 29 Expected Cell Frequencies: Correct Incorrect Control Group 8.21 5.79

130

Treatment Group

8.79 6.21

All expected cell frequencies are ≥ 5, so the Chi Square test is valid. χ2 = Σ (fo-fe)2/fe = 4*(2.2)2/29 = 2.78, Accept Ho; not significant (the critical value for χ2, 1 d.f., α=0.05 is 3.84). Example 2: Difference between two groups, Post-Test Case 1. Correct Incorrect Totals Control Group 10 5 15 Treatment Group

16 0 16

Totals 26 5 31 Expected Cell Frequencies: Correct Incorrect Control Group 12.58 2.42 Treatment Group

13.42 2.57

Most expected cell frequencies are not ≥ 5, so the Chi Square test is not valid. Thus, use Fisher's exact test instead. Fisher's Exact Test Assumptions. Nominal data. The cell entries must be independent of each other. Method. p = (a+b)!*(c+d)!*(a+c)!*(b+d)! a!*b!*c!*d!*N! where, Observed frequencies (fo) = : Correct Incorrect Totals Control Group a b a+b Treatment Group

c d c+d

Totals a+c b+d N=a+b+c+d

131

If one of the expected cell frequencies is 0, then you're done. Otherwise, you must add probabilities from more extreme values as shown below in example 2. Example 1: Difference between two groups, Post-Test Case 1. Correct Incorrect Totals Control Group 10 5 15 Treatment Group

16 0 16

Totals 26 5 31 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 15!*16!*26!*5! p = .018 < 0.05, Reject Ho. a!*b!*c!*d!*N! 10!*5!*16!*0!*31!

132

Example 2: Difference between two groups, Post-Test Case 2. Correct Incorrect Totals Control Group 8 8 16 Treatment Group

13 3 16

Totals 21 11 32 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 16!*16!*21!*11! p = .055859 a!*b!*c!*d!*N! 8!*8!*13!*3!*32! Since none of the observed cell frequencies is zero, we must calculate the p-value for a more extreme ratio, and add that on, until an expected cell frequency of 0 is reached. Reduce cell d (3) by 1 to get a more extreme split. Recalculate the other values in the table to keep the row and column totals the same: Correct Incorrect Totals Control Group 7 9 16 Treatment Group

14 2 16

Totals 21 11 32 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 16!*16!*21!*11! p = .0106 a!*b!*c!*d!*N! 7!*9!*14!*2!*32! Continue to subtract 1 from cell d and recalculate the other values in the table to keep the row and column totals the same: Correct Incorrect Totals Control Group 6 10 16 Treatment Group

15 1 16

Totals 21 11 32 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 16!*16!*21!*11! p = .00099 a!*b!*c!*d!*N! 6!*10!*15!*1!*32! Continue to subtract 1 from cell d and recalculate the other values in the table to keep the row and column totals the same: Correct Incorrect Totals Control Group 5 11 16 Treatment Group

16 0 16

133

Totals 21 11 32 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 16!*16!*21!*11! p = .000034 a!*b!*c!*d!*N! 5!*11!*16!*0!*32! Add all the p-values together. p = .055859 + .016 + .00099 + .000034 = 0.072883 > 0.05, Accept Ho. Example 3: Difference between two groups, Post-Test Case 3. Correct Incorrect Totals Control Group 10 6 16 Treatment Group

16 0 16

Totals 26 6 32 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 16!*16!*26!*6! p = .0088 < 0.05, Reject Ho. a!*b!*c!*d!*N! 10!*6!*16!*0!*32! Example 4: Difference between two groups, Post-Test Case 4. Correct Incorrect Totals Control Group 6 10 16 Treatment Group

16 0 16

Totals 22 10 32 p = (a+b)!*(c+d)!*(a+c)!*(b+d)! = 16!*16!*22!*10! p = .000397 < 0.05, Reject Ho. a!*b!*c!*d!*N! 6!*10!*16!*0!*32! Log-linear analysis Assumptions. Nominal Data, Takes into account Pre-Test Case Performance. Method. Compare performance for each Post-Test Case to the Pre-Test Case for the two groups. Use a statistical package to calculate the p-value. Case 1 - Data Treatment Pre-Test

Case 1 Total

134

Correct Incorrect Control Group Correct 7 0 7 Incorrect 2 4 6 Total 9 4 13 Treatment Group

Correct 11 0 11

Incorrect 4 0 4 Total 15 0 15 Case 1 - Results Source df Component Χ2 p Pre-Test 1 8.81 0.003 Treatment* 1 5.14 0.0234 Interaction 1 0.29 0.5925 Total 3 14.24 *after controlling for PreTest Case 2 - Data Treatment Pre-Test

Case 1 Total


Correct 10 1 11

Incorrect 3 1 4 Total 13 2 15 Case 2 - Results Source df Component Χ2 p PreTest 1 3.38 0.0658 Treatment* 1 2.78 0.0957 Interaction 1 0.04 0.8403 Total 3 6.2 *after controlling for PreTest Case 3 - Data Treatment Pre-Test

Case 3 Total

Correct Incorrect

135

Control Group Correct 6 1 7 Incorrect 2 4 6 Total 8 5 13 Treatment Group

Correct 11 0 11

Incorrect 4 0 4 Total 15 0 15

136

Case 3- Results Source df Component Χ2 p PreTest 1 4.99 0.0255 Treatment* 1 7.09 0.0078 Interaction 1 0.05 0.8316 Total 3 12.12 *after controlling for PreTest Case 4 - Data Treatment Pre-Test

Case 4 Total


Correct 11 0 11

Incorrect 4 0 4 Total 15 0 15 Case 4 - Results Source df Component Χ2 p PreTest 1 10.11 0.0015 Treatment* 1 13.95 0.0002 Interaction 1 0.39 0.5315 Total 3 24.45 *after controlling for PreTest An overall significance level can be obtained across cases by combining the p-values using the following method:

Case p-value -ln(p) 1 0.0234 3.76 2 0.0957 2.35 3 0.0078 4.85 4 0.0002 8.52

Total = 19.47

χ2 = 2(19.47) = 38.94 df = 8 p = 0.000005

137

138

Repeated Measures Analysis of Covariance Assumptions. Interval data, repeated measures, can control for a Pre-Test Case as a covariate. Method. Use a statistical package to enter subjects' performance on all of the cases Results. Four Post-Test Cases using the Pre-Test Case as a Covariate Source df SS MS F p PreTest 1 2.80264 2.80264 14.53 0.0008 Group 1 2.80477 2.80477 14.54 0.0008 Error 25 4.82300 0.19292 Case 3 0.37033 0.12344 1.5 ♠ 0.225 Interaction 3 0.22747 0.07582 0.92 ♠ 0.420 Error 78 6.45138 0.08225 Thus, although no one test is ideal for analyzing the data gathered for this study, use of all of these tests shows similar results, namely that there is a statistically significant improvement in performance on Cases 1, 3, and 4 for the Treatment Group and a marginally significant improvement in performance on Case 2. There is also an extremely large effect over all the cases, favoring the Treatment Group. Finally, performance on the Pre-Test Case is a good indicator of performance on later cases.

139

Appendix C. Definition of Classes of Errors Logged by the Computer

Error # Definition 1. Rule-out errors of

commission

1a. Antigen not present Rule-out of an antibody on a test cell not containing the corresponding antigen.

1b. Reactive cell Rule-out of an antibody on a test cell that was reacting. 1c. Heterozygous Rule-out of an antibody on a test cell that was heterozygous for the

corresponding antigen (where that antibody should only be ruled out on homozygous cells).

1d. No cells to rule out Rule-out (at the top of the panel) of an antibody on a panel where no cells existed for rule-out of that antibody

1e. Special panel error (RT, Cold, Prewarm, Enzyme, Eluate

Rule-out of an antibody whose reactions are weakened in that test condition

1f. Patient lacks antigen Rule-out of an antibody based on antigen typing showing that patient lacks the corresponding antigen.

1g Typing may not be valid Rule-out of an antibody on the antigen typing without first checking that the antigen typing test is valid.

1h. Test not run yet Rule-out of an antibody based on an additional cell that hasn't yet been run.

2. Rule-out errors of omission

2a. Main panels (screen & panels at Polyspecific or IgG)

Failure to complete all possible rule-outs using main panels.

2b. Special panels (RT, Cold, Prewarm, Enzyme)

Failure to complete all possible rule-outs using special panels.

2c. Antigen Typing Failure to complete all possible rule-outs based on antigen typings. 3. Incomplete protocol 3a. Unlikely Abs not

marked Failure to mark unlikely antibodies as "Unlikely".

3b. Rule-outs not completed Failure to rule-out alternative antibodies (antibodies not marked as confirmed or unlikely).

3c. Underlying Abs present Failure to rule out an antibody that is covered by the answer on all test cells.

3d. Antigen typing not done Failure to antigen type for antigen corresponding to antibodies marked as confirmed.

3e. Auto Abs not marked Failure to indicate that there are no autoantibodies present. 4. Data Implausible Given Answer

4a. No confirmed Ab on reacting cells

No answer(s) on a cell that is reacting.

4b. Confirmed Abs on non- reacting cells

A homozygous donor cell is not reacting with the hypothesized antibody

140

4c. Answer has positive antigen typing

Antigen typing positive for antigen corresponding to a confirmed antibody.

5. Answer implausible given prior probabilities

5a. Low frequency Ab (f, V, Cw, Lua, Kpa, Jsa,) in answer set

A low frequency antibody was included in the answer. (This could, of course, have been an unusual, but still correct, answer. The same holds true for 4d - 4j.)

5b. Abs from multiple groups in answer set

Marking an answer from three genetic systems. This would be very unusual for most patients (unless they have had separate exposures).

5c. Hypothesis probability Marking an unlikely combinations of antibodies 5d. Antibody specific Marking an answer that violates the normal pattern of reactions for

that antibody 5e. General check Failure to either rule-out or confirm an antibody that the patient

could have formed (since the patient was negative for that antigen) that was (statistically) more likely to form than the answer marked confirmed.

141

Appendix D. Sample Behavioral Protocol Log

Subject #: E1 Intelligence: On Degree: MT(ASCP) Years Worked:

15

Avg Pos Screens/mo:

<1

00:00:00 Loading Case: XPJR Training Case 2 00:00:00 Select Test: ABO-Rh ABO/Rh 00:00:03 Mark ABO: O 00:00:05 Mark Rh: Pos 00:00:09 Select Test: Antibody

Screen ANTIBODY SCREEN

You could have marked at least one more antibody as Unlikely for this patient.

00:00:18 Select Test: Case History

CASE HISTORY

00:00:25 Select Test: Albumin-Poly AHG

POLYSPECIFIC AHG IS, 37° ALBUMIN, AHG

00:00:34 Hilite: cells 1,2,3,6,7

00:00:43 Mark Likely: K,E 00:00:53 Select Test: Additional

Albumin-Poly AHG

You did not mark auto control before leaving a main panel.

You could have ruled out at least one more antibody using cell #'s 4, 5, 8, 9 and 10 on this panel.


ADDITIONAL CELLS: POLYSPECIFIC AHG IS, 37° ALBUMIN, AHG

00:01:15 Run Test Cells: 1,2,3,7,10 You could have ruled out at least one more antibody on this panel.

142


00:01:45 Select Test: Antigen Typing

ANTIGEN TYPING

00:01:54 Run Test Cells: E,K You did not rule out the corresponding antibody of a positive antigen typing.

00:02:00 Select Test: Direct Antiglobulin Test

DAT

00:02:28 Set Auto Ctrl: Negative 00:02:29 Set Allo Anti: Positive 00:02:38 Select Test: Done with

Case You did not mark an answer using the "Confirmed" button.

Pressed Try Again/Continue Button


DAT

Student Answer: none 00:02:53 Select Test: Albumin-

Poly AHG POLYSPECIFIC AHG IS, 37° ALBUMIN, AHG

00:03:02 Mark Confirmed:

K,E You could have ruled out at least one more antibody using cell #'s 4, 5, 8, 9 and 10 on this panel.


00:03:08 Select Test: Done with Case

You did not rule out all of the antibodies in this case before indicating you were done with the case.


C, Fyb and Jkb are confounded homozygously by your confirmed antigens

The antigen typing for c hasn't been checked. Since the patient would be more likely to form anti-c before K, be very careful.

Student Answer: E and K Correct Answer: E and K

143

00:00:00 Loading Case: XSVR Evaluation Case 1 00:00:00 Select Test: ABO-Rh ABO/Rh 00:00:03 Mark ABO: O 00:00:05 Mark Rh: Pos 00:00:08 Select Test: Antibody


00:00:14 Mark Unlikely: f,V,Cw,Lua,Kpa,Jsa


CASE HISTORY



00:00:35 Hilite: cells 1,2,3,4,5,7,8,9

00:00:47 Rule Out on cell 6:

Cw


Lua


D,C,e,N,s,P1,Leb,Lub,k,Jka,Xga

00:01:27 Select Test: Additional Albumin-Poly AHG

You did not mark auto control before leaving a main panel.


00:01:36 Set Auto Ctrl: Negative 00:01:50 Select Test: Additional

Albumin-Poly AHG


00:02:02 Hilite Antigens:

K,E,c,M

00:02:11 Run Test Cells: 1,4,6,5,8 00:02:34 Mark Likely: c,K You could have ruled out

at least one more antibody using cell #'s 4, 6 and 8 on this panel.




Lea


S


Fya You could have ruled out at least one more antibody on this panel.


144



E You ruled out anti-E using a heterozygous cell.

Undid Marking/Ruleout


M


ANTIGEN TYPING

00:04:07 Run Test Cells: c,K 00:04:13 Mark

Confirmed: K,c




Fyb and Jkb are confounded homozygously by your confirmed antigens

The antigen typing for E hasn't been checked. Since the patient would be more likely to form anti-E before K, be very careful.

Student Answer: c and K 00:04:36 Select Test: Albumin-




00:05:02 Run Test Cells: 7 00:05:15 Rule Out on

cell 7: Jkb


cell 10: Fyb



Pressed Leave Anyway Button

The antigen typing for E hasn't been checked. Since the patient would be more likely to form anti-E before K, be very careful.

Viewed Antigen Typing


ANTIGEN TYPING

145

Student Answer: c and K 00:06:09 Run Test Cells: E 00:06:21 Select Test: Done with

Case You did not rule out all of the antibodies in this case before indicating you were done with the case.


00:06:31 Rule Out: E 00:06:38 Select Test: Done with

Case

Student Answer: c and K Correct Answer: c and K 00:00:00 Loading Case: XPJS Evaluation Case 2 00:00:00 Select Test: ABO-Rh ABO/Rh 00:00:04 Mark ABO: O 00:00:07 Mark Rh: Neg 00:00:11 Select Test: Antibody


Be careful ruling out since there are weak reactions on this panel.

Pressed OK Button




00:00:39 Hilite: cells 7,10 You did not mark auto control before leaving a main panel.


00:00:52 Select Test: Antibody Screen

00:00:54 Set Auto Ctrl: Negative 00:01:01 Select Test: Antibody


00:01:08 Select Test: Cold 4°C COLD (4°C) 00:01:21 Select Test: Enzyme You could have ruled out

at least one more antibody using cell #'s 1, 2, 3 and 4 on this panel.


00:02:14 Rule Out on

cell 2: N

00:02:16 Rule Out on

cell 3: s You used 4 degrees C test

condition to rule out anti-s.


146


P1


M

00:02:34 Select Test: Enzyme You could have ruled out at least one more antibody using cell #'s 1, 3 and 4 on this panel.



Lub You used 4 degrees C test condition to rule out anti-Lub.



Leb

00:03:19 Select Test: Enzyme You could have ruled out at least one more antibody using cell #'s 3 and 4 on this panel.



Lea

00:03:53 Select Test: Enzyme You could have ruled out at least one more antibody on this panel.



Lua

00:04:06 Select Test: Enzyme ENZYME: FICIN TREATED CELLS 37°C, IGG AHG

00:04:18 Hilite: cells 10,9,8,7,6


S,s,Fya,Fyb,Xga


c,e,f,Lub


k,Jka


Jkb


K




D,E

00:05:18 Hilite: cells 2,10 00:05:25 Hilite

Antigens: C

147


C

00:05:31 Hilite: cells 1 00:05:37 Hilite ** Cells

on cell 4: C

00:05:41 Hilite: cells 4,8 00:06:00 Run Test Cells: 8,10 00:06:15 Rule Out on

cell 8: C


E


S


cell 5: s


Fyb


Fya

00:06:57 Mark Likely: D You could have ruled out at least one more antibody using cell #'s 5 and 8 on this panel.



00:07:24 Hilite: cells 5,4,2,1,10


Jsa


Kpa You could have ruled out at least one more antibody on this panel.




Xga


ANTIGEN TYPING


You did not mark an answer using the "Confirmed" button.



ANTIGEN TYPING

Student Answer: none 00:08:55 Mark

Confirmed: D

148


Student Answer: D Correct Answer: D 00:00:00 Loading Case: XPJW Evaluation Case 3 00:00:00 Select Test: ABO-Rh ABO/Rh 00:00:04 Mark ABO: B 00:00:06 Mark Rh: Pos 00:00:09 Select Test: Antibody



00:00:23 Hilite: cells 2 00:00:27 Rule Out on

cell 1: C,D,e,M,P1,Lea,Lub,K,k,Fyb


CASE HISTORY



00:01:01 Hilite: cells 1,3,5,6,8


c,f,V,S,Leb


Jkb,N


Xga

00:01:31 Set Auto Ctrl: Negative 00:01:37 Select Test: Additional

Albumin-Poly AHG

You could have ruled out at least one more antibody on this panel.



Cw




E,s,Fya,Jka

00:02:07 Run Test Cells: 1,4 00:02:15 Rule Out on

cell 1: Jka


s

00:02:27 Run Test Cells: 2,9,10

149


ANTIGEN TYPING

00:02:49 Run Test Cells: E,Fya 00:02:54 Select Test: Enzyme ENZYME: FICIN

TREATED CELLS 37°C, IGG AHG

00:03:06 Hilite: cells 1,3,6 00:03:13 Mark Likely: E 00:03:28 Set Allo Anti: Positive 00:03:39 Select Test: Case

History CASE HISTORY


ANTIBODY SCREEN


E

00:04:03 Mark Likely: Fya 00:04:10 Select Test: Done with

Case You did not rule out all of the antibodies in this case before indicating you were done with the case.

Pressed Leave Anyway Button

The antigen you confirmed was not present on reacting cell #'s 5 and 8 of the Main Albumin-Poly AHG panel.

Viewed Main Panel

The antigen you confirmed was not present on reacting cell #2 of the Additional Cells Albumin-Poly AHG panel.

The antigen typing for c hasn't been checked. Since the patient would be more likely to form anti-c before E, be very careful.

Student Answer: E 00:04:30 Select Test: Albumin-



Fya


Student Answer: E and Fya

Correct Answer: E and Fya

150

00:00:00 Loading Case: XKL Evaluation Case 4 00:00:00 Select Test: ABO-Rh ABO/Rh 00:00:03 Mark ABO: O 00:00:06 Mark Rh: Pos 00:00:09 Select Test: Antibody




CASE HISTORY



00:00:37 Set Auto Ctrl: Negative 00:00:46 Select Test: Enzyme ENZYME: FICIN

TREATED CELLS 37°C, IGG AHG

00:00:56 Hilite: cells 3,6 00:00:58 Hilite

Antigens: M,N,S,s,Fya,Fyb,Xga


D You ruled out incorrectly anti-D using cell #1 (The D antigen is present on that cell).



C You ruled out incorrectly anti-C using cell #1 (The C antigen is present on that cell).



P1 You ruled out incorrectly anti-P1 using cell #1 (The P1 antigen is present on that cell).


00:01:24 Select Test: Prewarm PREWARM TECHNIQUE IGG AHG

00:01:37 Select Test: Additional Eluate

ADDITIONAL CELLS: ELUATE: ORGANIC SOLVENT IGG AHG

00:01:48 Run Test Cells: 1,2,3,4,5,6,7,8,9,10,11,12


DAT

00:02:07 Set Allo Anti: Negative 00:02:11 Select Test: Done with

Case

151

You did not confirm an antigen but cell #'s 1 and 2 of the Antibody Screen panel were reacting.

Viewed Screen Cells


ANTIBODY SCREEN

You did not confirm an antigen but cell #'s 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 of the Main Albumin-Poly AHG panel were reacting.

Student Answer: none 00:02:47 Select Test: Cold 4°C COLD (4°C) 00:03:01 Select Test: Direct

Antiglobulin Test

DAT


CASE HISTORY



00:03:20 Run Test Cells: 1,2,3,4,5,6,7,8,9,11,10,12


D,C,e,Cw,S,P1,Lub,K,Fyb,Jka


N,Lea,k,Xga


s,Leb,Jsa


ANTIBODY SCREEN


ANTIGEN TYPING

00:04:38 Run Test Cells: E,c,M,Fya,Jkb

00:04:46 Rule Out: M,Fya 00:04:50 Mark Likely: E,c,Jkb 00:04:57 Select Test: Case

History CASE HISTORY




E

00:05:19 Hilite: cells 3,6 00:05:24 Select Test: Cold 4°C COLD (4°C)

152

00:05:37 Select Test: Enzyme ENZYME: FICIN TREATED CELLS 37°C, IGG AHG


Jkb


Jkb,c


Student Answer: E, c and Jkb

Correct Answer: E, c and Jkb

153

Appendix E. Sample Error Log

Subject #: E1 Treatment Degree: MT(ASCP) Years Worked: 15 Avg Pos Screens/mo: <1 Case XPJR XSVR XPJS XPJW XKL ABO ERRORS Did Not Mark ABO Interp

- - - - -

ABO Incorrect - - - - - Rh Incorrect - - - - - Unlikely Not Marked 4 - - - - Auto Control Not Marked

1 1 1 - -

Auto Control Incorrect - - - - - PANEL RO ERRORS OF COMMISSION

Antigen Not Present Screen - - - - - Poly (IgG) - - - - - Add'l Poly (IgG) - - - - - Other Panel - - - - - Reactive Cell Error Screen - - - - - Poly (IgG) - - - - - Add'l Poly (IgG) - - - - - Other Panel - - - - D,C,P1 Heterozygous Cell Error

Screen - - - - - Poly (IgG) - - - - - Add'l Poly (IgG) - E - - - Other Panel - - - - - Test Not Run Error AdditonalCells - - - - - No Non-Reactive Cells Error

Screen - - - - - Poly (IgG) - - - - -

154

Add'l Poly (IgG) - - - - - Other Panel - - - - - No Cells To Rule Out With Error

Screen - - - - - Poly (IgG) - - - - - Add'l Poly (IgG) - - - - - Other Panel - - - - - Panel Type Error Enzyme - - - - - Cold - - s,Lub - - Room Temp - - - - - Prewarm - - - - - Eluate - - - - - TYPING RO ERRORS OF COMMISSION

Antigen Not Present On Patient

- - - - -

Typing May Not Be Valid

- - - - -

Test Not Run - - - - - R/O ERRORS OF OMISSION

Screen - - - - - Poly (IgG) 2 - - 1 - Add'l Poly (IgG) 1 2 2 - - Enzyme - - - - - Cold - - 4 - - Room Temp - - - - - Incr. Serum/Cell - - - - - Prewarm - - - - - Other Panel - - - - - Antigen Typing 1 1 - - - Leave Anyway - 1 - 1 - R/O Right Answer - - - - - Heeded Weak Reaction Warning

- - - - -

155

Correct Answer [OPos/E,K] [OPos/c,K

] [ONeg/D] [BPos/E,Fya] [OPos/E,c,Jkb

] Student Answer(s) [OPos/-]

[OPos/E,K] [OPos/c,K] [OPos/c,K] [OPos/c,K]

[ONeg/-] [ONeg/D]

[BPos/E] [BPos/E,Fya]

[OPos/none] [OPos/E,c,Jkb]

Case Time 00:03:18 00:06:47 00:09:08 00:05:00 00:06:08 No Answer Marked 1 - 1 - - ABO Not Marked - - - - - Did Not Complete R/Os

1 3 - 1 -

Unlikelies Not Marked 1 - - - - Auto Control Not Marked

- - - - -

Typing Not Done - - - - - Answer Has Positive Typing

- - - - -

No Confirmed On Reacting

- - - 2 2

Confirmed On Non-Reacting

- - - - -

Confounding 1 1 - - - Confirmed Unlikely - - - - - Multiple Group - - - - - Hypothesis Probability - - - - - Antibody Specific - - - - - General Check 1 2 - 1 -

156

Appendix F. Number of Mistakes and Slips Made on Each Case By Each Subject in the Control Group (n = 16)

(Mistakes are shown first, slips are shown in parentheses) Error Case 1. Rule-out errors of

commission Pretest 1 2 3 4 Subjec

t Totals

1a. Antigen not present 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

33 (0) 22 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0)

31 (0) 16 (2)

0 (0) 0 (2)

18 (0) 13 (0)

5 (0) 0 (6)

0 (0) 0 (0) 0 (0) 9 (0) 0 (0) 0 (0)

20 (1) 21 (1)

0 (0) 0 (0) 0 (0)

10 (0) 16 (0)

0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1)

15 (0) 0 (0) 0 (0) 0 (0) 0 (0)

8 (0) 19 (0) 11 (0)

2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

30 (0) 10 (0)

0 (0) 0 (1) 0 (0)

16 (6) 23 (0)

6 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

41 (0) 25 (0)

0 (0) 0 (0) 0 (0)

85 (6) 93 (0) 22 (0)

2 (7) 0 (0) 0 (0) 0 (0) 0 (0) 9 (0) 0 (0) 0 (1)

137 (1) 72 (3)

0 (0) 0 (3) 0 (0)

Total: 102 (4) 77 (8) 41 (2) 80 (1) 111 (6) 411 (21) 1b. Reactive cell

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

11 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0)

18 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 4 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0)

11 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 4 (0) 0 (0) 0 (0)

20 (1) 0 (0) 0 (0) 0 (0) 0 (0)

Total: 29 (0) 4 (0) 0 (0) 3 (1) 0 (1) 36 (2)

157

1c. Heterozygous 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

8 (0) 0 (0) 0 (0) 4 (0) 0 (1) 4 (0) 0 (0) 8 (0)

0 (0) 1 (0) 0 (0) 4 (0) 6 (0) 2 (2)

2 (0) 0 (0) 0 (0) 5 (0)

2 (0) 0 (0) 6 (0) 0 (0) 0 (0) 0 (0) 1 (0) 3 (0) 6 (0) 1 (1) 1 (0)

9 (0) 0 (0) 0 (0) 4 (2) 0 (0) 7 (0) 0 (0) 6 (0) 2 (0) 0 (0) 0 (1) 5 (1) 6 (0) 6 (0) 2 (0) 0 (0)

4 (0) 0 (0) 0 (0) 2 (0) 0 (0) 2 (0) 0 (0) 8 (0) 0 (0) 0 (0) 3 (0) 8 (0) 4 (0)

4 (0) 2 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 1 (0) 0 (0)

23 (0) 0 (0) 0 (0)

15 (2) 0 (1)

17 (0) 0 (0)

30 (0) 2 (0) 0 (0) 4 (1)

14 (1) 17 (0) 24 (0)

8 (3) 1 (0)

Total: 37 (3) 27 (1) 47 (4) 37 (0) 7 (0) 155 (8) 1d. No cells with which to 1

rule out 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 0 (0) 4 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

21 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 4 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 4 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 8 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 4 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0)

37 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

Total: 27 (0) 4 (0) 4 (0) 8 (0) 1 (0) 44 (0)

158

1e. Special panel error 1

(RT, Cold, Prewarm, 2 Enzyme, Eluate)

3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

3 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0)

12 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 7 (0) 0 (0)

4 (0) 0 (0) 0 (0) 3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0)

6 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 5 (0) 0 (0) 0 (0) 0 (0) 0 (0) 6 (0)

25 (0) 0 (0) 0 (0) 5 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 5 (0) 2 (0) 0 (0) 0 (0) 7 (0) 6 (0)

Total: 0 (0) 6 (0) 20 (0) 7 (0) 17 (0) 50 (0) 1f. Patient lacks antigen 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (13) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (14) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

Total: 0 (0) 0 (13) 0 (0) 0 (0) 0 (1) 0 (14)

159

1g. Typing may not be valid 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

Total: 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0) 1h. Test not run yet

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (3)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (4) 0 (1)

Total: 0 (3) 0 (1) 0 (0) 0 (1) 0 (0) 0 (5) 2. Rule-out errors of

omission

160

2a. Main panels(screen & 1 panels at Polyspecific 2

or IgG) 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 2 (0) 8 (0) 8 (0) 0 (1) 0 (0) 2 (0) 0 (0)

4 (0) 0 (0) 1 (0) 0 (0) 0 (1) 0 (0)

0 (0) 5 (0) 2 (0) 1 (1)

0 (1) 3 (0) 0 (0) 3 (0) 4 (0) 0 (7) 0 (0) 1 (0) 0 (0) 0 (0) 0 (2)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 3 (0) 5 (0) 2 (0) 3 (0) 0 (1) 5 (0) 0 (0) 5 (0) 1 (0) 1 (1) 0 (0) 0 (0) 0 (0) 1 (0) 0 (1)

0 (0) 2 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 12 (0) 16 (0) 11 (1)

3 (1) 0 (2)

11 (0) 1 (0) 8 (0)

10 (0) 1 (8) 1 (0) 1 (0) 0 (1) 1 (0) 0 (3)

Total: 25 (2) 19 (11) 0 (0) 26 (3) 6 (0) 76 (16) 2b. Special panels (RT, Cold, 1

Prewarm, Enzyme) 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 1 (0) 4 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (4) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0)

0 (0) 0 (0) 2 (2) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 1 (1) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0) 1 (0) 1 (0) 5 (0) 1 (0) 0 (0) 1 (0) 2 (0) 0 (0) 0 (0) 3 (7) 2 (0) 0 (0) 0 (0) 1 (0) 0 (0)

Total: 5 (0) 3 (4) 2 (2) 1 (0) 6 (1) 17 (7) 2c. Antigen Typing

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 3 (0) 6 (0) 0 (0) 0 (0) 1 (0) 0 (0)

3 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0)

0 (0) 2 (0) 1 (0) 0 (0)

0 (0) 1 (0) 0 (0) 2 (0) 3 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 1 (0)

0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 6 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 2 (0) 0 (0) 1 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 2 (0) 1 (0) 1 (0) 5 (0) 0 (0) 2 (0) 2 (0) 1 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0) 4 (0) 8 (0) 7 (0) 6 (0) 0 (0)

12 (0) 2 (0) 4 (0) 6 (0) 0 (0) 1 (0) 1 (0) 1 (0) 2 (0) 1 (0)

Total: 14 (0) 11 (0) 9 (0) 5 (0) 16 (0) 58 (0)

161

3. Incomplete protocol

3a. Unlikely ab's not marked 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 1 (0) 0 (0) 1 (0) 1 (0) 1 (0) 1 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 1 (0)

0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 3 (0) 3 (0) 4 (0) 1 (0) 5 (0) 4 (0) 3 (0) 1 (0) 5 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

Total: 6 (0) 6 (0) 8 (0) 4 (0) 6 (0) 30 (0) 3b. Rule-out's not completed 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0)

1 (0) 0 (0) 1 (0) 0 (0) 1 (0) 1 (0)

0 (0) 1 (0) 1 (0) 1 (0)

0 (0) 1 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

1 (0) 0 (0) 1 (0) 1 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0)

0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 1 (0)

1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 2 (0) 5 (0) 5 (0) 4 (0) 1 (0) 5 (0) 0 (0) 4 (0) 3 (0) 1 (0) 1 (0) 1 (0) 1 (0) 1 (0) 1 (0)

Total: 8 (0) 7 (0) 8 (0) 5 (0) 8 (0) 36 (0) 3c. Underlying ab's present 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 1 (0) 0 (0)

0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 1 (0) 0 (0) 4 (0) 0 (0) 2 (0) 0 (0) 3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

162

Total: 2 (0) 5 (0) 1 (0) 2 (0) 2 (0) 12 (0) 3d. Antigen typing not done 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 1 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0)

5 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 4 (0) 3 (0) 0 (0) 1 (0) 1 (0)

Total: 3 (0) 4 (0) 1 (0) 5 (0) 2 (0) 15 (0) 3e. Auto ab's not marked 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0)

0 (0) 0 (0) 1 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (2) 0 (0) 0 (0) 2 (0) 1 (0) 0 (0) 0 (0)

Total: 2 (0) 1 (0) 1 (1) 3 (1) 1 (0) 8 (2) 4. Data Implausible Given Answer

163

4a. No confirmed Ab on 1

reacting cells 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 2 (0) 0 (0)

0 (0) 0 (0) 1 (0) 2 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 2 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0)

2 (0) 3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (0) 0 (0) 1 (0) 0 (0) 0 (0) 2 (0) 3 (0) 2 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 2 (0) 0 (0) 2 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 2 (0) 0 (0) 0 (0) 2 (0)

4 (0) 6 (0) 0 (0) 4 (0) 0 (0) 0 (0) 8 (0) 0 (0) 2 (0) 0 (0) 1 (0) 4 (0) 8 (0) 2 (0) 0 (0) 2 (0)

Total: 8 (0) 6 (0) 16 (0) 1 (0) 10 (0) 41 (0) 4b. Confirmed Abs on non- 1 reacting cells 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

Total: 2 (0) 1 (0) 0 (0) 0 (0) 0 (0) 3 (0) 4b. Answer has positive

1 antigen typing 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

164

Total: 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 2 (0) 5. Answer implausible given prior probabilities

5a. Low frequency Ab (f, V, 1

Cw, Lua, Kpa, Jsa) in 2 answer set 3

4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 2 (0) 0 (0)

Total: 1 (0) 0 (0) 2 (0) 0 (0) 1 (0) 4 (0) 5b. Abs from multiple

1 groups in answer set 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

Total: 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 2 (0)

165

5c. Hypothesis Probability 1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 1 (0)

Total: 0 (0) 0 (0) 1 (0) 0 (0) 2 (0) 3 (0) 5d. Antibody Specific

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 1 (0)

2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 1 (0) 0 (0) 2 (0) 2 (0) 0 (0) 0 (0) 2 (0)

Total: 1 (0) 1 (0) 4 (0) 0 (0) 5 (0) 11 (0) 5e. General Check

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 1 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 1 (0) 1 (0)

0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0)

1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 2 (0) 4 (0) 3 (0) 3 (0) 0 (0) 3 (0) 0 (0) 3 (0) 1 (0) 1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0)

166

Total: 4 (0) 6 (0) 1 (0) 4 (0) 7 (0) 22 (0)

167

Appendix G. Number of Mistakes and Slips Made on Each Case By Each Subject in the Treatment Group (n = 16)

(Mistakes are shown first, slips are shown in parentheses) Error Case 1. Rule-out errors of

commission Pretest 1 2 3 4 Subjec

t Totals

1a. Antigen not present 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 3 (2) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 3 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (6) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (1)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (2) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 3 (3) 0 (1) 0 (6) 0 (0) 0 (2) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 3 (0)

0 (1)

Total: 6 (2) 0 (6) 0 (3) 1 (2) 0 (1) 7 (14) 1b. Reactive cell

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0)

0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (1) 0 (0) 0 (0) 0 (1)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (0) 0 (1) 0 (0) 0 (1) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (1)

0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0)

0 (0)

0 (3) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (2) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (3) 0 (0) 0 (2) 0 (0) 0 (0) 0 (1) 0 (0) 0 (2) 0 (0) 3 (0) 0 (2) 1 (0) 0 (1) 0 (1)

0 (1)

Total: 0 (3) 3 (2) 0 (2) 1 (1) 0 (5) 4 (13)

168

1c. Heterozygous 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 5 (0) 6 (0) 3 (0) 6 (0) 8 (0) 0 (0) 0 (0)

0 (0) 0 (0) 4 (0) 0 (0) 5 (0)

8 (0)

1 (0) 0 (0) 0 (2) 0 (0) 0 (0) 0 (2) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (1) 0 (2) 0 (0) 0 (0) 0 (2) 0 (0) 0 (1) 0 (1) 0 (0) 0 (0)

0 (1)

0 (0) 0 (1) 0 (2) 0 (0) 0 (0) 0 (3) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0)

0 (0) 0 (0) 0 (1) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (1)

0 (0)

1 (0) 5 (1) 6 (5) 3 (1) 6 (1) 8 (7) 0 (0) 0 (0) 0 (3) 0 (0) 0 (1) 4 (2) 0 (0) 5 (1)

8 (1)

Total: 45 (0) 1 (6) 0 (8) 0 (6) 0 (3) 46 (23) 1d. No cells with which to 1

rule out 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (2) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (2) 0 (0) 1 (0) 0 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 0 (2) 1 (0) 1 (0) 0 (0) 0 (0) 2 (2)

169

1e. Special panel error 1

(RT, Cold, Prewarm, 2 Enzyme, Eluate)

3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

1 (0) 0 (0) 0 (0) 3 (0) 2 (0) 1 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 1 (0) 3 (0) 0 (0) 1 (0) 2 (2) 1 (1) 2 (0) 0 (0) 0 (0) 2 (0) 0 (0) 3 (0)

0 (0)

1 (0) 0 (0) 1 (0) 6 (0) 2 (0) 2 (0) 2 (2) 2 (1) 3 (0) 1 (0) 0 (0) 3 (0) 0 (0) 3 (0)

0 (0)

Total: 0 (0) 1 (0) 9 (0) 1 (0) 15 (3) 26 (3) 1f. Patient lacks antigen 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

170

1g. Typing may not be valid 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1h. Test not run yet

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (1)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (1)

Total: 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (1) 2. Rule-out errors of

omission

171

2a. Main panels(screen & 1 panels at Polyspecific 2

or IgG) 3 4 5 6 7 8 9

10 11 12 13 14 15 16

2 (1) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (4) 4 (0)

3 (0) 1 (0) 0 (0) 0 (0) 0 (0)

1 (0)

2 (0) 0 (0) 0 (1) 0 (1) 0 (1) 1 (2) 0 (2) 0 (1) 0 (1) 0 (0) 0 (2) 0 (1) 0 (0) 2 (1)

0 (0)

0 (1) 0 (0) 0 (0) 0 (0) 0 (1) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (1)

0 (0)

0 (1) 0 (0) 2 (0) 0 (5) 0 (1) 1 (1) 0 (0) 0 (0) 0 (0) 1 (1) 0 (1) 0 (1) 0 (0) 0 (1)

0 (0)

0 (0) 0 (0) 0 (1) 0 (1) 0 (2) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

4 (3) 1 (0) 2 (2) 0 (7) 0 (5) 2 (3) 0 (6) 4 (1) 0 (1) 4 (1) 1 (3) 0 (2) 1 (0) 2 (3)

0 (0)

Total: 11 (5) 5 (13) 1 (3) 4 (12) 0 (4) 21 (37) 2b. Special panels (RT, Cold, 1

Prewarm, Enzyme) 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

4 (0) 0 (0) 0 (0) 0 (2) 0 (0) 1 (0)

0 (0) 0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (2) 0 (0) 1 (0) 1 (1) 0 (1) 0 (0) 0 (0) 1 (1) 0 (0) 1 (0)

0 (0)

4 (0) 0 (0) 0 (0) 0 (2) 1 (2) 1 (0) 1 (0) 2 (1) 1 (1) 1 (0) 1 (0) 1 (1) 0 (0) 1 (0)

0 (0)

Total: 2 (0) 1 (0) 6 (2) 0 (0) 5 (5) 14 (7) 2c. Antigen Typing

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 2 (0)

2 (0) 0 (0)

0 (0) 0 (0) 0 (0)

0 (0)

1 (0) 0 (0) 0 (1) 0 (0) 0 (1) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 4 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

) 0 (0)

2 (0) 0 (0) 0 (1) 2 (0) 0 (1) 4 (0) 1 (0) 2 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 7 (0) 2 (2) 0 (0) 4 (0) 0 (0) 13 (2)

172

3. Incomplete protocol 3a. Unlikely ab's not marked 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 1 (0)

1 (0) 1 (0) 0 (0) 1 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

1 (0) 1 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0)

0 (0)

Total: 9 (0) 0 (0) 0 (0) 0 (0) 0 (0) 9 (0) 3b. Rule-out's not completed 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 1 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 1 (0)

1 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 6 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (0) 0 (0)

0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

0 (0) 0 (0) 1 (0) 3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 1 (0) 1 (0) 0 (0)

0 (0)

5 (0) 1 (0) 1 (0) 4 (0) 2 (0) 9 (0) 1 (0) 3 (0) 0 (0) 1 (0) 0 (0) 1 (0) 3 (0) 0 (0)

0 (0)

Total: 8 (0) 3 (0) 9 (0) 5 (0) 6 (0) 31 (0) 3c. Underlying ab's present 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

2 (0) 1 (0) 0 (0) 0 (0) 1 (0) 2 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0) 2 (0) 0 (0)

0 (0)

173

Total: 4 (0) 1 (0) 1 (0) 3 (0) 0 (0) 9 (0) 3d. Antigen typing not done 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 1 (0) 1 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0)

0 (0)

Total: 3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (0) 3e. Auto ab's not marked 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 3 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (0) 4. Data Implausible Given Answer

174

4a. No confirmed Ab on 1

reacting cells 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 2 (0) 0 (0) 0 (0) 3 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 2 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 3 (0) 1 (0) 0 (0) 0 (0)

10 (0) 0 (0) 1 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

2 (0) 0 (0) 2 (0) 8 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 2 (0) 0 (0)

0 (0)

4 (0) 5 (0) 3 (0) 8 (0) 3 (0)

10 (0) 0 (0) 1 (0) 0 (0) 2 (0) 0 (0) 1 (0) 4 (0) 0 (0)

0 (0)

Total: 7 (0) 0 (0) 17 (0) 2 (0) 15 (0) 41 (0) 4b. Confirmed Abs on non- 1 reacting cells

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0)

Total: 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0)

175

4b. Answer has positive 1

antigen typing 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 2 (0) 5. Answer implausible given prior probabilities

5a. Low frequency Ab (f, V, 1

Cw, Lua, Kpa, Jsa) in 2 answer set 3

4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

Total: 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

176

5b. Abs from multiple 1

groups in answer set 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0)

Total: 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 5c. Hypothesis Probability 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

Total: 0 (0) 0 (0) 2 (0) 0 (0) 1 (0) 3 (0) 5d. Antibody Specific

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

1 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 4 (0) 1 (0) 1 (0) 2 (0) 0 (0) 0 (0) 1 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

0 (0) 0 (0) 2 (0) 0 (0) 0 (0) 4 (0) 1 (0) 1 (0) 2 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0)

1 (0)

177

Total: 1 (0) 0 (0) 11 (0) 0 (0) 1 (0) 13 (0) 5e. General Check

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

1 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0)

1 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0) 1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

0 (0)

1 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (0)

0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 0 (0)

0 (0)

0 (0) 0 (0) 1 (0) 2 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (0) 1 (0) 0 (0)

0 (0)

4 (0) 1 (0) 1 (0) 2 (0) 0 (0) 4 (0) 0 (0) 2 (0) 0 (0) 1 (0) 0 (0) 1 (0) 3 (0) 0 (0)

0 (0)

Total: 5 (0) 2 (0) 2 (0) 5 (0) 7 (0) 19 (0)

Documents

Critiquing As A Design Strategy For Engineering Successful ...bart.sys.virginia.edu/hci/papers/SGDissertation.pdf · Critiquing As A Design Strategy For Engineering Successful Cooperative