Field evaluation of a prototype laser safety decision support system

interacting with Computers vol 7 no 4 (7995) 361-382

Field evaluation of a prototype laser safety decision support system

Anthony Clarke, Basil Soufi, Luise Vassie*, and John Tyrer’

A field evaluation of a decision support system prototype is described. The system is designed to aid the decision making of laser safety hazard assessors and laser manufacturers. The aims of the evaluation were to establish the usefulness and usability of the system, and to indicate where design changes might be needed. Three complementary methods namely observation evaluation, expert evaluation, and survey evaluation were used. Fifteen laser safety hazard assessors took part in the evaluation as subjects. Objective and subjective data were analysed and areas of user difficulty with the system were identified. The system was well-received although some pointers to modification for the eventual delivery system were identified. It is concluded that the aims of the evaluation were successfully met.

Keywords: human-computer interaction, decision support system, usability

The authors constructed a prototype computer-based laser safety support

system to meet the informational needs of people concerned with laser safety, and this paper reports on the evaluation of the prototype.

Lasers are widely used and are an important part of the economic scene. However, a number of problems can arise from their use. For example, the beam from some types of laser can burn or puncture the skin or cause damage to the retina of the eye. Also, materials being processed by laser can give off toxic fumes (i.e., the degradation products). Furthermore, accurate and reliable continuous control of the combination of beam plus material plus process has yet to be achieved.

These problems give rise to issues of health and safety, not just for the people working directly with the lasers, but also for those working in the area of the installation and for the population living around industrial sites where laser activities take place.

Department of Computer Studies *Centre for Hazard and Risk Management +Department of Mechanical Engineering, University of Technology, Loughborough, Leicestershire LEll 3TU, UK

0953~5438/95/$09.50 @ 1995 Elsevier Science B.V. All rights reserved 361

A survey of laser safety in 49 companies in the UK producing lasers showed that just under half had difficulties in assessing laser safety hazards or implementing the current UK laser safety standards (Vassie et al., 1993).

There are a number of reasons for this, all of an information-related nature. Much of the information needed by safety assessors is not readily available; it is scattered across journal papers, technical books and various interim British and foreign standards. The information that is available is often very technical and not in a form readily usable by the assessors. Also, the information base is incomplete because new data, being steadily generated by research into new types of laser and the effects of lasers upon new materials, is especially difficult for assessors to obtain and to apply.

Thus there is an outstanding need to support laser safety assessors with up-to-date practical information in a readily accessible and usable form. The authors believed that a computer-based system with its ready facility in handling and presenting knowledge would meet this need, and so they constructed a prototype system.

Brief description of the potential system users

The people who use laser safety information and therefore might benefit from a computer-based laser safety system can be classified as laser manufacturers, product suppliers, laser users, or inspectors. Their main safety related tasks are given here, in brief:

l Laser manufacturers: The manufacturers ensure that the laser/laser product complies with the relevant laser safety, electrical safety and other appropriate standards such that the product can be sold, and that there is in-house safety provision for personnel working with lasers.

l Product suppliers: Suppliers of lasers and laser-based systems ensure that the product complies with relevant national and international standards and is appropriately classified. Information about the hazards associated with the operational use of the laser product is needed for the supplier’s in-house safety codes of practice, and transference of the required information to the laser users (i.e., ‘customer education’).

l Laser users: Companies and others who use laser systems need to determine the hazards associated with the use of laser equipment; adopt appropriate measures to protect both personnel and equipment; and demonstrate to inspectors that the measures in place meet the requirements of the UK Health & Safety at Work Act (H&SW).

l Inspectors: Factory inspectors ensure that a user or manufacturer complies with the H&SW Act in all matters of safety, including laser safety. The inspector establishes whether safety precautions taken are appropriate to the risks (i.e., if reasonable practicable measures have been taken against the worst foreseeable fault condition). The factory inspector may consult specialist inspectors. The laser specialist inspector establishes the hazards associated with the use of lasers and laser systems in their environment of use. The degree of hazard is established and an evaluation of the appropriateness of

362 interacting with Computers ~017 no 4 (1995)

the safety controls in place is carried out. In this way the measures for improving the situation are determined. The specialist inspector also has responsibility for increasing the laser safety awareness of factory inspectors.

Choice of system type

Before the system was constructed, preliminary discussions were held with 20 people, selected to be a cross-section of potential users, about their requirements including the type of system that could be provided. They felt that the most important choice was between an expert system (ES) or a decision support system (DSS).

We noted from the discussions that these potential users held the view that an ES is a system that makes authoritative recommendations in the same way as a human expert, but in restricted, highly defined domains. As an ES could replace a human, they could not see the utility of a system that would simply mimic them, as experts themselves. They also could not think of any individual whose expertise might be represented in the system who would be accepted widely enough in British industry, let alone wider afield. Two people expressed the belief that UK Government inspection bodies would be very mindful of the legal implications of using the recommendations of an ES, especially if an Inspector authorised inappropriate or wrong recommendations from a system that was claimed to be an ‘expert’.

The proposed system should bring all the loose data together and should be easy to use by non-computer people in interactive mode. The system should help them to do what they routinely do only faster and with less error: it should support not replace people. Leave the assessors to make the decisions and to take the responsibility. If a DSS does not make decisions but helps in decision- making, then that is what is required. The system had to be portable so that it could support hazard analysis and assessment work on site. It also had to be easy to use. If the system proved itself they might move to an ES later.

Thus the potential users expressed a preference for a decision support system, as they saw it, rather than an expert system.

Brief description of system

Decision support system The system gives quantitative support to the laser safety decision maker. It may be categorised as ‘an interactive system that provides the user with easy access to decision models and data in order to support decision-making tasks’ (Hans and Kim 1990). It is called the Laser Safety Decision Support System (LSDSS).

User interface It is possible for highly supportive interfaces to hamper user development by providing an inaccurate model of system activity and by producing an over-reliance on prompts and facilities (Stevens, 1983). This potential problem is obviated in the LSDSS by designing the interface screens and dialogue to

Clarke et al. 363

I Detailed Analysis

1 Material Processing

Oueruiew

Figure 1. Users choose the mode of operation from the main option selector

reflect the task as perceived by expert users (and echoed in the support to the interaction) yet giving information, prompts, cross-references, and navigation aids to the extent that less-expert users can also access the required information. A similar approach has also been used by Narula and Diaper (1991). In their work on a hypertext-based computer-assisted medical instruction system they stress the design principle of maintaining the ‘cognitive compatibility between the target end users of the system . . . and the knowledge base encapsulated in [the system]‘.

The design of the user-computer interface supported the ways in which assessors use safety knowledge in doing their tasks, namely:

in overview (which gives the user rapid access to general safety information), for detailed analysis (which provides support in carrying out a comprehen- sive assessment of hazards given a specific laser problem), for calculations - for example, of the minimum permissible exposure (MPE): the maximum level of laser emissions to which a person can be exposed without suffering adverse effects, and of the nominal ocular hazard distance (NOHD): the safest distance a person can be from laser emissions without risk of suffering damage to the eyes.

In use, the system opens at a pictorial screen then moves to the ‘About the system’ screen. This tells the user about the detailed analysis, overview, and calculation options that are available, and depicts the link map icon, saying that it ‘helps the user in navigating through the system by showing the point of interaction in a diagram of the structure of the information in the system’. Selecting the ‘continue’ button takes the user to the main option selector, shown in Figure 1.

The main option selector provides ten option buttons. (Items in italics are active buttons or windows which require data to be input by the user):

364 Interacting with Computers vol 7 no 4 (1995)

l Detailed analysis: material processing which links to laser types, to energy control methods, to material and to process.

l Detailed analysis: R&D not linked in prototype. l Detailed analysis: display/entertainment not linked in prototype. l Overview: applications which links to processes which links to two A4 pages

of text and separate scrolling window about lasers typically used in processes. l Overview: lasers which links to processes which each links to laser types

which links to general information about selected laser type. l Overview: classes which links to laser classes and description of classification

system. 0 Calculations: MPE which links to data input required for wavelength,

operating mode, exposure time, viewing of source which links to calculate. l Calculations: NOHD which links to data input required for wavelength,

operating mode, laser power, beam diameter, beam divergence, and exposure which links to calculate.

l Return to ‘About the system’ screen. l Quit the system.

Interaction support structure The hypertext cards are hierarchically structured to harmonise with the way in which the assessment task is done in practice. Laser safety hazard assessment is highly procedural and the system reflects that fact. In this way the design of the structure for supporting the user-computer interaction reflects the users’ conceptual model of how they carry out their assessments, i.e. their understand- ing of the process. (The explicit user conceptual model of laser hazard assessment is described in Tyrer et al., 1994). This approach to the design was used because a DSS which is in keeping with the user’s cognitive style will be more frequently used, more effective in decision-making, and better accepted (Zmud, 1978; Mann et al., 1989).

A link map was provided to help in navigation (see Figure 2). It showed the relationship of levels and routes from and to major topic areas. It was selectable from the same icon button on every screen except the main option selector.

Knowledge base The knowledge represented in the system was structured such that interaction with it reflected the users’ conceptual model of the knowledge they used in their traditional decision making. The knowledge-base consists of three sections:

l definitions of lasers and associated equipment, 0 information about laser systems, their hazards, safety requirements, and

controls, l information about processes such as cutting and welding.

Elements within these sections are linked according to the various task requirements previously identified by the cross-section of potential users who took part in the requirements analysis. The integrity of the diverse knowledge- base sections was thus maintained yet the interface presented the users with a

Clarke et al. 365

Optical Hazards

Figure 2. Successful outcome of Task 10, a question about eye damage

‘seamless’ interaction with the system. The interaction with the knowledge- base would be familiar to the users even though the interface as part of the computer-based system itself was new. As Narula and Diaper wrote about their own system, ‘the structure of the hypertext knowledge base followed a traditional structure that was known to be familiar to the system’s end users’.

Some links were not implemented because they would have permitted the users to deviate too widely from the procedures of the formal safety assessment, which would have caused difficulties for the less-experienced assessors, and others were not implemented because they were ‘impossible’, for example, the combination of beam plus material plus process would be illogical.

Prototyping Prototyping was used because it allowed rapid development of a working system for evaluation, permitted rapid changes to be made to the system, especially from the user evaluation feedback; and would speed up the process leading to implementation of the final delivery system. The prototype LSDSS was developed using Hypertext (on an Apple Macintosh IIci) to take advantage of the easy access to the system functionality offered by the HyperCard type of interface to user and developer alike.

Once it had been evaluated the prototype system would be ‘thrown away’. This approach was used because the delivery system would have to be mounted on the most widely used types of lap-top or ‘average’ desk-top computer available to the safety assessors in the companies that were surveyed (typically a PC-type with a 286 or 386SX processor with MS-DOS) and would have to be programmed in a language more appropriate to those platforms.

366 Interacting with Computers ~02 7 no 4 (2995)

LSDSS evaluation

Aims The aims of the evaluation were twofold: firstly to establish the usefulness and usability of the system, and secondly to indicate if and where design changes were needed. In this instance, ‘usefulness’ indicates the extent to which the functionality of the system supports safety assessors in meeting their goals in laser safety and hazard assessment. ‘Usability’ relates to quantifiable and measurable performance within specific laser safety and hazard assessment tasks supported by the decision support system. Such performance is generally measured by: learnability (the time and effort required to reach a specified level of user performance), throughput (i.e., tasks accomplished by experienced users, speed of task execution, and errors made), and attitude (i.e., the positive attitude engendered in users by the system). (Shackel, 1991; Lindgaard, 1994).

The subjects completed a set of tasks using the LSDSS. Their performance was recorded and analysed to obtain objective usability and usefulness data. The subjects then completed a questionnaire. Their responses to the questionnaire were analysed to obtain measures of subjective usefulness and attitude. The objective and subjective data were then examined to discover:

l If the difficulties that subjects said they had in a task were reflected in their performances recorded for that task.

l If features in the recorded performance indicating that the subjects were experiencing difficulties were reflected in the subjects’ opinions and comments about the task or the system.

Positive correlation between objective and subjective measures would enhance the usability, usefulness, and attitude data and hence the evaluation of the LSDSS.

Evaluation tasks An interview (the ‘contextual interview’) was held with a group of four laser safety experts to establish a set of tasks for use in the objective measurement of usefulness and usability.

In usability studies the contextual interview is used to discuss the user’s goals, ways of working, and problems and is normally held with the intended users (Preece, 1993). In the present study it was assumed that the four experts would be able to describe the goals of the intended users accurately, and that the tasks they prescribed would, for example, correctly reflect the goals and typical problems experienced by users in the field.

There were nineteen tasks in all which took the form of questions that had to be answered using the system. The questions reflected accurately problems that were solved daily by laser users and assessors. The tasks assumed a level of knowledge typically possessed by a person with some experience in laser safety matters. The tasks had ‘face value’ i.e., not only were they realistic but they would be seen by the subjects as being realistic. Their structure reflected the three main ways of using safety information: to get an overview or general

CIark~ et al. 367

information, to carry out detailed analysis, and to perform calculations. Nine of the tasks were ‘overview’ tasks and covered lasers, laser application, laser classification, and general safety precautions.

Examples of questions from the overview section:

1. “What is the purpose of the assist gas in the cutting process?” 2. “What can be achieved by changing the design of the CO2 laser cavity?“ 3. “You would like to have a Class 1 laser system. What safety features would you

expect the manufacturer of this system to implement?”

The other ten were ‘detailed analysis’ and ‘calculation’ tasks and prompted the subjects to get answers to questions about details of laser safety in the field of material processing. Examples of questions from this section:

“You are using a CO2 laser in continuous mode (CW) for cutting the polymer polymethyl-methacralate. The assist gas is air and the laser power is 500 Watts. The setup has a movable beam and an XY moving workpiece.”

1. “What potential damage can exposure to this laser cause to the eye?“ 2. “Determine the minimum permissible exposure value (MPE) for an exposure

time of 10 seconds minimum exposure criteria of intra-beam viewing.” 3. “Determine the recommended enclosure material and the enclosure design/s for

this setup.”

Questionnaire A two-part questionnaire was designed to elicit subjective usefulness, usability and attitude data from the subjects. The structure of the questionnaire was based on the three-level model of the human-computer interface (Clarke, 1986). In this model the interface is represented by three layers. The top layer models the interface between a user’s goal and its representation in the computer. The top layer is serviced by the middle layer which models the interface between the user’s information handling characteristics on one side, and the organisa- tion and presentation of information from the computer on the other side. The middle layer is serviced by the bottom layer, which is the physical interface. At the bottom level the sensori-motor characteristics of the user, the input and output devices of the computer, and the interaction between them are mod- elled. This 34evel model provided the structure for the questionnaire which was designed to elicit subjects’ responses to the main features of the system, namely:

l its functionality (i.e., the support given to the subject in reaching his or her goals),

l the dialogue (i.e., the form and function of the information presented to, and required from, the subject),

l the physical human-computer interface (i.e., the characteristics of the keyboard, roll-ball and screen).

From a user-centred perspective the last two levels are equally as important in practice as the first because ‘. . . there is a tendency for evaluation to be centred

368 Intending with Computers vol 7 no 4 (1995)

around the expert, concentrating on the correctness of the system’s output rather than the way such output is represented to the user’ (Bright et al., 1989).

The questionnaire had two parts. The first part had 37 questions about technical aspects of the system. The questions in this part were read to the subjects and their responses tape recorded.

Here are examples of the questions that were asked from the first part:

About the tasks:

1. “Did the system allow you to obtain the necessary information? If not, what else was needed?”

2. “Is the information required by these tasks typical of the information you would normally require?”

About the presentation of information:

1. “Do you think there is a balanced split between textual and pictorial information?”

In the broader job context:

1. “Do the facilities/functions/information offered by the system seem satisfactory for your needs?”

2. “Do you think that the overview function and the detailed analysis facility will meet the needs of a) factory inspectors, and b) laser users?” 3. “What major modifications or additions would you make to the system to make

it more accessible and useful to the user?”

The second part used a rating scale format and sought to elicit the subject’s beliefs and feelings about the system. It was given to the subjects on a separate sheet for them to complete themselves (see Appendix).

Subjects Fifteen subjects representative of the laser assessor community took part in the evaluation as subjects. Each subject had responsibility for laser safety in their companies or organisations. Thirteen of the subjects had computer experience or competence and two had little computer experience. Twelve were not familiar with the Apple interface or HyperCard.

Evaluation tools The tools used in gathering data for the evaluation were:

l A copy of the working system, mounted on an Apple Macintosh PowerBook for practical reasons of portability.

l The tasks: the set of tasks which the subjects had to complete using the system

l The questionnaire for eliciting and structuring subjects’ responses.

Clarke et al. 369

l ScreenRecorder, also mounted on the PowerBook, for software logging i.e. recording the interactions between subject and system in the tasks.

l A voice-activated tape recorder for recording the responses.

Evaluation methods Three complementary methods of evaluation: observation, expert, and survey, were used to gather data for the evaluation (Preece, 1993). For the observational method the subjects’ behaviour was monitored while they were using the system. For the expert evaluation method, a group of experts interacted with the system and their performance was monitored. For the survey method, a questionnaire was used to elicit the subjects’ responses to the system.

Evaluation procedure There were four steps in the evaluation procedure which had to be followed by each subject. (The times shown are not limits, but are indicators of the amount of time apportioned to each step).

Step 1: Tutorial session (15 minutes). The subject received a tutorial on the system from an investigator: the session included an introduction to the use of screen buttons and icons, the use of the roll-ball, and techniques of ‘dragging’, ‘clicking’, and ‘scrolling’.

Step 2: Task session (45 minutes). The subject, under supervision, worked through the set task sheet. The subjects were encouraged to follow the evident logic of the interface and the structure of the system knowledge-base in working towards answers which they believed to be acceptable. The subjects were also encouraged to try to resolve any problems without assistance but help was promptly given when requested. Software logging was used to record the dialogue between subjects and system. The subjects’ performances were recorded, with their knowledge and agreement.

Step 3: Subject Browse Session (10 minutes). The subject freely explored the system to follow up any points of interest or to see what information it contained.

Step 4: Subject questionnaire/interview (30 minutes). The subject answered the questionnaire. The subjects’ spoken responses to the questions were recorded, with their knowledge and agreement, on a voice-activated tape recorder for later transcription and analysis. Their written responses to questions were gathered together for later analysis.

Subjects in their place of work In the idealised circumstances of the controlled experiment the usability of a computer system is evaluated in the usability laboratory, but there is a growing awareness among researchers of the disadvantages with that. One disadvantage is the problem of reactivity, where the subjects taking part in an evaluation react unfavourably to the laboratory and its unrealistic, often inhibiting, environment (Wright and Fowler, 1986). These situational factors can easily distort the performance of the subjects.


Field evaluations, on the other hand, are less controllable but often inevitable. The LSDSS evaluation subjects, for example, had to be visited in their places of work because of their heavy commitments. The LSDSS and the recording equipment were portable so the investigators were able to make the visits and obtain data of good quality and quantity. The advantages of going to the places of work were that access was possible to subjects who were experienced in practical laser safety, and the subjects were working in their own environments and were thus receiving strong situational prompts to their memories.

One potential disadvantage was that some of the subjects felt under pressure from the unseen presence of their supervisors and superiors and the ever- present threat of interruption. In the event, no-one was interrupted, and only one subject had to be asked to switch his ‘bleeper’ off.

Analysis and results

Analysis of objective data

Performance baseline data In the expert evaluation method experts take the role of the user and describe the problems they encounter in using the system under evaluation. One of the drawbacks with this method is that experts can be biased, i.e., they can have strong views and be inflexible. To get the benefit of the method but to overcome any potential problems the experts who had helped in the design of the evaluation task set were also asked to carry out the tasks, under supervision, on the LSDSS.

The same method and equipment used in the field part of the evaluation were used to measure their performance. It was intended that the resulting performance data should be used as a standard for comparison in quantitative analysis of the subjects’ performance.

The value of doing this can be seen, for example, in one evaluation where the performance of subjects using a Hypertext system to answer questions was compared with their performance in the same tasks, but using a book; the information, subjects, and tasks were the same and the performance with the book was used as a standard (Leventhal et al., 1993). The results of the evaluation using comparison with a standard indicated more clearly where the advantages and disadvantages of one medium over another lay.

There was not enough time available in the LSDSS work for tests to be conducted using information from the traditional sources of laser safety information so the experts were used to establish a standard. The experts’ performance provided a valuable measure of the minimum task times possible with the system, where the sequence of operating was known and data that would be required by the system was to hand.

Throughput The mean time, or throughput, for the subjects to complete the tasks was 37.30 minutes, with standard deviation, SD = 12.47 minutes. The mean of the experts’

Clar/ce et al. 371

n T. q T.

1 2 3 4 5 6 7 6 9 10111213141516171819

Task

Figure 3. Mean time taken by experts and subjects to complete each task

time was 12.0 minutes, with SD = 1.9 minutes. The experts were familiar with the system and the detail of the tasks, and the subjects would have been familiar with the type of task and information involved. Nevertheless, the differences in overall times were large. To find an explanation for the differences, the times taken to complete each individual task (times per task) were examined.

Times per task Figure 3 shows the time per task for experts and subjects. It can be seen that the times follow similar patterns and Spearman’s rank correlation test applied to the results from the 19 tasks gave a correlation between times per task for expert and subjects of rs = 0.691, with an associated probability of p = 0.01 (Day, 1988; Siegel, 1956). This suggests that the differences and similarities were not due to chance but that expert and subject alike had similar encounters with most of the tasks, but different encounters with the others. These differences were explored by considering the relative task completion times.

Relative task completion times Some of the tasks took longer to complete than others simply because they called for more information to be entered, for example, or because they required a greater number of steps to complete. Experts and subjects alike took more time for these. What was of more interest were those tasks where the subjects took disproportionately longer than the experts. These tasks could be said to be relatively more difficult than the others.

Relative task completion times (RC) show which tasks took relatively longer

372 Interacting with Computers vol7 no 4 (1995)

1 2 3 4 5 6 7 6 9 10111213141516171619

Task

Figure 4. Mean relative time taken by all subjects to complete each task

for the subjects to complete, irrespective of the absolute length of the task. RC was calculated for each task using

Relative task completion time = Tsubject - Texpert / Texpert (1)

where Tsubject is the mean time for the subjects to complete each task, and Texpert is the experts’ base-line task completion time.

Figure 4 shows the results obtained by calculating RC. It can be seen that tasks 1, 9, and 15 gave rise to comparatively large relative time differences between subjects and experts: these large values of RC are pointing to task or system design areas that need closer investigation.

Learnability of the system Learnability or ease of learning can be measured by the time and effort required to reach a specified level of user performance (Shackel, 1991). In the present evaluation, however, the subjects were not available for sufficient time for repeated trials or sessions with the system. Variability, as indicated by the standard deviations of subjects’ task completion times for each task Ts,+ was examined as a possible measure of learnability. ‘Improvement in the level of skilled performance is invariably accompanied by an increase in the consistency of the methods by which the performance is achieved. In view of this, the amount of variability serves as a method of assessment.’ (Edwards, 1973). However, no clear pattern of progressive improvement was discerned.

Clarke et al. 373

Table 1. Task ranking in quartiles

Quartile Task

Highest Q 1: 1, 2, 7, 15, 17. aQ;r * 3, 6, 9, 13, 10, 14, 11, 16. 19.

Lowest Q 4: 4, 5, 8, 12, 18.

Difficult tasks Certain tasks seemed to be more difficult for the subjects to execute than others but the individual objective measures did not consistently point to the same tasks, although there appeared to be a fair consensus. To examine this, data from the relative task completion times (RC), variability of task completion times (TJ, and the absolute differences in task completion times per task for the 19 tasks were jointly analysed using Friedman’s Analysis of Variance by Ranks (Day, 1988; Siegel, 1958). With df = 2, the test gave Chi square equal to 31.684 and p < 0.001. This result indicated that the apparent variation in task difficulty was not due to chance, but persisted across all subjects and some tasks.

Following examination of the ranking of the data used in the ANOVA, the tasks were grouped in four quartiles, according to ranked value, as shown in Table 1.

The results focussed attention on the tasks in the highest quartiles, which were then traced in the subjective questionnaire and analysis data, and examined for associated statements of difficulty by the subjects.

Requests for help Requests for help can show up a weakness in usability. There were four occurrences where help from an investigator was sought by subjects while they were carrying out the evaluation tasks.

One was clearly a system design problem. The subject experienced difficulty in keying in numerical data for an MPE calculation. The essence of the problem was in the notation used by the system MPE calculator. In the prototype system, decimal notation was used because some of the safety assessors had said they used it. However, this subject was more accustomed to using scientific notation and assumed that it would be tolerated by the computer. This design shortcom- ing was noted for subsequent system improvement whereby either notation could be selected and used.

The second occurrence was partly to do with the subject and partly to do with the system. The subject went on a ‘detour’ by accident and asked for help to get back on the correct route as quickly as possible. In the introduction to the system, the subjects had been shown the link map that was available to help in navigation. The subject had forgotten about it. Once showed how to select it from the icon button (see Figure 2), he quickly returned to the task in hand. However, it was felt that the design of the icon may need closer study.

The two other requests indicated the state of knowledge of the subject in

374 interacting with Computers vol 7 no 4 (1995)

Table 2. Attitudinal comments about the perceived usability of the LSDSS

Level of interface

Usability comments: favourable adverse

Goal/task Information Physical

108 15 72 7 24 2

204 24 N = 228

particular areas more than the design of the system eg one subject said, “I needed help on deciding on beam diameter, MPE values etc.” In practical use the data needed for answering these questions would be available from the particular laser set-up being assessed.

Analysis of subject responses The questionnaire was administered to the 15 subjects: although some of the subjects gave neutral, i.e., unclassifiable responses, the total yield was 245 responses suitable for analysis. Of these, 228 were used in the content analysis of perceived usability, and 17 for the attitudinal response. Of the 228 usability responses, 204 were favourable, and 24 were adverse.

Content analysis The structure and content of the questionnaire were designed so that content analysis could be made of the responses. Content analysis is a technique for making inferences by objectively and systematically identifying specified characteristics of messages, texts, and communications (Bereison, 1952; Carney, 1972). The texts have to be defined before the analysis is done: in the present work this was simple because the texts were the subjects’ recorded responses to the questionnaire and their impromptu comments made during the evaluation sessions. The characteristics also have to be defined before the analysis is done: we were interested in the occurrence of two main characteristics (‘favourable’ and ‘adverse’) to the perceived utility of the system. To make the analysis more sensitive, the characteristics were further sub-divided into three categories:

system functionality - the support given to the subject in reaching his or her goals, dialogue - the form and function of the information presented to, and required from, the subject, the physical human-computer interface - about the keyboard, roll-ball and screen.

table of categories was then constructed with 2 x 3 = 6 cells because there were a number of related characteristics to be identified.

The analysis procedure commenced by correctly identifying the responses and placing them in the table in the appropriate cell. This gave a series of structured responses indicating which aspects of the system were perceived to

Clarke et al. 375

have utility and which did not. Table 2 shows that the favourable comments outnumber adverse comments about the perceived usability of the LSDSS at all the levels of the interface. To test the hypothesis that the differences were more apparent than real, the frequency of occurrence of ‘favourable’ and ‘adverse’ responses in the table was analysed using the Binomial Proportion Test, which yielded a Contingency Coefficient of C = 0.3569 for df = 2, p = 0.001, which suggests that the difference in the numbers of ‘favourable’ and ‘adverse’ responses per level, and the similarity of their proportions at all levels, is significant (Day, 1988; Siege, 1958). Thus, the subjects found the LSDSS equally acceptable whether they were considering the system’s physical interface, accessing or receiving information, or using the system to make safety decisions.

Attitude to system The subects’ favourable comments indicated their generally positive attitude towards the system. For example:

“lots of good points” “calculation facilities very good” “Sliney and Wolbarsht and the Standard, all in one” (a reference to Sliney and Wolbarsht (1980), and BS7192:89 (1989)) “far more effective than a book” “impressive speed of getting results” “too good for general use” [I would use it] “twice a week”, “every time” “certainly OK for industrial laser users” “Manufacturers requirements [are] met in logical order” “very detailed fume data, makes it a learning point”. N . . . the diagrams are essential.”

Other comments which indicated the subjects’ attitude to the system included the frequently occurring, “We don’t want an expert system”, although one subject did express the view was that an “expert system [is] needed, with an interrogative interface”. The subjects also expressed a generally favourable attitude towards the aesthetics of the design of the screens and the PowerBook. However, one subject did make reference to the “. . . annoying bleep when doing things wrong”.

Correlating objective and subjective evaluation data Users’ perceptions of difficulty in using a system do not always accord with objective measures of their performance. Data from the subjective measures of the system (specifically, the adverse comments, arising out of interacting with the system), were therefore examined for correlation with the objective measures. Each task was analysed in terms of its interface features, and ‘problem’ tasks were compared with the ‘non-problem’ tasks.

The results of the correlation analysis are given here according to the three aspects of the interface examined:


l Physical level of the interface: there are clear differences in the average frequency of occurrence per task of scrolling acts, use of pull-down menus, and data insertions between the group of ‘difficult’ and the group of ‘non-difficult’ tasks. The differences between average number of ‘reading text or diagram’ events, and average actions per task, is not so large. These results compare well with the responses about the physical interface, which included “I had to study the mechanics of the system”, “I had great difficulty in manipulating information using the scrolling method”, and “. . . using ‘click and hold’ button is irritating”.

l Information level of the interface: ‘difficulty’ in the tasks was associated with relatively high frequencies of occurrence of problems with consistency in presentation and accessing information, for example: “It is annoying that I need to click on ‘ok’ or ‘proceed’ after having made a selection”, and with coding (i.e., use of particular terminology and units) e.g., “(I had) trouble with the units e.g. mS, kJ., and problems with entering numerical data”, and “Replace [the term] ‘energy manipulation’ with ‘beam delivery’ “. Users experienced problems with navigation in some tasks, e.g., “back- tracking is irritating”, and “there should be more short cuts”.

l Goal interface level: there were few perceived problems with the knowledge in the system (i.e., facts, object information, rules, and procedures), but they occurred relatively more frequently in those tasks ranked as being ‘difficult’ than in the others. Some examples of this are: “. . . need much more about interlocks”, “NOHD most useful” [meaning, “include it’ “I., and “. . . not enough about display lasers”.

Discussion of the evaluation results

It was intended that the subjects should highlight the difficult or unwieldy aspects of the physical interface and separate them out from the information interface and the knowledge base. The results show that this aim was achieved.

The subjects perceived as ‘difficult’ those tasks which involved: scrolling; the manipulation of information having inconsistent format and presentation; the use of unfamiliar terminology and units; typing in data, and making selections using pull-down menus.

There were some perceived shortcomings in facts, objects, rules, and procedures at the goal level, and these were also associated with the ‘problem’ tasks. Scrolling, pointing, and ‘clicking’ is well established among users of GUIs (graphical user interfaces), albeit perhaps more commonly in conjunction with the mouse, rather than the roll-ball (tracker-ball). However, although the subjects in the evaluation were mostly unfamiliar with this type of interface and with HyperCard, they quickly adapted to the small ball and large keys of the PowerBook.

The need to use pull-down menus to make selections inhibited free movement within the system. This was a design-related feature, and was therefore addressed (see below).

Problems identified at the information interface level included a stated need for an indication of ‘where we are now’, and for short cuts. A map was available

Clarke et al.

to help in navigation: selectable from the same icon button on every card, it showed the relationship of levels, and routes from and to major topic areas (on the screen shown in Figure 2). The map did not provide short cuts, however, which are sometimes used to save time, or to explore or browse: as Monk (1989) points out, “navigation through links that are part of the expository material can be discriminated from navigation using facilities external to the text”. The provision of the ‘general overview’ routes may be seen as a restricted browsing facility, and the nature of the specific system applications precludes ‘short cuts’.

Movement around the system was achieved primarily through the card buttons, not the keyboard. Problems with typing in data were not related to this feature: the evaluation analysis shows that the problem is connected to choice of units and notation, not the act of typing.

Perceived task difficulty Certain tasks which might be thought to be difficult because of their length or the menu depth were not perceived as such by the subjects. The absence of perceived difficulty reflected the close match achieved between the structure of the information interface in the system and the conceptual task model, which itself is derived from tasks typical of those carried out in safety assessment work. The evaluation revealed problems with manipulation of information with inconsistent format and presentation, and the use of unfamiliar terminology and units. There was also a necessity to extend the knowledge base in certain areas.

Subject performance variation The variation in performance among the individual subjects was due to a number of factors, chief of which were differences in their:

l familiarity and/or skill with the system, l ability to manipulate the physical interface, l computer experience (in general, and with HyperCard in particular), l ability to navigate the database (this is related to their knowledge of the

subject, and their experience with laser safety assessment procedures).

Pointers to design of the delivery system The evaluation indicated areas for system development. Some changes were also suggested by the subjects outside of the framework of the evaluation, and these were noted. The most significant points arising from the evaluation were the following changes and additions:

0 To improve consistency, the laser type, energy manipulation, application, and material choices should be selectable from selection screens instead of nested pull-down menus. The fonts of the selection screens should be harmonised.

l The MPE calculation function should be modified: the range of acceptable pulsed laser parameters should be widened, and the current displayed values must be cleared when a user enters different or new data.


l NOHD calculation functions should be added as an option, under ‘optical hazards’.

l MPE/NOHD calculation functions should be added as separate options available from the ‘main option selector’. This will provide flexibility in specifying the values of all the necessary parameters. Both decimal and scientific notation options should be available.

l Further information could be added about handling of excimer lasers, incorporating recent research results from within the project, enclosures, including enclosure material and interlocking, and electrical safety.

Conclusions

Aims It is concluded that, overall, the twofold aims of the evaluation were successfully met in that the usefulness and usability of the system were established, and necessary design requirements for the delivery system highlighted.

The System The criteria for the system to be portable and easy to use were met. The prototype system functions well and is not only acceptable to laser safety assessors but also created a need for the delivery system.

Method We conclude that the idea of using three evaluation methods worked. The data from the field was reliable, but a great deal of planning beforehand was needed to make sure that the data from each method was complementary and compatible but not compromising to the others.

Having experts to suggest the subjects turned out to be successful in that the subjects were a good cross-section of people in the laser safety community. The investigators may not have been so successful if they had made the selection themselves.

The techniques for the analysis of objective data were successful: the performance baseline data was an advantage in establishing objective usability and usefulness, but the small size of the expert group may have introduced difficulties of non-representativeness for the analysis.

The procedures worked very well indeed: those investigators who were not experimentalists by training or background were able to establish good relations with the subjects and to produce high quality data. Field investigators must be thoroughly trained to do this work.

The structure of the procedure, and limited time for access to the subjects prevented data for measures of learnability from being obtained, and although the attitudinal data suggested that the system was easy to use, there was no real objective material to support the subjects’ comments.

The techniques for the analysis of subjective responses were successful: splitting the questionnaire into two parts, one of a technical nature requiring spoken responses and the other, the attitudinal part requiring written

Clarke et al. 379

responses, largely eliminated reticence on the part of the subjects and elicited quite sensitive data. The technique of content analysis worked very well, but the need to set up the analysis categories beforehand must not be under- estimated. The techniques of correlating objective and subjective evaluation data worked well. For example, an indication of future system development needs was obtained by integrating the data from the different methods, but in practice the correlation is probably better limited to inspection, i.e., non- statistical analysis. The subjects were able to highlight problem areas of the interface and interaction, and also to separate these out from what proved to be the acceptable structure of the information interface and the knowledge base.

Meeting the need Finally, the authors conclude that the computer-based LSDSS will meet the need to support laser safety assessors with up-to-date practical information in a readily accessible and usable form.

Acknowledgements

This work was funded by the ACME Directorate of the UK Science and Engineering Research Council. The authors would like to thank all the laser companies and laser safety inspectors who participated in this evaluation, and our collaborators: the UK Health and Safety Executive, and the UK National Radiological Protection Board.

References

Berelson, B. (1952) Content Analysis in Communication Research Free Press

Bright, C., Inman, A. and Stammers, R. (1989) ‘Human factors in expert system design: can lessons in the promotion of methods be learned from commercial DP?’ lnferucfing with Compuf. 1, 2,141-158

British Standards Institute (1989) Radiation Safety of Laser Products and Systems, BS7192:89 (now BS:EN:60825)

Camey, T.F. (1972) Content Analysis: A Technique for Systematic Inference from Com- munications Batsford

Clarke, A.A. (1986) ‘A three-level human-computer interface model’ Inf. J. Man-Mach. Studies 24, 503-517

Day, S.M. (1988) Multistat Statistical Software, BIOSOFT, Cambridge, UK

Edwards, E. (1973) ‘Techniques for the evaluation of human performance’ in Singleton, W.T., Fox, J.G. and Whitfield, D., (eds) Measurement of Man at Work Taylor and Francis, 129-133

Han, S-Y. and Kim, K.T. (1990) ‘Intelligent urban information systems: review and prospects’ in Kim, T.J., Wiggins, L.L. and Wright, J.R. (eds) Expert Systems: Applications to Urban Planning Springer-Verlag

Leventhal, L.M., Teasley, B.M., In&me, K., Rohlman, D.S. and Farhat, J. (1993) ‘Sleuthing in HyperHolmes”: an evaluation of using hypertext versus a book to answer questions‘ Behav. Info. Technol. 12, 149-164

Lindgaard, G. (1994) Usability Testing and System Evaluation Chapman & Hall

380 Interacting with Computers vol 7 no 4 (19%)

Mann, R., Watson, H., Cheney, P. and Gallagher, C. (1989) ‘Accommodating cognitive style through DSS hardware and software’ in Sprague, R. and Watson, H. (eds) Decision Support Systems: Putting Theory info Practice Prentice Hall

Narula, I.S. and Diaper, D. (1991) ‘Third World continuing medical education with hypertext: the Liverpool Anaemia Guide System’ Ergonomics 34, 8, 1147-1159

Preece, J. (ed) (1993) A Guide to Usability: Human Factors in Computing Addison-Wesley

Shackel, 8. (1991) ‘Usability - context, framework, definition, design and evaluation’ in Shackel, B. and Richardson, S. teds) Human Factors for Informatics Usability Cambridge, University Press

Siegel, S. (1958) NonParametric Statistics for the Behavioral Sciences McGraw-Hill

Sliney, D.H. and Wolbarsht, M.L. (1980) Safety with Lasers and other Optical Sources Plenum Press

Stevens, G.C. (1983) ‘User friendly computer systems?: a critical examination of the concept’ Behav. Info. Technol. 2, 1, 3-16

Tyrer, J.R., Vassie, L.H. Clarke, A.A. and Soufi, B. (1994) ‘Laser safety hazard assessment - a generic model’ Optics and Lasers in Eng. 20, 153-161

Vassie, L.H., Tyrer, J.R., Soufi, B. and Clarke, A.A. (1993) ‘Lasers and laser applications in the 1990s: a survey of laser safety schemes’ Optics and Lasers in Eng. 18, 339-347

Wright, G. and Fowler, C. (1986) Invesfigative Design and Statistics Penguin

Zmud, R. (1978) ‘Individual differences and MIS success: a review of the empirical literature’ Management Sci. 25, 10, 966-79

Appendix: Attitude questionnaire

The Questionnaire: Part Two

1. I think that I would like to use this system frequently.

2. I found the system unnecessarily complex.

3. I thought the system was easy to use.

4. I think that I would need the support of a technical person to be able to use this system.

5. I found the various functions in this system were well integrated.

6. I thought there was too much inconsistency in this system.

7. I imagine that most people would learn to use this system very quickly if used often.

Strongly Disagree

1 2

1 2

1 2

1 2

1 2

1 2

1 2

Strongly Agree

5

Clarke et al. 381

8.

9.

I found the system very cumbersome to use.

I felt very confident using the system.

10. I needed to learn a lot of things 1 before I could get going with this system.

2 3 4

2 3 4

2 3 4

Table 3. The subiects’ responses to the attitude questionnaire.

Questions Mean response levels

Strongly agree/Agree Q 1. I think that I would like to use this system frequently. Q 3. I thought the system was easy to use. Q 7. I imagine that most people would learn to use this system very quickly

if used often. Q 9. I felt very confident using the system.

Agree Q 5. I found the various functions in this system were well integrated.

Strongly disagree/Disagree Q 2. I found the system unnecessarily complex. Q 4. I think that I would need the support of a technical person to be

able to use this system. Q 6. I thought there was too much inconsistency in this system. Q 8. I found the system very cumbersome to use. Q 10. I needed to learn a lot of things before I could get going with this system.

4.38 4.38

4.25 4.13

4.0

1.25

1.38 1.63 1.25 1.88

Responses were placed on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Received May 1994; accepted September 1995


Documents

Field evaluation of a prototype laser safety decision support system