15
Playback: A method for evaluating the usability of software and its documentation Human factors evaluations of software products and accompanying user publications must be conducted so that developers can be certain that the target user population can learn to use the product with a mini- mum of difficulty and be able to perform the intended tasks efficiently. A methodology is described for ob- taining objective measures of product usability by col- lecting performance data on the user interface without affecting the user or the system being evaluated. The log of stored activity is later played back through the host system for analysis. D evelopers of programs and their documentation can no longer expect that their products will be used exclusively by data processing professionals. Therefore, these products must be easy for all in- tended users to learn and use. We are regularly reminded of this fact by product advertising in all popular communications. Attention must be focused on the personal interfaces to products and systems to ensure their usability. Human factors specialists are now assisting in the product development proc- ess. These specialists have the following roles in product development: (1) they provide advice and counsel on the user-interface design, (2) they inter- pret design guidelines, (3) they compare alternate designs, and (4) they conduct product evaluations. Comparisons of alternate designs and product eval- uations require that the products be tested. Human factors testing of hardware com ponents of computer interfaces has been performed frequently, but the testing of software has been limited primarily be- 82 NEAL AND SIMONS by A. S. Neal R. M. Simons cause of the complexity of the task. Tests of the usability of software products and their explanatory publications need to be conducted to be certain that the target user population will be able to learn to use each product with minimum difficulty and apply them with the maximum efficiency that their devel- opers intend. More than a single test is usually required to accom- plish these usability goals. For example, an initial test yields information that can be given to product development and publications groups for corrective action. After indicated changes to the product, a second test is made to determine whether those changes have alleviated the problems discovered in the first test and to detect additional problems that the changes may have caused. Frequently, this proc- ess is repeated several times. The earlier in the development cycle that testing begins, the more efficiently product developers can make changes. Testing can begin long before the coding phase of the development cycle by employing prototypes of the user interface. "Copyright 1984 by International BusinessMachines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (I) each reproduction is done without alteration and (2) the Journalreference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Playback: A method for evaluating the usability of software and its documentation

  • Upload
    r-m

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Playback: A method for evaluating the usability of software and its documentation

Playback: Amethod forevaluating the usability ofsoftware and itsdocumentation

Human factors evaluations of software products andaccompanying user publications must be conductedso that developers can be certain that the target userpopulation can learn to use the product with a mini-mum of difficulty and be able to perform the intendedtasks efficiently. A methodology is described for ob-taining objective measures of product usability by col-lecting performance data on the user interface withoutaffecting the user or the system being evaluated. Thelog of stored activity is later played back through thehost system for analysis.

D evelopers of programs and their documentationcan no longer expect that their products will

be used exclusivelyby data processing professionals.Therefore, these products must be easy for all in-tended users to learn and use. We are regularlyreminded of this fact by product advertising in allpopular communications. Attention must be focusedon the personal interfaces to products and systemsto ensure their usability. Human factors specialistsare now assisting in the product development proc-ess. These specialists have the following roles inproduct development: (1) they provide advice andcounsel on the user-interface design, (2) they inter-pret design guidelines, (3) they compare alternatedesigns, and (4) they conduct product evaluations.

Comparisons of alternate designs and product eval-uations require that the products be tested. Humanfactors testing of hardware components of computerinterfaces has been performed frequently, but thetesting of software has been limited primarily be-

82 NEAL AND SIMONS

by A. S. NealR. M. Simons

cause of the complexity of the task. Tests of theusability of software products and their explanatorypublications need to be conducted to be certain thatthe target user population will be able to learn to useeach product with minimum difficulty and applythem with the maximum efficiency that their devel-opers intend.

More than a single test is usually required to accom-plish these usability goals. For example, an initialtest yields information that can be given to productdevelopment and publications groups for correctiveaction. After indicated changes to the product, asecond test is made to determine whether thosechanges have alleviated the problems discovered inthe first test and to detect additional problems thatthe changes may have caused. Frequently, this proc-ess is repeated several times.

The earlier in the development cycle that testingbegins, the more efficiently product developers canmake changes. Testing can begin long before thecoding phase ofthe development cycle by employingprototypes of the user interface.

"Copyright 1984 by International BusinessMachines Corporation.Copying in printed form for private use is permitted withoutpayment of royalty provided that (I) each reproduction is donewithout alteration and (2) the Journalreferenceand IBM copyrightnotice are included on the first page. The title and abstract, but noother portions, of this paper may be copied or distributed royaltyfree without further permission by computer-based and otherinformation-service systems. Permission to republish any otherportion of this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 2: Playback: A method for evaluating the usability of software and its documentation

The purpose of usability testing is to determinewhether a product is easy to learn and use, as wellas to detect areas for improvement. Usability goalsneed to be specified in terms of ease-of-use criteriathat can be measured. An objective test must consistof typical users actually working with a product.Observations and measurements of users' perform-ance can be recorded during and after training.Users' comments, suggestions, and preferencesshould be solicited only after they have had somepractical experience with a product.

Usability evaluation. In usability testing, the testobjectives and scope must first be defined, and adecision as to what parts or aspects of a product totest should be included. For example, the test mightbe designed to evaluate the user interface ofan entireproduct, or the focus might be on such individualfeatures as the following:

• Command syntax.• Task procedures.• Screen layout and context.• Menu navigation.• Messages.• On-line help facility.• Publications.• Training.• System response time.

Operating software is needed for those parts ofa userinterface to be tested. If the testing is to be donebefore actual code is available, a prototype of theuser interface can be created. Prototypes can rangein complexity from simple straight-line simulationsto functioning interfaces. Bury' used a straight-linesimulation in a usability evaluation of a record-selection language for a word-processing machine.In that evaluation, if the query statement was correct,a canned correct result was displayed. To simplifythe simulation, an incorrect query result was notpresented; instead, only a message stating that thequery was incorrect was presented. In the same re-port, Bury also discusses using an elaborate proto-type of a user interface to a programming language,where most of the product's edit and display func-tions actually worked. Prototypes are also useful incomparing design alternatives, because the construc-tion of a prototype is much easier and less expensivethan producing multiple versions of an actual prod-uct.

Test software must run on a system that provides atleast as short a system response time as that expected

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

with the final product in a field environment. Exces-sive delays may adversely affect the user's learningand the operation of the product. The terminal hard-ware also should be typical of the devices expectedto be used with the software in the field.

It is important to provide thesubjects with motivation

approximating that provided by apotential employer.

The user documentation for a product can and usu-ally should be tested at the same time as the software.Current drafts of product tutorials, programmers'guides, and reference books should be obtained.Decisions must be made concerning which books orparts of the books should be used and evaluated.

The target user population for which a product isintended must be determined; product specificationsare often vague about intended users. Representativetest subjects are selected from the population ofintended users. If test subjects from this populationare difficult to obtain, surrogate users are identifiedand agreed upon by all parties involved. Even whentest subjects are obtained from an appropriate userpopulation, it is important to provide the subjectswith motivation that approximates that provided bya potential employer.

Another essential ingredient for a usability test is theselection of tasks for the users to perform with theproduct. Test goals determine the types of tasksselected. If the objective is to measure the effective-ness of a training program or tutorial, one of thetasks is the training itself, including required exer-cises. Subsequent to training, the users should beasked to perform a selection of tasks designed tomeasure how well they have learned to use thevarious features of the product. Other tasks can beemployed to assess usability and productivity. Thesetasks should be representative of the types of activi-ties the ultimate users of the product will perform.

NEAL AND SIMONS 83

Page 3: Playback: A method for evaluating the usability of software and its documentation

Usability testing must initially be conducted in acontrolled environment. To obtain valid and reliableusability measurements, the tester must know how

Comparative measurements ofusability can be made onalternate product designs.

the user was trained, what portions of the systemwere available, and what testing tasks the user per-formed. Later, field studies can be run to identifyspecial problems associated with the integration ofthe product into an actual working environment.Erdmann and Neal2 discuss further the relative pur-poses and merits of controlled laboratory and fieldstudies.

Finally, systematic measurements must be made ofuser performance in order to obtain the quantitativeand qualitative information needed to determine thelevel of usability of the product. Measurements ofuser performance are useful for comparing a currentproduct with earlier versions of that product to seewhether the revisions have yielded improved per-formance. Comparative measurements of usabilitycan be made on alternate product designs. Humanfactors specialists can analyze, summarize, and in-terpret these measurements in order to recommendchanges in both the software and the documentationto optimize product usability.

Measurements to be collected are determined by thegoals of the test and the ease-of-use criteria that mayhave been established for the product. Measurementsof ease of learning may include the following;

• Time required to complete a training program.• Total time needed to achieve a given performance

criterion.• Observed difficulties in learning the product.• User comments, suggestions, and preferences col-

lected in a postlearning interview.

Objective indicators of user difficulties could be thefollowing;

84 NEAL AND SIMONS

• Frequency with which each error message is en-countered.

• Inability to find needed information in the docu-mentation.

• Frequency of use of each section of an on-linehelp facility and the amount of time spent usinghelp.

• Number of times the user asks for special assist-ance (perhaps by calling a simulated product sup-port center).

• Efficiency with which the user employs differentfeatures.

Measurements of ease of use after initial learningmay include the following:

• Time required to perform selected tasks.• Success or failure in completing tasks.• Frequency of use of various commands or lan-

guage features.• Time spent looking for information in documen-

tation.• Measures of user problems similar to those used

to measure learning difficulties.• User comments, suggestions, and preferences.

In this paper, we discuss a methodology we callPlaybackthat we have developed for testing programproducts and their documentation. We first discussthis methodology in general and relate it to thegeneral principles just presented. We then discussPlayback analysis, data collection and recording,program management, and our experience withPlayback.

Playback methodology

The experimental methodology called Playback hasbeen developed at the IBM Human Factors Center tomake the kinds of measurements just described, inorder to evaluate the usability of software and/orsoftware documentation. Playback evolved over sev-eral years and has been employed at the HumanFactors Center to evaluate a variety of softwarepackages operating on systems ranging from stand-alone word processors to large multipurpose com-puters. Several of these studies are presented in thelast section of this paper.

The central idea of Playback is that, while a user isworking with the system, the keyboard activity istimed and recorded by a second computer. Thisstored log of activity is later played back through thehost system for observation and analysis. Thus, the

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 4: Playback: A method for evaluating the usability of software and its documentation

methodology is noninvasive in that the data-collec-tion programs are external to the product beingevaluated, and the method is nonintrusive becausethe data collection does not infringe upon the user'sactivities.

Basic system operation

Figure I is a block diagram of the basic testingsystem. The user sits in front of a terminal that isattached to a host system. The user's keyboard is notconnected directly to the display terminal or to thehost. Instead, the keyboard is connected throughinterface logic to the Human Factors Center labora-tory computer.

The laboratory computer time-stamps and re-cords each keystroke. The time recorded is the cu-mulative time in milliseconds since the start of thetesting session. The laboratory computer then trans-mits the keystrokes to the keyboard interface of thehost system terminal. All the data-collection pro-grams run on the laboratory computer. In contrast,if data-collection programs were to reside in the hostsystem, details of all keying activity, including ac-curate timing information, might be lost becausesome computer terminals send nothing to the hostuntil certain interrupt keys are used. With data col-lection being conducted in the laboratory computer,every keystroke can be timed and recorded.

No modifications are necessary to the host computersoftware that is being evaluated. The host softwaremay be actual product code or a prototype of theuser interface to a software product under develop-ment.

Session initialization. The auxiliary display shownin Figure I is used to initialize the test session, topresent task instructions, and to display the Playbackanalysis. The usability evaluation is divided intoseparate testing sessions. A session can consist ofreading and doing exercises in one chapter of atutorial, or it can include the execution of one taskof a performance test. Beforebeginning a test session,the experimenter usually sets up the host system tothe state at which the user begins execution of aparticular task that involves the software under eval-uation. In addition, the experimenter can enter in-formation that identifies the subject, task, and testconditions.

Whether reading and doing exercises or executing aperformance test, both objectives are accomplished

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Figure 1 Configuration of the Playback system

by software switches in the Playback program on thelaboratory computer. A unique combination ofswitches controls the connection to the host system.One switch determines whether the keystrokes willbe transmitted to the host. Another switch deter-mines whether keystrokes will be interpreted andrecorded by the laboratory computer. The status ofboth switches is shown on the auxiliary display.

When setting up the host system for a session, theexperimenter uses these switches to turn the hostsystem on and the laboratory system off. The labo-ratory system automatically uses the opposite switchpositions (host off, laboratory on) when acceptingidentifying information about the session. Both hostand laboratory switches are turned to on just beforethe user begins to work.

Session identification information is provided to thelaboratory computer by selecting I (Initialize a newtest session) from the primary Playback menu shownin Figure 2. The program then prompts the experi-menter for a subject (user) number, session number,and condition number. In subsequent sessions, thepreviously entered subject and condition numberswill be displayed for verification, and the sessionnumber will be incremented by one. These numbersare displayed one at a time and are verified by theexperimenter by keying the Line Return key. Cor-rections can be made by backspacing and rekeying.

In the final step in this initialization process, thePlayback program requests the experimenter to ver-ify all the identifying numbers. If the experimenterindicates that the numbers are incorrect, the programallows the experimenter to step through the numbersagain to make corrections.

NEAL AND SIMONS 85

Page 5: Playback: A method for evaluating the usability of software and its documentation

Figure 2 Primary Playback menu

Playback Primary Selection Menu

Make a selection by typing the one-letter abbreviation.

A Analyze a subject's test session

o Delete a test session

I Initialize a new test session

S Status information about test sessions on disk

T Terminate Playback program

These session identifying numbers are retained withthe data collected during the session. The conditionnumber identifies experimental conditions such asorder of presentation of tasks or alternate productdesigns being tested.

Task description. After the experimenter verifies theidentifying numbers, the message WHEN READY,PRESS SPACE BAR is displayed on the auxiliary screenby the program. After ensuring that the appropriatedocumentation is available to the user and that thehost system is ready, the experimenter leaves theroom. When the user presses the space bar, thePlayback program displays on the auxiliary screen adescription of the required task to be performed inthis session. The description can be as simple as"Read Chapter 3 of the tutorial and do the prescribedexercises using the ABC system," or as extensive as acomplete description of a computer program to bewritten on the host system.

Keystroke monitoring. The user now attempts toaccomplish the task, using the host system softwareand appropriate documentation. All keystrokesmade by the user are time-stamped and recorded forlater analysis. In addition, the Playback programsends to the host system a predetermined interruptif no user raterrupts have occurred during a selectedtime interval. This is to prevent automatic logoffonsome systems.

Documentation monitoring. We are sometimes inter-ested in evaluating not only the user's interactionswith the software but also with the documentation.We want to know how easily the user can find theinformation needed to solve a problem. In theseevaluations, an observer in a separate room monitorsthe use of the documentation or book activity via

86 NEAL AND SIMONS

television. Figure 3 shows a typical observation sta-tion arrangement. One television monitor providesan overview of the work station, giving the observera view of the user, display, keyboard, and documen-tation. This monitor is used to determine whetherthe user is looking at the terminal or the documen-tation.

A second monitor shows a close-up of the documen-tation or books available to the user. Large pagenumbers are written in the books so that they arereadable on the television monitor. Although theprinted matter is not generally readable on the mon-itors, the observer has a copy of the books in theobservation room for reference. The user is requiredto keep the book or books within certain confines ofa table so that they will be within the range of thecamera. Alternatively, a book box that contains aplatform for the book and a mount for lights andcamera is sometimes employed. The book box per-

The observer may enterobservation codes and comments

about the user's activities.

mits the user to move the book around for comfort-able reading without disturbing the relative positionsof camera and book.

In addition to the overview monitor and the bookmonitor, the observer sometimes employs a slavedisplay that shows the identical information that ison the user's host system display. The observationstation is also equipped with a keyboard-displayterminal connected to the laboratory computer. Anexample of this display is shown in Figure 4. Thesubject, session, and condition numbers establishedduring session initialization are shown at the top. Atthe bottom of the screen is a readout that shows thecumulative session time.

In the center of the display appear three columns,labeled TIME, CODE, and COMMENT. The observer mayenter observation codes and comments about theuser's activities. These entries are time-stamped andrecorded by the laboratory computer. The time re-

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 6: Playback: A method for evaluating the usability of software and its documentation

Figure 3 The observer station

Figure 4 Example observer display

corded is the cumulative session time-the sametime indicator recorded with the keyboard activity.

Five-character alphanumeric codes can be used. Theexperimenter selects easy-to-remember codes to de-note such specific book activities as, for example,using the Index, browsing in Chapter 3, reading page42, or looking at page 19 while keying.

The cursor on the observer's display is initially lo-cated in the code field. When the first keystroke of acode is entered, the current session time is displayedin the time field. Corrections in an entry can bemade with the back-space key. End of the code issignaled by pressing ENTER or TAB. If TAB is pressed,the cursor moves to the comment field to allow theexperimenter to enter a comment of any length. Ifthe entry is too long to be displayed on that line, thecursor is automatically moved to the next line in thecomment field. Alternately, the observer can movethe cursor to that location using the line-return key.An enter key signals completion of that comment.

Subject 12 Session 1 Condition 4

TIME CODE COMMENT

0:23:03

The observer station is usually manned by the ex-perimenter during pilot testing to debug proceduresand tasks and to establish an appropriate set ofcodes.Later the observer station may be manned by alaboratory assistant who is not required to under-stand the host system operation or details about thetasks being performed.

0:03:59 P42

0:04: 15 M10: 18:51 M20: 18:57 PB30:20:38 P19K0:22:02 RTC0:22:46 R178

Studying Programmers Guide.Seems confused!Subject requested a coffee break

Searching for topic in Table of Contents

Requests for assistance. A function key is providedthe user to request assistance from the experimenter.This assistance is separate from and in addition toanyon-line help function that might be available inthe host software. The user might request assistancefor various reasons including lack of understandingof task instructions, inability to solve the problem,or malfunction of the host system. The Playbackprogram acknowledges requests with an audio signalin the testing room, a signal light outside the testingroom, and messages written on the user's auxiliaryscreen and the observer's display. This assistancecondition is turned off and the user resumes workupon a second depression of the Assistance key.

If an observation station is employed in the evalua-tion, information about the nature of the assistancerequest and the type of assistance given is recordedvia codes and comments entered on the observer'sterminal.

Session completion. The user presses a function keylabeled DONE to indicate that the task is completed

IBM SYSTEMS JOURNAL, VOL 23, NO 1. 1984

satisfactorily. The Playback program acknowledgesthe done key in a manner similar to its response toa request for assistance. The session timer is stoppedimmediately, but the observer is allowed to completeany code or comment entries already started. Finally,the observer's display and the user's auxiliary displayare erased and the summary data for this session arestored.

Stored with the task description for a session is acode indicating whether this is a terminalor contin-uoussession. If it is a continuous session, the subject,session, and condition numbers are displayed for thenext session, along with the message to press thespace bar when ready. The observer's display is alsoupdated for the next session. The user may take abreak at this point because the session timer is notstarted until the space bar is pressed. Continuoussessions can be employed only ifno setup ofthe hostsystem is required between sessions.

NEAL AND SIMONS 87

Page 7: Playback: A method for evaluating the usability of software and its documentation

Figure 5 Example statistics display

SUBJECT 12 SESSION 1 CONDITION 4

ENTER CR CLEAR ERASE INPUT ERASE EOF DEL15 2 3 0 1 25

RIGHT LEFT DOWN UP RIGHT TAB LEFT TAB0 12 0 0 0 1

PF1 PF2 PF3 PF4 PF5 PF62 0 1 0 0 5

PF7 PF8 PF9 PF10 PFll PF120 0 0 0 0 3

PAl PA2 FIELD MARK DUP INS MODE TEST REQ0 0 0 0 1 0

RESET HELP3 1

--------------- TIME (Sec) --------- ..----- --- KEYSTROKES ---1st KS Penultimate KS Session Help Interrupts Total

12 1782 1800 125 29 231

If DONE is pressed for a terminal session, a messageon the user's auxiliary screen asks him to wait forthe experimenter. At this point, the experimentermay request the Playback program to present sum-mary statistics on the user's auxiliary screen. Figure5 shows a sample of such a display. Shown are thesubject, session, and condition identifiers; frequencycounts offunctions, commands, and keystrokes;andvarious timing measures. The experimenter maydesire to discuss some of these measures with theuser for motivational or tutorial purposes.

During this break between sessions,the experimenterhas the opportunity to make any setups on the hostsystem that may be necessary for the next session.The experimenter has the capability to abort a ses-sion and readjust the session identifiers, in the eventof operational difficulties.

Playback analysis

Playback analysisis the key feature of the Playbackprogram. Each session is separately stored in the

88 NEAL AND SIMONS

laboratory computer. The experimenter usually con-ducts a Playback analysis of the user's performanceafter the user has completed all required sessions.There are occasions, however, when the playbackis done immediately after a session with the userpresent, in order to obtain supplementary informa-tion about the user's thoughts or reasons for partic-ular actions while performing the task. Playbackanalysis can be performed many times if necessary.

Playback setup. To conduct a playback analysis, theexperimenter selects A (Analyze a subject's test ses-sion) from the primary Playback program menushown in Figure 2. The Playback program runningon the laboratory computer asks the experimenterfor the subject (user) and session numbers to beanalyzed. The experimenter then uses the specialswitches, discussed earlier, to tum off the laboratorycomputer and tum on the host system, in order toset up the host program to the state that existed priorto the time the user began the sessionto be analyzed.The special switches are then set so that only thelaboratory computer is on.

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 8: Playback: A method for evaluating the usability of software and its documentation

Playback pacing. The experimenter then requestsPlayback to pace through the user's actions duringthe session. This is accomplished by pressing one offour function keys to cause the laboratory computerto send one of the following sequences of user key-strokes to the host system:

• Next keystroke.• All characters keyed up to and including the next

function selected.• All characters and functions up to and including

the next interrupt.• All characters up to and including the next func-

tion with the same time intervals as when origi-nally keyed by the user.

Each time the experimenter presses one of thesefunction keys, the laboratory computer sends theappropriate characters to the host system so that thehost terminal display appears just as it appeared tothe user at that same point in the test process. Theexperimenter may switch between pacing methodsat any time during the Playback analysis.

Playback display. At the same time the laboratorycomputer transmits the characters to the host system

Figure 6 Sample Playback screen

it also writes the characters on an auxiliary display.An example of the display the experimenter mightsee while pacing through the user's actions is shownin Figure 6. The previous function or interrupt theuser selected is displayed along with the cumulativesession time at the point the function was keyed. Allthe keystrokes up to the next function or interruptare shown on the following lines, along with thesession time when the first keystroke was made. Thenext line shows the subsequent function or interrupt.The display also shows the time interval betweenentering the two functions. Also shown on the dis-play are the observer's codes and comments thatwere entered from the observation station duringthis period, along with the session time when theywere entered. The next observer code and commentsubsequent to the current time are also shown.

Occasionally, when pacing through a session, theexperimenter desires to back up and replay a shortsegment of the action. A back-up key is provided forthis purpose. When the back-up function is used, thePlayback display shows the previous interrupt. Onecan back up all the way to the beginning of thesession if that is desired. Because it is not feasible toback up the host system, the host display remains

Subject 12

0:20:300:20:550:21:20

Session 1

CLEARA = (2217) * R**2ENTER

Condition 4

Function interval: 50 sec.

- - - - --- -- --- --- --- - - -- -- --- - - -- OBSERVATIONS --- -- -- -- - --- -- - - - - - -- - -- --

0:20:380:22:02

P19KRTC Searching for topic in Table of Contents

PF1: Next keystrokePF4: Real time

PF2: Next functionPF6: Set time

PF3: Next interrupt<--: Backup CLEAR: Abort

0:21:20 E5 Syntax error because referring to wrong page inProgrammers Guide

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984 NEAL AND SIMONS 89

Page 9: Playback: A method for evaluating the usability of software and its documentation

static until forward pacing is resumed and catchesup to the point where the backing-up began.

Problem determination and recording. As the exper-imenter paces through the user's activity, he canenter additional codes and comments into the datastream in the same way they were entered by theobserver during the session. These codes and com-

Adetailed analysis requires theexperimenter to be very familiar

with the software, documentation,and user's task.

ments may be additional observations not notedoriginally, or they may be indicators of certain errorsor problems noted during analysis. These codes arealso time-stamped with the cumulative session time,so that in later review there is a record of the pointin the session at which errors or problems occurred.

A detailed analysis of the user's actions during Play-back requires the experimenter to be very familiarwith the software and documentation being evalu-ated, as well as the nature of the user's task on thatsession.

Data collection and recording

Tape records. All user keystrokes and the cumulativesession time in milliseconds are recorded on tapeduring the session. Also recorded on tape during thesession are the experimenter's codes and commentsentered from the observation station (along with theassociated session time). In addition, the followingstatistics are collected during the session and writtenon tape at the conclusion of the session:

• Time from session beginning (space bar) until firstuser keystroke.

• Time from session beginning until last keystroke.• Cumulative time in an "Assistance" condition.• Total session time from session beginning to DONE

key.

90 NEAL AND SIMONS

• Frequency of use of each function key.• Number of requests for assistance.

Also recorded on tape are the experimenter codesand comments entered during Playback analysis. Alldata on tape for a session are preceded by an iden-tification record containing the subject (user), ses-sion, and condition numbers. The tape records arefor archival purposes, allowing later analysis of otherinformation that was not required at the time oftesting.

Disk records. In addition to the above tape records,a disk data set is written by the Playback program toinclude the user's keystrokes and the codes andcomments (entered both during the session and dur-ing the playback analysis). This data set is used toaccomplish the Playback analysis. It may also bedisplayed or printed by the experimenter. Severaloptions are available. The display or printout maycontain only the user's keyboard activity, only thecodes and comments, or both. Figure 7 is a sampleprintout of this data set, showing both types ofentries. The first column is the cumulative sessiontime in hours, minutes, and seconds. The secondcolumn shows the codes entered by the observerduring the session and codes entered by the experi-menter during Playback analysis. The third columncontains either the user's keystrokes or the observer/experimenter comments. If user keystrokes areshown in the third column, the time field shows thesession time for the first keystroke, and the code fieldremains blank. Function keys are listed on a separateline. Continuation lines are allowed for the com-ments, and these are indicated by a quotation mark(") in the code column.

Data analysis. The Playback program also allows theexperimenter to perform limited analysis of this dataset to obtain measurements such as the following:

• Frequency of selected experimenter-entered codes.• Time between selected pairs of codes.• Time between selected code and next user key-

stroke.

This information may be accumulated over a singlesession or over many sessions. If needed, more so-phisticated analysis of the user- or experimenter-entered data may be performed using ad hoc pro-grams for manipulating either the disk or tape rec-ords.

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 10: Playback: A method for evaluating the usability of software and its documentation

Figure 7 Printed record of a portion of a session

SUBJECT 40

<CLEAR>

SUBJECT KEYSTROKES / OBSERVER COMMENTS

CONDITION 40SESSION

<PF8><CLEAR>

PCOPY' 3 WS2 'C2'<ENTER>

Studying Programmers GuideSeems confused!Subject requested a coffee break

)PCOPY WSASKED ABOUT LOADING FROM PUBLIC LIBRARYAFTER EXPLAINING, SHE KNEW TO CLEAR WSAND USE PCOPY)PCOPY CREDIT GETANS<ENTER>)PCOPY CREDIT GETINFO<ENTER>)FNS<ENTER>

WS2 C2<ENTER>

A = (2217) * R**2<ENTER>Syntax error because referring to wrong page inProgrammers GuideSearching for topic in Table of Contents<CLEAR>

RTC

B

R178E

P19K

E

CODE

E

B54E

P42..M1M2P83

B5657

H..

*E5

TIME

0:25:590:26:220:26:290:26:420:26:510:26:530:27:020:27:030:27:130:27: 190:27:330:27:510:27:570:27:590:28:000:28: 110:28:360:28:370:28:430:28:580:29:010:29:04

0:03:59

0:04:150: 18:510:18:570:20:300:20:380:20:550:21:200:21:20

0:22:020:22:380:22:460:23:300:25:000: 25: 14

Program management

Concurrent experiments. Multiple copies of the Play-back program may be loaded in the same Human

Factors Center computer. Each copy controls oneuser and optionally one observer station. All exper-imental data are recorded on a community tape thatis later separated by user.3

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984 NEAL AND SIMONS 91

Page 11: Playback: A method for evaluating the usability of software and its documentation

Figure 8 Primary menu for the loader Program

Playback loader Primary Selection Menu

Make a selection by typing the one-fetter abbreviation.

A Add or modify instructions.

o Delete instructions for one task.

L List task numbers stored.

S Specify task sequence for a condition.

T Terminate Playback Loader program.

Job parameters. Before the Playback program canbe executed, multiple copies of the program must becreated, several data sets must be created, and datadefinitions must be established for each data set usedby Playback. In order not to burden the experimenterwith programming details, an interactive EXEC thatperforms all the preceding functions is provided. TheEXEC interrogates the experimenter for the followingparameters of his experiment:

• Name of experiment.• Number of subjects' data to be stored concur-

rently.• Maximum number of keystrokes per subject.• Maximum number of sessions per subject.• Maximum number of observations per subject.• Average length of task instructions.

When the Playback program is initially loaded, suchjob parameters as the following are specified:

• Input and output computer ports for the variouskeyboards and displays.

• Whether an optional observation station is used.• Whether separate task descriptions are to be used

or a common description as a default.• Default maximum session time.• Whether end-of-session statistics should be dis-

played.

These and other job parameters are saved. Thus,when Playback is later executed, only those param-eters that the experimenter wants to change need beentered. Ordinarily all parameters remain un-changed from run to run.

File management. The selection of S (Status infor-mation about test sessions on disk) on the Playback

92 NEAL AND SIMONS

program's primary menu shown in Figure 3 allowsthe experimenter to note the disk status by displayingthe number of sessions recorded on disk for eachsubject number. Also shown is the percentage ofallocated disk space that is filled. The selection of D(Delete a test session) allows the experimenter todelete sessions no longer needed on disk.

Playback loader. A separate interactive program,which also runs on the same Human Factors Centercomputer, performs several functions necessary forpreparing an experiment using the Playback pro-gram. The primary display menu for this PlaybackLoader program is shown in Figure 8.

The selection of item A (Add or modify instructions)on the Playback Loader menu allows the experimen-ter to enter or modify the task descriptions presentedto the test user at the beginning of each session.These task descriptions are entered with an arbitraryidentifying number. Figure 9 shows a sample displayfrom this function of the Playback Loader program.In addition to the entry of the task description, thisfunction requests a maximum time to be allowed forthe task (0 specifies no limit). Also, a specificationcan be entered to indicate whether, at the conclusionof the task, the experimenter should be called (i.e., aterminal session) or whether the next task should bepresented (i.e., a continuous session).

Selection of item D on the menu (Delete instructionsfor a task) allows the experimenter to delete one ofthe previously entered task instructions. Selection ofitem S on the Playback Loader menu (Specify tasksequence for a condition) allows the experimenter tospecify the order of presentation of the tasks for eachtest condition. Function keys are used to insert anddelete tasks from this list. Selection of item L (Listcondition numbers stored) lists the condition num-bers for which task indexes have been entered.

It is not necessary to use Playback Loader beforeusing the Playback program if defaults are preferred.The default task instruction is the following: COM-PLETE THE TASK FOR SESSION __. In the default, alltest users receive the tasks in the same order, allsessions are terminal sessions (experimenter iscalled), and the maximum session time is the samefor all sessions.

History and applications

The first application of the Playback process at theHuman Factors Center was in a study of word-

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 12: Playback: A method for evaluating the usability of software and its documentation

Figure9 Sample task description

Task: 43 Maximum time: 10 Conti nue/stop: C

#43: Create a new workspace to be stored in your private library. Thename of the new workspace should be NEW. The new workspace shouldcontain all of the functions that start with the letter D in theworkspace named CREDIT. Also, NEW should contain the variables thatstart with the letter C in the workspace named WS2.

PF3: File PF4: Max time PF5: Continue/stop PF12: Cancel

processing machine functions and displays con-ducted in 1980. Users' keyboard activity, which hadbeen collected earlier while they performed selectedediting tasks, was replayed through a simulated wordprocessor in order to tutor the users. In this study,the programs simulating the host system and theprograms for data collection and for playing backtasks all ran on one of the IBM System/? computersin the Human Factors Center.

Clauer' recognized the usefulness of the playing backof tasks as a method of data collection. He adaptedthis technique in an evaluation of the self-trainingmaterial for a free-standing word-processing ma-chine. The host system was an IBM Displaywriter.An early version of the Playback program ran on anIBM System/? computer. Keystrokes were collectedand recorded while users performed training exer-cises and did test problems with the Displaywriter.The users' keystrokes were later replayed throughthe Displaywriter in order to observe problems inthe use of either the systems or training book. Theseproblems were recorded by the experimenter duringthe paced replay by entering error codes into thePlayback program to designate the section of thetraining book where problems occurred. Experimen-ter comments were also recorded.

The number and severity of user problems were theprimary data item collected, along with training

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

time, test completion time, and frequency of callsfor assistance. Four interactions of this test wereconducted, with modifications being made in thetraining material as a result of the findings of eachinteraction of the test. The process resulted in veryworthwhile improvements in the training, as re-flected in several of the performance measures. Therewere fewer requirements for assistance, fewer signifi-cant difficulties, shorter criterion test times (reflect-ing greater retention), and also somewhat highersubjective ratings of the training manual and skillproficiency.

The Displaywriter test clearly proved the utility ofthe Playback approach for the evaluation of theeffectiveness of training material for a small stand-alone system. The next application of the techniqueat the Human Factors Center was in a study of howbest to introduce novice users to the concepts of aninteractive data base query language. The host sys-tem was a prototype ofa query language running onan IBM System/370 VM system.

Additional Playback pacing methods were incorpo-rated into the Playback program before its next usein a human factors study of IBM BASIC by Bury.' Thestudy included an evaluation of a tutorial, an editor,and other interactive aspects of the language. Thehost system was, at first, a prototype running on aVM system. The user learned how to use the language

NEAL ANO SIMONS 93

Page 13: Playback: A method for evaluating the usability of software and its documentation

and editor from a tutorial that required exercises tobe entered on an IBM 3277 Display Station. Aftercompletion of training, users had to write a numberof programs using IBM BASIC. User actions wererecorded and later played back through the hostsystem to observe user problems with both the soft-ware and the training materials.

Six iterations of this study were conducted, withimprovements in both the system and tutorial being

Experiences with Playbackconvinced us of the usefulness ofthe methodology to evaluate bothtraining material and the software.

made after each test. The first iterations were con-ducted using a prototype of IBM BASIC, while the latertests were conducted with the actual product code.No modifications were required in the Playbackprogram running on the laboratory computer toaccommodate this change. In almost all cases, im-provements in user performance measures wereachieved with new versions of the system and tuto-rial.

These experiences with Playback convinced us of theusefulnessof the methodology to evaluate both train-ing material and the software itself. The programwas then completely rewritten to make it an efficientgeneral-purpose laboratory tool. This new general-ized version of Playback was first used in an evalu-ation of the training material for QMF, a data basequery facility. Later, Ogden and Boyle" used thesame program to compare three different methodsof report modification after completing a query of adata base. Prototypes of the user interface for thethree methods ran on a VM system. The Playbackprogram was employed to conduct a replay analysisof the users' activities and to collect performancedata. The performance data that were obtainedclearly revealed the relative usability of the alternatedesigns. The product developers were then able toselect for the product that design which yielded thebest user performance.

94 NEAL AND SIMONS

This version of the Playback program was also usedto study the usability of the advanced functionsavailable in IBM BASIC. In this study, experiencedprogrammers attempted to write BASIC programs tosolve selected problems. They were required to useone or more of the advanced features on each task.The users needed the Language Reference Manualto learn how to use the advanced features, since noneof the programmers were familiar with them. Al-though the emphasis in this study was on the ad-vanced features of the software, inadequacies in thereference material were also revealed.

Late in 1982, several evaluations were planned fornew versions of programming languages in whichthe emphasis was on the software documentation.Modifications in the Playback program again ap-peared to be called for. Since these modificationswere substantial, a whole new Playback program wasdesigned.

Coding of this new version was completed in the firstpart of 1983. The largest addition to the programwas the capability of recording and displaying codesand comments entered from an observer station.The method of entering error codes and commentsduring Playback analysis was also modified, as wasthe method of storing the user's keyboard activity.Another major addition was the ability to displaytask instructions for each session. The PlaybackLoader program was developed to facilitate the entryof these instructions as well as the entry of specifi-cations for such other new options as continuoussessions, varying task orders, and flexible sessiontime limits. A number of other changes were madein the program to increase its utility and ease ofoperation.

The current version has now been used to evaluatethe Application Programmers Guide and LanguageReference Manual for a new version of a program-ming language. It has also been employed to test theeffectivenessof a primer for novice users of an inter-active system under development.

The early versions ofPlayback were all programmedto run on the Human Factors Center's IBM System/7 computers. The latest version has been convertedto run on the Human Factors Center's IBM Series/Icomputer network.

Although we feel that Playback is now a very flexibleand useful human factors data-collection tool, weanticipate further improvements to be made as we

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984

Page 14: Playback: A method for evaluating the usability of software and its documentation

gain more experience using it in the evaluation ofsoftware and software documentation.

Concluding remarks

The Playback methodology has proved itself to be avery effective tool for objective evaluations and com-parisons of software, including user interface designand software documentation. The Playback programincorporating this methodology has a number ofattributes that make it flexible and useful.

Playback is a general-purpose data-collection pro-gram. No changes in the program are required fromone application to another. Even radically differenthost systems (ranging from free-standing word proc-essors to large multiple-user systems) can be moni-tored with little or no modification of the Playbackprogram. Some setup, of course, is required for eachindividual experiment. Such preparations as enteringtask descriptions and various program options arehandled by the experimenter using interactive fea-tures of the Playback and Playback Loader programs,without requiring the services of the programmer.The discussions in the previous section of the mod-ifications made to the Playback program may haveimplied the contrary, but these changes were en-hancements to the general utility of the program,and were not added specifically for any particularexperiment.

The Playback methodology requires no modificationof the host system software. All experimental controland data-collection functions are in the Playbackprogram running on the laboratory computer.

The data-collection process does not intrude on theuser's thoughts or activities. The user operates theactual or prototype system in a separate room with-out necessarily being aware of the extent of the databeing collected on his activities. The experimenter,of course, tells the test user that his performance isbeing monitored. The user, however, perceives notime delay due to the Playback program capture ofkeystrokes before transmission to the host system,nor is he aware of any other interference.

The Playback program provides the unique capabil-ity for replaying the user's keystrokes back throughthe host system for later analysis. This allows theexperimenter not only to know what keystrokesweremade while the user interacted with the host system,but also allowsthe experimenter to see on the displaywhat the user saw at each point in the execution of

IBM SYSTEMS JOURNAL. VOL 23, NO 1. 1984

the task. The speed of Playback is controlled by theexperimenter. Where the user had no apparent prob-lem, the analysis can proceed quickly. Where therewereproblems, the experimenter can pace the reviewmore slowly or even back up and replay sections

An eight-hour day of user activitycan usually be analyzed in an

hour or two.

requiring careful analysis. Our experience has beenthat an eight-hour day of user activity can usually beanalyzed in an hour or two, depending upon thenumber of user problems, task complexity, and de-tail of analysis desired. This is certainly much lessanalysis time than would be required to review anequivalent amount of user activity had it been re-corded on video tape only.

Although the use of an observer station can besomewhat labor-intensive, the station can bemanned by a laboratory assistant, thereby freeingthe experimenter for other work. This is possiblebecause problem determination is done during theplayback analysis, rather than during the session.

Finally, all user actions and observer entries arerecorded on-line. Data are thus immediately avail-able for computer-aided summary and analysis.

AcknOWledgments

The authors would like to express their appreciationfor the valuable contributions made toward the de-velopment of the Playback methodology by a num-ber of members of the Human Factors Center. Cal-vin Clauer was the first to apply the technique as adata-collection tool. James Boyle, Kevin Bury, Wil-liam Ogden, and Barbara Isa used the technique inearly stages of development and suggested improve-ments. Michael Darnell and Susan Wolfe debuggedboth the latest System/? and Series/ I versions ofPlayback. They also used the program in severalstudies and helped improve the program's utility as

NEAL AND SIMONS 95

Page 15: Playback: A method for evaluating the usability of software and its documentation

a general-purpose laboratory tool. The operation ofPlayback would not be possible without the contri-bution of Rob Cotton, who designed and built theinterface logic that allows Playback to be used witha wide variety of stand-alone and terminal devices.In addition, the authors acknowledge ideas obtainedfrom the IBM Product Usability group in Atlanta,Georgia, on techniques for monitoring the use ofdocumentation.

Cited referencesI. K. F. Bury, Prototyping on CMS: Using Prototypes to Conduct

Human Factors Tests ofSoftware During Development, IBMHuman Factors Center Technical Report HFC-43, IBM Gen-eral Products Division, 5600 Cottle Road, San Jose, CA 95143(February 1983).

2. R. L. Erdmann and A. S. Neal, "Laboratory versus fieldexperimentation in human factors," Human Factors 13, No.6,521-531 (1971).

3. R. M. Simons, A Community Tape, IBM Human FactorsCenter Technical Report HFC-27, IBM General ProductsDivision, 5600 Cottle Road, San Jose, CA 95143 (December1977).

4. C. K. Clauer, "Methodology for testing and improving oper-ator publications," Proceedings of Office Automation Confer-ence, American Federation of Information Processing Socie-ties, San Francisco, CA (1982), pp. 867-873.

5. w. C. Ogden and J. M. Boyle, "Evaluating human-computerdialog styles: Command versus form/fill-in for report modifi-cation," Proceedings of The Human Factors Society 26thAnnual Meeting, Santa Monica, CA (1982), pp. 542-545.

Alan S. Neal IBM General Products Division, 5600 Cottle Road,San Jose, California 95193. Mr. Neal is the manager of the IBMHuman Factors Center in San Jose, California. In the twenty yearshe has been with IBM, Mr. Neal has concentrated on designingand conducting experiments with the aim of optimizing the inter-face between computer systems and their users. Mr. Neal beganhis career with IBM in 1964 when he joined the Advanced SystemsDivision, where he worked as an engineering psychologist. In 1970he transferred to the Research Division. He was a charter memberof the Human Factors Center when it was formed in 1973, andwas made a manager of interdivisional projects in the HumanFactors Center in 1974. In 1981 he became manager ofthe softwarehuman factors group within the Human Factors Center. Mr. Nealwas promoted to Senior Human Factors Engineer and manager ofthe Human Factors Center in August of 1982. He is a graduate ofPurdue University (B.S.) and Iowa State University (M.S.) inexperimental psychology. He has been an active member of theHuman Factors Society since 1963, and is currently serving thatorganization as Publications Committee Chair and Managing Ed-itor.

Roger M. Simons IBM General Products Division, 5600 CottleRoad, San Jose, California 95193. Mr. Simons is a Senior Pro-grammer in the Human Factors Center in San Jose, California. Inthis assignment, he has developed programs that simulate theoperator interface and measure operator performance for IBMproducts, including keyboards, displays, electronic typewriters,word processors, software, and software documentation. Mr. Si-mons joined IBM in 1955 at San Francisco, where he was anApplied Science Representative. In this position, he assisted IBM

96 NEAL AND SIMONS

customers and IBM sales personnel in developing scientific com-puting applications. Since 1959, he has held various programmingand management assignments in San Jose, including Manager ofthe Engineering and Scientific Computation Laboratory. Mr. Si-mons received a B.S. degree in mathematics from Stanford Uni-versity and a Ph.D. degree in applied mathematics from MIT. Heis a member of the Association for Computing Machinery andSigma Xi, and is a former chairman of the Bay Area Chapter ofthe ACM.

Reprint Order No. G321-5211.

IBM SYSTEMS JOURNAL, VOL 23, NO 1, 1984