10
Collaboration in Performance of Physical Tasks: Effects on Outcomes and Communication Robert E. Kraut, Mark D. Miller, Jane Siegel Human Computer Interaction Institute Carnegie Mellon University Pittsburgh, PA, U.S.A. 15213 Robert. Kraut @ andrew.cmu,edu Mark.D.Miller@ andrew.cmu.edu Jane. Siegel @cs.cmu.edu ABSTRACT We report an empirical study of people using mobile collaborative systems to support maintenance tasks on a bicycle. Results show that field workers make repairs more quickly and accurately when they have a remote expert helping them. Some pairs were connected by a shared video system, where the video camera focused on the active workspace and they communicated with full duplex audio. For other pairs, either the video was eliminated or the audio was reduced to half duplex (but not both). Pairs’ success at collaboration did not vary with the communication technology. However, the manner in which they coordinated advice-giving did vary with the communication technology. In particular, help was more proactive and coordination was less explicit when the pairs had video connections. The results show the value of collaboration, but raise questions about the interaction of communication media and conversational coordination on task performance. Keywords Wearable computers, empirical studies, collaborative work, conversation, media effects, vehicle maintenance. INTRODUCTION This paper investigates the design and value of collaborative, mobile computer systems to support field repair and maintenance of mechanical devices. When a pilot radios in a possible malfunction in an aircraft, the field crew has only a short time to diagnose and repair the problem. Smailagic and Siewiorek [18] have described mobile maintenance systems, which incorporate computers, on-line repair manuals and diagnostic aids. These systems could give field workers access to schematics, repair histories, instructions, parts lists, and other information about the equipment being repaired. With the addition of Permissionto make digital/hardcopies of all or part of this material for personal or classroom use is granted without fee provided that the copies are not made or distributedfor profit or commer~al advantage, the copy- right notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission and/or fee. Computer Supported Cooperative Work ’96, Cambridge MA USA @ 1996ACM o_8979 l_7fj5_o/9fj/l 1 ..$3.50 telecommunications, they can also provide access to remote experts. Designers expect that these systems will be useful for aircraft and other mechanics as well as emergency personnel and other workers who face unusual field conditions and could be supported by documents, data, and helpers that are not at hand. Incorporating an interpersonal communication component in what are otherwise lightweight, portable computers makes their design substantially harder. Empirical field research has documented the value of collaboration when workers are troubleshooting and repairing complex equipment like elevators [2], copiers [13], or telephone equipment [14]. Based on this work, we presume that the addition of communication facilities would be valuable, although we know of no empirical tests of this proposition. Some designers have recommended the use of video as the basis of a shared workspace between a field worker and a remote helper (e.g., [11]). The video could be used to broadcast current conditions in the field to an expert staffing a help desk. For example, an emergency medical field system developed by British Telecom and ABB enables physicians in a hospital remotely to supervise paramedics as they conduct sophisticated procedures on accident victims [1]. Yet, video substantially increases the costs of mobile systems and their technological challenges. Numerous studies of video telephony for white-collar, office tasks have failed to show that video improves task performance in this domain ([3] [12]; see also [20] for an old but still useful review). The applicability of this earlier research, however, may be low, since most video telephony systems use a “talking heads” model, in which the cameras broadcast pictures of the people in conversation. It is possible that video focused on what a worker is doing rather than on the worker may improve communication, by grounding the conversation in a shared view of the rapidly changing world. How people ground their conversation and how they exploit features of communication medium to effectively coordinate their speech with each other are 57

Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

Collaboration in Performance of Physical Tasks:Effects on Outcomes and Communication

Robert E. Kraut, Mark D. Miller, Jane SiegelHuman Computer Interaction Institute

Carnegie Mellon UniversityPittsburgh, PA, U.S.A. 15213

Robert. Kraut @ andrew.cmu,eduMark.D.Miller@ andrew.cmu.edu

Jane. Siegel @cs.cmu.edu

ABSTRACTWe report an empirical study of people using mobilecollaborative systems to support maintenance tasks ona bicycle. Results show that field workers makerepairs more quickly and accurately when they have aremote expert helping them. Some pairs wereconnected by a shared video system, where the videocamera focused on the active workspace and theycommunicated with full duplex audio. For other pairs,either the video was eliminated or the audio was reducedto half duplex (but not both). Pairs’ success atcollaboration did not vary with the communicationtechnology. However, the manner in which theycoordinated advice-giving did vary with thecommunication technology. In particular, help wasmore proactive and coordination was less explicit whenthe pairs had video connections. The results show thevalue of collaboration, but raise questions about theinteraction of communication media and conversationalcoordination on task performance.

KeywordsWearable computers, empirical studies, collaborativework, conversation, media effects, vehicle maintenance.

INTRODUCTIONThis paper investigates the design and value ofcollaborative, mobile computer systems to supportfield repair and maintenance of mechanical devices.When a pilot radios in a possible malfunction in anaircraft, the field crew has only a short time to diagnoseand repair the problem. Smailagic and Siewiorek [18]have described mobile maintenance systems, whichincorporate computers, on-line repair manuals anddiagnostic aids. These systems could give field workersaccess to schematics, repair histories, instructions,parts lists, and other information about the equipmentbeing repaired. With the addition of

Permissionto make digital/hardcopies of all or part of this material forpersonal or classroom use is granted without fee provided that the copiesare not made or distributedfor profit or commer~al advantage, the copy-right notice, the title of the publication and its date appear, and notice isgiven that copyright is by permission of the ACM, Inc. To copy otherwise,to republish, to post on servers or to redistribute to lists, requires specificpermission and/or fee.Computer Supported Cooperative Work ’96, Cambridge MA USA@ 1996ACM o_8979l_7fj5_o/9fj/l1 ..$3.50

telecommunications, they can also provide access toremote experts. Designers expect that these systemswill be useful for aircraft and other mechanics as wellas emergency personnel and other workers who faceunusual field conditions and could be supported bydocuments, data, and helpers that are not at hand.

Incorporating an interpersonal communicationcomponent in what are otherwise lightweight, portablecomputers makes their design substantially harder.Empirical field research has documented the value ofcollaboration when workers are troubleshooting andrepairing complex equipment like elevators [2], copiers[13], or telephone equipment [14]. Based on this work,we presume that the addition of communicationfacilities would be valuable, although we know of noempirical tests of this proposition.

Some designers have recommended the use of video asthe basis of a shared workspace between a field workerand a remote helper (e.g., [11]). The video could beused to broadcast current conditions in the field to anexpert staffing a help desk. For example, an emergencymedical field system developed by British Telecom andABB enables physicians in a hospital remotely tosupervise paramedics as they conduct sophisticatedprocedures on accident victims [1]. Yet, videosubstantially increases the costs of mobile systems andtheir technological challenges.

Numerous studies of video telephony for white-collar,office tasks have failed to show that video improvestask performance in this domain ([3] [12]; see also [20]for an old but still useful review). The applicability ofthis earlier research, however, may be low, since mostvideo telephony systems use a “talking heads” model,in which the cameras broadcast pictures of the peoplein conversation. It is possible that video focused onwhat a worker is doing rather than on the worker mayimprove communication, by grounding theconversation in a shared view of the rapidly changingworld.

How people ground their conversation and how theyexploit features of communication medium toeffectively coordinate their speech with each other are

57

Page 2: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

important theoretical issues in their own right. Inaddition, answers to these questions can also provide atheoretically motivated basis for many designdecisions. Both Clark [5] and Krauss and Fussell [10]argue that in coordinating speech, conversationalistsneed to achieve mutual knowledge or common ground.That is, when Person A forms an utterance for PersonB, A uses the history of the conversation, their sharedcontext, their prior knowledge of each other, and otherinformation to infer what B would understand. Thus,Adam can say to Eve, “We shouldn’t have eaten thatfruit” because they have a common history that leadsAdam to infer that Eve and he will have the same applein mind, and he knows that Eve will know this aswell. Clark and Wilkes-Gibbs [5] argue that when aspeaker offers an assertion, both parties to theconversation actively collaborate in making sure thatthe assertion is understood and that both parties knowthat it has been understood before the speaker can offeranother assertion.

Different communication media have features thatchange the ease with which conversationalists achievecommon ground and the methods that they use toachieve it. Krauss and his colleagues ([8] [9] [10])have shown that speakers become more efficient indescribing objects with repetition, because they canassume that their partner understands the shorthand theyare using. This growth in efficiency, however, isdisrupted by delays of the sort introduced by satellitesor video compression, because with delay a listenercannot easily indicate what he or she has understood.Clark and Brennan [4], in a speculative article, tried todecompose the differences among communicationmodalities in terms of their features and the costs thesefeatures impose on achieving common ground. Theynoted, for example, several features of face-to-facecommunication that change the cost of achievingcommon ground: visual co-presence, which allows aspeaker to refer efficiently to objects that are in thespeaker’s and listener’s shared view; sequentirdity ofmessage arrival, which allows a speaker to know whata listener has already received before creating a newmessage; and ephemerality of messages, which meansthat speakers cannot assume that listeners have accessto all the details of a previous message.

We are interested in how these general principles ofconversational coordination are deployed in the contextof help-giving around a physical repair task. For helpto be effective, workers getting the help must receive itwhen they need it and when their preconditions forunderstanding it and taking advantage of it have beenmet. To effectively give help, experts must continuallymake inferences about the worker’s internal cognitiveand motivational state and time their contributions to

trouble, is finished with a previous subtask, will beable to understand the technical vocabulary that is theefficient shorthand for describing components, and hasfound the correct parts of the bicycle to apply therecommended procedures.

The cues that allow experts to make these inferencescan consist of explicit communication activity on theworker’s part or may be side effects of the worker’s taskperformance, with no explicit communicative intent.That is, the worker may explicitly ask for help(“What’s the next step”), ask for help using an indirectspeech act (“Ok, I finished the first step”), or verballyindicate confusion about the task in such a way that theexpert provides unsolicited help (“The wheel won’t goin”). Or the expert may infer that the worker is readyfor the next step in the procedure by listening to thesound of the tools or by directly observing the worker’sperformance.

The cues that experts have available are likely to varywith the technology for communication, for tworeasons. Consider the effects of video on coordination.When the worker and expert share a visual workspace,the expert can receive feedback from the task itself toprecisely time when he gives instructions and whichinstructions to give. Moreover, it is also likely thatworkers will change their explicit communicationbecause they are aware that the expert can see theirperformance. In contrast, when they do not have ashared visual workspace, they need to describe whatthey are doing, because the expert has no other way ofknowing. As a result, we expect workers will be moreexplicit in describing their state when the experts cannot directly observe it, and that experts will be moreexplicit in asking for information about the workersstate when they can’t directly observe it.

In the research described here, we examine in detail theway that visual co-presence effects how peoplecoordinate their conversation. By comparing pairs ofpeople who share visual information about a task thatone of them is performing with people who do nothave this shared visual space, we can examine howpeople exploit visual co-presence to achieve commonground. Because we are using a new referentialcommunication task, we also included the contrastbetween full-duplex audio and half-duplex audio as atechnological variable, expecting to find as did previousresearchers that half-duplex audio degrades theachievement of common ground [8] [9] [10].

In summary, the research we report on has two foci:(1) an applied focus on the development of mobilesystems and (2) a theoretical focus on communicationand coordination of work.

moments of readiness. In repairing a bicycle forexample, the task we use in this research, an expertmust continually assess whether a worker is having

58

Page 3: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

METHODStudy OverviewWe designed a study where unskilled workers performedthree repair tasks on a ten speed bicycle : 1) replacing aseat on the saddle post, 2) adjusting the front brakes,and 3) installing the front derailleur. Workers alwayshad access to an on-line repair manual. They eitherworked alone or with a remote expert. When theyworked with the expert, they were connected or notwith a video link between the worker and expert so thatthe expert could see what the worker was doing. Wealso varied the quality of audio connection between theworker and expert (either full or half-duplex audio).The experimental design was an incomplete factorial,in which we compared: (1] solo performance tocollaborative performance and (2) collaborative pairswith video and full-duplex audio connections to pairswho were connected by full duplex audio without videoor to pairs connected with video, but only half-duplexaudio. Study participants were randomly assigned to asingle treatment.

Figure 1: Worker wearing collaborative system.(Photo credi~ David Kaplan)

Each worker donned the apparatus shown in Figure 1, ahard-hat on which were mounted various display andaudio/video telecommunications devices.

The devices included a sports caster style Radio Shack49 MHz microphone, headphones, and a tiny VirtualVision VGA monitor mounted in front of the righteye, with optics that placed the image directly in frontof the eye. The visual display had VGA (640x480pixel resolution) and enabled the worker to view theon-line manual bicycle manual text, parts, and pictures.They also wore a small CCD camera mounted on thebrim of the hat above the left eye.

In the video conditions, both the worker and expertcould see the output from the camera on their screenand output from a camera focused on the face and uppertorso of the remote expert, using Inters Proshare video

con ferencing technology 1. The worker’s camera sawapproximately what the worker was pointing his headat, which in many cases was what the worker was alsoattending to. There were problems with the angle ofthe camera, insufficient resolution for close-up views,variable frame rates (when the worker moved a lot,there was increased delay in video updating), andregistration of the camera with what the worker saw.Despite these problems, the simple user interfaceallowed the expert to keep track of what the worker wasdoing most of the time. The view from the videoconditions is shown in Figure 2.

Figure 2. Display for the video conditions

The subjects were tethered by a 20 foot cable thatprovided power, audio input, and VGA input to thesystem. Also, they used a remote control/radiocontrolled trackball to navigate the shared informationspace with the expert.

The experts were in a separate room in front of a 17inch VGA monitor and communicated with the workerthrough an audio link. In each experimental condition,workers and experts had the same view on theirdisplays. The always saw the on-line manual, and boththe worker and expert could control the cursor and flippages.

SubjectsSixty-four Carnegie Mellon University undergraduatestudents were recruited to participate in thisexperiment, of whom sixty completed it. One subjectdid not finish due to equipment failure and one subjectfailed a required eye test. Two withdrew from theexperiment due to headaches. The students wereparticipating in a subject pool for course credit. Sixty-nine percent of the participants were male. Sixty-threepercent owned a bicycle. All subjects also received a

1 Because Proshare introduces an audio delay, we by-passed Proslm.re for the audio.

59

Page 4: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

candy bar and competed for a $20 bonus for the subjectwith the fastest completion time and highest qualitytask performance.

Two bicycle repair experts also participated in thestudy. They had both worked in a bicycle repair shopfor over a year and were native speakers of English.The experts were tested for their ability to complete theexperimental task by themselves and their ability tocommunicate the appropriate procedures. They weretrained for the specific repairs they would advise onduring this experiment. They were paid for theirparticipation.

ProcedureSubjects reported to a room to complete consent forms,a battery of tests and questionnaires. Then subjectswere taken to the experimental lab and suited up for thetask with a jump suit to protect them from dirt. Theyalso put on the hard hat shown in Figure 1 and alumbar pack containing some of the components andcontrols. The hard hat was fitted on the subjects’ headand adjusted by the experimenter so that it wascomfortable and so that the camera tracked the workersgaze. Then the subject was given a brief eye test,reading several lines of text with diminishing font sizefrom the head-mounted monitor. Next, subjects wereinstructed on how to navigate through the hypertextmanual: they used two hypertext buttons, one forforward and one to go back. As soon as the subjectcould navigate, they were instructed and given apractice task to assemble a small part of a toy car. Forthe collaborative cases, the expert was introduced priorto the practice task. Subjects were given finalinstructions and introduced to the tool set they used (aflathead screw driver, crescent adjustable wrench, pliers,hammer, and socket wrenches).

Then the experiment began, with one experimenter inthe same room as the worker, behind a computer usedfor real-time coding of communication behavior. Inthe solo condition the subject simply did each task. Inthe collaborative condition, the expert and subject couldcommunicate at will, but the expert had the followingrules to follow: (1) answer any question asked, (2) tryto give the best answer, (3) if the subject was quiet forone minute, ask if they were doing alright, and (4) offerhelp or advice if the subject is doing or is about to dosomething incorrectly.

Following completion of each task, subjects were

asked a set of questions by the experimenter about theirexperience during that task. At the completion of thethree tasks, the subjects removed the equipment andreturned to the initial room where they took a test ofbicycle repair knowledge and completed a questionnaireabout their experience. They were debriefed, given acandy bar and dismissed.

MeasuresDemographic data for each subject included: age,gender, year in school, major field of study, and gradeaverage. Data about subjects’ experience and skill tocarry out the repair tasks included: self-reportedratings of computer ability (familiarity, comfort, andaverage daily use), bicycle experience (ownership,familiarity, comfort, and average daily use), history inspecific bicycle repairs (e.g., fixing a flat tire, adjustingthe handlebars), as well as their familiarity and comfortwith tools. Subjects completed the Eckstrom et al.spatialhisual abilities test [7]. They also completed aten-item multiple choice instrument to measure bicyclerepair knowledge, developed based on information fromvan der Plas [19] and Sloane [17].

An experimenter coded the worker and expert’s speechin real time during the experiment, using MacShapa[15] Video and audio recordings of the sessions werethe basis for verbatim transcripts and more detailed,post-experimental coding of the communication.

Post-session, on-line questions asked participants todescribe their experiences using the mobile system.Workers rated their perceptions about ease of findinginformation, seeing the workspace, the ease of hearingthe remote expert, being heard by the expert, and beingunderstood by the expert.

Measures of task performance include the number ofthe tasks that the workers completed, participants’ worktime for each task they completed, and repair quality ofeach task they completed. To assess repair quality,both the experimenter and the session expert checkedthe worker’s repair against a checklist, assessing suchdetails as whether the saddle was level to the groundand tight and whether the brake anchor was setcorrectly.

RESULTSWe present the results in two parts, first examining theeffects of collaboration and communication media onmeasures of task performance and then examining theeffects of media on communication patterns among thecollaborative pairs. The examination ofcommunication includes data from the real-time codhtg,more detailed, post-session coding of a single subtask(attaching a brake straddle cable), and several vignettesthat illustrate the way that workers and expertscoordinated their speech.

PerformanceWorkers’ self-rated mechanical expertise, their gender,and their spatial abilities were associated with theirspeed and accuracy in performing the repair tasks, andtogether explain 5’?ZOof the variance in the number oftasks completed, 2570 of the variance in the log of timeto complete the tasks and 207. of the variance in the

60

Page 5: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

quality of their repairs. They were included as controlvariables in the analyses that follow.

Workers do substantially better at performing theserepair tasks with collaborative help. Average time tocomplete the tasks with a remote expert was half aslong as in the solo condition (7.5 versus 16.5 minutesrespectively; t(54)=4.54; p < .001). In contrast, beinga standard deviation more mechanically skillful thanaverage only reduced completion times by 2.4 minutes(t(54)=2.87; p c .01). More of the workers who had aremote expert completed all three repair tasks thanthose who had no expert (9390 versus 6290; p < .01).Finally, the quality of the repairs they completed wassuperior when they had a expert than when they workedsolo (79Y0 of the quality points for the collaborativecondition versus 5 lfZO for the solo condition;t(54)=6.50; p < .001),

While having access to an expert dramatically improvedperformance, having better tools for communicationwith the expert did not improve the number of taskscompleted, the average time per completed task, orperformance quality (for all 3 dependent variables, F(2,42) < 1; p> .5). In particular, neither video (thecomparison of the full duplex/video condition with thefull duplex/no video condition) nor full duplex audio(the comparison of the full duplex/video condition withthe half duplex/video condition) helped workers performmore tasks, perform tasks more quickly, or performthem better.

CommunicationWhile technology in this experiment did not changeperformance, it did influence how workers and expertscoordinated their activities. We concentrate here on theeffects of video (i.e., the contrast of the fullduplex/video condition with the full duplexho videocondition) on experts’ ability to give effective help.When video is present, the worker and expert have asimilar view of what the worker was doing, on amoment by moment basis. Our goal in this section isto understand how this common view changescoordination of conversation.

Real-time speech coding. To examine how mediachanges coordination strategies, a single experimentercoded experts’ and workers’ communication behavior inreal-time, during the study, using the following codes:

0

9

e

Workers’ questions about the task or technology.Workers’ descriptions about the state of the task ortechnology.Experts’ questions about the workers’ state.Experts’ help about how to perform the task.Worker or Expert acknowledgments ofpartner’s communication.

their

NoSpeech act Video Video p

Worker questions 34.8 28.5 .44

Worker state descriptions 40.3 55.2 .02

Worker acknowledgments 32.9 34.6 .72

Expert questions 13.7 16.8 .30

Expert help 111 93.7 .24

Expert acknowledgment 7.67 13.4 .003

Probability of worker .95 .92 .26question being followedby expert help

Probability of worker .56 .42 .001description beingfollowed by expert help

—.. . -.I able 1: Speech acts by the presence of videoNote. Real-t~me coding of t&-ee bike repair tasks.

These codes represent over 909’0 of speech activityduring a session. They describe in a very rough waywhat worker and expert were talking about, but they donot differentiate some interesting speech behaviors(e.g., whether a description or help statement wasproactive or a reaction to a prior speech act). Inter-raterreliability analyses of the real-time codes based onrecoding of 3 videotapes by five judges show Cohen’skappas in the mid 40s for the set of 6 codes. As onewould expect from coding done in real time, judgmentsare substantially more reliable than chance, but containa substantial amount of error. The practicalconsequence of both the small number ofdifferentiations made and their relatively lowreliabilities is that we will be able to observe onlygross differences in the communication behavior bytechnology condition. This dataset does not have thepower to detect subtle differences among conditions.

Table 1 shows the average number of speech-acts in thefull-duplex video and no-video conditions and theprobability that difference between conditions couldhave occurred by chance. A comparison between thetwo conditions suggests how a shared visual spaceinfluences conversation.

As seen in Table 1, in the no-video condhion, workerswere more explicit in describing their task (i.e., morestate descriptions). More of the content of theircommunication was about their state, e.g., what toolor part they were holding, what they had just

61

Page 6: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

completed, and what they were seeing. Second, expertswithout video were more likely to acknowledge workercomments (e.g., to utter “yes”, “uh huh”), as if theyneeded to be explicit about having heard the workersquestions and descriptions.

The bottom two rows in Table 1 show the conditionalprobability of one speech act following another. Thedifferences in the probability of workers’ descriptionbeing followed by expert help in the video and non-video condtions, shows that experts treated a worker’sdescription of state differently depending on whethervideo was available. When video was available,experts seem to treat workers’ descriptions of state asimplicit request for help; they followed with help in56% of the cases. On the other hand, since thedescriptions were more necessary for simply trackingwhat workers were doing in the no-video condition,they were followed by help in only 4270 of the cases.In contrast, an explicit request for help was followed byhelp in over 92% of cases, and this contingency didn’tdiffer by experimental condition.

Speech coding of the straddle-cable subtask. Toexamine these phenomena in more detail and toimprove the reliability of our coding, we usedvideotapes and transcripts to code a single subtask—attaching a brake anchor plate to a straddle-cableconnecting the two brake pads. As can be seen inFigure 2, workers needed to latch the lip on the back ofthe anchor plate over the straddle cable. To do this,they needed to create slack in the cable, either byreleasing the brake quick release lever (not shown inFigure 2) or by pressing the break pads towards the tirerim.

We divided workers’ state descriptions and experts’help, to distinguish reactive cases, in which the speakerwas responding to an explicit request for informationfrom a partner, from proactive cases, in which thespeaker seemed to initiate the description or help. Inaddition, we split the experts’ help into cases ofinstruction, in which he was telling the worker whatprocedural steps to take, from clarifications, in whichhe attempted to clarify the language he had previouslyused.

Turn Speech

1 E. Then you can hook the plate on. Isn’t

~

there hooks on the back of that plate?

E The back of the that anchor plate iswhat it’s called. That you’re holding

Table 2. Conversational fragment.

In the example in Table 2, turn number 1 is aninstruction, while both sentences in turn number 3 are

clarifications, in which the expert informs the workerof the name of the plate that he has previously referredto.

Again we see that people coordinate differentlydepending on whether they have video present. In thestraddle-cable subtask, all of the action is in theexpert’s behavior. When experts could see the worker,they gave more proactive help (i.e., without the workerexplicitly asking for it), presumably because they couldsee when the worker was finishing a task or when theworker was having trouble. In addition, they weremore likely to clarify and elaborate their priorinstructions, presumably because they could betterunderstand when the workers were understanding themand when they were not. Finally, they were less likelyto offer acknowledgments. This last finding isinconsistent with the data from the real-time coding anddeserves futher investigation.

NoSpeech act Video Video p

Worker questions 2.0 1 .26

Worker proactive state 2.0 1.1 .70descriptions

Worker reactive state .60 .7 .92descriptions

Worker acknowledgments 4.5 3.4 ,35

Expert questions 1.2 .8 .31

Expert proactive help 7.4 3.8 .03

Expert reactive help 1.2 .8 .48

Expert clarifications 1.9 .04 .04

Expert acknowledgments 1.8 2.1 .04

Table 3 Speech acts by the presence of videoNote. Coding of the straddle-cable subtask, based ontranscripts and video tape.

Vignettes. We can illustrate the ways that people adjusttheir conversation to the resources they have availableby looking at several cases in more detail. Tables 4through 5 present annotated transcripts from twosessions in which the dyad had video and one session inwhich they did not. Our annotations are based onmultiple viewing of the video and readings of thetranscripts. While they are inferences about theworkers’ and experts’ internal states, they represent theinformed judgment of two observers who had repeatedlyviewed these videotapes. They reflect our assessmentsof the inferences that the workers and experts must

62

Page 7: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

have been making to utter the words they said, in lightof their other behaviors.

Transcript Description

E. Ifit, ifit, uh, won’tseem to go on.

E. I didn’t know thisuntil yesterday. Butthere’s sort of a quickrelease-mechanism forthe anchor plate.

E. It’s above it.

E. You just kind of,uh, flick it over to theright and it will release abit of slack.

E. Yeah.

W. Oh, OK, I got it.

Expert visuallyestablishes that workersis having troubleattaching straddle cable.

Expert establishes theprecondition for givinginstruction—term forand location of theanchor plate.

Expert visuallyestablishes that workerknows what the quickrelease mechanism is andwhere it is.

Expert providesinstruction.

Worker acknowledges hehas understood theinstruction. Expertvisually establishes thatthe worker ismanipulating the correctcomponent

Worker announces hehas completed thissubtask.

I

Table 4. Precise timing of instruction on the basis ofvisual inspection.

Our examination of these video tapes and transcriptsstrongly suggest that experts were using the knowledgethey gained from the video channel to determine whatinstructions to give the worker and to precisely timethem. The experts could deliver appropriate instructionat an appropriate time when they did not have thevideo, but this required more effort. When they did nothave video, both the worker and expert needed to encodein language the worker’s state and the state of the taskfor timing of instructions to be effective. We make no

claims that the vignettes we have analyzed arerepresentative of our entire corpus. We present them togive the reader a richer sense of the mechanisms thatthe dyads use to coordination their speech, and the waysthat the video channel influences coordination.

Table 4 is an example of a conversation from thevideo condition. In it, the expert infers that the workerknows what a quick release mechanism is, because theworker goes directly to it when the expert firstmentions it. Thus, the expert relies on the video toinfer that the worker u derstands what the quick release

$lever is and when the orker locates it on the bike. Theexpert modifies his instruction on the basis of thevideo.

In Table 5, the expert uses visual information to makeinferences about a worker’s approach to the task. Basedon the worker’s behavior, the expert first believes thatthe worker has identified the correct problem(increasing slack in the brake cable assembly) and has aworkable solution (squeezing the brake pages together).As a result, the expert delays giving his standardinstruction about the quick release mechanism. Whenhe notices that the workers is attempting to unscrewthe brake cable where it is attached to the hand brake,he gives his instruction about the quick releasemechanism, but does so in a mitigated way because theworker hasn’t explicitly asked for help and because theexpert is proposing a solution that conflicts with theworker’s theory that slack can be created by squeezingthe brake shoes together. Through the rest of theepisode the expert uses a combination of verbal andvisual cues to determine how much additionalinstruction to deliver and when to deliver it

The two preceding conversations, taken from the videocondition, contrast with the conversation in Table 6,where no video was available. In Table 6, the expert ismore explicit in describing various components, doublechecking that the worker has understood hisdescription. He withholds instruction until he is surethat the worker has identified the correct parts, butmust rely upon language to make that assessment. Theworker is more explicit in using language to indicatewhen she has identified the parts and has completedelements of the subtask. An interesting episode in thisconversation occurs when the expert uses the worker’svocabulary to create the link between his technicaldescription and the worker’s knowledge (“that anchorplate - that little horseshoe thingy”).

63

Page 8: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

Transcript Description

E. I guess you have tomake sure that, in the,in the diagram in themanual.

E. that the anchor plateis in place on the cable.And that’s about it.

W. OK (3 seeond pause)OK, I’m just...

E If you’re havingtrouble getting theanchor plate on you willfind that there’s a, aquick-release mechanismin the brake.

E. It’s just abovewhere the anchor plate isconnected to the cable.

W. Oh, OK, I see.

E Yeah.

W. Oh, OK, I see.Thank you (3 secondpause).

W. OK.

Expert uses a diagram toidentify the componentsfor the worker. He usesthe video to see that theworker is having troubleattaching straddle cable.

During this interval, theworker is squeezing thebrake shoes together,allowing the expert toinfer that a) he hasidentified the relevantparts and b) has anappropriate theory abouthow to create slack onthe brake cable.

Expert can see in thevideo that the worker’shands are moving awayfrom the brakecomponents to thehandlebars and the brakelevers.

Expert gives a mitigatedform of instruction.

Clarification about thelocation of the quickrelease lever.

Worker acknowledgesthat he has found thequick release lever.

Expert sees that workeris manipulating thecorrect component.

Worker indicates that heunderstands theinstruction and has foundthe correct components.

Worker announces hehas completed thesubtask.

Table 5. Expert infers a worker’s theory of operabased on visual inspection.

DISCUSSIONIn summary, this research has shown that collaborationwith a remote expert substantially improves a lessskilled person’s ability to perform maintenance andrepair tasks. It also shows that the technology that thecollaborators have available to them effects the mannerwith which they communicate and how they articulatetheir speech with their task performance. Workers wereless explicit in describing the state of the physicalworld and what they had accomplished when theyshared a view of the work environment with theircollaborators. When they shared this view, expertswere more likely to offer proactive instruction, basingthe instruction they delivered and when they delivered iton a combination of the worker’s explicit descriptionsand their visual inspection of the worker’s behavior.When the shared view was available, experts were morelikely to treat the workers’ explicit description of stateas an implicit request for help, which they thenprovided.

Experts used the shared visual space as a basis forassessing the worker’s readiness to receive help as wellas for inferring what help the worker needed. Theyused images from the camera (where it was pointed andwhat the workers was touching), along with languagebehavior (e.g., back channel responses such as “ok’ andmore extended questions or comments that indicatedconfusion) to infer several aspects of the worker’s state.These inferences included a) whether the workerunderstood the language they were using to describe arepair, b) whether the worker had an appropriate theoryabout how bicycles operated and how the repair was tobe done, c) whether the worker had identified thelocations in the physical environment where the taskswere to be done, and d) whether the worker had createdthe conditions on the bicycle to be ready for the nextstep in the repair process. The experts also referred toand pointed to drawings in the repair manual totranslate between their technical instructions and thephysical world in which the repairs needed to be done.While we have shown that collaborators used the videoto coordinate their language, we have also shown thatthey can achieve their collaboration by language alone.We found no evidence that differences incommunication technology influenced success incollaboration.

These findings should be viewed as preliminary. Someof our findings are based on real-time coding of a largesample of conversation (about 30 minutes from each of45 pairs of subjects). This coding, however, wasrelatively crude, with a small number of categories andonly moderate reliability. Other findings are based onmore reliable coding, but of only a single subtask. Weare in the process of extending these analyses.

64

Page 9: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

Transcript Description

E. The next thing we Expert gives overallgot to do uh, ok, hook instruction.the brake cable up.

E. The brake cable’s Expert tries to figure outunhooked right now. if worker knows whereUhm ... if you look up the anchor plate is.uh ... ju uh ... to the top Worker’s lack ofpart of the brakes here. responses causes the

expert to give moredetailed informationabout the location.

W U-huh Worker indicates she’slocated the right spot.

E You see uh this, Expert uses the manualthis loose cable that to indicate the correctkind of goes across components.from this one arm tothis other arm.

w. m Worker hasn’t identifiedthe location yet.

E. Called the straddle Expert labels the part,cable for more efficient

communication.

W The little tiny one. Worker misidentifiesIt’s got a little kindy, part.tiny horse shoe thingyhanging from it

E No, that’s called, The expert corrects thethat’s the actual cable worker’scoming from above. misunderstanding of the

part.

W. Right The workeracknowledges that shemisunderstood.

E Ok, there’s also a Expert returns to usingsecond part, which is the manual to indicatethis cable that just kind the correct part. Heof loops around from moves the cursor overhere to here. the part in the diagram.

W. Ok. I see it. It’s The worker reports shehooked between the two has identified the partright now. and gives a brief

description to show this.

E. Right. Right. It’s Expert acknowledgesattached on both ends. that she has identified

the correct part, andprovides furtherdescription to solidlyground that they arereferring to the samepart.

w: Right. Worker acknowledgesthe expert’s description.

E: Ok. Uhm, what The Expert continues theyou want is, you want instruction. He refers toto hook that uh. . . that the anchor plate by theanchor plate - that little previous descriptionhorseshoe thingy. It. ., given by the Worker.

W Uh huh.

E:, ,, hooks on the Finishes the instruction.straddle cable

—.. < ---- -------- ----‘ltible b. Worker and expert use language to compensatefor lack of video.

This research raises both theoretical and engineeringquestions. The finding that changes in technologydidn’t lead to changes in collaborative success deservesadditional investigation. It is possible that this lack ofrelationship represents a failure to measurecollaborative success sensitively enough to capturepotentially subtle effects of technology on outcome.Our failure to replicate earlier studies, which showedthat half-duplex audio degrades communication, isconsistent with this explanation. Another possibilityis that different methods for coordinating conversationare functionally equivalent to each other, at least withinlimits, because people can modify their behavior tocompensate for constraints imposed by technology.This explanation is consistent with the finding thatpeople were more explicit in describing state when theydidn’t have a video image to show their state to theirpartner. We need additional analysis to learn howpeople adapt to these different conversational resources.

While collaborators did take advantage of a sharedvisual space provided by a video camera, the displayitself was not sufficient to substitute for real co-presence. The user interface to the camera—mountinga continuous feed camera on a hat so that headmovement produced camera movement—was valuablebecause it allowed workers to concentrate on their taskswith little regard to controlling the video. Somefeatures of the interface were unsatisfactory, however.Parallax and other misalignment problems meant thatthe camera sometimes did not show what the workerwm attending to (cf. [1 l]). A cemera with a wider fieldof view, an intelligent camera controller that could

65

Page 10: Collaboration in Performance of Physical Tasks: Effects on ...research.cs.vt.edu/ns/cs5724papers/5.usageenvir... · the basis of a shared workspace between a field worker and a remote

keep a predefine object in view, or the expert’s controlof the camera could all help this problem, but mayraise difficulties of their own. Assuming that sharedvideo workspaces prove valuable, designing them sothat they are visually adequate, but don’t distract fromthe task at hand will still be challenge.

ACKNOWLEDGEMENTSThis study was conducted with support from thefollowing institutions and individuals: Daimler BenzAG, Advanced Research Projects Agency, The IntelCorporation, National Science Foundation EquipmentGrant #90225 11, Richard Martin, Dan Siewiorek, LenBass, Kathleen Carley, Bonnie John, Malcolm Bauer,John Stivorek, Sandra Esch and Tom Pope. JordanAnderson and David Kaplan were exceptionally helpfulin all phases of this research.

REFERENCES[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

BT Development (1993). Camnet videotape.Suffolk, Great Britain: Author.

Cash, J., O’Neil, J. & Ostrofsky, K. (1991). OtisElevator Co.: Managing the Service Force.Cambridge, MA: Harvard Business Case 9-191-213.

Chapanis, A. (1975) Interactive humancommunication. Scientific American, 232, 36-42.

Clark, H. H. & Brennan, S. E. (1991).Grounding in communication. In L. B. Resnick,R. M. Levine, & S. D. Teasley (eds.).Perspectives on socially shared cognition. (Pp.127-149). Washington, DC: AmericanPsychological Association.

Clark, H. H. & Wilkes-Gibbs, D. (1986).Referring as a collaborative process. Cognition,22, 1-39

Clark, H. H. (1992) Arenas of language use.Chicago: University of Chicago Press.

Eckstrom, R., French, J.W., Harman, H. H., &Dermen, D. (1976). Kit of factor-referencedcognitive tests. Princeton, New Jersey!Educational Testing Service.

Krauss, R. M. & Bricker, P. D. (1966). Effectsof transmission delay and access delay on theefficiency of verbal communication. Journal ofthe Acoustical Society, 41, 286-292.

Krauss, R. M. & Fussell, S. R. (1990) Mutualknowledge and communicative effectiveness. In J.Galegher, R. Kraut, & C. Egido (Eds.),Intellectual teamwork: Social and technological

[10]

[11]

12]

13]

bases of cooperative work. Hillsdale, \ NJ:Lawrence Erlbaum Associates. Pp. 111- 145.,

Krauss, R.M. & Weinheimer, S. (1!66).Concurrent feedback, confirmation and theencoding of referents in verbal communication.Journal of Personality and Social Psychology, 4,343-346.

Kuzuoka, H., Kosuge, T., and Tanaka, K.. (1994)GestureCam: A video communication system forsympathetic remote collaboration. Proceedings ofCSCW 94, Chapel Hill, North Carolina, 35-43.

McCarthy, J. C., Monk, A. F., (ih press)Measuring the quality of compute~-mediatedcommunication. Behavior and InformationTechnology.

Orr, J. (1989) Sharing knowledge, celebratingidentity: War stories and community memoryamong service technicians. In D. S. Middletonand D. Edwards (Eds.), Collective Remembering:Memory in Society. London: Sage Publications(in press).

[14] Sachs, P. (1995). Transforming work:Collaboration, learning, and design.Communications of the ACM, 38(9), 36-44.

[15] Sanderson, P, Scott, J., Johnston, T., Mainzer, J,Watanabe, L., & James, J. (1994). MacSHAPAand the enterprise of exploratory sequential dataanalysis. International Journal of Human-Computer Studies, 41, 633-681.

[16] Siegel, J., Kraut, R.E., John, B.E., Carley, K.M.(1995) An Empirical Study of CollaborativeWearable Systems, CHI Companion 95, Denver,Colorado, 312-313.

[17] Sloane, E.A. (1991) Sloane’s new bicyclemaintenance manual. Simon & Schuster, NewYork.

[18] Smailagic, A. and Siewiorek, D. (1993). A casestudy in embedded systems design: VuMan2wearable computers. IEEE Design and Test ofComputers,.

[19] van der Plas, R. (1993) The bicycle repair book,San Francisco: Bicycle Books, Inc.

[20] Williams, E. (1977). Experimental comparisonsof face-to-face and mediated communication: Areview. Psychological Bulletin, 84, 963-976.

66