Evaluation of human–agent user interfaces in multi-agent systems

lable at ScienceDirect

International Journal of Industrial Ergonomics 39 (2009) 192–201

Contents lists avai

International Journal of Industrial Ergonomics

journal homepage: www.elsevier .com/locate/ergon

Evaluation of human–agent user interfaces in multi-agent systems

Chang S. Nam a,*, Steve Johnson a, Yueqing Li a, Younho Seong b

a Department of Industrial Engineering, University of Arkansas, 4207 Bell Engineering Center, Fayetteville, AR 72701, USAb Department of Industrial and Systems Engineering, North Carolina A&T State University, Greensboro, NC, USA

a r t i c l e i n f o

Article history:Received 6 November 2007Received in revised form 24 July 2008Accepted 21 August 2008Available online 25 October 2008

Keywords:Human–agent user interfaceMulti-agent systemUsability evaluationHuman factors

* Corresponding author. Tel.: þ1 479 575 2563; faxE-mail addresses: [email protected] (C.S. Nam), saj

[email protected] (Y. Li), [email protected] (Y. Seong).

0169-8141/$ – see front matter � 2008 Elsevier B.V.doi:10.1016/j.ergon.2008.08.008

a b s t r a c t

A multi-agent system (MAS) is a computational system in which two or more intelligent agents worktogether to perform a set of tasks. MAS’s have had a significant impact in a wide range of applications.However, little attention has been paid to the design of human–agent user interfaces. Our research wasconducted primarily to further explore interaction between humans and intelligent agents, as well asattempt to solidify foundational concepts for intelligent multi-agent system interface design. Based onthe foundation of the commercialized PC game Unreal Tournament 2003, a framework was developedthat utilized fundamental interface design principles as well as newly designed guidelines. Three userinterfaces were designed for an experiment conducted with 25 participants. Their performances wereevaluated via effectiveness, efficiency, satisfaction, and workload metrics. Through the testing of scenariotasks, an in-depth study was conducted on the collaboration between the human user and intelligentagents within a system. Finally, the paper summarized a set of design principles and guidelinesdeveloped through the study, which could easily be applicable to the design of future user interfaces formulti-agent systems.

Relevance to industry: A deeper understanding of the manner by which the appropriate information isprovided to the user interacting with multi-agent systems in the correct form at the right time shouldhave a broad impact on the future development of intelligent multi-agent system interfaces.

� 2008 Elsevier B.V. All rights reserved.

1. Introduction

A multi-agent system (MAS) is a computational system, in whichhuman user(s) and two or more intelligent agents work together toperform a set of tasks (Brooks, 1991; Krupansky, 2003; Lesser,1999). Intelligent agents are ‘‘software entities that carry out a set ofoperations on behalf of a user or another program with somedegree of independence or autonomy’’ (Wickramasinghe et al.,2004, p. 603). An MAS can be utilized to reduce human cognitiveworkload in complex and time consuming tasks (Maes, 1994;Scholtz and Bahrami, 2003; Tambe et al., 1995). Generally, multi-agent systems are used for tasks that could be considered unrea-sonable or too costly for humans to perform and maintain (Lairdand van Lent, 2001). MAS’s have had a significant impact in a widerange of applications such as aircraft maintenance (Shehory et al.,1999), mission planning (Lenox et al., 2000; Pechoucek et al., 2002),traffic control (Hexmoor et al., 2006), information networks(Gerber and Klusch, 2002; Vlahavas et al., 2002), and education(Cobo et al., 2007). Such rapid growth of agent-technology

: þ1 479 575 [email protected] (S. Johnson),

All rights reserved.

applications has also generated a need for systematically designinginteractions between the user(s) and agent teams. Because theagents perform tasks on behalf of the human user (e.g., informationprocessing), an MAS should support the appropriate form ofinteraction between agents and the user (Bradshaw, 1997). Forexample, user interfaces should be designed in a way that allowsthe user interacting with the MAS to clearly know what anindividual agent is currently doing, when intervention is necessary,and what information is needed to make effective interventions(Scholtz, 2002).

However, little attention has been paid to the design of human–agent user interfaces, including the way that the human userinteracts with agents, as well as the manner by which the appro-priate information is provided to the user in the correct form at theright time. Based on the foundation of the commercialized PC gameUnreal Tournament 2003, a framework was developed that utilizedboth fundamental design principles and newly designed guidelines– such as the Layout Ranking Heuristic – in order to design anddevelop user-centered interfaces. These interfaces were used toenhance human collaboration with intelligent agents in a set ofuniquely designed scenario tasks. Through the testing of thesescenario tasks, an in-depth study was conducted on the collabo-ration between the human user and intelligent agents within thesystem. The primary objective of this study was to systematically

mailto:[email protected]




www.sciencedirect.com/science/journal/01698141

http://www.elsevier.com/locate/ergon

C.S. Nam et al. / International Journal of Industrial Ergonomics 39 (2009) 192–201 193

evaluate the overall quality of the user interfaces in order to obtainusability and performance data, as well as to develop designguidelines for the future development of intelligent multi-agentsystem interfaces.

This paper begins with an overview of the study’s background –including a description of the research framework, an explanationthe framework’s Capture the Flag game, an overview of theframework’s Intelligent Bots (IBots), and a discussion of the designand integration of user interfaces utilized to facilitate humaninteraction with intelligent multi-agents within the framework.Afterwards is a description of research methodologies andmeasurements. An analysis of results follows. Finally, the paperconcludes with the study’s conclusions and discussions.

2. Background

2.1. Information- and user-centered interface design

There are several issues that should be considered whendesigning user interfaces for multi-agent systems – most of whichare information related. Information issues often result from notproviding the user with the appropriate type and/or quality ofinformation in the correct form at the right time to support domaintasks, as well as not recognizing that new user roles (e.g., super-visor, teammate, or bystander) necessitate new types of informa-tion (Malin and Schreckenghost, 1993; Scholtz, 2002). In order toutilize this information-centered user interface design approachand facilitate interface usability, it was essential in this study todetermine types of information supported that would be deemedboth necessary and appropriate by users for domain tasks.Additionally, an intelligent system’s functionality must be ‘‘easilyinterpretable by a user in regards to the context of the system andits environment’’ (Suchman, 1987, p. 99).

When interacting with an intelligent MAS within a domain task,users and agents often change their roles dynamically in order toeffectively respond to environmental stimuli and situations. In orderto facilitate user interaction with an intelligent MAS, a user-centeredinterface design approach needs to be utilized to account for new userand agent roles – which would ultimately account for new types ofinformation in the framework’s already established information-centered design approach. Because the framework’s user interfacedesign approach was both information centric and user centric, it wascrucial for the user interfaces to be able to update and display infor-mation in real-time. Likewise, any displayed information would needto be easily distinguishable by the user in order to avoid confusion.Because human decision tends to demand a large amount of infor-mation in order to make an appropriate analysis, the user interfacesneeded to convey appropriate, high quality, user-centered informa-tion so that a human decision could be made quickly and efficiently.

2.2. Research framework

The framework breaks down into three components:

(1) The Bot Connection Window – this interface serves as thecommunication and control link between a human player andthe UT2003 IBots.

(2) Unreal Tournament Dedicated Server – this program is used toinitialize the UT2003 server for LAN play.

(3) UT2003 Game Environment Window – this interface renders andanimates all graphical data in 3D using the UT2003 physicsengine. UT2003 is synonymous with the Unreal Tournament2003 game.

Fig. 1 shows the architecture of the framework developed in thisstudy. UT2003 was selected as the platform of choice because

UT2003 is open source, meaning that the entire base source code isavailable to developers.

2.2.1. Capture the FlagThere are several ways in which one can play UT2003, including

Capture the Flag, Deathmatch, Bombing Run, and Last ManStanding. The focus of the framework, however, is Capture the Flag(CTF). In Capture the Flag, two teams (Red and Blue) attempt tooutscore one another by capturing each other’s flag (see Fig. 2).Each team has a Home Base in which their Home Flag resides. Ateam scores one point if and only if they capture the enemy’s flagand return it to their home flag’s location. The key to a team’ssuccess lies in maintaining the delicate balance between offensiveand defensive strategies. A team must be able to fortify its ownstrengths while exploiting the opposing team’s weaknesses. Inother words, agent team members and the human operator(s) mustwork together to ensure that their home flag remains safe while theopponent’s flag is constantly under siege. In order for a team towork together, each member must have a clear line of communi-cation to all other members. Through communication, a team iscapable of maintaining an offensive and defensive balance via thedivision of offensive and defensive ‘‘roles’’ amongst its members. Ahuman user is able to cycle through each generated IBot’s thirdperson view, which allows the user to visually see what an IBot isdoing at a particular time.

2.2.2. Intelligent BotsIn addition to unique intelligence, each IBot has an autonomy

level. An autonomous IBot can be defined as ‘‘a system situatedwithin and a part of an environment that senses that environmentand acts on it, over time, in pursuit of its own agenda and so as toeffect what it senses in the future’’ (Franklin and Graesser, 1996, p.4). The extent to which an IBot can independently reason and act isdependent upon an IBot’s autonomy level. In critical situations, lessautonomous IBots need human input in order to assess whatactions they should perform. In this study, two levels of IBotautonomy were implemented labeled medium level (50%) and fulllevel (100%). An IBot’s autonomy is only controlled by a humanuser.

� Medium level (50%) – an IBot has moderate independentreasoning, but must occasionally ask a human player what todo in critical situations.� Full level (100%) – an IBot is fully independent, capable of

reasoning what role to take on its own as well as what actionsto take in critical situations.

2.3. User interface design goals for IMAS

In order to realize the framework’s information- and user-centered design approaches, a set of user interface design goalswere established – a fundamental principle in user interface designmethodology. Interface design goals included:

(1) Display in one window all of the necessary and appropriateinformation for a team of four IBot agents and one humanplayer in order to facilitate interaction.

(2) Layout the information in a user friendly, easy to read fashion.(3) Provide essential, non-overwhelming information to the user

to assist in efficient decision making.(4) Maximize usability by facilitating rapid user interaction with

a team of IBot agents.(5) Provide a user with necessary and appropriate interface

information and interactions to facilitate the dynamic rolechanging that will occur by both the user and the IBot agent(s).

(6) Provide a user with interaction feedback.

Fig. 1. The framework architecture.

C.S. Nam et al. / International Journal of Industrial Ergonomics 39 (2009) 192–201194

After a series of preliminary usability tests were conductedusing an early build of developed domain tasks as well as prototypeinterfaces, the following information types and user interactionswere deemed essential to the domain tasks that would besupported by the framework:

Information types:� IBot Name (can be user specified or default [Bot1, Bot2, .]);� IBot Vital Statistics (i.e. Health, Ammo, Armor – information

built into UT2003);� IBot Current Role (i.e. Capture Enemy Flag, Defend Our Flag,

Defend Our Base, Search N’ Destroy, Cover Me, Hold ThisPosition – information built into UT2003);� IBot Autonomy (i.e. Medium and Full Level – information

unique to the framework and not included in UT2003);� IBot Questions (IBot questions that occur during critical

situations – Information unique to the framework and notincluded in UT2003).

Fig. 2. Example of the Capture the Flag game.

User interactions:� IBot Role Selection: a method that enables the user to select

and change an IBot’s role� IBot Autonomy Control: a method that enables the user to

change an IBot’s autonomy level.� Human Answers: a method that allows users to answer IBot

questions in critical situations� Communication Sending Mechanism: utilized by the user to

communicate a command to an appropriate IBot

3. Methods

3.1. Participants

Twenty-five participants were recruited from the student pop-ulation at the University of Arkansas. Participants were given extracredit for compensation for their participation. There were 5 femaleand 20 male participants whose mean (M) age was 21.56 years(Standard Deviation, SD¼ 4.5). All the participants played less than2 h of 3D video games per week.

3.2. Apparatus

3.2.1. Task scenarioTo evaluate main functions of the interface systems, this study

developed seven task scenarios representing the user’s mostcommon tasks in the Capture the Flag (CTF) game. Table 1 showsthese seven tasks and their main testing objectives.

3.2.2. User interface evaluation surveyThis consisted of three questionnaires – one for each interface

type. Each questionnaire is identically the same except for the firstfour questions. Each first question assesses a participant’s ability tointerpret information from the interface providing necessaryfeedback to discern if the quality of information is relevant. Theremaining six questions allowed participants to comment on theeffectiveness of each interface’s control (e.g., ‘‘Could you control allof your Bots successfully?), quality of information (e.g., Did Inter-face Type 1 provide sufficient status information for all of yourBots?), layout (e.g., Are you satisfied with the layout of Interface

Table 1Experimental tasks and main testing objectives

Task Characteristic Testing objective

Scenario 1 Blitzkrieg To verify that the Send All button is appropriatelyutilized

Scenario 2 Prepare forbattle

To assess if role sending buttons and box areappropriately used to prepare for battle

Scenario 3 Bot cover To assess an indirect method of human–Botinteraction (via the Say: Cover Me command) andto trigger a critical situation that will requirehuman influence

Scenario 4 HaywireBots

To assess if autonomy buttons are appropriatelyutilized

Scenario 5 Hearingnoises

To assess that users can find and destroy enemyBots (entire interface utilized)

Scenario 6 Imposter To assess that users can find an enemy pretendingto be an ally and make ally Bots destroy that enemy(entire interface utilized)

Scenario 7 Lost Bot To assess if question/answer area is appropriatelyutilized

Table 2Usability metrics and measurements

Metric Measurement Definition

Effectiveness Percent of taskscompleted

Ratio of completed tasks to total tasks

Number of mouseclicks

Number of mouse clicks to complete atask

Efficiency Time to complete atask

Time from when a user was told to‘‘begin’’ a scenario to (a) when the userstated that they were ‘‘done’’ or (b)when a scenario’s time limit was reached

Numbers of errors Numbers of errors made by usersSatisfaction Control Methods of interface interactions

providedQuality of information Clarity and effectiveness of interface

informationLayout The position of information and

interaction methods within a contained‘‘window’’

Interaction methods Types of interface communicationmethods provided

Utilization Amount of interface interactions usedfor tasks provided

Workload NASA TLX Mental demands, physical demands,temporal demands, own performance,effort, and frustration


Type 1?), communication/interaction methods (e.g., Are yousatisfied with the Bot communication methods that Interface Type1 provided?), and utilization (e.g., Are there any parts of InterfaceType 1 that you rarely or never used?), as well as any improvementsthey would make to each interface.

3.2.3. Interface feature questionnaireThis questionnaire was designed for each participant to express

supporting or non-supporting views regarding potential futuredeveloped interface features. These features included the incor-poration of a live-update overhead map (complete with all IBot,pickup, and flag locations), colored percentage bars (reflecting eachIBot’s current Health, Ammunition, and Armor), and colored text(allowing a user to more easily see pertinent information). Thisquestionnaire also allows the user to comment on the assessmentof their experience.

3.2.4. NASA Task Load Index (TLX)The TLX was also used to measure the workload that

participants experienced. The NASA TLX contains six subscalesmeasuring mental demands, physical demands, temporal demands,own performance, effort, and frustration. All items were rated ona 10-point scale.

3.3. Evaluation criteria

The experimental design was a one-way, within-subjects designwith interface type as the factor. The overall quality of the userinterfaces was determined in terms of the effectiveness, efficiency,user satisfaction, and workload. Table 2 shows a list of metrics,corresponding performance and subjective measurements used inthe study.

3.4. User interfaces for multi-agent system (MAS)

3.4.1. Interface Type 1Fig. 3 illustrates this study’s first modified interface design (a.k.a.

Type 1) which consists of four areas of IBot fields (each containedwithin a light colored box appropriately labeled Bot1, Bot2, Bot3, orBot4), all of which are contained within one window in order toadhere with design principle #1. Within each area is an IBot’sessential information and interactions in order adhere with designprinciples #3 and #5. In order to adhere with design goal #4,Interface Type 1’s layout is based around the concept of Singularity,that is, each IBot is considered a separate entity. Therefore,

singularity results in a unique area within the interface layoutallotted to each individual IBot.

In order to facilitate the framework’s user-centered interfacedesign approach and design goals, a feature known as a LayoutRanking Heuristic was developed to determine the verticalorder – top to bottom – in which interface information andinteractions appear. A Layout Ranking Heuristic is a series of user-oriented logical questions that streamline user interface develop-ment with the framework’s user-centered design paradigm,allowing a developer to think in a manner similar to a user inregards to what they would prefer to see – in order of importance –governed by question-based logical flow(s). For example, InterfaceType 1’s layout of each IBot’s unique area was determined throughthe following Layout Ranking Heuristic:

(1) What is a particular IBot doing? (IBot Current Role Label)(2) What are a particular IBot’s vital statistics? (IBot Vital Statistics)(3) Based on (1) and (2), what do I want a particular IBot to do?

(IBot Role Selection and Communication Buttons)(4) What is a particular IBot’s current autonomy? (IBot Autonomy

and IBot Autonomy Control)(5) Based on (4), does a particular IBot have a question? (IBot

Questions and Human Answers)

Type 1’s heuristic utilizes two logical flows: questions 1, 2, and 3for the first logical flow; questions 4 and 5 for the second logicalflow. Within each flow interface attributes are placed sequentiallyfrom the top to the bottom of each IBot area. Therefore, the firstlogical flow displays its first attribute (IBot Current Role) at the topof each allotted IBot area, the second attribute (IBot Vital Statistics)under the first, and the third set of interface attributes (IBot RoleSelection and Communication Buttons) under the second. Thesecond logical flow, because it occurs after the first logical flow,displays its first attribute (IBot Autonomy Control) just below thelast set of interface attributes (IBot Role Selection and Communi-cation Buttons) of the first logical flow, and its second set of attri-butes (IBot Questions and Human Answers) below its first at thebottom of the allotted area. It should be noted that in order tofurther emphasize design goal #2 and increase interface informa-tion clarity, each interface attribute was ‘‘padded’’ with an arbitraryamount of pixels (between 10 and 20) around all four sides. Ourpreliminary analysis of Type 1’s layout revealed that:

Fig. 3. Type 1 interface and its covering problem. (a) Type 1 interface; (b) Type 1 covering problem.


(a) A user can only assess a single IBot within one visual focal pointdue to Type 1’s Singularity (a non-compact layout of informa-tion where distinct interface areas are allotted to each IBOT).

(b) A user must traverse approximately half the width and/orheight of the interface to accomplish similar interactionsamong multiple IBots due to Type 1’s Singularity.

(c) To influence the whole IBot team, the user must issuea different selected order to each IBot – a minimum of 12 clicks(one to open an IBot’s Role Selection combo-box, one to selecta role, and one to send the order via the Send Order Commu-nication Button – for each of the four IBots) – due to Type 1’sSingularity.

(d) There exists a covering problem with each IBot’s Role Selectioncombo-box (refer to Fig. 3b). When a user opens an IBot’s RoleSelection combo-box, the opened selections severely restrictthe user’s view of that IBot’s Autonomy Control, Questions, andpossibly another IBot’s Name.

Fig. 4. Type 2 interface and its covering problem. (a)

3.4.2. Interface Type 2Fig. 4 illustrates this study’s second interface design (a.k.a. Type

2). Type 2’s layout was drastically changed from Type 1’s Singularityconcept for a more Compact information and interaction layout inorder to provide easier user readability and facilitate faster userinteraction (goals #2 and #4). Type 2’s Compact design allows fora single area to be allotted for all IBots – all contained within onewindow to adhere to design principle #1. Within that single area,every IBots’ essential information and interactions are displayed ina grid-like manner to ensure compactness (similar in principle tothat of an Excel spreadsheet) and to adhere with design principles#3 and #5. Type 2’s layout was determined using a unique LayoutRanking Heuristic.

Type 2’s Layout Ranking Heuristic consists of two logical flows.The first logical flow displays the first attribute (IBot Current Role)for each IBot column-wise at the top of the IBot area, then thesecond attribute (IBot Vital Statistics) for each IBot column-wise

Type 2 interface; (b) Type 2 covering problem.


just under the first attribute, then the third set of attributes (IBotAutonomy and IBot Autonomy Control) for each IBot column-wisejust under the second attribute, then the fourth set of attributes(IBot Role Selection and Communication Buttons) placed just underthe third set of attributes. The second logical flow displays the onlyattribute (IBot Questions and Human Answers) just under the lastattribute of the first logical flow. The preliminary analysis of Type2’s layout revealed that:

(a) Every IBots’ Current Role, Vital Statistics, and AutonomyControl are displayed in close proximity of one another due toCompact design, allowing the user to evaluate multiple IBotinformation in a single visual assessment while facilitatingdesign goals #2 and #4.

(b) In order to further facilitate design goals #2 and #4, the IBotRole Selection and Communication Buttons were designed asa Compact single set of attributes by incorporating a new ‘‘SendAll’’ Communication Button as well as implementing a singleIBot Role Selection combo-box that could be utilized for eachIBot.

(c) To influence the whole IBot team, the user can issue a selectedorder to all IBots, facilitating design goal #4.

(d) Individual IBot Communication Buttons (labeled Bot1, Bot2,Bot3, and Bot4) are in the columns corresponding to a partic-ular IBot’s Vital Statistics and Autonomy Control.

(e) There exists a covering problem with the IBot Role Selectioncombo-box (refer to Fig. 4b). When a user opens the combo-box, the opened selections severely restrict the user’s view ofthe IBot Question and Human Answers Area.

3.4.3. Interface Type 3Fig. 5 illustrates this study’s third interface design (a.k.a. Type 3).

Type 3’s layout was slightly changed from Type 2’s Compact designin order to accentuate the IBot Question and Human Answerattributes as well as resolve the combo-box covering problemassessed in Type 1 and Type 2 analysis – ultimately further facili-tating design goals #2 and #4. Like Type 2, Type 3’s Compact designallows for a single area to be allotted for all IBots – all containedwithin one window to adhere to design principle #1. Type 3’s

Fig. 5. Type 3 interface.

layout was determined using a unique Layout Ranking Heuristicutilizing one logical flow.

Unlike Type 1 and Type 2, Type 3 contains a single logical flowconsisting of five questions. The logical flow displays the firstattribute (IBot Current Role) for each IBot column-wise at the top ofthe IBot area, then the second set of attributes (IBot Autonomy andIBot Autonomy Control) for each IBot column-wise just under thefirst attribute, then the third attribute (IBot Vital Statistics) for eachIBot column-wise just under the second set of attributes, then thefourth set of attributes (IBot Question and Human Answers) foreach IBot row-wise just under the third attribute, and finally thefifth set of attributes (IBot Role Selection and CommunicationButtons) placed just under the fourth set of attributes. Thepreliminary analysis of Type 3’s layout revealed that:

(a) Every IBots’ Current Role, Vital Statistics, and AutonomyControl are displayed in close proximity of one another due toCompact design, allowing the user to evaluate multiple IBotinformation in a single visual assessment while facilitatingdesign goals #2 and #4.

(b) The ‘‘<Send All>’’ button from Type 2 has been modified toa single column width ‘‘All’’ button placed after each IBot orderbutton. It should be noted that this change was made in Type 3in order to assess user preference as compared to Type 2’s 4column width ‘‘Send All’’ button.

(c) There exists no covering problem with the IBot Role Selectioncombo-box. When a user opens the combo-box, the openedselections ‘‘fall out’’ of the bottom of the Type 3 interface,restricting no critical information and facilitating design goals#2 and #4.

3.5. Procedure

Prior to the experiment, each participant was required to readthrough prepared training materials. These training materialsincluded a CTF tutorial (encompassing an overview of Capture theFlag, teamwork strategies, how to get help from teammates, how tocapture an enemy’s flag, and special situations) as well as an IBottutorial (encompassing the six unique IBot roles and IBotautonomy). Each participant then performed seven uniquelydesigned scenarios using each of the three interfaces – resulting ina total of 21 scenarios. The sequence of user interfaces andscenarios was counterbalanced to remove the influence of thelearning process as much as possible. Each scenario was conductedas follows: a participant read the information a scenario provided.During this time, a participant could ask any questions about thescenario information – as long as it did not involve any strategicallysensitive information relevant to the current scenario. Participantswere told that each scenario had a time limit, though they were notmade aware of what the time limit actually was (ranging from1 min to 2.5 min), determined from a previously conducted pilottest. Participants were also made aware that upon reaching thetime limit, they were asked to ‘‘Stop’’ and have their performanceevaluated. Finally, participants were made aware that if they feltthat they had satisfied a particular scenario’s ending condition priorto reaching the time limit, they could say ‘‘Done.’’ Upon eachscenario’s conclusion, a participant was asked to complete NASATLX questionnaire regarding the completed scenario’s cognitiveworkload.

The final three scenarios utilized all three interfaces, that is,a different interface type was used in each of the final threescenarios. Whichever interface type was used in each of the finalthree scenarios determined which interface evaluation survey willbe administered after the NASA TLX for that particular scenario.An interface evaluation survey was also used to assess the par-ticipant’s satisfaction with a particular interface type, in regards


to an interface’s control, quality of information, layout, interactionmethods, and usability, as well as the participant’s recommen-dations for improving an interface type. After completing allscenarios, NASA TLX questionnaires, and interface evaluationsurveys, each participant was asked to complete a final interfacefeature questionnaire in order to assess each participant’s opinionon future features that were planned for integration into thecurrent interface types.

4. Results

Both quantitative (e.g., effectiveness, efficiency, user satisfac-tion, and workload) and qualitative (e.g., user comments) data wereobtained from the user experiment. A series of one-way analysis ofvariance (ANOVA) were conducted to determine the overall qualityof user interfaces designed to support collaboration and coordi-nation between the human user and intelligent agents. Table 3shows ANOVA results for the experiment.

4.1. Effectiveness

4.1.1. Ratio of successThe percentage of tasks completed measurement was computed

as the ratio of completed tasks to total tasks. The average successratio in this study was 77% (SD¼ 11%), with average ratios of 77% forType 1 (SD¼ 15%), 78% for Type 2 (SD¼ 14%), and 76% for Type 3(SD¼ 17%). No significant effect was found for the interfacetype (p> .05).

4.1.2. Number of clicksThe number of mouse clicks that the participant used to

complete a task was measured. Overall, participants clicked themouse 23.78 times on average (SD¼ 5.81) to finish one task.ANOVA analysis revealed a significant effect for the interface type,F2,48¼ 8.95; p� 0.0005. Contrasts showed that the number of clicksused to finish a task utilizing interface Type 1 (M¼ 27.31, SD¼ 6.93)was significantly higher than in interface Type 2 (M¼ 22.12,SD¼ 7.41) and interface Type 3 (M¼ 21.9, SD¼ 7.313). However,there was no significant difference in the number of clicks betweeninterface Type 2 and interface Type 3.

4.2. Efficiency

4.2.1. Task completion timeResults showed significant differences in the task time between

the user interfaces, F2,48¼ 3.54; p¼ 0.0369. That is, participantswho used interface Type 1 (M¼ 63.08, SD¼ 11.47) spent a longertime completing tasks than those who used interface Type 3(M¼ 55.59, SD¼ 14.05). However, there were no significantdifferences in the task time between interface Type 1 and Type 2, aswell as between interface Type 2 and Type 3.

4.2.2. Number of errorsParticipants completed most tasks with very few errors

(M¼ 0.27, SD¼ 0.15 for Type 1; M¼ 0.28, SD¼ 0.15 for Type 2;M¼ 0.27, SD¼ 0.19 for Type 3). However, there was no significanteffect for the interface type on the number of errors (p> 0.05).

Table 3ANOVA results for the experiment (n¼ 20)

Category Parameter F value (F2,48) p Value

Effectiveness Ratio of success 0.18 >0.05Number of clicks 8.95 <0.0005

Efficiency Task completion time 3.54 ¼0.0369Number of errors 0.01 >0.05

Workload Workload 6.37 ¼0.0035

4.3. User satisfaction

The reliability of the rating scale was first assessed by calculatingCronbach’s coefficient alpha (Cronbach, 1951). The standardizedalpha of rating scales showed acceptable reliability (a¼ 0.92), withcoefficient greater than the suggested value of 0.70 (Nunnaly, 1978).

A User Interface Evaluation Survey was conducted to measurethe satisfaction rate for each interface type. Each question per-tained to the satisfaction criteria described in Section 3.3 – Table 2:Control, Quality of Information, Layout, Interaction Methods, andUtilization. Subjects expressed satisfaction or dissatisfaction byanswering each question with either YES or NO, respectively. Beloweach question a comment area could be utilized by a participant tofurther express the satisfaction or dissatisfaction experienced withany particular aspect of an interface (discussed in Section 5.1satisfaction). The results indicated that all aspects of both InterfaceTypes 2 and 3 had at least an 84% satisfaction rate (21 of 25participants). Results also showed that Interface Type 1’s satisfac-tion rates were lower – some aspects falling well below 84%satisfaction. In order to determine significant statistical differencein the user interface satisfaction results, a Chi-Square test for equalproportions was conducted. Of all Chi-Square tests for equalproportions conducted on the five interface aspects, Layoutproduced the greatest statistical difference with c2¼ 7.0,p< 0.0302.

The Interface Feature Questionnaire also assessed the intro-duction of future features, including an overhead map (84%),window utilization (48%), colored vital statistic bars (80%), andsection-wise text coloring (68%).

4.4. Workload

The overall workload ratings on the NASA TLX subjectiveworkload index were very low (M¼ 3.55, SD¼ 1.10). ANOVAanalysis showed that the main effect of interface type wassignificant, F2,48¼ 6.37; p¼ 0.0035. Contrasts showed thataverage workload for the participant was significantly lower ininterface Type 3 (M¼ 3.42, SD¼ 1.18) than in interface Type 1participants (M¼ 3.72, SD¼ 1.08). However, no significant effectswere found between interface Type 1 and interface Type 2(M¼ 3.50, SD¼ 1.25), as well as between interface Type 2 andinterface Type 3.

5. Discussion and conclusions

5.1. Usability of user interfaces in human–agent interaction

5.1.1. EffectivenessThere were two measurements constituting the effectiveness of

the user interfaces with an MAS – number of clicks and taskscenario ratio of success. We found that interface Type 2 andinterface Type 3 resulted in less clicks and a better ratio of successthan interface Type 1; there was no significant difference betweeninterface Type 2 and interface Type 3. These results can be directlyattributed to the design of each interface. First, Interface Type 2 andType 3 were Compact, as opposed to the Singular Type 1 (bothSingularity and Compactness are discussed in Section 5.2). There-fore, users only utilized one Role oriented combo-box in Type 2 andType 3, resulting in less clicks as opposed to four Role orientedcombo boxes in Type 1. Second, Type 2 and Type 3 containeda ‘Send All’ or ‘All’ button, respectively, allowing a user to senda selected order to all ally IBots – an interaction that would takea minimum of 12 clicks in Type 1, which was four times as manyclicks as Type 2 and Type 3 would require.

Table 5Main and sub-principles and guidelines for interface development

Main principle Sub-principle Description

InformationIsolation

The process in which a development teamdiscerns what information is critical touser-oriented tasks and interaction to ensuresuccess and comprehension.

Layout RankingHeuristic

Logical flow The process of ‘‘ranking’’ isolated interactioninformation via a series of questions in logicalsuccession

Logic break Where a logical flow stops and a new logical flowbegins


5.1.2. EfficiencyThere were two efficiency measurements regarding user inter-

action with an MAS: time and number of errors. Results showedthat interface Type 3 was significantly better than interface Type 1,while there was no significant difference between interface Type 2and interface Type 1, interface Type 3 and interface Type 2. Theseresults could be attributed to several interface design aspects. First,Type 3’s Layout Ranking Heuristic contained a single logical flow asopposed to two logical flows as in Type 1 and Type 2 (described inSection 3.4). This single logical flow may have allowed a user tobetter analyze data from interface top to interface bottom, resultingin more efficient decision making as opposed to Type 1 and Type 2.Second, Type 3’s layout addressed the covering problem, allowinguser’s to see all interface implementation without obstruction,resulting in more efficient decision making as opposed to Type 1and Type 2. Type 1 and 2’s similarity in these two points likelyattributes to the non-significant difference between Type 1 andType 2 ANOVA efficiency results. Third, Type 3 was of Compactdesign, allowing for easy readability and categorical informationdivision, which increases user information assessment and resultsin more efficient decision making as opposed to Type 1. Type 2’sCompact design likely attributes to the non-significant differencebetween Type 3 and Type 2 ANOVA efficiency results. Finally, Type3’s design accentuated the Question & Answer Field. While notessential to task scenario completion, the Q&A Field could poten-tially aid a user in decision making and information assessment inseveral scenario tasks. Typically, users would overlook or simplynot see the Q&A Field in Type 1 and Type 2 – more so in Type 1. Yetin Type 3, rarely did a question in the Q&A Field go unnoticed.

5.1.3. Satisfaction and workloadThe percentage results were determined byevaluating each user’s

YES or NO answer on each Interface evaluation survey: InterfaceControl, Quality of Information, Layout, and Interaction methods.Table 4 demonstrates Interface Type 2’s percentile divisions.

Each user logic ranking heuristic question was given an equaldivision. For example, Interface Type 2 is based on five logicquestions, thus each question received 20% of the interfacespercentile division. Within each logic question, each interfaceobject (i.e. A Button, Selectable or Viewable text, Radio Button) that‘‘made up’’ a particular question was allotted a fraction of thatquestions percentile division. For example, question (2) Vital Stats iscomposed of three objects, therefore each object is allotted an equaldivision of that question’s percentile division (i.e. a 20% percentiledivision divided by three objects yields a 6.67% allotment perobject). Of all Chi-Square Tests for Equal Proportions conducted on

Table 4Interface Type 2 percentile divisions for measuring interface utilization

Logic RankingHeuristic question

Percentiledivision (%)

Objects Objectpercentage

Current Role 20 Role Label 20Vital Stats 20 Health Label 6.67

Ammo Label 6.67Armor Label 6.67

Send Order 20 Send All Button 2.5Bot 1, 2, 3, 4 Buttons 2.5Capture Enemy Flag Role 2.5Defend My Flag Role 2.5Defend My Base Role 2.5Search N’ Destroy Role 2.5Say: Cover Me Role 2.5Say: Hold this Position Role 2.5

Autonomy 20 Toggle Button w/Label 20Q&A 20 Question Label 6.67

Yes Button 6.67No Button 6.67

the five interface aspects, Layout produced the greatest statisticaldifference with c(2)

2 ¼ 7.0, p< 0.0302. This is highly significant inthat the layout is development-wise the most distinguishableaspect between Type 1 and Type 2 and Type 3 comparatively. Thissubstantial statistical difference, coupled with the high user satis-faction rates for Interface Type 2 and Type 3, helps to reinforcesome of the more abstract guidelines that the development teamutilized in order to help solidify an MAS user interface designconcepts.

The ANOVA demonstrated that interface Type 3 is significantlybetter than interface Type 1, while there is no significant differencebetween interface Type 1 and interface Type 2, interface Type 2 andinterface Type 3. Again, similarities in the interface design canexplain the non-significant difference between interface Type 2and interface Type 3.

5.2. Design principles and guidelines

Establishing design goals assists in guiding the entire interfacedesign process by keeping the design choices accountable. There-fore, any interface development that violated at least one of thedesign principles was discarded or modified in order to adhere tothe interface design goals. Furthermore, these principles weredesigned with the intention of being easily applicable to any MASinterface design process. Table 5 describes the main and sub-principles and guidelines used during interface development.

5.2.1. Information IsolationThis fundamental design principle is necessary for any

developing interface. Essentially, it is the process in whicha development team discerns what information is critical to user-oriented tasks and interaction in order to ensure user success andcomprehension. Section 2.3 explained in detail the process by

Singularity When similar interface items are allotted theirown area or ‘‘box’’ within the interface. Deemedby the way in which a logic question is worded –specifically with words such as ‘one’, ‘single’, or‘particular’.

Compactness When similar interface items are groupedtogether in close proximity to another within theinterface. Deemed by the way in which a logicquestion is worded – specifically with words suchas ‘all’ or ‘every’

EfficientInteraction

Tool evaluation The assessment of interface development toolcapability

SMICA An interaction classification method (Size,Minimum Interaction, Cross Analyze) that assistsin deciding which interaction methods that aninterface development tool supports would bemost appropriate for an interface designed withindesign goals

InformationOptimization

The process of discerning the best way in whichthe interface information should be displayed

FeedbackMechanisms

The method in which the user interface relaysinformation to the user in regards to when newinformation is available or when a user hasinteracted with the interface


which interface information is further isolated to ensure usersuccess and comprehension for listed tasks (i.e. developedscenarios). Information Isolation supports design goals #1 and #3.

5.2.2. A Layout Ranking HeuristicBy approaching interface design from the user’s point of view,

an interface developer is better able to isolate essential interactioninformation. To further expound this fundamental design principle,a method that allows a developer to ‘‘rank’’ the isolated interactioninformation via a series of questions was designed in such a mannerthat one question leads to the next in a logical flow – indicative ofthe interface layout from top to bottom. As such, a potential LayoutRanking may contain a logical flow that breaks and reestablishesa new logical flow. In regards to MAS, a logical break is governed byinteractions that would directly change or alter an IBot’s role, sinceinteraction methods require essential information analysis. Forexample, Interface Type 1’s Layout Ranking Heuristic containsa logical flow in questions (1), (2), and (3), a logic break due to IBotRole Selection (an interaction that could directly change an IBot’srole), and a reestablishment of a new logical flow for questions(4) and (5). In this study’s framework, all interface interactionmethods alter IBot roles. Therefore, each interaction methodrequires the analysis of essential information in order to beeffective (e.g. IBot Health, Current Role, etc.). Also, because IBot RoleSelection can directly influence IBots as the user deemsappropriate, IBot Role Selection is dependent upon all otheressential information and interaction methods, resulting in aninformation/interaction hybridization of Human Answers and IBotAutonomy Control. Thus IBot Role Selection could be considereda ‘‘stronger’’ break in a layout’s logical flow.

The establishment of logical flows determines the shape aninterface will take. Likewise, the question itself determineswhether similar interface items will be singular or compact. Asan example, consider the difference in Type 1’s logic question(1) What is a particular IBot doing? as opposed to Type 2’s logicquestion (1) What are all of my IBot’s doing? Because Type 1’squestion refers to a particular IBot, it is clear that the layout ofInterface Type 1 is singular in the regard to similar interfaceitems – each IBot has its own ‘‘box’’ in the interface – a conceptlabeled as interface Singularity. Adversely, Type 2’s questionreflects Compactness, as similar interface items are groupedtogether in close proximity to another within the interface,which allows for a user to access multiple interface componentswithin a single glance (hence the use of ‘‘all’’ in Type 2’squestion). Furthermore a Compact interface allows for quickerinterface traversal mouse-wise than a Singular interface, mainlydue to the grouping of similar items and the overall size of theinterface. Compactness also supports categorical informationdivision, similar in design to an Excel spreadsheet, whichincreases user information assessment – resulting in more effi-cient decision making as compared to Singularity. The LayoutRanking Heuristic further supports design goal #2 and #5. Thisstudy’s results support – in regards to the Layout RankingHeuristic – the conclusion that fewer logic breaks and interfacecompactness lead to a more user-centered and successful userinterface.

5.2.3. Efficient InteractionAll design goals were taken into account in order to deem which

interaction methods were appropriate for the MAS interface design.In order to assist in this decision, a simple classification methodknown as SMICA (Size, Minimum Interactions, and Cross Analysis)was developed. SMICA can be applied to any development platform.In this study, SMICA analyzes available Java interface components.

First and foremost, size was taken into account. Of the Javainterface components available, the Button, Check Box, Combo Box,

Menu, Radio Button, Spinner, and Text Field were the smallestinteraction methods available. Next, the methods that required thefewest user interactions were determined from the interactionmethods that were deemed smallest. A user interaction was deter-mined to be click or keystroke initiated by the user. Of the interac-tions available, the Button, Check Box, Combo Box, Menu, RadioButton, and Spinner required the least user interactions (1–2 clicks)as opposed to the Text Box (an indeterminable number of keystrokes– dependent on the word typed). Next, the interaction methods thatrequired the fewest user interactions were cross analyzed, withspecial attention being paid to interaction method similarity (like-ness to another interface method), versatility (manner in which theinterface object can be used or placed), size, and simplicity (ease ofuse; based on user interaction clicks and keystrokes). Thus, theSMICA classification model allowed us to deem the Button, RadioButton, and Combo Box interface components as the optimalinteraction methods for interface development. Furthermore, theSMICA method supports design goals #2 and #4.

5.2.4. Information OptimizationThis is the process of discerning the best way in which the

interface information should be displayed. Large amounts of dis-played text can be a hindrance to the user. In this study icons wereutilized to easily define and distinguish interface informationcategories and to minimize the amount of textual information thatwould be displayed to a user. As a result, any textual informationacted as a user interaction initiation (as when an IBot hada question) or as a feedback mechanism (as discussed below). Ashorthand notation for display of an IBot’s Current Role wasincorporated in Interface Type 2 and Type 3 in order to test userclarity of information (For example Capture Enemy Flag would bedisplayed as CEF). Information Optimization supports design goals#2 and #3.

5.2.5. Feedback MechanismsThis fundamental design principle assists in many facets of

interface ease of use and user information comprehension. AFeedback Mechanism is the method in which the user interfacerelays information to the user when new information is available orwhen a user has interacted with a system via the interface. As such,the discerned Isolated Information could be displayed to the user inreal-time, producing real-time feedback. If a user sent an order toan IBot, that IBot’s Current Role would reflect the sent order. If anIBot was being damaged by an enemy IBot, its Health would reflectthe damage. These are just a few examples of the real-time feed-back that each interface provides. IBot Q&A Enabled Buttons werealso incorporated. For example, if an IBot did not have a question,then there was no need for a user to answer. Therefore, the Yes andNo Buttons in the Q&A areas would remain disabled, non-clickableand camouflaged into the interface until the IBot had a question, atwhich point it would become emboldened and clickable, providingfeedback to the user that an IBot had a question. FeedbackMechanisms support design goals #5 and #6.

5.3. Limitations and future work

This study’s sample size of 25 is somewhat limited, whichcaused a degree of difficulty in specifying, measuring, andabstracting data for some measurements. Therefore, a largersample size might alleviate that measurement difficulty, ultimatelyleading to more accurate and sound results and conclusions.However, this study’s framework testing with that sample size didreveal some system limitations. These system limitations were onlyapplicable to the version tested. In other words, these limitationswill be alleviated through algorithm modification in future versionsof this study.


Acknowledgements

This research was in part supported by the Air Force Office ofScientific Research (AFOSR) via the Center for Engineering Logisticsand Distribution (CELDi) under Grant No. FA9550-05-1-0182. Anyopinions, findings, and conclusions or recommendations expressedin this material are those of the authors and do not necessarilyreflect the views of the AFOSR or CELDi.

References

Bradshaw, J.M., 1997. An introduction to software agents. In: Bradshaw, J.M. (Ed.),Software Agents. MIT Press, Menlo Park, CA, pp. 3–46.

Brooks, R.A., 1991. Intelligence without representation. Artificial Intelligence Journal47, 139–159.

Cobo, P., Fortuny, J., Puertas, E., Richard, P., 2007. AgentGeom: a multiagent systemfor pedagogical support in geometric proof problems. International Journal ofComputers for Mathematical Learning 12, 57–79.

Cronbach, L., 1951. Coefficient alpha and the internal structure of tests. Psycho-metrika 16, 297–334.

Franklin, S., Graesser, A., 1996. Is it an agent, or just a program? A taxonomy forautonomous agents. In: Proceedings of the Third International Workshop onAgent Theories, Architectures, and Languages, pp. 21–35.

Gerber, A., Klusch, M., 2002. Agent-based integrated services for timber productionand sales. IEEE Intelligent Systems 17 (2), 2–8.

Hexmoor, H., Venkata, S., Hayes, D., 2006. Modelling social norms in multiagentsystems. Journal of Experimental and Theoretical Artificial Intelligence 18 (1),49–71.

Krupansky, J.W., 2003. What is a Software Agent? Agtivity, Foundations of Soft-ware Agents. Available from: http://agtivity.com/agdef.htm (accessed May2007).

Laird, J.E., van Lent, M., 2001. Human-level AI’s killer application: interactivecomputer games. AI Magazine, 15–25.

Lenox, T., Hahn, S., Lewis, M., Payne, T., Sycara, K., 2000. Task characteristics andintelligent aiding. In: Proceedings of the 2000 IEEE International Conference onSystems, Man, and Cybernetics, pp. 1123–1127.

Lesser, V.R., 1999. Cooperative multiagent systems: a personal view of the state ofthe art. IEEE Transactions on Knowledge and Data Engineering 11 (1), 133–142.

Maes, P., 1994. Agents that reduce work and information overload. Communicationsof the ACM 37 (7), 30–40.

Malin, J.T., Schreckenghost, D.L., 1993. Information for the User in Design ofIntelligent Systems. CASI.

Nunnaly, J., 1978. Psychometric Theory. McGraw-Hill, New York, NY.Pechoucek, M., Barta, J., Marık, V., 2002. CPlanT: coalition planning multi-agent

system for humanitarian relief operations. In: Proceedings of the Ninth ECCAI-ACAI/EASSS 2001, AEMAS 2001, HoloMAS 2001 on Multi-Agent-Systems andApplications II, pp. 363–376.

Scholtz, J., 2002. Evaluation methods for human–system performance of intelligentsystems. In: Proceedings of the 2002 Performance Metrics for IntelligentSystems (PerMIS) Workshop.

Scholtz, J., Bahrami, S., 2003. Human–robot interaction: development of an evalu-ation methodology for the bystander role of interaction. Proceedings of theSystems, Man, and Cybernetics Conference, Washington DC.

Shehory, O., Sycara, K., Sukthankar, G., Mukherjee, V., 1999. Agent AidedAircraft Maintenance. Third International Conference on AutonomousAgents.

Suchman, A.L., 1987. Plans and Situated Actions: The Problem of Human–MachineCommunication. Cambridge University Press.

Tambe, M., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Rosenbloom, P.S.,Schwamb, K., 1995. Intelligent agents for interactive simulation environments.AI Magazine 16 (1), 15–40.

Vlahavas, I., Bassiliades, N., Sakellariou, I., Molina, M., Ossowski, S., 2002. ExperNet:an intelligent multiagent system for WAN management. IEEE IntelligentSystems 17 (1), 62–72.

Wickramasinghe, L., Amarasiri, R., Alahakoon, L., 2004. A hybrid intelligentmultiagent system for e-business. Computational Intelligence 20.

http://agtivity.com/agdef.htm

Documents

Evaluation of human–agent user interfaces in multi-agent systems