Between laboratory and simulator: a cognitive approach to evaluating cockpit interfaces by manipulating informatory context

ORIGINAL ARTICLE

Between laboratory and simulator: a cognitive approachto evaluating cockpit interfaces by manipulatinginformatory context

Armin Eichinger • Johannes Kellerer

Received: 22 March 2013 / Accepted: 24 September 2013

� Springer-Verlag London 2013

Abstract An evaluation approach for correspondence-

driven domains is suggested and implemented. Touch

screen and trackball controls were evaluated as interaction

devices for large-area displays in the cockpits of highly

agile jet aircraft. To account for the context conditions of

selected use cases, informatory quality and the difficulty of

situational demands were analysed and manipulated

experimentally in dual-task scenarios, which were com-

pleted by experienced pilots. Results indicate a clear per-

formance advantage of touch compared to trackball

interaction, accompanied by less workload. Informatory

dimensions induce different performance and workload

ratings. Cognitive demands interfere the least with aiming

performance, followed by visual and motor. Task and

device influences are interdependent. Motor components of

an additional task interfere especially with trackball control

actions. Workload operates as a buffer. When difficulty

increases, performance decrements are lower than workload

increments. It is argued that this cognitive manipulation of

informatory context is advisable for correspondence-driven

domains, where context is expected to influence human–

machine interaction. Transfer to automotive display eval-

uation appears to be straightforward.

Keywords Jet aircraft � Interface evaluation �Informatory context � Panoramic display � Trackball �Touch screen � Correspondence-driven domain

1 Introduction

Evaluation of control devices is strongly related to the

type of work domain under study. In some domains,

quality aspects of human–machine interaction do not

depend on environmental influences; in other domains,

these influences are important determinants of a suc-

cessful interaction. Vicente (1990) classifies the former as

coherence-driven; the latter as correspondence-driven

work domains. Aircraft cockpit interfaces exchange action

and information between the pilot and a dynamic sur-

rounding. The field of aviation, therefore, qualifies as a

classical correspondence-driven domain.

1.1 Aviation as correspondence-driven domain

The context of flying puts strong information processing

constraints on pilot interface interaction. This is especially

true for tactical aircraft, where informatory load is created

by a broad range of highly demanding tasks that have to be

executed simultaneously by the pilot (Williges et al. 1989).

Wickens (1999) describes modern pilots as information

managers and time-sharing systems, spanning the whole

range of information processing dimensions: visual, audi-

tory, cognitive, verbal, manual/motor (Wickens and Car-

swell 2006). Informatory demands of pilot tasks are highly

complex and change dynamically (Clamann and Kaber

2004).

These external aspects of display interaction, induced by

the task of flying an agile jet aircraft, must not be disre-

garded. In particular, for early-stage evaluations of cockpit

interaction concepts, these informatory characteristics of

contextual influences have to be taken into account and

mapped empirically; allowing their potentially differential

impact on types of control to be examined. Such an

A. Eichinger (&)

Institute of Ergonomics, Technische Universitat Munchen,

Boltzmannstr. 15, 85747 Garching, Germany

e-mail: [email protected]

J. Kellerer

Department for Human Factors Engineering, Cassidian,

Rechliner Straße, 85077 Manching, Germany

123

Cogn Tech Work

DOI 10.1007/s10111-013-0270-y

approach attempts to implement what Alex Kirlik recently

recommended for the empirical work of cognitive engi-

neers: ‘I will continue to recommend (…) that their work

should strive for a reasonable level of fidelity in repre-

senting central aspects of the contexts to which they desire

their research to generalize’ (p. 219; Kirlik 2012).

In spite of dynamic surroundings, the pilot’s interaction

with the main instrument panel of modern jet aircraft takes

place under rather comfortable conditions: vibration spec-

trum is dominated by motion at frequencies below 1 Hz

(Griffin 1996); higher frequencies are controlled and min-

imized by aircraft stabilizers (Kellerer 2011). After

detailed analyses of Eurofighter flight data, Kellerer (2011)

conclude that MHDD interaction performance does not

deteriorate under normal flight conditions. More extreme

accelerations usually do not require main panel control

input.

1.2 Technological point of departure: controls

for a panoramic jet display

The current Eurofighter main instrument panel is dominated

by three individual multifunctional head-down displays

(MHDDs). The development of a holographic back-pro-

jection technique (Becker et al. 2008) allows for large-area

touch screen displays to be used as a single head-down

display. This large-area display will replace the MHDDs

with its corresponding soft keys as well as other display and

control units on the main instrument panel. Figure 1 illus-

trates the differences between the current and the intended

configuration, the latter of which more than doubles the

available multifunctional display area.

Possible control types are evaluated empirically in order

to explore the potential of this new display concept. The

current Eurofighter cursor control device has not been

optimized for interacting with a large-area screen. A

trackball has comparable interaction characteristics (Boff

and Lincoln 1983), as it is a high-performing representative

of the indirect control devices family. A similar trackball

control is used for display interaction in the Airbus A380.

To explore and understand the aptitude of direct

manipulation (Hutchins et al. 1986), interaction by touch

was selected as second type of control to be studied.

1.3 Filling the gap: deficits of current evaluation

studies

In developing new display concepts, evaluation of types of

control has to take place early in the design process. It is

time-consuming and costly to test early interaction proto-

types by means of realistic flight simulation, however.

During this stage of development, it is therefore not fea-

sible to take advantage of highly elaborate simulation and

test environments. Usually, these constraints lead to a

parsimonious kind of assessment involving constricted

studies in a laboratory setting, without taking any contex-

tual influences into account, e.g. demands induced by other

tasks. Inherent to these one-shot analyses or comparisons is

the conviction that there is no sensitivity of the interaction

concept to these external aspects.

Bearing these dimensions in mind, the following profile

of requirements can be defined for an evaluation approach

applicable to the aviation setting described above.

• As the intended domain of application is correspon-

dence-driven, study participants have sufficient exper-

tise in the work domain.

• Main interaction tasks are representative of the

intended domain of application.

• The main informatory dimensions are identified, ana-

lysed and quantitatively assessed.

• Informatory load profiles are represented by additional

tasks.

• Relevant context attributes, like difficulty or attention

allocation policy, are considered.

There is a wide spectrum of studies evaluating various

control devices (cf. Stanton et al. 2013; Noyes and Starr

2007; Rogers et al. 2005; Jones and Parrish1990; Alapetite

et al. 2013). Their methodological approaches differ with

regard to domain focus, practicability/expenditure, degree

of participants’ expertise, main tasks and context attributes

considered.

Fig. 1 Main instrument panel

of the Eurofighter Typhoon in

the current configuration (left)

and the intended area of the new

touch screen display (right;

Kellerer 2011; Eichinger 2011)

Cogn Tech Work

123

Stanton et al. (2013) evaluated input devices for aircraft

cockpits—a correspondence-driven domain. Participants

were not domain experts. Although simulator software was

used for stimulus presentation, the representative tasks

were not performed during flying conditions. The authors

emphasize that, for studies comparing different input

device types for aviation control, ‘it is important to account

for aspects of the context-of-use’ (p. 590). However, their

approach lacks an analysis and representation of theses

context aspects. Similarly, in their study of cockpit input

devices, Noyes and Starr (2007) emphasize the multi-task

nature of aviation, which leads to different context and

workload demands; an issue, they suggest, should be

addressed in future studies. Rogers et al. (2005) evaluated

input devices for desktop software—a coherence-driven

domain. Most interesting is their understanding of con-

textual influences; ‘context’ being taken into consideration

by using representative main tasks. Their research illus-

trates the differences in perspectives between correspon-

dence-driven and coherence-driven evaluation. In an early

study, Jones and Parrish (1990) compared different input

devices for a large-area cockpit display in a simulator

environment. Although a flight condition was set up,

comparisons with the nonflying condition revealed no

performance difference. There were no reasons given for

the selection of the scenario used. No preceding task ana-

lysis, no identification or manipulation of context condi-

tions was reported. Alapetite et al. (2013) compared touch

and trackball interaction for correspondence-driven

domains in an experimental evaluation of a user interface

concept. They used a secondary task to keep participants

focused on a head-up display while blindly interacting with

a main task. No further details were given regarding the

context that might influence interaction.

To the best of our knowledge, no study meets the

requirement profile described above.

1.4 Putting it all together: a correspondence-driven

empirical evaluation concept

We suggest an evaluation concept that fills the gap between

constrained one-shot laboratory and costly simulator experi-

ments by considering the requirements stated above. This new

approach attempts to enrich laboratory-based experiments by

analysing and explicitly considering the main contextual

influences. In accordance with this, our approach also com-

plies with a requirement defined by Hammond (1986): ‘when

an experiment is carried out, the question of generalization

from the laboratory to the actual conditions of interest outside

the laboratory must be answered by some form of defensible

logic’ (p. 431). Thereby, expenditures for simulator equip-

ment can be avoided without sacrificing the being able to

generalize the results.

The suggested evaluation concept takes into account the

characterizing aspects introduced: the influences of various

informatory qualities tapped by the different tasks, by

multitasking and by the varying intensity of informatory

demand are addressed and mapped for evaluation in a

realistic fashion that corresponds to real settings. To

account for these influences, informatory context is first

analysed and then mapped empirically for an experimental

evaluation of the two control devices: trackball and touch

screen. This mapping is effected by additional tasks, whose

complexity is manipulated. Main interaction tasks are

representative of the domain of application. Study partici-

pants are experienced in their work domain.

Although this approach is tailored to fit the requirements

defined in an aviation setting, its structure is easily trans-

ferable to other correspondence-driven domains (Bennett

and Flach 2011) as for example automotive engineering.

1.5 Research questions

The general purpose of the study is to suggest and imple-

ment an evaluation approach for correspondence-driven

domains, especially in highly agile aviation, which makes

the identification and experimental analysis of contextual

influences possible. General research questions, resulting

from the aforementioned approach, are as follows:

How do aspects of the context influence the measured

interaction criteria?—Main effects of informatory dimen-

sions or context complexity.

Is there a differential influence of the context attri-

butes?—Interaction effects of context attributes with con-

trol devices.

Is there a difference between the control devices?—

Main effects of control devices as they are studied in

classical usability studies.

The current study focuses specifically on the evaluative

comparison of performance and workload aspects of

trackball and touch. The manipulation of central context

attributes allows for the analysis of the following ques-

tions: Is there an informatory context influence on inter-

action performance and workload? Is there an

interdependency of informatory context and devices? Do

performance and workload measures react synchronously

on experimental conditions? What conclusions can be

drawn from the perspective of a cockpit designer with

regard to function allocation between devices?

2 Analysing informatory context

To cover a broad range of realistic situational demands,

three typical but divergent mission types were analysed (1)

Combat Air Patrol, CAP, a tactical flying pattern to detect

Cogn Tech Work

123

incoming intruders; (2) Air-to-Surface, A/S, an attack

mission; (3) Route Management, RM, a use case dominated

by waypoint editing.

For these use cases, a profile of informatory load was to

be created along the five processing dimensions: visual

perception, auditory perception, central processing or

cognition, verbal response and manual motor response.

Assessing visual load did not explicitly distinguish stimuli

inside and outside the cockpit.

2.1 Procedure and participants

A three-step approach was taken to create these profiles.

Firstly, informatory load was identified for the main flying

tasks: aviation, navigation, communication, system man-

agement and tactics (cf. Wickens 2003). Secondly, the

relevance of the main flying tasks for the selected use cases

was assessed on a scale from 0 to 100 %. Finally, infor-

matory load scores were aggregated by using relevance

ratings as linear weights according to the following model:

Ldim = Rtaskldim,task � wtask, with Ldim as the load for

dimension dim, ldim,task as the load for dimension dim

during the main flying task task and wtask as the relevance

of this main flying task for the use case in consideration.

The assessments were made by eight test pilots from EADS

Military Air Systems in Manching, Germany. The pilots’

average age was 44; they had an average flight experience

with highly agile jet aircraft of 3,350 h.

2.2 Results

Figure 2 documents almost identical load profiles for the

three use cases. All profiles show peak loads for visual

perception, cognitive processing and motor response.

In spite of the small sample size, Bonferroni-corrected

comparisons using t tests document significant differences

between any of the three peaks and any of the two lesser

loads, except for comparisons comprising the motor com-

ponent. Average corresponding effect sizes of these peak-

low comparisons in terms of contrast correlation coeffi-

cients (Rosenthal et al. 2000) within the three use cases

were reffect size = .752 (air-to-surface), reffect size = .631

(CAP) and reffect size = .732 (Route Management); they all

qualify as very large effects according to Cohen’s (1988)

classification.

According to these results, the study focus was placed

on visual, cognitive and motoric information processing

loads for further analyses.

2.3 Mapping informatory context

The general aspects of informatory context described

above put certain constraints on empirical mapping in the

form of experimental conditions. The information qualities

identified as contributing peak demands—visual, cognitive,

and motor—are reproduced by specific additional tasks,

which have to be accomplished while carrying out a rapid

aiming task for the main comparison of the two control

devices. As primarily visually demanding, a standardized

search task was used as described in ISO/DTS 14198

(2011). Cognitive processing was achieved by a memory

search task according to the classic Sternberg paradigm (cf.

Sternberg 2004). A motor response task was designed to be

similar to the widespread use of pegboard tests (cf. Strenge

et al. 2002). All additional tasks were selected or con-

structed so that difficulty and resource demand could be

manipulated experimentally accordingly.

The dynamic complexity of tasks is accomplished by

presenting each additional task with two levels of difficulty.

Difficulty levels were calibrated in preliminary tests to

qualify as being of average or high difficulty; for ease of

communication, these levels were labelled ‘easy’ and

‘difficult’.

To meet realistic requirements with regard to control

demands, two different aiming tasks were constructed

with differing task emphases. Under the conditions of the

first task, single static targets have to be selected as

quickly as possible after they appear on screen. Under

the conditions of the second task, several target symbols

are presented. Targets appear jointly static and moving

within a set of jointly static and moving distractor

symbols.

Fig. 2 Informatory load profiles for the three use cases. Error bars indicate ± one standard error

Cogn Tech Work

123

This setting fulfils the requirement of time-sharing by

presenting both tasks in a classic dual-task scenario. Sub-

jects were instructed to place equal emphasis on both tasks

to prevent any bias in performance.

3 Methods

3.1 Design

The main criteria for evaluating interaction concepts under

contextual influences have to take into account perfor-

mance aspects as well as the amount of processing

resources that have to be invested to achieve this level of

performance. For early phases of display development,

rapid aiming tasks suggest themselves as a commonly used

experimental paradigm, which has been in widespread use

since the pioneering experiments of Paul Fitts (1954).

There are two basic questions to be addressed in evaluating

the two forms of control interaction. Firstly, how do they

affect performance in rapid aiming tasks, and secondly,

how high are the corresponding costs of the subjective

workload.

As independent variables, the two types of control

devices were compared under three qualities of informa-

tory load and operationalized through three different

additional tasks, which were presented with two levels of

difficulty. The resulting three-way 2 9 3 9 2 design was

empirically implemented in a three-factor repeated mea-

sures design.

Two different aiming tasks were constructed as descri-

bed above. As the first dependent variable, the speed of

task fulfilment was selected as a measure of aiming per-

formance. To analyse the cognitive workload’s potential

buffer effects, NASA TLX ratings were recorded addi-

tionally after each experimental round.

3.2 Participants

Experiments for evaluations were supported by eleven

male test pilots from EADS Military Air Systems and the

German Federal Armed Forces Technical Centre WTD 61

in Manching, Germany. The pilots’ average age was 41

(SD = 9.1); they had an average flight experience with

highly agile aircraft of 3,100 h.

3.3 Instruments

Experiments were conducted in a cockpit mock-up, which

resembled a real cockpit in all relevant geometric respects.

The large-area display was positioned head-down and

integrated into the main instrument panel. The trackball

was positioned to the left of the pilot seat, in the position of

a classic cursor control device that is integrated into the

throttle. Both aiming tasks were executed on the main

display. Figure 3 shows the positioning of instruments used

for aiming and additional tasks in the mock-up.

3.3.1 Aiming task

Aiming performance was assessed in two different tasks.

For both tasks, subjects were instructed to select targets as

fast and accurately as possible.

In the single task condition, single square targets

appeared on random positions of the display. The target,

then, disappeared after it had been selected. Targets were

presented in intervals of 6 s. A new target would appear

regardless of previous target selection. Time needed to

select was recorded as a measure of task performance.

In the multiple task condition, multiple square symbols

appeared randomly on the display. Three target squares were

presented in red, and ten distractor squares were presented in

blue. One of the targets and two of the distractor squares

moved on a linear trajectory. After a target or distractor was

selected, the symbol disappeared. Figure 4 illustrates four

steps of symbol presentation and selection for single and

multiple target tasks, respectively. Time for target selection

was recorded as a measure of task performance.

3.3.2 Visual task

To access visual resources, a visual search, proposed by

ISO as a reference task (ISO/DTS 14198 2011), was used.

Fig. 3 Instruments for aiming and additional tasks were positioned

within a cockpit mock-up. The head-up display was used for the

visual task. The motor task was positioned to the right of the

participants; the trackball to their left. The large-area display with

integrated touch was installed at the position of the main instrument

panel

Cogn Tech Work

123

Fig. 4 Single target task (1–4 above), multiple target task (1–4 below); distractors are indicated by solid squares as opposed to targets;

movement of targets and distractors is designated by a direction indicator

Cogn Tech Work

123

The task was presented on a 17’’’ liquid crystal monitor

with a 1,024 9 768 pixel resolution, positioned as and

where a head-up display would be placed. Participants

were required to identify one white circle as the target

stimulus among a set of 50 similarly shaped and coloured

circles distractors. All circles were randomly configured.

Subjects indicated the half of the screen containing the

target circle by pressing the left or right arrow on a key-

board positioned to their right. Subjects were able to switch

their selection until 2 s post key press. After this two

second window, a new random circle configuration was

presented. An example of an easy and a difficult display

configuration before and after selection is shown in Fig. 5.

3.3.3 Cognitive task

This task was presented using the Cognitive Tasks software

(DaimlerChrysler AG). Participants were played recorded

numbers between one and nine, at a speed of one number

per second. Three seconds after the sequence was com-

pleted, a target number was presented. Subjects were

instructed to verbally indicate in a fast and accurate

way, whether this number was part of the preceding

sequence of numbers by answering ‘yes’ or ‘no’. The next

sequence began 5 s after presenting the target number.

Figure 6 depicts the temporal structure used for both easy

and difficult conditions. The easy condition consisted of

five numbers, while the difficult condition consisted of

eight numbers.

3.3.4 Motor task

In order to manipulate motoric load, a pegboard task was used

as described in Kuhn (2005). A wooden stencil was placed on

top of a flat, flexible keyboard to record when the stylus was

plugged into the circular slots (cf. Fig. 7). In the easy condition,

slot diameter was 25 mm and stylus diameter was 23 mm;

stylus radius was 7 mm on its edge. For the difficult condition,

slot diameter was 8 mm and stylus diameter was 7 mm; stylus

radius was 0.5 mm on its edge. Subjects were instructed to put

the stylus into the pegboard slots in a clockwise fashion

without visual feedback as fast and accurately as possible.

3.4 Procedure

All subjects attended to any combination of experimental

conditions. All combinations of the independent variables

difficulty (low, high), device (trackball, touch screen) and

additional task (visual, cognitive, motor) were presented in a

completely randomized manner. The experiment was sepa-

rated into two sessions for each of the two aiming tasks. These

sessions took place on two different days, usually with a week

between both. The sequence of the two kinds of aiming task

was balanced across the subjects. Subjects were given ample

time to become acquainted with the setting and practice each

task before recordings started. After each experimental run, the

NASA TLX workload questionnaire was completed and

interpreted in an equally weighted fashion suggested by

Nygren (1991).

Fig. 5 Easy (left) and difficult (middle) visual search task configurations before selection. In the rightmost image, the grey field shows the

selected screen half of a difficult configuration

Fig. 6 Easy (above) and

difficult (below) cognitive task

Cogn Tech Work

123

3.5 Data analysis

Data were analysed by using three-way repeated measures,

or ANOVAs. As two kinds of aiming task (single target

and multiple target) and two independent variables (aiming

performance and workload) were used, four ANOVAs

were computed.

The level of significance for any statistical analysis was

set to 5 %. Partial gp2 was used as effect size measure for

ANOVA effect tests. For omnibus effect tests that included

Fig. 7 Easy (left) and difficult

(right) motor task

Fig. 8 Mean performance (a–d) and corresponding mean workload (e–h) ratings for any experimental condition and for both aiming tasks,

single and multiple targets; error bars indicate ± one standard error

Cogn Tech Work

123

more than one degree of freedom, focused contrast analy-

ses with one degree of freedom were conducted addition-

ally. As suggested by Rosenthal et al. (2000), rcontrast was

used as effect size measure for focused contrast analyses.

4 Results

Figure 8 provides a visual account of the measures taken

and relates performance to corresponding workload ratings.

Statistical analyses reveal the significant main effects of

interaction device, difficulty and informatory dimension of

the additional task for both performance and workload ratings

under any of the two aiming conditions. Touch interaction

leads to better aiming performance and lower workload rat-

ings than trackball interaction. Difficult additional tasks lead

to poorer performance and a higher workload. Different

additional tasks lead to differences in performance and

workload ratings. According to Cohen’s effect size classifi-

cation, all effects qualify as very large (Cohen 1988). Anal-

yses of interaction effects document a dependency on the task

context of the devices used for aiming for single and multiple

target performance, as well as for multiple target workload

ratings. These effects also qualify as very large. Table 1

summarizes all relevant ANOVA statistics with error prob-

ability and effect size estimates.

In order to take a closer look at effects with multiple

degrees of freedom, focused contrasts were analysed. For

both single and multiple target performance ratings, they

document significant differences between additional visual

and cognitive demands, as well as between cognitive and

motor demands. For workload ratings under the single and

multiple target conditions, there were significant differences

between visual and motor, as well as between cognitive and

motor demands. Significant interaction effects of devices

and tasks were broken down to focused interaction contrasts.

They all show the same result: There is a significant task

dependency of the difference between touch and trackball

when an additional motor demand is present. The difference

between motor and visual or cognitive task increases when

the trackball is used instead of touch. There is no significant

interaction for visual compared to cognitive demand with

respect to device-induced differences in performance or

workload. Table 2 summarizes all relevant contrast statistics

with error probability and effect size estimates. All signifi-

cant contrasts show very large effects.

5 Discussion

At the heart of this cognitive approach to evaluation lies

the manipulation of informatory context. Touch is faster

than trackball interaction for any additional task and

achieves this with a lower workload for most conditions.

There are, however, differences in informatory dimension

with regard to aiming performance and workload. From a

performance perspective, additional cognitive load leads to

better results than visual or motor demand. From a work-

load perspective, additional motor load leads to higher

ratings than visual or cognitive demand. The two measures

taken complement each other: performance measurement

helps to identify cognitive as the least detrimental, and

workload ratings help to identify motor as the most detri-

mental informatory condition.

There is a clear advantage of touch interaction regarding

task performance. This difference is not a result of

increased effort, as the main determinant of cognitive

workload (Kahneman 1973), as workload ratings were

lower for touch interaction. Better aiming performance is

achieved by a lower workload. This evaluation aspect holds

true for the single and multiple target aiming, as well as for

both levels of additional task difficulty.

There is a differentiating influence of informatory con-

text on device effects. As Fig. 8 illustrates, when combined

with a motor task, aiming performance with trackball is

clearly slower in comparison with visual or cognitive. This

suggests an interference of motor demands for both tasks,

which is not present with touch interaction.

Table 1 Summary of resulting statistics for all four ANOVA analyses; significant effects are set in boldface

Effect df ST performance MT performance ST workload MT workload

F p gp2 F p gp

2 F p gp2 F p gp

2

Device 1,10 133.47 <.001 .93 135.28 <.001 .93 22.16 .001 .69 8.36 .016 .46

Task 2,20 19.83 <.001 .55 24.28 <.001 .71 27.77 <.001 .73 12.94 <.001 .56

Difficulty 1,10 12.27 .001 .67 12.45 .005 .56 74.88 <.001 .88 40.48 <.001 .80

Dev. 9 task 2,20 20.59 <.001 .67 7.92 .003 .44 0.63 .544 .06 5.34 .014 .35

Dev. 9 diff. 1,10 0.08 .783 .01 0.04 .783 .04 0.45 .520 .04 2.30 .160 .19

Task 9 diff. 2,20 1.82 .187 .15 2.01 .160 .17 0.46 .638 .04 1.00 .384 .09

Dev. 9 task 9 diff. 2,20 0.60 .559 .06 0.67 .522 .06 1.37 .276 .12 .784 .470 .07

ST single targets, MT multiple targets

Cogn Tech Work

123

Figure 8 indicates a difference in workload compared to

performance measures, when the difficulty of the additional

task is increased. The effect on workload is much more

pronounced than the effect on performance. For single tar-

gets, the effect of difficulty on workload compared to per-

formance is higher by 31 % in effect size; for multiple

targets, this increase amounts to 43 %. Workload seems to

have a buffer function concerning performance that con-

forms to an energetic notion of workload (cf. Kahneman

1973; Sanders 1983). By exerting more effort, a sufficient

level of performance is achieved. As workload is a bounded

resource, performance can only be maintained up to certain

levels. In potential overload situations, there might not be

any resources available. So a workload increase beyond

certain limits, which might go unnoticed by performance

indicators, could lead to a complete performance breakdown.

This stunning observation alone justifies the inclusion of

workload ratings in this type of evaluation. Relying exclu-

sively on performance indicators would reveal only part of

the mechanisms underlying the interaction achievement.

The aim of this study was not to choose between one of

two interaction devices, but to analyse their strengths and

weaknesses under realistic informatory circumstances.

Conforming to the HOTAS principle—hands on throttle

and stick—certain functions of display interaction have to

be accomplished using a remote cursor control device

(Smith 1999). The suggested approach enriches the clas-

sical procedure of evaluating control devices by analysing

and empirically mapping informatory context. This study

highlights possible interferences with additional tasks with

motor demands as one main weakness of remote cursor

control devices, represented by trackball. This reduction in

aiming performance is accompanied by an increase in

workload, especially in highly demanding situations that

might lead to highly detrimental operating conditions or

even performance breakdowns. It is up to regulation

authorities and cockpit designers to take these influences

and interferences into account when analysing tasks to be

accomplished by display interaction and allocating func-

tions to different interaction devices.

Although our evaluation approach led to a broad spec-

trum of interesting results, and has, thus, broadened the

scope of a classical approach, the potential to improve

certainly still remains. We took single-peak dimensions of

informatory context and mapped them by using isolated

additional tasks. To consider the complex nature of tasks to

be accomplished in cockpit interaction, one complex task

could be devised that jointly maps the three peak dimen-

sions of visual, cognitive and motor demands. This task

should be manipulated in difficulty, as were the single tasks

used for this study.

To be able to study the effects of strategic allocation of

attentional resources, as might be expected with highly

skilled jet pilots (Wickens and McCarley 2008), an explicit

task focus could be considered. Subjects might be

instructed to allocate different portions of their resources to

the various task components; main and additional tasks,

respectively.

Piloting jet aircraft was introduced as a correspondence-

driven working domain, where contextual characteristics

influence display interaction performance. Variable sce-

narios induce volatile requirements in task difficulty and in

turn in pilot workload. The very same characteristics hold

true for other realms, for example the automotive domain:

A highly agile vehicle is navigated through multifaceted

scenarios requiring various informatory profiles of atten-

tional resources. As in aviation, workload is an important

construct in evaluating driver–display interaction perfor-

mance; this is well documented by a vast body of research

in automotive human factors engineering.

Our approach, as described and implemented for avia-

tion can, therefore, be adopted by the automotive domain

with few, if any, modifications: (1) representative scenarios

are to be identified, e.g. urban and highway traffic; (2) main

driving tasks are to be defined, e.g. in accordance with

popular three-level models of driving, as formulated by

Michon (1985), separating strategic, tactical and control

tasks; (3) informatory load profiles are to be determined

using the weighting scheme described above; (4) load

profiles are represented by additional tasks focusing on

Table 2 Results of a contrast analysis of device 9 task interactions; significant contrasts are set in boldface

Contrast df ST performance MT performance ST workload MT workload

F p rcont. F P rcont. F p rcont. F p rcont.

Vis vs. cog 1,10 22.81 <.001 .83 22.81 <.001 .83 1.23 .294 .33 3.28 .100 .50

Vis vs. mot 1,10 2.20 .168 .43 2.21 .167 .43 32.67 <.001 .88 10.71 .008 .72

Cog vs. mot 1,10 17.98 .002 .80 56.18 <.001 .92 52.52 <.001 .92 18.21 .002 .80

Vis vs. cog, TB vs. touch 1,10 4.40 .073 .55 0.41 .537 .20 0.09 .771 .09 .16 .699 .12

Vis vs. mot, TB vs. touch 1,10 13.79 .004 .76 7.53 .021 .66 1.66 .227 .38 5.52 .041 .59

Cog vs. mot, TB vs. touch 1,10 44.23 <.001 .90 12.91 .005 .75 0.69 .426 .25 10.45 .009 .71

ST single targets, MT multiple targets, Vis; Cog; Mot visual, cognitive, motor task, TB trackball, Touch touch screen

Cogn Tech Work

123

single-demand dimensions, e.g. using the methods descri-

bed in this study; (5) relevant task characteristics are

manipulated, e.g. difficulty or resource allocation; (6)

appropriate performance measures are selected for assess-

ing interaction quality, e.g. speed or error rate; (7) effects

of display and/or interaction devices, e.g. comparing jog

dial and touch, in combination with context characteristics

are analysed by appropriate factorial designs with respect

to performance and workload measures.

This cognitive approach of context-sensitive evaluation

can take place in a very early phase of development of

display or interaction devices for any correspondence-dri-

ven domain, sharing the basic characteristics and general

steps described above. It can and should replace evaluation

efforts that restrict themselves to a one-shot study design,

ignoring contextual influences for lack of a feasible or

affordable simulation environment.

References

Alapetite A, Fogh R, Ozkil AG, Andersen HB (2013) A deported

view concept for touch interaction. In: Proceedings of

ACHI’2013, international conference on advances in com-

puter–human interactions, pp 22–27

Becker S, Neujahr H, Sandl P, Babst U (2008) Holographisches

display—HOLDIS. In: Grandt M, Bauch A (eds) Beitrage der

Ergonomie zur Mensch-System-Integration. DGLR, Bonn

Bennett KB, Flach JM (2011) Display and interface design: subtle

science, exact art. CRC Press, Boca Raton

Boff K, Lincoln J (1983) Engineering data compendium: human

perception and performance. Wiley, New York

Clamann M, Kaber D (2004) Applicability of usability evaluation

techniques to aviation systems. Int J Aviation Psychol

14(4):395–420

Cohen J (1988) Statistical power analysis for the behavioral sciences.

Erlbaum, Hillsdale

Eichinger A (2011) Bewertung von Benutzerschnittstellen fur Cock-

pits hochagiler Flugzeuge. Sudwestdeutscher Verlag fur Ho-

chschulschriften, Saarbrucken

Griffin MJ (1996) Handbook of human vibration. Elsevier,

Amsterdam

Hammond K (1986) Generalization in operational contexts: What

does it mean? Can it be done? IEEE Trans Syst Man Cybern

16(3):428–433

Hutchins EL, Hollan JD, Norman DA (1986) Direct manipulation

interfaces. In: Norman DA, Draper SW (eds) User centered

system design: new perspectives on human–computer interac-

tion. Lawrence Erlbaum Associates, Hillsdale, pp 87–124

ISO/DTS 14198 (2011) Road vehicles—ergonomic aspects of trans-

port information and control systems—calibration tasks for

methods which assess demand due to the use of in-vehicle

systems. Revised Draft Version

Jones DR, Parrish RV (1990) Simulator comparison of thumball,

thumb switch and touch screen input concepts for interaction

with large screen cockpit display format. NASA Technical

Memorandum no 1025

Kahneman D (1973) Attention and Effort. Prentice-Hall, Englewood

Cliffs

Kellerer J (2011) Panoramic displays: Untersuchung zur Auswahl von

Eingabeelementen fur Großflachendisplays in Flugzeugcockpits.

Sudwestdeutscher Verlag fur Hochschulschriften, Saarbrucken

Kirlik A (2012) Relevance versus generalization in cognitive

engineering. Cogn Technol Work 14:213–220

Kuhn F (2005) Methode zur Bewertung der Fahrerablenkung durch

Fahrerinformations-Systeme. World Usability Day

Michon JA (1985) A critical view of driver behavior models: what do

we know, what should we do? In: Evans L, Schwing RC (eds)

Human behavior and traffic safety. Plenum Press, New York,

pp 485–524

Noyes JM, Starr AF (2007) A comparison of speech input and touch

screen for executing checklists in an avionics application. Int J

Aviation Psychol 17(3):299–315. doi:10.1080/

10508410701462761

Nygren TE (1991) Psychometric properties of subjective workload

measurement techniques: implications for their use in the

measurements of perceived mental workload. Hum Factors

33:17–33

Rogers WA, Fisk AD, McLaughlin AC, Pak R (2005) Touch a screen

or turn a knob: choosing the best device for the job. Hum Factors

47:271–288

Rosenthal R, Rosnow R, Rubin DB (2000) Contrasts and effect sizes

in behavioral research: a correlational approach. Cambridge

University Press, Cambridge

Sanders AF (1983) Towards a model of stress and performance. Acta

Psychol 53(1):61–97

Smith C (1999) Design of the Eurofighter human machine interface.

Air Space Eur 1(3):54–59

Stanton NA, Harvey C, Plant KL, Bolton L (2013) The performance

of computer input devices in a vibration environment. Ergo-

nomics 56(4):590–611. doi:10.1080/00140139.2012.751458

Sternberg S (2004) Memory-scanning: Mental processes revealed by

reaction-time experiments. In: Balota DA, Marsh EJ (eds)

Cognitive Psychology: Key Readings. Psychology Press, New

York, pp 48–74

Strenge H, Niederberger U, Seelhorst U (2002) Correlation between

tests of attention and performance on grooved and purdue

pegboards in normal subjects. Percept Mot Skills 95(2):507–514

Vicente KJ (1990) Coherence- and correspondence-driven work

domains: implications for system design. Behav Inf Technol

9:493–502

Wickens C (1999) Aerospace psychology. In: Damos D (ed) Human

performance and ergonomics. Academic Press, San Diego,

pp 195–242

Wickens C (2003) Pilot actions and tasks: selection execution, and

control. In: Tsang P, Vidulich M (eds) Principles and practice of

aviation psychology. Lawrence Erlbaum Associates, Mahwah,

pp 239–265

Wickens C, Carswell C (2006) Information processing. In: Salvendy

G (ed) Handbook of human factors and ergonomics. Wiley,

Hoboken, pp 111–149

Wickens C, McCarley J (2008) Applied attention theory. CRC Press,

Boca Raton

Williges R, Williges B, Fainter R (1989) Software interfaces for

aviation systems. In: Wiener E, Nagel D (eds) Human factors in

aviation. Academic Press, San Diego, pp 463–493

Cogn Tech Work

123

http://dx.doi.org/10.1080/10508410701462761

http://dx.doi.org/10.1080/10508410701462761

http://dx.doi.org/10.1080/00140139.2012.751458

Documents

Between laboratory and simulator: a cognitive approach to evaluating cockpit interfaces by manipulating informatory context