Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C....

Preview:

Citation preview

Evaluation and metrics: Measuring the effectiveness of

virtual environments

Doug Bowman

Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 2

11.2.2 Types of evaluation

Cognitive walkthrough

Heuristic evaluation

Formative evaluation Observational user studies Questionnaires, interviews

Summative evaluation Task-based usability evaluation Formal experimentation

Sequentialevaluation

Testbedevaluation

(C) 2005 Doug Bowman, Virginia Tech 3

11.5 Classifying evaluation techniques

Ÿ Form al Sum m ativeEvaluation

Ÿ Post-hoc Q uestionnaire

Ÿ (generic perform ancem odels for VEs (e.g., fitt'slaw))

Ÿ In form al Sum m ativeEvaluation

Ÿ Post-hoc Q uestionnaire

Ÿ H euris tic Evaluation

Ÿ Form ative EvaluationŸ Form al Sum m ative

EvaluationŸ Post-hoc Q uestionnaire

Ÿ Form ative Evaluation(in form al and form al)

Ÿ Post-hoc Q uestionnaireŸ In terview / D em o

Ÿ (application-specificperform ance m odels forVEs (e.g., G O M S))

Ÿ H euris tic EvaluationŸ C ognitive W alk through

Generic

Quantitative

Qualitative

Requires Users Does Not Require Users

{Quantitative

Qualitative

U s e r I n v o l v e m e n t

C o

n t

e x

t

o f

E

v a

l u

a t

i o

n

T y

p e

o f R

e s

u l t s

ApplicationSpecific{

Generic

Qualitative

Quantitative

Application-specific

Qualitative

Quantitative

(C) 2005 Doug Bowman, Virginia Tech 4

11.4 How VE evaluation is different

Physical issuesUser can’t see world in HMDThink-aloud and speech incompatible

Evaluator issuesEvaluator can break presenceMultiple evaluators usually needed

(C) 2005 Doug Bowman, Virginia Tech 5

11.4 How VE evaluation is different (cont.)

User issuesVery few expert usersEvaluations must include rest breaks to

avoid possible sickness

Evaluation type issuesLack of heuristics/guidelinesChoosing independent variables is difficult

(C) 2005 Doug Bowman, Virginia Tech 6

11.4 How VE evaluation is different (cont.)

Miscellaneous issuesEvaluations must focus on lower-level

entities (ITs) because of lack of standardsResults difficult to generalize because of

differences in VE systems

(C) 2005 Doug Bowman, Virginia Tech 7

11.6.1 Testbed evaluation framework

Main independent variables: ITs

Other considerations (independent variables) task (e.g. target known vs. target unknown) environment (e.g. number of obstacles) system (e.g. use of collision detection) user (e.g. VE experience)

Performance metrics (dependent variables) Speed, accuracy, user comfort, spatial awareness…

Generic evaluation context

(C) 2005 Doug Bowman, Virginia Tech 8

Testbed evaluation

User-centered Application8

Heuristics&

Guidelines

7

QuantitativePerform ance

Results

6

T e s t b e dE v a l u a t i o n

5

2Taxonom y

Outside Factorstask, users, evnironm ent,

system

3 4 Perform anceM etrics

Initial Evaluation1

(C) 2005 Doug Bowman, Virginia Tech 9

Taxonomy

Establish a taxonomy of interaction technique for the interaction task being evaluate.

Example : Task: Changing the object’s color 3 sub tasks :

selecting object Choosing a color Applying color

2 possible technique components (TC) for choosing a color Changing the values of R, G and B sliders Touching a point within a 3D color space

(C) 2005 Doug Bowman, Virginia Tech 10

Outside Factors

A user’s performance on an interaction task may depend on a variety of factors.

4 categories Task

Distance to be traveled, size of object to be manipulated Environment

The number of obstacles, the level of activity or motion User

Spatial awareness, physical attributes (arm length, etc) System

Lighting model, the mean frame rate etc.

(C) 2005 Doug Bowman, Virginia Tech 11

Performance Metrics

Information about human performance

Speed, Accuracy : quantitative

More subjective performance valuesEase of use, ease of learning, and user

comfortThe user’s sense and body, user-centric

performance measure

(C) 2005 Doug Bowman, Virginia Tech 12

Testbed Evaluation

Final stages in the evaluation of Interaction techniques for 3D Interaction tasks

Generic, generalizable, and reusable evaluation through the creations of test-beds.

Test-beds : Environments and tasks Involve all important aspects of a task Evaluate each component of a technique Consider outside influences on performance Have multiple performance measures

(C) 2005 Doug Bowman, Virginia Tech 13

Application and Generalization of Results Testbed evaluation produces models that characterize the

usability of an interaction technique for the specified task. Usability is given in terms of multiple performance metrics w.r.t

various lelvels of outside factors. -> performance Database(DB) More information is added to the DB each time a new technique is

run through the testbed.

To choose interaction techniques for applications appropriately, one must understand the interaction requirements of the application The performance results from testbed evaluation can be used to

recommend interaction techniques that meet those requirements.

(C) 2005 Doug Bowman, Virginia Tech 14

11.6.2 Sequential evaluation

Traditional usability engineering methods

Iterative design/eval.

Relies on scenarios, guidelines

Application-centric

User-centered Application

(D )R epresentative

U serT ask

Scenarios

(C )S tream lined

U ser In terfaceD esigns

(1)User TaskAnalysis

(3)Formative

User-CenteredEvaluation

(4)Summative

ComparativeEvaluation

(2)Heuristic

Evaluation

(A )T ask

D escriptionsSequences &D ependencies

(E )Iterative ly R efined

U ser In terfaceD esigns

(B)G uidelines

andH euris tics

(C) 2005 Doug Bowman, Virginia Tech 15

11.3 When is a VE effective?

Users’ goals are realized

User tasks done better, easier, or faster

Users are not frustrated

Users are not uncomfortable

(C) 2005 Doug Bowman, Virginia Tech 16

11.3 How can we measure effectiveness?

System performance

Interface performance / User preference

User (task) performance

All are interrelated

(C) 2005 Doug Bowman, Virginia Tech 17

Effectiveness case studies

Watson experiment: how system performance affects task performance

Slater experiments: how presence is affected

Design education: task effectiveness

(C) 2005 Doug Bowman, Virginia Tech 18

11.3.1 System performance metrics

Avg. frame rate (fps)

Avg. latency / lag (msec)

Variability in frame rate / lag

Network delay

Distortion

(C) 2005 Doug Bowman, Virginia Tech 19

System performance

Only important for its effects on user performance / preference frame rate affects presencenet delay affects collaboration

Necessary, but not sufficient

(C) 2005 Doug Bowman, Virginia Tech 20

Case studies - Watson

How does system performance affect task performance?

Vary avg. frame rate, variability in frame rate

Measure perf. on closed-loop, open-loop task

e.g. B. Watson et al, Effects of variation in system responsiveness on user performance in virtual environments. Human Factors, 40(3), 403-414.

(C) 2005 Doug Bowman, Virginia Tech 21

11.3.3 User preference metrics

Ease of use / learning

Presence

User comfort

Usually subjective (measured in questionnaires, interviews)

(C) 2005 Doug Bowman, Virginia Tech 22

User preference in the interface

UI goalsease of useease of learningaffordancesunobtrusivenessetc.

Achieving these goals leads to usability

Crucial for effective applications

(C) 2005 Doug Bowman, Virginia Tech 23

Case studies - Slater

questionnaires

assumes that presence is required for some applications

e.g. M. Slater et al, Taking Steps: The influence of a walking metaphor on presence in virtual reality. ACM TOCHI, 2(3), 201-219.

study effect of:collision detectionphysical walkingvirtual bodyshadowsmovement

(C) 2005 Doug Bowman, Virginia Tech 24

User comfort

Simulator sickness

Aftereffects of VE exposure

Arm/hand strain

Eye strain

(C) 2005 Doug Bowman, Virginia Tech 25

Measuring user comfort

Rating scales

QuestionnairesKennedy - SSQ

Objective measuresStanney - measuring aftereffects

(C) 2005 Doug Bowman, Virginia Tech 26

11.3.2 Task performance metrics

Speed / efficiency

Accuracy

Domain-specific metricsEducation: learningTraining: spatial awarenessDesign: expressiveness

(C) 2005 Doug Bowman, Virginia Tech 27

Speed-accuracy tradeoff

Subjects will make a decision

Must explicitly look at particular points on the curve

Manage tradeoffSpeed

Acc

urac

y

(C) 2005 Doug Bowman, Virginia Tech 28

Case studies: learning

Measure effectiveness by learning vs. control group

Metric: standard test

Issue: time on task not the same for all groups

e.g. D. Bowman et al. The educational value of an information-rich virtual environment. Presence: Teleoperators and Virtual Environments, 8(3), June 1999, 317-331.

(C) 2005 Doug Bowman, Virginia Tech 29

Aspects of performance

SystemPerformance

InterfacePerformance Task

Performance

Effectiveness

(C) 2005 Doug Bowman, Virginia Tech 30

11.7 Guidelines for 3D UI evaluation

Begin with informal evaluation

Acknowledge and plan for the differences between traditional UI and 3D UI evaluation

Choose an evaluation approach that meets your requirements

Use a wide range of metrics – not just speed of task completion

(C) 2005 Doug Bowman, Virginia Tech 31

Guidelines for formal experiments

Design experiments with general applicability Generic tasks Generic performance metrics Easy mappings to applications

Use pilot studies to determine which variables should be tested in the main experiment

Look for interactions between variables – rarely will a single technique be the best in all situations

(C) 2005 Doug Bowman, Virginia Tech 32

Acknowledgments

Deborah Hix

Joseph Gabbard

Recommended