32
Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

Embed Size (px)

Citation preview

Page 1: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

Evaluation and metrics: Measuring the effectiveness of

virtual environments

Doug Bowman

Edited by C. Song

Page 2: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 2

11.2.2 Types of evaluation

Cognitive walkthrough

Heuristic evaluation

Formative evaluation Observational user studies Questionnaires, interviews

Summative evaluation Task-based usability evaluation Formal experimentation

Sequentialevaluation

Testbedevaluation

Page 3: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 3

11.5 Classifying evaluation techniques

Ÿ Form al Sum m ativeEvaluation

Ÿ Post-hoc Q uestionnaire

Ÿ (generic perform ancem odels for VEs (e.g., fitt'slaw))

Ÿ In form al Sum m ativeEvaluation

Ÿ Post-hoc Q uestionnaire

Ÿ H euris tic Evaluation

Ÿ Form ative EvaluationŸ Form al Sum m ative

EvaluationŸ Post-hoc Q uestionnaire

Ÿ Form ative Evaluation(in form al and form al)

Ÿ Post-hoc Q uestionnaireŸ In terview / D em o

Ÿ (application-specificperform ance m odels forVEs (e.g., G O M S))

Ÿ H euris tic EvaluationŸ C ognitive W alk through

Generic

Quantitative

Qualitative

Requires Users Does Not Require Users

{Quantitative

Qualitative

U s e r I n v o l v e m e n t

C o

n t

e x

t

o f

E

v a

l u

a t

i o

n

T y

p e

o f R

e s

u l t s

ApplicationSpecific{

Generic

Qualitative

Quantitative

Application-specific

Qualitative

Quantitative

Page 4: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 4

11.4 How VE evaluation is different

Physical issuesUser can’t see world in HMDThink-aloud and speech incompatible

Evaluator issuesEvaluator can break presenceMultiple evaluators usually needed

Page 5: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 5

11.4 How VE evaluation is different (cont.)

User issuesVery few expert usersEvaluations must include rest breaks to

avoid possible sickness

Evaluation type issuesLack of heuristics/guidelinesChoosing independent variables is difficult

Page 6: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 6

11.4 How VE evaluation is different (cont.)

Miscellaneous issuesEvaluations must focus on lower-level

entities (ITs) because of lack of standardsResults difficult to generalize because of

differences in VE systems

Page 7: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 7

11.6.1 Testbed evaluation framework

Main independent variables: ITs

Other considerations (independent variables) task (e.g. target known vs. target unknown) environment (e.g. number of obstacles) system (e.g. use of collision detection) user (e.g. VE experience)

Performance metrics (dependent variables) Speed, accuracy, user comfort, spatial awareness…

Generic evaluation context

Page 8: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 8

Testbed evaluation

User-centered Application8

Heuristics&

Guidelines

7

QuantitativePerform ance

Results

6

T e s t b e dE v a l u a t i o n

5

2Taxonom y

Outside Factorstask, users, evnironm ent,

system

3 4 Perform anceM etrics

Initial Evaluation1

Page 9: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 9

Taxonomy

Establish a taxonomy of interaction technique for the interaction task being evaluate.

Example : Task: Changing the object’s color 3 sub tasks :

selecting object Choosing a color Applying color

2 possible technique components (TC) for choosing a color Changing the values of R, G and B sliders Touching a point within a 3D color space

Page 10: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 10

Outside Factors

A user’s performance on an interaction task may depend on a variety of factors.

4 categories Task

Distance to be traveled, size of object to be manipulated Environment

The number of obstacles, the level of activity or motion User

Spatial awareness, physical attributes (arm length, etc) System

Lighting model, the mean frame rate etc.

Page 11: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 11

Performance Metrics

Information about human performance

Speed, Accuracy : quantitative

More subjective performance valuesEase of use, ease of learning, and user

comfortThe user’s sense and body, user-centric

performance measure

Page 12: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 12

Testbed Evaluation

Final stages in the evaluation of Interaction techniques for 3D Interaction tasks

Generic, generalizable, and reusable evaluation through the creations of test-beds.

Test-beds : Environments and tasks Involve all important aspects of a task Evaluate each component of a technique Consider outside influences on performance Have multiple performance measures

Page 13: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 13

Application and Generalization of Results Testbed evaluation produces models that characterize the

usability of an interaction technique for the specified task. Usability is given in terms of multiple performance metrics w.r.t

various lelvels of outside factors. -> performance Database(DB) More information is added to the DB each time a new technique is

run through the testbed.

To choose interaction techniques for applications appropriately, one must understand the interaction requirements of the application The performance results from testbed evaluation can be used to

recommend interaction techniques that meet those requirements.

Page 14: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 14

11.6.2 Sequential evaluation

Traditional usability engineering methods

Iterative design/eval.

Relies on scenarios, guidelines

Application-centric

User-centered Application

(D )R epresentative

U serT ask

Scenarios

(C )S tream lined

U ser In terfaceD esigns

(1)User TaskAnalysis

(3)Formative

User-CenteredEvaluation

(4)Summative

ComparativeEvaluation

(2)Heuristic

Evaluation

(A )T ask

D escriptionsSequences &D ependencies

(E )Iterative ly R efined

U ser In terfaceD esigns

(B)G uidelines

andH euris tics

Page 15: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 15

11.3 When is a VE effective?

Users’ goals are realized

User tasks done better, easier, or faster

Users are not frustrated

Users are not uncomfortable

Page 16: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 16

11.3 How can we measure effectiveness?

System performance

Interface performance / User preference

User (task) performance

All are interrelated

Page 17: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 17

Effectiveness case studies

Watson experiment: how system performance affects task performance

Slater experiments: how presence is affected

Design education: task effectiveness

Page 18: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 18

11.3.1 System performance metrics

Avg. frame rate (fps)

Avg. latency / lag (msec)

Variability in frame rate / lag

Network delay

Distortion

Page 19: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 19

System performance

Only important for its effects on user performance / preference frame rate affects presencenet delay affects collaboration

Necessary, but not sufficient

Page 20: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 20

Case studies - Watson

How does system performance affect task performance?

Vary avg. frame rate, variability in frame rate

Measure perf. on closed-loop, open-loop task

e.g. B. Watson et al, Effects of variation in system responsiveness on user performance in virtual environments. Human Factors, 40(3), 403-414.

Page 21: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 21

11.3.3 User preference metrics

Ease of use / learning

Presence

User comfort

Usually subjective (measured in questionnaires, interviews)

Page 22: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 22

User preference in the interface

UI goalsease of useease of learningaffordancesunobtrusivenessetc.

Achieving these goals leads to usability

Crucial for effective applications

Page 23: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 23

Case studies - Slater

questionnaires

assumes that presence is required for some applications

e.g. M. Slater et al, Taking Steps: The influence of a walking metaphor on presence in virtual reality. ACM TOCHI, 2(3), 201-219.

study effect of:collision detectionphysical walkingvirtual bodyshadowsmovement

Page 24: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 24

User comfort

Simulator sickness

Aftereffects of VE exposure

Arm/hand strain

Eye strain

Page 25: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 25

Measuring user comfort

Rating scales

QuestionnairesKennedy - SSQ

Objective measuresStanney - measuring aftereffects

Page 26: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 26

11.3.2 Task performance metrics

Speed / efficiency

Accuracy

Domain-specific metricsEducation: learningTraining: spatial awarenessDesign: expressiveness

Page 27: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 27

Speed-accuracy tradeoff

Subjects will make a decision

Must explicitly look at particular points on the curve

Manage tradeoffSpeed

Acc

urac

y

Page 28: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 28

Case studies: learning

Measure effectiveness by learning vs. control group

Metric: standard test

Issue: time on task not the same for all groups

e.g. D. Bowman et al. The educational value of an information-rich virtual environment. Presence: Teleoperators and Virtual Environments, 8(3), June 1999, 317-331.

Page 29: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 29

Aspects of performance

SystemPerformance

InterfacePerformance Task

Performance

Effectiveness

Page 30: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 30

11.7 Guidelines for 3D UI evaluation

Begin with informal evaluation

Acknowledge and plan for the differences between traditional UI and 3D UI evaluation

Choose an evaluation approach that meets your requirements

Use a wide range of metrics – not just speed of task completion

Page 31: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 31

Guidelines for formal experiments

Design experiments with general applicability Generic tasks Generic performance metrics Easy mappings to applications

Use pilot studies to determine which variables should be tested in the main experiment

Look for interactions between variables – rarely will a single technique be the best in all situations

Page 32: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song

(C) 2005 Doug Bowman, Virginia Tech 32

Acknowledgments

Deborah Hix

Joseph Gabbard