Upload
roland-evans
View
212
Download
0
Embed Size (px)
Citation preview
Evaluation and metrics: Measuring the effectiveness of
virtual environments
Doug Bowman
Edited by C. Song
(C) 2005 Doug Bowman, Virginia Tech 2
11.2.2 Types of evaluation
Cognitive walkthrough
Heuristic evaluation
Formative evaluation Observational user studies Questionnaires, interviews
Summative evaluation Task-based usability evaluation Formal experimentation
Sequentialevaluation
Testbedevaluation
(C) 2005 Doug Bowman, Virginia Tech 3
11.5 Classifying evaluation techniques
Ÿ Form al Sum m ativeEvaluation
Ÿ Post-hoc Q uestionnaire
Ÿ (generic perform ancem odels for VEs (e.g., fitt'slaw))
Ÿ In form al Sum m ativeEvaluation
Ÿ Post-hoc Q uestionnaire
Ÿ H euris tic Evaluation
Ÿ Form ative EvaluationŸ Form al Sum m ative
EvaluationŸ Post-hoc Q uestionnaire
Ÿ Form ative Evaluation(in form al and form al)
Ÿ Post-hoc Q uestionnaireŸ In terview / D em o
Ÿ (application-specificperform ance m odels forVEs (e.g., G O M S))
Ÿ H euris tic EvaluationŸ C ognitive W alk through
Generic
Quantitative
Qualitative
Requires Users Does Not Require Users
{Quantitative
Qualitative
U s e r I n v o l v e m e n t
C o
n t
e x
t
o f
E
v a
l u
a t
i o
n
T y
p e
o f R
e s
u l t s
ApplicationSpecific{
Generic
Qualitative
Quantitative
Application-specific
Qualitative
Quantitative
(C) 2005 Doug Bowman, Virginia Tech 4
11.4 How VE evaluation is different
Physical issuesUser can’t see world in HMDThink-aloud and speech incompatible
Evaluator issuesEvaluator can break presenceMultiple evaluators usually needed
(C) 2005 Doug Bowman, Virginia Tech 5
11.4 How VE evaluation is different (cont.)
User issuesVery few expert usersEvaluations must include rest breaks to
avoid possible sickness
Evaluation type issuesLack of heuristics/guidelinesChoosing independent variables is difficult
(C) 2005 Doug Bowman, Virginia Tech 6
11.4 How VE evaluation is different (cont.)
Miscellaneous issuesEvaluations must focus on lower-level
entities (ITs) because of lack of standardsResults difficult to generalize because of
differences in VE systems
(C) 2005 Doug Bowman, Virginia Tech 7
11.6.1 Testbed evaluation framework
Main independent variables: ITs
Other considerations (independent variables) task (e.g. target known vs. target unknown) environment (e.g. number of obstacles) system (e.g. use of collision detection) user (e.g. VE experience)
Performance metrics (dependent variables) Speed, accuracy, user comfort, spatial awareness…
Generic evaluation context
(C) 2005 Doug Bowman, Virginia Tech 8
Testbed evaluation
User-centered Application8
Heuristics&
Guidelines
7
QuantitativePerform ance
Results
6
T e s t b e dE v a l u a t i o n
5
2Taxonom y
Outside Factorstask, users, evnironm ent,
system
3 4 Perform anceM etrics
Initial Evaluation1
(C) 2005 Doug Bowman, Virginia Tech 9
Taxonomy
Establish a taxonomy of interaction technique for the interaction task being evaluate.
Example : Task: Changing the object’s color 3 sub tasks :
selecting object Choosing a color Applying color
2 possible technique components (TC) for choosing a color Changing the values of R, G and B sliders Touching a point within a 3D color space
(C) 2005 Doug Bowman, Virginia Tech 10
Outside Factors
A user’s performance on an interaction task may depend on a variety of factors.
4 categories Task
Distance to be traveled, size of object to be manipulated Environment
The number of obstacles, the level of activity or motion User
Spatial awareness, physical attributes (arm length, etc) System
Lighting model, the mean frame rate etc.
(C) 2005 Doug Bowman, Virginia Tech 11
Performance Metrics
Information about human performance
Speed, Accuracy : quantitative
More subjective performance valuesEase of use, ease of learning, and user
comfortThe user’s sense and body, user-centric
performance measure
(C) 2005 Doug Bowman, Virginia Tech 12
Testbed Evaluation
Final stages in the evaluation of Interaction techniques for 3D Interaction tasks
Generic, generalizable, and reusable evaluation through the creations of test-beds.
Test-beds : Environments and tasks Involve all important aspects of a task Evaluate each component of a technique Consider outside influences on performance Have multiple performance measures
(C) 2005 Doug Bowman, Virginia Tech 13
Application and Generalization of Results Testbed evaluation produces models that characterize the
usability of an interaction technique for the specified task. Usability is given in terms of multiple performance metrics w.r.t
various lelvels of outside factors. -> performance Database(DB) More information is added to the DB each time a new technique is
run through the testbed.
To choose interaction techniques for applications appropriately, one must understand the interaction requirements of the application The performance results from testbed evaluation can be used to
recommend interaction techniques that meet those requirements.
(C) 2005 Doug Bowman, Virginia Tech 14
11.6.2 Sequential evaluation
Traditional usability engineering methods
Iterative design/eval.
Relies on scenarios, guidelines
Application-centric
User-centered Application
(D )R epresentative
U serT ask
Scenarios
(C )S tream lined
U ser In terfaceD esigns
(1)User TaskAnalysis
(3)Formative
User-CenteredEvaluation
(4)Summative
ComparativeEvaluation
(2)Heuristic
Evaluation
(A )T ask
D escriptionsSequences &D ependencies
(E )Iterative ly R efined
U ser In terfaceD esigns
(B)G uidelines
andH euris tics
(C) 2005 Doug Bowman, Virginia Tech 15
11.3 When is a VE effective?
Users’ goals are realized
User tasks done better, easier, or faster
Users are not frustrated
Users are not uncomfortable
(C) 2005 Doug Bowman, Virginia Tech 16
11.3 How can we measure effectiveness?
System performance
Interface performance / User preference
User (task) performance
All are interrelated
(C) 2005 Doug Bowman, Virginia Tech 17
Effectiveness case studies
Watson experiment: how system performance affects task performance
Slater experiments: how presence is affected
Design education: task effectiveness
(C) 2005 Doug Bowman, Virginia Tech 18
11.3.1 System performance metrics
Avg. frame rate (fps)
Avg. latency / lag (msec)
Variability in frame rate / lag
Network delay
Distortion
(C) 2005 Doug Bowman, Virginia Tech 19
System performance
Only important for its effects on user performance / preference frame rate affects presencenet delay affects collaboration
Necessary, but not sufficient
(C) 2005 Doug Bowman, Virginia Tech 20
Case studies - Watson
How does system performance affect task performance?
Vary avg. frame rate, variability in frame rate
Measure perf. on closed-loop, open-loop task
e.g. B. Watson et al, Effects of variation in system responsiveness on user performance in virtual environments. Human Factors, 40(3), 403-414.
(C) 2005 Doug Bowman, Virginia Tech 21
11.3.3 User preference metrics
Ease of use / learning
Presence
User comfort
Usually subjective (measured in questionnaires, interviews)
(C) 2005 Doug Bowman, Virginia Tech 22
User preference in the interface
UI goalsease of useease of learningaffordancesunobtrusivenessetc.
Achieving these goals leads to usability
Crucial for effective applications
(C) 2005 Doug Bowman, Virginia Tech 23
Case studies - Slater
questionnaires
assumes that presence is required for some applications
e.g. M. Slater et al, Taking Steps: The influence of a walking metaphor on presence in virtual reality. ACM TOCHI, 2(3), 201-219.
study effect of:collision detectionphysical walkingvirtual bodyshadowsmovement
(C) 2005 Doug Bowman, Virginia Tech 24
User comfort
Simulator sickness
Aftereffects of VE exposure
Arm/hand strain
Eye strain
(C) 2005 Doug Bowman, Virginia Tech 25
Measuring user comfort
Rating scales
QuestionnairesKennedy - SSQ
Objective measuresStanney - measuring aftereffects
(C) 2005 Doug Bowman, Virginia Tech 26
11.3.2 Task performance metrics
Speed / efficiency
Accuracy
Domain-specific metricsEducation: learningTraining: spatial awarenessDesign: expressiveness
(C) 2005 Doug Bowman, Virginia Tech 27
Speed-accuracy tradeoff
Subjects will make a decision
Must explicitly look at particular points on the curve
Manage tradeoffSpeed
Acc
urac
y
(C) 2005 Doug Bowman, Virginia Tech 28
Case studies: learning
Measure effectiveness by learning vs. control group
Metric: standard test
Issue: time on task not the same for all groups
e.g. D. Bowman et al. The educational value of an information-rich virtual environment. Presence: Teleoperators and Virtual Environments, 8(3), June 1999, 317-331.
(C) 2005 Doug Bowman, Virginia Tech 29
Aspects of performance
SystemPerformance
InterfacePerformance Task
Performance
Effectiveness
(C) 2005 Doug Bowman, Virginia Tech 30
11.7 Guidelines for 3D UI evaluation
Begin with informal evaluation
Acknowledge and plan for the differences between traditional UI and 3D UI evaluation
Choose an evaluation approach that meets your requirements
Use a wide range of metrics – not just speed of task completion
(C) 2005 Doug Bowman, Virginia Tech 31
Guidelines for formal experiments
Design experiments with general applicability Generic tasks Generic performance metrics Easy mappings to applications
Use pilot studies to determine which variables should be tested in the main experiment
Look for interactions between variables – rarely will a single technique be the best in all situations
(C) 2005 Doug Bowman, Virginia Tech 32
Acknowledgments
Deborah Hix
Joseph Gabbard