28
Conceptual and practical challenges in InfoViz evaluations Kasper Hornbæk

BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Embed Size (px)

DESCRIPTION

Keynote: Kasper HornbaekBELIV 2010 Workshophttp://www.beliv.org/beliv2010/

Citation preview

Page 1: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Conceptual and practical challenges in InfoViz evaluations

Kasper Hornbæk

Page 2: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Concerns in evaluation

• Increase in InfoViz studies that include evaluation

• But concerns over those studies remain, see– The BELIV workshops– Reviews of InfoViz papers (e.g., Munzner 2008)

• Concerns are shared by research in – Usability evaluation (e.g., Gray & Salzman,

1998)– CHI (e.g., Greenberg & Buxton 2008, PC

meeting)

Page 3: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Conceptual challenges

• Observations about (some) InfoViz studies:– Win-lose evaluations– Point designs– Lack of clarity in key constructs– Lack of theory-driven experiments

Page 4: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Example: The concept of overview

• An overview– “is constructed from, and represents, a

collection of objects of interest” (Greene et al., 2000, p. 381)

– “implies a qualitative awareness of one aspect of some data, preferably acquired rapidly and, even better, pre-attentively: that is, without cognitive effort” (Spence, 2007, p.19)

Hornbæk & Hertzum (under review)

Page 5: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

The concept of overview, cont’

• 1391 occurrences of overview in 60 studies analyzed

• Used in three senses:– 81% technical (overview+detail

visualization)– 14% user-centered (user forming an

overview)– 5% other ( “give an overview of the

literature”)

• Does an overview give an overview?Hornbæk & Hertzum (under review)

Page 6: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Example: Fisheye interfaces

• DOI function composed of a priori interest and distance components

• Mixed empirical results (e.g., Cockburn et al. 2008)– some optimistic (e.g., papers at this

CHI), – some pessimistic (e.g., Lam & Munzner

2008)

Page 7: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Jakobsen & Hornbæk, “Evaluating a fisheye view of source code”, CHI (2006)

Page 8: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Context view

Focus area

Jakobsen & Hornbæk, “Fisheyes in the field …”, CHI (2009)

Page 9: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Fisheye interfaces, cont’

• Is a priori interest useful?• Our data suggests maybe

– Hard to understand a priori components of the DOI function

– Appears less useful in real-life programming than lines linked directly to the users’ focus

• But no one (?!) has generated direct evidence about this question

Jakobsen & Hornbæk (under review)

Page 10: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Strong inference

• Platt (1964) proposed the notion of strong inference, with the steps of– Devising alternative hypotheses– Devising a crucial experiment which will

rule out some hypotheses– Carry out the experiment

Page 11: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Radical solutions

• Newman (1994) reviewed CHI publications and found 25% radical solutions

Page 12: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Practical challenges

Page 13: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Simple outcome measures

Hornbæk, K. (2006), “Current Practice in Measuring Usability …”, IJHCS, 64 (2), 79-102

Binary task completion

Spatial accuracy

Error rates

Comprehension/learning

Recall

Completeness

Expert grading of product

0% 5% 10% 15% 20% 25% 30%

Page 14: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Simple process measures

• 57% of studies measure task completion time

• Task completion time work as a summary measure of the process

• How should we interpret task completion time?

Hornbæk, K. (2006), “Current Practice in Measuring Usability …”, IJHCS, 64 (2), 79-102

Page 15: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Example: Comparing text viz.

Hornbæk & Frøkjær, TOCHI, 2003

Page 16: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations
Page 17: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations
Page 18: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Longer and more frequent exploration

Overview+Detail (N=59) Linear (N=40)0

0.5

1

1.5

2

2.5

3

Time for further exploration (min.)

Page 19: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Standing on the shoulders of giants

• A key challenge is to build on the work of others in selecting:– Data– Tasks– Measures– Interfaces

• Crucial for replications, accumulating knowledge, validity/reliability

Page 20: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

The system is …

I feel …

1 2 3 4 5

Page 21: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

accessible, adequate, annoying, anxiety, appealing, boring, clear, cluttered, comfortable, competent, comprehensible, conclusive, confident, conflict, confusing, connected, convenient, desirable, difficult, dislikable, dissatisfied, distracting, easy, effective, efficient, embarrassed, emotional, engaging, enjoyable, entertaining, enthusiasm, excellent, exciting, familiar, favorable, flexible, flustered, friendly, frustrating, fun, good, hate, helpfulness, immediate, important, improving, inefficient, intelligent, interested, intuitive, involved, irritation, learnable, likable, lively, loved, motivating, natural, nice, personal, plain, pleasant, preference, presence, productive, quality, quick, relevant, reliable, respect, responsive, satisfied, sensate, sense of being together, sense of control, sense of success, simple, smooth, sociable, social presence, stimulating, successful, sufficient, surprising, time consuming, timely, tiring, trust, uncomfortable, understand, useful, user friendly, vexed, vivid, warm, well-organized

Page 22: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Reliability of questionnaires

• Homegrown questionnaires have low reliability, 6 are below .7

Questionnaire N Cronbach’s α

Mean Range

Standard questionnaires (e.g., TLX, QUIS, CSUQ)

16 .814 .73 - .95

Homegrown 20 .736 .21 - .92

Table 1: Reliability of satisfaction questionnaires as indicated by Cronbach’s α [6]. Homegrown refers to questionnaires that authors themselves developed to capture ease-of-use.

Hornbæk & Law, CHI (2007)

Page 23: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Example: Selecting tasks

• Tasks are crucial (thanks Catherine)• But chosen ad hoc, to match

evaluation, or habitually• Which task would you use to

evaluate an overview?– Monitoring rare, navigation frequent

(Hornbæk & Hertzum, under review)

Page 24: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

What to do?

• More strong, theoretically motivated comparisons (TILCS?)

• More complex measures of outcome and process, coupled with richer data

• Build on existing work, replications• Task taxonomies, do task-level

analysis, do field work• Studies of adoption and integration

Page 25: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations
Page 26: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Construct validity

From Adcock, R. & Collier, D. (2001), Measurement Validity: A shared standard …, American Political Science Review, 95(3), 529-546

Page 27: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Results: are measures correlated?

• Effectiveness vs. efficiency: r = .247 ± .059

• May be interpreted as:– 6% variance explained– small (~.1) to medium (~.3) effect

(Cohen 1969)

0

0,1

0,2

0,3

0,4

0,5

0 1 2 3 4 5 6

Time

Err

ors

r = .229

0

0,5

1

1,5

2

2,5

0 2 4 6 8 10 12

Time

Err

ors

r = .23

Page 28: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations

Results: are measures correlated?

• Task complexity does not influence the correlations

• More complex measures (e.g., quality of outcome) attenuate correlations

• Difference between errors-along-the-way (.441) and task-completion-errors (.155)