Upload
muriel-marshall
View
217
Download
0
Embed Size (px)
Citation preview
Just Another Pretty Visualization?
The What, Where, When, Why and How of Evaluating Visualizations
Jim FoleyCollege of ComputingGeorgia Institute of TechnologyAtlanta, Georgia USA
Talk originally presented at
Opening ConvocationUNC-Charlotte Visualization CenterMay 2006
Visualization
• Interactive visual representation of data to provide a user with information and insights
• Two subfields Scientific visualization Information visualization
4
Scientific Visualization
• Data typically from physics/chemistry/biology
• Data an inherent geometry• Examples
Air flow, molecules, weather
5
Information Visualization
• Data typically abstract Census, financial, business, …
• Must create geometricalrepresentation
Examples taken from Information Visualization
Methodologies that are appropriate for both SciVis and InfoVis
Evaluation of Visualizations
Roadmap
• The challenges of evaluation• Subjective evaluation• Cognition + perception - based
evaluation• Experimental evaluation• Long-term Evaluation• Conclusions
The Challenges of Evaluation
• The challenges of evaluation• Subjective evaluation• Evaluation based on model of
cognition & perception• Experimental evaluation• Long-term Evaluation• Conclusions
Evaluation Criteria
• Look pretty? Be effective?• Extract numbers?
Accuracy or speed or avolid errors or …
• The purpose of computing is insight, not numbers - Richard Hamming How do we measure insights? Are some insights better than others?
• Detecting the Expected – Discovering the Unexpected™ - Jim Thomas If we don’t know what is expected to be there,
how do we know if we have found it?
Challe
nge
1
The Problem of Good, Better, Best
• There is no “proof of optimality”• Even if we find all info that we seek –
could be faster with another viz• Maybe another insight was there?• Settle for “Good” or “Better” rather
than “Best”
Challe
nge
2
Effectiveness Depends on …
• The data• The user
Domain knowledge Tool knowledge Right brain - left brain
Challe
nge
3
The Tool vs. the Visualization
• Usability ≠ Usefulness• If viz hard to create, user may not
use Maybe the viz is perfect for the task.
• Focus is NOT on usability of the viz tool Assume it is easy to learn and use=> Apply ALL usability tools/methods!
Challe
nge
4
Compare Best Apple to Best Orange
• Compare two visualizations => Each must be good as
possible• Great implementation of a low-
effectiveness visualization may perform better than a poor implementation of a high-effectiveness visualization
Challe
nge
5
The Default Trap
• Users tend to stick with default settings and visualizations, even though others would be better Example: Kobsa, An empirical
evaluation of three commercial information visualization systems, InfoVis 2001
Seen in other domains as well
Challe
nge
6
Purpose of Evaluation: Insight, not Numbers
• Good to know that A is better than B
• But MUCH better to know WHY A is better than B
=> Ground experiments in theory
=> Get inside users’ heads
Challe
nge
7
Subjective Evaluation
• The challenges of evaluation• Subjective evaluation• Evaluation based on model of
cognition & perception• Experimental evaluation• Long-term Evaluation• Conclusions
Subjective Evaluation
• Really subjective - “I like it so must be GREAT”
• Less subjective - “All my friends like it so it must be REALLY GREAT”
• Tufte’s “minimize chart junk” etc• Shneiderman’s “Overview first, zoom
and filter, details on demand” • But
Need more goal-oriented criteria Need HCI/InfoVis experts, not our
friends
Potential Criteria
• Amar and Stasko, A Knowledge Task-Based Framework for Design and Evaluation of Information Visualizations, Best Paper, InfoVis 2004.
• Rationale-based Tasks Expose uncertainty Concretize relationships Formulate cause and effect
• Worldview-based Tasks Determination of domain parameters Multivariate explanation Confirm hypotheses
Pros/Cons (+/-)
+ Fast+ Have rationale basis- Still based on subjective (but
informed) judgments
Cognition & Perception - Based
• The challenges of evaluation• Subjective evaluation• Evaluation based on model of
cognition & perception• Experimental evaluation• Long-term Evaluation• Conclusions
Evaluation based on model of cognition & perception
• Lohse, A Cognitive Model for the Perception and Understanding of Graphics, CHI 1991
• In the spirit of GOMS and ACT• Given one of three query types
Read: Did Ford’s market share exceed 18% in 1983
Compare: In 1983, did Honda have more USA market share than Nissan?”
Trend: From 1979 to 1982, did GM’s market share increase?
Lohse’s Model Accounts for
• Eye movements (saccades)• Word recognition• Retrieve into working memory• Compare two units in working
memory• Etc…
“Semantic Traces” on three content-equivalent visualizations, to answer the question “In 1983, did Nissan have more USA market share than Toyota?”
WorkingMemory
Model explained 58% of variation in verification
experiment
Pros/Cons (+/-)
- Focus is facts (read, compare, trend), not insights
+ But facts can lead to insights+ Useful for comparing designs+ Faster than experiments!- Assumes scanning
sequence can bedetermined
CS 7450 26
Application
http://www.perceptualedge.com/example2.php
Experimental Evaluation
• The challenges of evaluation• Subjective evaluation• Evaluation based on model of
cognition & perception• Experimental evaluation• Long-term Evaluation• Conclusions
Experimental Comparison of Tree Visualization Systems
• Kobsa, User Experiments with Tree Visualization Systems, InfoViz 2004
• Compares 5 Tree Visualization Systems + Windows file browser
• Quantitative - measure results• Qualitative - to (partially) understand
results
Treemap 3.2Tree Viewer
Hyperbolic BrowserBeamTrees
SequoiaView
Experimental Design
• One data set - subset of eBay item taxonomy 5799 nodes
• Six systems 5 viz + windows file browser
• Between subjects 48 S’s, each uses one system => 8 S’s per system
• Each subject performs 15 tasks 9 tasks concern directory structure
Maximum depth of hierarchy?Tree balanced or unbalanced?
6 tasks concern file or directory attributesFind file with name= xxxxFind name of largest file
Results - Correct Answers
Treemap (TM
)SequoiaView
(SV)Beam
Trees (BT)H
yperbolic Browser (H
B)
Tree Viewer (TV)
Window
s File Browser (W
FB)
BT << TM, SV, HB, WFB (1%)
TV << TM, HB, WFB (1%)
TV < SV (5%)
Results - Effect of Task Type
BeamTrees - Structure
BeamTrees - Attributes
Tree Viewer - Structure
Tree Viewer - Attributes
Tree ViewerSS at 0.0001BeamTrees
SS at 0.001
Results - Task Times
• Range for all tasks 101 seconds (Windows File Browser) and
106 (Treemap) up to 188 (BeamTrees)
• Structure-related tasks ~165 (SequoiaView and Windows
Browser) up to 265 (Tree Viewer)
• Attribute-related tasks ~80 (Treemap and Windows) up to 155
(BeamTrees)
Typical Qualitative Results
• Treemaps Color coding and filtering heavily used Needs “Detail on Demand”
• Hyperbolic Browser Cartesian distance ≠ distance from root
Tree depth not what HB intended for!!
• Lots of usability problems• From analyzing videos of users
Pros/Cons
+ Nicely designed experiment- Very different tool functionalities Detecting the Expected –
Discovering the UnexpectedTM
+ Good job measuring “Detecting the Expected”
- Does not try to measure “Discovering the Unexpected” (ie, Insights)
Experimental Evaluation for INSIGHT
• Saraiya, North and Davis, An Evaluation of Microarray Visualization Tools for Biological Insight, InfoVis 2004
• Purpose of visualizations is insight => Count insights• Domain: Visualization of gene
expression arrays• Compare: five software tools
TimeSearcher Hierarchical Clustering Explorer
Spotfire® GeneSpring®C
lust
er /
Tre
evie
w,
aka
Clu
ster
view
(not
sho
wn)The Tools
Experimental Design
• Five visualization tools• Three data sets
Time series: viral infection - 1060 genes x 5 times Comparison: 861 genes x 3 related viral infections Comparison: 170 genes x (48 with lupus + 42 healthy)
• Three classes of users Domain expert (10) Domain novice (11) Software developers (9)
• One user, two tools => 30 trials Mixed design – within/between subject
Tool Capabilities
Overview + Detail
Dynamic Queries
Measure INSIGHTS
• Users instructed to list some questions they might want to ask, and to then use tool until could learn nothing more
• For each insight, record / assess The actual insight Time to reach insight Importance of the insight Correctness of the insight Sought for vs serendipitous insight Etc
Results (Statistically Sig.)
• Insight Value Spotfire® (66) > GeneSpring® (40)
• User’s perception of how much learned Spotfire® > HierClusExplr, Clusterview
• Time to first insight Clusterview (4.6) < all others (7 to 16) :-) GeneSpring® (16) > all others (4.6 to 14) s
:-(
Other Results & Insights
• User background did not have a s.s. effect• Poor UI design in some cases
Discouraged/prevented users from using most effective visualization
Slowed down use of an otherwise powerful tool – too many clicks
• Clustering is a poor default view - biases users toward certain points of view
• Tool - dataset interaction (in some cases)
Pros/Cons (+/-)
+ Very important methodology!Gets at users’ real objectives!!!!
+ Detecting the Expected – Discovering the UnexpectedTM
+ Measures “Detecting the Expected”+ Measures “Discovering the Unexpected” (ie,
Insights)- Wide variation in tool capabilities
TimeSearcher did really well with the time series data - but not with other types of data
- Not enough users to get many s.s. results- Only 10 domain experts- Short-term use – just a few hours- Users not 100% motivated
Implications
• Study real users doing real work with “seeded” data Known set of predetermined insights
Also called “ground truth”
• Can approximate this with run-off competitions using “seeded” data
Long-term Evaluation
• The challenges of evaluation• Subjective evaluation• Evaluation based on model of
cognition & perception• Experimental evaluation• Long-term Evaluation• Conclusions
Long-term Evaluation
• Hard to know what really works for exploratory data analysis
• Real people doing real work Real people need results for
professional success Highly motivated users!!
Long-term Studies
• Perer & Shneiderman, Integrating Statistics and Visualization: Case Studies of Gaining Clarity during Exploratory Data Analysis, CHI ’08
• Domain: Social network analysis – political influence, etc
• Problem: Network visualizations by themselves not so useful Need statistics about the networks
• SocialAction InfoVis system Combines visualization and statistics
• Question: Does SocialAction improve researcher’s
capabilities?For another such study, see Saraiya, North, et al, An Insight-Based Longitudinal Study of Visual Analytics, IEEE TVCG 2006
SocialAction
Methodology – for each user
• Interview (1 hour) Screen participants Understand their goals
• Training (2 hours) With generic data
• Early use (2-4 weeks) With user’s data Weekly visits Requested (and feasible) software enhancements Provide help
• Mature use (2-4 weeks) No software improvements Weekly visits Provide help
• Exit Interview (1 hour) How did system impacted research? How well were goals met?
One User: A Political Analyst
Study US senate power structures, based on voting blocs
Political Analyst
• Analyst had tried 3 other network analysis tools, without gaining desired insights NetDraw, ManyEyes, and KrackPlot
• Found interesting patterns using the capability to rank all nodes, visualize outcome and then filter out the unimportant The betweenness centrality statistic helped
find “centers of gravity” Found geographic alliances.
Findings
• With all four users Integration of network statistics with
visualizations improved ability to find insights
• User feedback helped improve SocialAction
Pros/Cons (+/-)
+ Detecting the Expected – Discovering the UnexpectedTM
+ Measures “Detecting the Expected”+ Measures “Discovering the Unexpected” (ie,
Insights) – by users’ successes
- Does not compare two tools- Focus on how well a tool helps real users do
real work – very authentic, not lab study
- Not enough users to get s.s. results+ Users 110% motivated
Conclusions
• Apply UI & graphic design skills• Methods not mutually exclusive• Use as part of development cycle• Understand why users do what they do• Be sure you know what you’re
measuring• The purpose of evaluations is insight,
not numbers!
Thank You
Questions and
(maybe) Answers