Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Visualization for
Visualization for
Inform
ation Analysis and Exploration
Inform
ation Analysis and Exploration
John Stasko
John Stasko
Inform
ation Interfaces Research Group
Inform
ation Interfaces Research Group
Sept. 17, 2008
Sept. 17, 2008
Inform
ation Interfaces Research Group
Inform
ation Interfaces Research Group
School of Interactive Computing
School of Interactive Computing
Georgia Institute of Technology
Georgia Institute of Technology
Exercise
Exercise
••Get out pencil & paper
Get out pencil & paper
2
Data Explosion
Data Explosion
••Society is more complex
Society is more complex
––There simply is more “stuff”
There simply is more “stuff”
3
••Computers, internet and web give people
Computers, internet and web give people
access to an incredible amount of data
access to an incredible amount of data
––news, sports, financial, purchases, etc...
news, sports, financial, purchases, etc...
Data Overload
Data Overload
••Confound: How to m
ake use of the data
Confound: How to m
ake use of the data
––How do we m
ake sense of the data?
How do we m
ake sense of the data?
––How do we harness this data in decision
How do we harness this data in decision--
making processes?
making processes?
––How do we avoid being overw
helmed?
How do we avoid being overw
helmed?
4
––How do we avoid being overw
helmed?
How do we avoid being overw
helmed?
The Challenge
The Challenge
••Transform
the
Transform
the data
data
into
into inform
ation
inform
ation
(understanding, insight) thus making it
(understanding, insight) thus making it
useful to people
useful to people
5
Premise of my W
ork
Premise of my W
ork
••Visualization of data helps people
Visualization of data helps people
understand it better
understand it better 6
Human Vision
Human Vision
••Highest bandwidth sense
Highest bandwidth sense
––~100 MB/s
~100 MB/s
––Parallel
Parallel
––Strong pattern recognition
Strong pattern recognition
7
––Much done preattentively, ie, without
Much done preattentively, ie, without
thought
thought
Visualization
Visualization
••Definition
Definition
––“The use of computer
“The use of computer--supported, interactive
supported, interactive
visual representations of data to amplify
visual representations of data to amplify
cognition.”
cognition.”
••From [Card, Mackinlay Shneiderm
an ‘98]
From [Card, Mackinlay Shneiderm
an ‘98]
8
••From [Card, Mackinlay Shneiderm
an ‘98]
From [Card, Mackinlay Shneiderm
an ‘98]
Visualization
Visualization
••Often thought of as process of creating a
Often thought of as process of creating a
graphic or an image
graphic or an image
••Really is a cognitive process
Really is a cognitive process
––Form
a m
ental image of something
Form
a m
ental image of something
––Internalize an understanding
Internalize an understanding
9
––Internalize an understanding
Internalize an understanding
••“The purpose of visualization is insight,
“The purpose of visualization is insight,
not pictures”
not pictures”
––Insight: discovery, decision m
aking,
Insight: discovery, decision m
aking,
explanation, analysis, exploration, learning
explanation, analysis, exploration, learning
Main Idea
Main Idea
••Visuals help us think
Visuals help us think
––Provide a frame of reference, a temporary
Provide a frame of reference, a temporary
storage area
storage area
••Cognition
Cognition →→
Perception
Perception
10
••Pattern m
atching
Pattern m
atching
••External cognition aid
External cognition aid
––Role of external world in thinking and reason
Role of external world in thinking and reason
Larkin & Simon ’87
Larkin & Simon ’87
Card, Mackinlay, Shneiderm
an ‘98
Card, Mackinlay, Shneiderm
an ‘98
When to Apply?
When to Apply?
••Many other techniques for data analysis
Many other techniques for data analysis
––Data m
ining, DB queries, m
achine learning…
Data m
ining, DB queries, m
achine learning…
••Visualization m
ost useful in
Visualization m
ost useful in exploratory
exploratory
11
••Visualization m
ost useful in
Visualization m
ost useful in exploratory
exploratory
data analysis
data analysis
––Don’t know what you’re looking for
Don’t know what you’re looking for
––Don’t have a priori questions
Don’t have a priori questions
––Want to know what questions to ask
Want to know what questions to ask
Part of our Culture
Part of our Culture
••“I see what you’re saying”
“I see what you’re saying”
••“Seeing is believing”
“Seeing is believing”
••“A picture is worth a thousand words”
“A picture is worth a thousand words”
12
Some quick (static) examples…
Some quick (static) examples…
13
NYC W
eather
NYC W
eather
2220 numbers
14
E. Tufte, Visual Display of Quant Info
London Subway
London Subway
www.thetube.com
15
True Geography
True Geography
www.kottke.org/plus/misc/images/tubegeo.gif
16
Easy W
alking Lines Added
Easy W
alking Lines Added
rodcorp.typepad.com/photos/art_2003/tube_walklines_final_lmfaint.html
17
Atlanta Flight Traffic
Atlanta Flight Traffic
18
Atlanta Journal
April 30, 2000
InfoVis ‘07
InfoVis ‘07
19
Reinforce m
y point with two examples
Reinforce m
y point with two examples
20
Which cereal has the m
ost/least potassium?
Which cereal has the m
ost/least potassium?
Is there a relationship between potassium and fiber?
Is there a relationship between potassium and fiber?
If so, are there any outliers?
If so, are there any outliers?
Which m
anufacturer makes the healthiest cereals?
Which m
anufacturer makes the healthiest cereals?
Questions:
Questions:
21
22
Potassium
Potassium
Fiber
Even Tougher?
Even Tougher?
••What if you could only see one cereal’s
What if you could only see one cereal’s
data at a time?
data at a time? (e.g. some websites)
(e.g. some websites)
••What if I read the data to you?
What if I read the data to you?
23
Four Data Sets
Four Data Sets
••Mean of the x values = 9.0
Mean of the x values = 9.0
••Mean of the y values = 7.5
Mean of the y values = 7.5
••Equation of the least
Equation of the least--squared regression line is: y = 3 + 0.5x
squared regression line is: y = 3 + 0.5x
••Sums of squared errors (about the m
ean) = 110.0
Sums of squared errors (about the m
ean) = 110.0
••Regression sums of squared errors (variance accounted for by x)
Regression sums of squared errors (variance accounted for by x)
= 27.5
= 27.5
24
= 27.5
= 27.5
••Residual sums of squared errors (about the regression line)
Residual sums of squared errors (about the regression line)
= 13.75
= 13.75
••Correlation coefficient = 0.82
Correlation coefficient = 0.82
••Coefficient of determ
ination = 0.67
Coefficient of determ
ination = 0.67
http://astro.swarthmore.edu/astro121/anscombe.htm
lhttp://astro.swarthmore.edu/astro121/anscombe.htm
l
The Data Sets
The Data Sets
25
The Values
The Values
1 2 3 4
10.0, 8.04 10.0,9.14 10.0, 7.46 8.0, 6.58
8.0, 6.95 8.0,8.14 8.0, 6.77 8.0, 5.76
13.0, 7.58 13.0,8.74 13.0,12.74 8.0, 7.71
9.0, 8.81 9.0,8.77 9.0, 7.11 8.0, 8.84
11.0, 8.33 11.0,9.26 11.0, 7.81 8.0, 8.47
26
14.0, 9.96 14.0,8.10 14.0, 8.84 8.0, 7.04
6.0, 7.24 6.0,6.13 6.0, 6.08 8.0, 5.25
4.0, 4.26 4.0,3.10 4.0, 5.39 19.0,12.50
12.0,10.84 12.0,9.13 12.0, 8.15 8.0, 5.56
7.0, 4.82 7.0,7.26 7.0, 6.42 8.0, 7.91
5.0, 5.68 5.0,4.74 5.0, 5.73 8.0, 6.89
Revisit Starting Exercise
Revisit Starting Exercise
••What did you put on paper?
What did you put on paper?
27
Two Related Disciplines
Two Related Disciplines
••Inform
ation Visualization
Inform
ation Visualization
••Visual Analytics
Visual Analytics
28
Inform
ation Visualization
Inform
ation Visualization
••Using interactive computer visualizations
Using interactive computer visualizations
to represent and communicate abstract
to represent and communicate abstract
data
data
––Statistics, databases, software, …
Statistics, databases, software, …
29
••Area emerged approximately 1990
Area emerged approximately 1990
Inform
ation Visualization
Inform
ation Visualization
••Recent research trends
Recent research trends
––InfoVis for the Masses
InfoVis for the Masses
––Challenges of evaluation
Challenges of evaluation
––Interaction is crucial
Interaction is crucial 30
Visual Analytics
Visual Analytics
••Inform
al: Using visual representations to
Inform
al: Using visual representations to
help m
ake decisions
help m
ake decisions
••Form
al: The science of analytical
Form
al: The science of analytical
reasoning facilitated by interactive visual
reasoning facilitated by interactive visual
interfaces
interfaces
31
interfaces
interfaces
••InfoVis++
InfoVis++
••Area emerged approximately 2005
Area emerged approximately 2005
Overview of the R&D Agenda
•Challenges
•Science of Analytical
Reasoning
•Science of Visual
Representations
and Interactions
32
•Data Representations
and Transformations
•Production, Presentation, and
Dissemination
•Moving Research Into
Practice
•Positioning for an Enduring
Success
Visual Analytics: Beyond InfoVis
Visual Analytics: Beyond InfoVis
••Statistics, data representation and statistical graphics
Statistics, data representation and statistical graphics
••Geospatial and Temporal Sciences
Geospatial and Temporal Sciences
••Applied Mathematics
Applied Mathematics
••Knowledge representation, management and discovery
Knowledge representation, management and discovery
••Ontology, semantics, NLP, extraction, synthesis, …
Ontology, semantics, NLP, extraction, synthesis, …
33
••Ontology, semantics, NLP, extraction, synthesis, …
Ontology, semantics, NLP, extraction, synthesis, …
••Cognitive and Perceptual Sciences
Cognitive and Perceptual Sciences
••Comunications: Capture, Illustrate and present a m
essage
Comunications: Capture, Illustrate and present a m
essage
••Decision sciences
Decision sciences
Academic Context
Academic Context
Visual
Analytics
~2005
34
Inform
ation
Visualization
~1990
IEEE InfoVis
IEEE InfoVis
35
IEEE VAST
VAST
Sensemaking
Sensemaking
“A m
otivated , continuous effort to understand
“A m
otivated , continuous effort to understand
connections (w
hich can be among people,
connections (w
hich can be among people,
places, and events) in order to anticipate
places, and events) in order to anticipate
their trajectories and act effectively.”
their trajectories and act effectively.”
36
––Klein, Moon and Hoffman
Klein, Moon and Hoffman
Jigsaw
Jigsaw
••Visualization for Investigative Analysis
Visualization for Investigative Analysis
across Document Collections
across Document Collections
37
The Jigsaw Team
The Jigsaw Team
Carsten Görg
Carsten Görg
Zhicheng Liu
Zhicheng Liu
Vasili Pantazopoulos
Vasili Pantazopoulos
+ 4 new students
+ 4 new students
38
Gennadiy Stepanov
Gennadiy Stepanov
Sarah W
illiams
Sarah W
illiams
Neel Parekh
Neel Parekh
Kanupriyah Singhal
Kanupriyah Singhal
39
Pirolli & Card, ICIA ‘05
Pirolli & Card, ICIA ‘05
Pain Points
Pain Points
••Cost structure of scanning and selecting
Cost structure of scanning and selecting
items for further attention
items for further attention
••Analysts’ span of attention for evidence
Analysts’ span of attention for evidence
and hypotheses
and hypotheses
40
Problem Addressed
Problem Addressed
••Help investigative analysts discover
Help investigative analysts discover
plans, plots and threats embedded across
plans, plots and threats embedded across
the individual documents in large
the individual documents in large
document collections
document collections
41
Documents/
Documents/
case reports
case reports
Blogs
Blogs
DBs
DBs
Example Document
Example Document
42
Our Focus
Our Focus
••Entities within the documents
Entities within the documents
––Person, place, organization, phone number,
Person, place, organization, phone number,
date, license plate, etc.
date, license plate, etc.
••Thesis: A plot/threat within the
Thesis: A plot/threat within the
documents will involve a set of entities in
documents will involve a set of entities in
43
documents will involve a set of entities in
documents will involve a set of entities in
coordination
coordination
Entity Identification
Entity Identification
••Must identify and extract entities from
Must identify and extract entities from
plain text documents
plain text documents
––Crucial for our work
Crucial for our work
••Not our main research focus
Not our main research focus ––
44
••Not our main research focus
Not our main research focus ––
Collaborate with or use tools from others
Collaborate with or use tools from others
Entities Identified
Entities Identified
45
Connections
Connections
••Entities relate/connect to each other to
Entities relate/connect to each other to
make a larger “story”
make a larger “story”
••Connection definition:
Connection definition:
––Two entities are connected if they appear in
Two entities are connected if they appear in
46
––Two entities are connected if they appear in
Two entities are connected if they appear in
a document together
a document together
––The m
ore documents they appear in
The m
ore documents they appear in
together, the stronger the connection
together, the stronger the connection
Jigsaw
Jigsaw
••Multiple visualizations (views) of
Multiple visualizations (views) of
documents, entities, & their connections
documents, entities, & their connections
••Views are highly interactive and
Views are highly interactive and
coordinated
coordinated
“Putting the pieces together”
“Putting the pieces together”
47
••User actions generate events
User actions generate events
that are transm
itted to and
that are transm
itted to and
(possibly) reflected in other
(possibly) reflected in other
views
views
System Views
System Views
48
The Need for Pixels
The Need for Pixels
49
Demo
Demo
50
Demo
Demo
Console
Console
51
Document View
Document View
52
List View
List View
53
Graph View
Graph View
54
Scatterplot View
Scatterplot View
55
Calendar View
Calendar View
56
Report Cluster View
Report Cluster View
57
Timeline View
Timeline View
58
Shoebox
Shoebox
59
Trial Use
Trial Use
••Transitioning system to real clients
Transitioning system to real clients
60
Future W
ork
Future W
ork
••Entity Identification
Entity Identification
••Evaluation
Evaluation
••Collaborative version
Collaborative version
••Themes/concepts
Themes/concepts
••Enhanced evidence
Enhanced evidence
••Wikipedia & Intellipedia
Wikipedia & Intellipedia
••Geospatial View
Geospatial View
••Connectivity search
Connectivity search
••Reliability/uncertainty
Reliability/uncertainty
••Other types of data
Other types of data
61
••Enhanced evidence
Enhanced evidence
marshalling
marshalling
••Present/browse
Present/browse
investigation history
investigation history
••Scalability issues
Scalability issues
••Other types of data
Other types of data
••Web search & situational
Web search & situational
awareness
awareness
••Display wall?
Display wall?
••Deployment
Deployment
Take Away Point
Take Away Point
••Design your visualization systems and
Design your visualization systems and
tools to facilitate analysis and exploration
tools to facilitate analysis and exploration
––Not to just illustrate and reconfirm
existing
Not to just illustrate and reconfirm
existing
knowledge
knowledge
62
••Including flexible, useful interaction is
Including flexible, useful interaction is
one of the best ways to do this
one of the best ways to do this
To Learn More
To Learn More
••http://w
ww.gvu.gatech.edu/ii
http://w
ww.gvu.gatech.edu/ii
63
Acknowledgment
Acknowledgment
••Some slides in this presentation
Some slides in this presentation
borrowed from overviews of visual
borrowed from overviews of visual
analytics by Jim Thomas, NVAC Director
analytics by Jim Thomas, NVAC Director
64
Acknowledgments
Acknowledgments
••Work conducted as part of the
Work conducted as part of the
Southeastern Regional Visualization and
Southeastern Regional Visualization and
Analytics Center, supported by DHS and
Analytics Center, supported by DHS and
NVAC
NVAC
65
••Supported by NSF IIS
Supported by NSF IIS--0414667
0414667
End
End
••Thanks for your attention!
Thanks for your attention!
••Questions?
Questions?
66