20
1 CS448B :: 23 Sep 2009 Data and Image Models Jeffrey Heer Stanford University Last Time: Value of Visualization Three functions of visualizations Record: store information Ph h bl Photographs, blueprints, … Analyze: support reasoning about information Process and calculate Reason about data Feedback and interaction Communicate: convey information to others Share and persuade Collaborate and revise Emphasize important aspects of data Other recording instruments Marey’s sphygmograph [from Braun 83]

Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

1

CS448B :: 23 Sep 2009

Data and Image Models

Jeffrey Heer Stanford University

Last Time:Value of Visualization

Three functions of visualizations

Record: store informationPh h blPhotographs, blueprints, …

Analyze: support reasoning about informationProcess and calculateReason about dataFeedback and interaction

Communicate: convey information to othersShare and persuadeCollaborate and reviseEmphasize important aspects of data

Other recording instruments

Marey’s sphygmograph [from Braun 83]

Page 2: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

2

Make a decision: Challenger

Visualizations drawn by Tufte show how low temperatures damage O-rings [Tufte 97]

“to affect thro’ the Eyes h f l

1856 “Coxcomb” of Crimean War Deaths, Florence Nightingale

what we fail to convey to the public through their word-proof ears”

Info-Vis vs. Sci-Vis? Visualization Reference Model

Raw DataData

TablesVisual

StructuresViews

Data Visual Form

Data Transformations

Visual Encodings

View Transformations

Task

Page 3: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

3

Data and Image Models

The Big Picturetask

dataphysical type

int, float, etc.abstract type

nominal, ordinal, etc.

processingalgorithms

mappingvisual encoding

imagevisual channelretinal variables

domainmetadatasemantics conceptual model

visual metaphor

Topics

Properties of data or informationProperties of the imageMapping data to images

Data

Page 4: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

4

Data models vs. Conceptual models

Data models are low level descriptions of the dataM h S h hMath: Sets with operations on themExample: integers with + and × operators

Conceptual models are mental constructionsInclude semantics and support reasoning

Examples (data vs. conceptual)(1D floats) vs. Temperature(3D vector of floats) vs. Space

Taxonomy (?)

1D (sets and sequences)Temporal2D (maps)3D (shapes)nD (relational)Trees (hierarchies)Trees (hierarchies)Networks (graphs)

Are there others?The eyes have it: A task by data type taxonomy for information

visualization [Shneiderman 96]

Types of variables

Physical typesCh t i d b t f tCharacterized by storage formatCharacterized by machine operationsExample: bool, short, int32, float, double, string, …

Abstract typesProvide descriptions of the dataProvide descriptions of the dataMay be characterized by methods/attributesMay be organized into a hierarchyExample: plants, animals, metazoans, …

Nominal, Ordinal and QuantitativeN - Nominal (labels)

Fruits: Apples, oranges, …

O - OrderedQuality of meat: Grade A, AA, AAA

Q - Interval (Location of zero arbitrary)Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)Like a geometric point. Cannot compare directlyOnly differences (i.e. intervals) may be comparedy ( ) y p

Q - Ratio (zero fixed)Physical measurement: Length, Mass, Temp, …Counts and amountsLike a geometric vector, origin is meaningful

S. S. Stevens, On the theory of scales of measurements, 1946

Page 5: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

5

Nominal, Ordinal and Quantitative

N - Nominal (labels)Operations: = ≠Operations: =, ≠

O - OrderedOperations: =, ≠, <, >

Q - Interval (Location of zero arbitrary)Operations: =, ≠, <, >, -

Can measure distances or spans

Q - Ratio (zero fixed)Operations: =, ≠, <, >, -, ÷Can measure ratios or proportions

S. S. Stevens, On the theory of scales of measurements, 1946

From data model to N,O,Q data type

Data model32.5, 54.0, -17.3, …floats

Conceptual modelTemperature (°C)

Data typeBurned vs. Not burned (N)Hot, warm, cold (O)Continuous range of values (Q)

Sepal and petal lengths and widths for three species of iris [Fisher 1936].

Q

O

N

Page 6: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

6

Relational data modelRecords are fixed-length tuplesE h l ( tt ib t ) f t l h d i (t )Each column (attribute) of tuple has a domain (type)Relation is schema and a table of tuplesDatabase is a collection of relations

Relational Algebra [Codd]

Data transformations (sql)Selection (select)Projection (where)Sorting (order by)Aggregation (group by, sum, min, …)Set operations (union, …)Set operations (union, …)Join (inner join)

Statistical data modelVariables or measurementsCategories or factors or dimensionsCategories or factors or dimensionsObservations or cases

Statistical data modelVariables or measurementsCategories or factors or dimensionsCategories or factors or dimensionsObservations or cases

Month Control Placebo 300 mg 450 mgMarch 165 163 166 168April 162 159 161 163May 164 158 161 153June 162 161 158 160July 166 158 160 148August 163 158 157 150

Blood Pressure Study (4 treatments, 6 months)

Page 7: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

7

Dimensions and Measures

Independent vs. dependent variablesE l f( )Example: y = f(x,a)Dimensions: Domain(x) × Domain(a)Measures: Range(y)

Dimensions and Measures

Dimensions: Discrete variables describing dataDates, categories of values (independent vars)

Measures: Data values that can be aggregatedNumbers to be analyzed (dependent vars)Aggregate as sum count average std deviationAggregate as sum, count, average, std. deviation

Example: U.S. Census Data

People: # of people in groupYear: 1850 – 2000 (every decade)

Age: 0 – 90+Sex: Male, FemaleMarital Status: Single, Married, Divorced, …

Example: U.S. Census

PeopleYearAgeSexMarital Status

2348 data points

Page 8: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

8

Census: Dimension or Measure?

People Count MeasureYearAgeSex (M/F)Marital Status

DimensionDepends!DimensionDimension

Roll-Up and Drill-Down

Want to examine marital status in each decade?Roll-up the data along the desired dimensions

SELECT year, marst, sum(people)FROM census

Dimensions Measure

FROM censusGROUP BY year, marst;

Dimensions

Roll-Up and Drill-Down

Need more detailed information?Drill-down into additional dimensions

SELECT year, age, marst, sum(people)FROM censusFROM censusGROUP BY year, age, marst;

Age

19701980

19902000

40-59

60+

All Marital Status

A

Marital Status

Sing

le

Mar

ried

Div

orce

d

Wid

owed

0-19

20-39

All Ages

Sum along Marital Status

Sum along Age

All Ages

All YearsSum along Year

Page 9: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

9

Age

19701980

19902000

40-59

60+

All Marital Status

Roll-Up

Drill-Down

A

Marital Status

Sing

le

Mar

ried

Div

orce

d

Wid

owed

0-19

20-39

All Ages

Sum along Marital Status

Sum along Age

All Ages

All YearsSum along Year

Row vs. Column-Oriented Databases

Relational Data Organizations

Transactions vs. AnalysisRow-oriented Column-oriented

Relational Data Organizations

Row-oriented Column-oriented

Page 10: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

10

Relational Data Organizations

Speed-up Analysis Column-orientedReduce data transferImproved localityBetter data compression

Administrivia

Announcements

AuditorsR C l d ( l ll)Requirements: Come to class and participate (online as well)

Class participation requirementsComplete readings before classIn-class discussionPost at least 1 discussion substantive comment/question on wiki within a week of each lecture

Class wiki: http://cs448b.stanford.edu

Assignment 1: Visualization Design

Design a static visualization for a given data set.

Deliverables (post to the course wiki)

Image of your visualizationShort description and design rationale (≤ 4 para.)

Due by 7:00am on Monday 9/28.

Page 11: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

11

Image

Visual language is a sign system

Images perceived as a set of signsg p gSender encodes information in signsReceiver decodes information from signs

Sémiologie Graphique, 1967

Jacques Bertin

Bertin’s Semiology of Graphics

1. A, B, C are distinguishable , , g2. B is between A and C. 3. BC is twice as long as AB.

∴ Encode quantitative variablesA

B

C

"Resemblance, order and proportion are the three signifieds in graphics.” - Bertin

Page 12: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

12

Position (x 2)Si

Visual encoding variables

SizeValueTextureColorOrientationShapeShape

PositionL th

Visual encoding variables

LengthAreaVolumeValueTextureColorColorOrientationShapeTransparencyBlur / Focus …

Information in color and value

Value is perceived as orderedE d d l bl (O)∴Encode ordinal variables (O)

∴ Encode continuous variables (Q) [not as well]

Hue is normally perceived as unordered∴ Encode nominal variables (N) using color

Page 13: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

13

Bertin’s “Levels of Organization”

Nominal O d d

N O QPosition

OrderedQuantitativeN O Q

N O Q

N O

Size

Value

Texture

Note: Q < O < N

N

N

N

Color

Orientation

Shape

Note: Bertin actually breaks visual variables down into differentiating (≠) and associating (≡)

Design Space of Visual Encodings

Univariate data A B C1

factors

variable

A B C DA B C D

Univariate data A B C1

factors

variable

7

5

3

1Mean

low highMiddle 50%

Tukey box plot

A B C D E

0 20

A B C D

Page 14: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

14

Bivariate data A B C12

C

A

B

C

D

E

F

Scatter plot is common

Trivariate dataA B C

123

3D scatter plot is possible

A

B

C

D

E

F

A

B

C

D

E

F

D

G

Three variables

Two variables [x,y] can map to pointsS lScatterplots, maps, …

Third variable [z] must useColor, size, shape, …

Large design space (visual metaphors)

[Bertin, Graphics and Graphic Info. Processing, 1981]

Page 15: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

15

Multidimensional data

A B C1

How many variables can be depicted in an image?

2345678

Multidimensional data

A B C1

How many variables can be depicted in an image?

“With up to three rows, a data table can be constructed directly as a single image … However,

2345y g g ,

an image has only three dimensions. And this barrier is impassible.”

Bertin

678

Deconstructions

Playfair 1786

Page 16: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

16

Playfair 1786

x-axis: year (Q)y-axis: currency (Q)color: imports/exports (N, O)

Wattenberg 1998

http://www.smartmoney.com/marketmap/

Wattenberg 1998

rectangle size: market cap (Q)rectangle position: market sector (N), market cap (Q)color hue: loss vs. gain (N, O)color value: magnitude of loss or gain (Q)

Minard 1869: Napoleon’s march

Page 17: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

17

Single axis composition

+

=[based on slide from Mackinlay]

i t t (Q)

Mark composition

y-axis: temperature (Q)

x-axis: longitude (Q) / time (O)+

=

[based on slide from Mackinlay]

temp over space/time (Q x Q)

y-axis: longitude (Q)

Mark composition

x-axis: latitude (Q)

width: army size (Q)

++=

[based on slide from Mackinlay]

army position (Q x Q) and army size (Q)

longitude (Q)

latitude (Q)

army size (Q)

temperature (Q)

latitude (Q) / time (O)

[based on slide from Mackinlay]

Page 18: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

18

Minard 1869: Napoleon’s march

Depicts at least 5 quantitative variables. Any others?

Automated designAutomated designJock Mackinlay’s APT 86

Combinatorics of Encodings

Challenge: Pick the best encoding from the exponential number of possibilities (n+1)8

Principle of Consistency: The properties of the image (visual variables) should match the properties of the data.match the properties of the data.

Principle of Importance Ordering: Encode the most important information in the most effective way.

Design Criteria (Mackinlay)

ExpressivenessA f f bl l l f hA set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.

Page 19: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

19

Cannot express the facts

A one-to-many (1 → N) relation cannot be expressed in i l h i t l d t l t b lti l t la single horizontal dot plot because multiple tuples are

mapped to the same position

Expresses facts not in the dataA length is interpreted as a quantitative value;

L th f b thi t b t N d t∴ Length of bar says something untrue about N data

[Mackinlay, APT, 1986]

Design Criteria (Mackinlay)

ExpressivenessA f f bl l l f hA set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.

EffectivenessA visualization is more effective than another visualization if theA visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization.

(Effectiveness subject of the Graphical Perception lecture)

Mackinlay’s Ranking

Conjectured effectiveness of the encoding

Page 20: Sep 2009 Data and Image Models - Computer graphicsgraphics.stanford.edu/courses/cs148-10-summer/docs/... · 3 Data and Image Models The Big Picture task data physical type int, float,

20

Mackinlay’s Design Algorithm

User formally specifies data model and typeAdd l d d l f d bl hAdditional input: ordered list of data variables to show

APT searches over design spaceTests expressiveness of each visual encodingGenerates image for encodings that pass testT t t l ff ti f lti iTests perceptual effectiveness of resulting image

Outputs the “most effective” visualization

Limitations

Does not cover many visualization techniquesB d h d k dBertin and others discuss networks, maps, diagramsDoes not consider 3D, animation, illustration, photography, …

Does not model interaction

Summary

Formal specification D d lData modelImage modelEncodings mapping data to image

Choose expressive and effective encodingsF l t t f iFormal test of expressivenessExperimental tests of perceptual effectiveness

Assignment 1: Visualization Design

Design a static visualization for a given data set.

Deliverables (post to the course wiki)

Image of your visualizationShort description and design rationale (≤ 4 para.)

Due by 7:00am on Monday 9/28.