Download pdf - Data Visualization - Oxford Statisticsripley/DataVisualization/DataVis.pdf · 2007-02-01 · ‘Data Visualization’ Data visualization is the art of looking at data in ’high’

Data Visualization

Brian D. Ripley

Professor of Applied StatisticsUniversity of Oxford

[email protected]

http://www.stats.ox.ac.uk/∼ripley

‘Data Visualization’

Data visualization is the art of looking at data in ’high’ (3 ormore) dimensions.

The lecture will major on interactive/dynamic ways to do so suchas brushing, (directed) grand tours, projection pursuit and multi-dimensional scaling, with demonstrations using R and GGobi.

‘Data Visualization’

Data visualization is the art of looking at data in ’high’ (3 ormore) dimensions.

The lecture will major on interactive/dynamic ways to do so suchas brushing, (directed) grand tours, projection pursuit and multi-dimensional scaling, with demonstrations using R and GGobi.

Several recent/forthcoming books such as

Cook, D. and Swayne, D. F. (2007?) Interactive and Dynamic Graphics forData Analysis: With Examples Using R and GGobi.

Unwin, A., Theus, M. and Hoffmann, H. (2006) Graphics of Large Datasets.Visualizing a Million. Springer.

Young, F. W., Valero-Mora, P. M. and Friendly, M. (2006) Visual Statistics:Seeing Data with Dynamic Interactive Graphics. Wiley.

Why Now?

• Because we can.

Actually it is not that recent, and Cleveland (1993) is about VisualizingData, and I heard about most of the ideas in the 1980s. But noweveryone can afford to do it.

Why Now?

• Because we can.


• ‘Graphics for the video-game generation’.(Ross Ihaka, R wishlist ca 1998.)

Why Now?

• Because we can.



• Data are increasingly being collected automatically, and in areas likedata mining it is often very high-dimensional.

Prime example: genomics.

Why Now?

• Because we can.





• Usable software is becoming available.

Why Now?

• Because we can.





• Usable software is becoming available.

As the MSc class will find out in an assessed practical tomorrow.

A Brief History

From Young et al (2006):

1600–1699 Measurement and Theory

1700–1799 New Graphics Forms and Data

1800–1899 Modern Graphics and the Golden Age

1900–1950 The Dark Ages of Statistical Graphics—The Golden Age ofMathematical Statistics

1950–1975 Rebirth of Statistical Graphics[Tukey’s Exploratory Data Analysis.]

1975–2000 Statistical Graphics comes of Age.

[Apparently developments stopped then!]

Visualizing What?

• Three or more continuous variables.

• Contingency tables (mosaic plots, correspondence analysis).

• Mixed types of variables.

• Patterns of missingness.

• Imputations.

Two of the books mentioned have chapters on missing data.

Three or More Continuous Variables

Human beings are quite proficient in seeing in 2.5 dimensions. We don’treally do this by stereoscopic vision, but more by

• Perspective.

• Shading / lighting.

• Texture.

• Motion.

The ways we have to visualize three or more continuous variables arealmost all by two- (or occasionally three-) dimensional ‘windows’ on a high-dimensional point cloud. But there are some others, e.g. via glyphs.

Three or More Continuous Variables

Human beings are quite proficient in seeing in 2.5 dimensions. We don’treally do this by stereoscopic vision, but more by

• Perspective.

• Shading / lighting.

• Texture.

• Motion.

The ways we have to visualize three or more continuous variables arealmost all by two- (or occasionally three-) dimensional ‘windows’ on a high-dimensional point cloud. But there are some others, e.g. via glyphs.

What are we looking for?

What are we looking for?

• Multivariate outliers.

• Subgroups (clusters).

• Gradations.

Alabama

Alaska

Arizona

Arkansas

California

Colorado

Connecticut

Delaware

Florida

Georgia

Hawaii

Idaho

Illinois

Indiana

Iowa

Kansas

Kentucky

Louisiana

Maine

Maryland

Massachusetts

Michigan

Minnesota

Mississippi

Missouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

Vermont

Virginia

Washington

West Virginia

Wisconsin

Wyoming

Chernoff faces plot of the state.x77 dataset, from S-PLUS.

Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware

Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas

Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi

Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York

North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina

South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia

Wisconsin Wyoming

Chernoff faces plot of the state.x77 dataset, from the R TeachingDemos package(faces).

Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware

Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas

Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi

Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York

North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina

South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia

Wisconsin Wyoming

Chernoff faces plot of the state.x77 dataset, from the R TeachingDemos package(faces2).

AlabamaAlaska

ArizonaArkansas

CaliforniaColorado

Connecticut

DelawareFlorida

GeorgiaHawaii

IdahoIllinois

Indiana

IowaKansas

KentuckyLouisiana

MaineMaryland

Massachusetts

MichiganMinnesota

MississippiMissouri

MontanaNebraska

Nevada

New HampshireNew Jersey

New MexicoNew York

North CarolinaNorth Dakota

Ohio

OklahomaOregon

PennsylvaniaRhode Island

South CarolinaSouth Dakota

Tennessee

TexasUtah

VermontVirginia

WashingtonWest Virginia

Wisconsin

Wyoming Frost

Life ExpHS GradIncome

Murder

Illiteracy

R version of stars plot of the state.x77 dataset.

Three Ways to Use Windows on Point Clouds

1. Projections of random/guided rotations, with motion and perhapsdepth cues. Example: exploratory projection pursuit.

2. Parallel axes, so-called parallel coordinate plots. Think of this asmultiple 1D views.

3. Multiple low-dim views.

[Categorization by Young et al (2006).]

Population Income Illiteracy Life Exp Murder HS Grad Frost Area

Parallel coordinate plot of the state.x77 dataset.

Frost Life Exp HS Grad Income Murder Illiteracy

A better parallel coordinate plot of the state.x77 dataset. I re-ordered the variablesand flipped some signs.

Interacting with Plots

Parallel coordinate plots rapidly become unusable with large numbers ofcases or variables. We can make some progress by highlighting cases orgroups of cases.

• Colour/glyph for cases/groups, transiently or permanently.

• Shadow cases

• Unshadow cases

• Identify

The first three are done by brushing, often linked to some other display.





The idea of linking as in parcoord plots can be done in other multiple viewstoo, especially dynamically (brushing) and by colour or glyph type.

Multiple Low-Dimensional Views

The classic examples are 1D (parallel coordinate plots) and scatterplotmatrices. We can link the latter by glyph, colour, dynamically.

Leptograpsus variegatus Crabs

200 crabs from Western Australia. Two colour forms, blue and orange;collected 50 of each form of each sex. Are the colour forms species?

Measurements of carapace (shell) length CL and width CW, the size of thefrontal lobe FL, rear width RW and body depth BD.

.

10 15

15 20

15

20

10

15FL

8 10 12 14

14 16 18 20

14

16

18

20

8

10

12

14

RW

15 20 25 30

30 35 40 45

30

35

40

45

15

20

25

30CL

20 30

40 50

40

50

20

30

CW

8 10 12 14

14 16 18 20

14

16

18

20

8

10

12

14BD

Blue male Blue female Orange Male Orange female

Transformations

Do not forget that your data may need transformation even for visualization.

• Univariate transformations, of the types Tukey promoted in EDA (andgo back a long way).

Also scaling to a common visual scale (as parcoords did): by range,mean/variance, median/IQR, . . . .

Transformations

Do not forget that your data may need transformation even for visualization.

• Univariate transformations, of the types Tukey promoted in EDA (andgo back a long way).

Also scaling to a common visual scale (as parcoords did): by range,mean/variance, median/IQR, . . . .

• Multivariate transformations.

– Removal of correlation: one use of principal components.(May need to do this robustly.)

– Sphering, a multivariate rescaling to common scale.Most commonly done by changing to principal components andscaling each to unit variance.

-1.5 -1.0 -0.5

-0.5 0.0 0.5

-0.5

0.0

0.5

-1.5

-1.0

-0.5Comp. 1

-0.15 -0.10 -0.05

0.00 0.05 0.10

0.00

0.05

0.10

-0.15

-0.10

-0.05

Comp. 2

-0.10 -0.05 0.00

0.00 0.05 0.10

0.00

0.05

0.10

-0.10

-0.05

0.00Comp. 3

Blue male Blue female Orange Male Orange female

First three principal components on log scale.

Grand Tours

Now our first idea, of a moving 2D window into the point cloud (technically,an orthogonal projection).

In Daniel Asimov’s grand tour one chooses a random projection (a randomrotation of the point cloud), and rotates towards it along the geodesic (thekD analogue of the great circle route).

Motion helps us in several ways:

• Front vs back.

• Outliers move at different speeds and directions from the bulk of thedata, and will (perhaps briefly) appear at the periphery of a view.

• Groups move together.

It can be helpful to add trails, as in what Young calls an orbitplot.

Guided Tours

There are far too many possible views in k � 5 dimensions to have a chanceof coming close to an ‘interesting’ view and spotting it.

We need some guidance, and that is the ‘pursuit’ of (exploratory) projectionpursuit.

Choose an index of ‘interestingness’, and optimize the views to high valuesof the index. (Lots of research ideas in the 1980s.) These are normallyapplied to sphered data, so random views look like samples from thestandard bivariate normal distribution, and ‘interesting’ means ‘non-normal’(but remember the Anna Karenina principle).

• ‘Holes’: look for views with relatively few points in the centre.Tends to find clusters.

• ‘Central mass’: look for relatively many points in the centre.Tends to find multivariate outliers.

The Anna Karenina principle

Happy families are all alike; every unhappy family is unhappyin its own way.

‘With this dramatic sentence, Leo Tolstoy begins his famous novel AnnaKarenina about the struggles of multiple, interconnected families to findhappiness.’

(a) (b)

(c) (d)

Projections of the Leptograpsus crabs data found by projection pursuit. View (a) is arandom projection. View (b) was found using the natural Hermite index, view (c) bythe Friedman–Tukey index and view (d) by Friedman’s (1987) index.





The idea of linking as in parcoord plots can be done in other multiple viewstoo, especially dynamically (brushing) and by colour or glyph type.

Note that we do not need to use projections: we could ‘squeeze’ thepoint cloud to 2D (or 1D, seriation, or 3D), known to psychologists asmultidimensional scaling.

Multidimensional Scaling

Aim is to represent distances between points well.

Suppose we have distances (dij) between all pairs of n points, or a dissim-ilarity matrix. Classical MDS plots the first k principal components, andminimizes ∑

i �=j

d2ij − d̃2

ij

where (d̃ij) are the Euclidean distances in the kD space.

Shepard and Kruskal (1962–4) proposed only to preserve the ordering ofdistances, minimizing

STRESS2 =

∑i �=j

[θ(dij) − d̃ij

]2

∑i �=j d̃2

ij

over both the configuration of points and an increasing function θ.

The optimization task is quite difficult and this can be slow.

Multidimensional scaling

-0.10

0.0

0.10

-1.5 -1.0 -0.5 0.0 0.5 1.0

Blue maleBlue female

Orange MaleOrange female

An order-preserving MDS plot of the (raw) crabs data.

-0.15

-0.10

-0.05

0.0

0.05

0.10

-0.1 0.0 0.1 0.2

Blue maleBlue female

Orange MaleOrange female

After re-scaling to (approximately) constant carapace area.

Now for GGobi demonstrations

A Forensic Example

Data on 214 fragments of glass collected at scenes of crimes. Each has ameasured refractive index and composition (weight percent of oxides of Na,Mg, Al, Si, K, Ca, Ba and Fe).

Grouped as window float glass (70), window non-float glass (76), vehiclewindow glass (17) and other (containers, tableware, headlamps) (22).

RI

Na

Mg

Al

Si

K

Ca

Ba

Fe

WinF

-4 -2 0 2 4 6 8

WinNF Veh

-4 -2 0 2 4 6 8

RI

Na

Mg

Al

Si

K

Ca

Ba

Fe

Con Tabl

-4 -2 0 2 4 6 8

Head

Strip plot by type of glass.

WinFWinNF

VehConTabl

Head

RI

-5 0 5 10 15

Na

12 14 16

Mg

0 1 2 3 4

WinFWinNF

VehConTabl

Head

Al

0.5 1.5 2.5 3.5

Si

70 71 72 73 74 75

K

0 1 2 3 4 5 6

WinFWinNF

VehConTabl

Head

Ca

6 8 10 12 14 16

Ba

0.0 1.0 2.0 3.0

Fe

0.0 0.2 0.4

Strip plot by type of analyte.

WinFWinNFVehConTablHead

Isotonic multidimensional scaling representation.