Data Visualization
Brian D. Ripley
Professor of Applied StatisticsUniversity of Oxford
http://www.stats.ox.ac.uk/∼ripley
‘Data Visualization’
Data visualization is the art of looking at data in ’high’ (3 ormore) dimensions.
The lecture will major on interactive/dynamic ways to do so suchas brushing, (directed) grand tours, projection pursuit and multi-dimensional scaling, with demonstrations using R and GGobi.
‘Data Visualization’
Data visualization is the art of looking at data in ’high’ (3 ormore) dimensions.
The lecture will major on interactive/dynamic ways to do so suchas brushing, (directed) grand tours, projection pursuit and multi-dimensional scaling, with demonstrations using R and GGobi.
Several recent/forthcoming books such as
Cook, D. and Swayne, D. F. (2007?) Interactive and Dynamic Graphics forData Analysis: With Examples Using R and GGobi.
Unwin, A., Theus, M. and Hoffmann, H. (2006) Graphics of Large Datasets.Visualizing a Million. Springer.
Young, F. W., Valero-Mora, P. M. and Friendly, M. (2006) Visual Statistics:Seeing Data with Dynamic Interactive Graphics. Wiley.
Why Now?
• Because we can.
Actually it is not that recent, and Cleveland (1993) is about VisualizingData, and I heard about most of the ideas in the 1980s. But noweveryone can afford to do it.
Why Now?
• Because we can.
Actually it is not that recent, and Cleveland (1993) is about VisualizingData, and I heard about most of the ideas in the 1980s. But noweveryone can afford to do it.
• ‘Graphics for the video-game generation’.(Ross Ihaka, R wishlist ca 1998.)
Why Now?
• Because we can.
Actually it is not that recent, and Cleveland (1993) is about VisualizingData, and I heard about most of the ideas in the 1980s. But noweveryone can afford to do it.
• ‘Graphics for the video-game generation’.(Ross Ihaka, R wishlist ca 1998.)
• Data are increasingly being collected automatically, and in areas likedata mining it is often very high-dimensional.
Prime example: genomics.
Why Now?
• Because we can.
Actually it is not that recent, and Cleveland (1993) is about VisualizingData, and I heard about most of the ideas in the 1980s. But noweveryone can afford to do it.
• ‘Graphics for the video-game generation’.(Ross Ihaka, R wishlist ca 1998.)
• Data are increasingly being collected automatically, and in areas likedata mining it is often very high-dimensional.
Prime example: genomics.
• Usable software is becoming available.
Why Now?
• Because we can.
Actually it is not that recent, and Cleveland (1993) is about VisualizingData, and I heard about most of the ideas in the 1980s. But noweveryone can afford to do it.
• ‘Graphics for the video-game generation’.(Ross Ihaka, R wishlist ca 1998.)
• Data are increasingly being collected automatically, and in areas likedata mining it is often very high-dimensional.
Prime example: genomics.
• Usable software is becoming available.
As the MSc class will find out in an assessed practical tomorrow.
A Brief History
From Young et al (2006):
1600–1699 Measurement and Theory
1700–1799 New Graphics Forms and Data
1800–1899 Modern Graphics and the Golden Age
1900–1950 The Dark Ages of Statistical Graphics—The Golden Age ofMathematical Statistics
1950–1975 Rebirth of Statistical Graphics[Tukey’s Exploratory Data Analysis.]
1975–2000 Statistical Graphics comes of Age.
[Apparently developments stopped then!]
Visualizing What?
• Three or more continuous variables.
• Contingency tables (mosaic plots, correspondence analysis).
• Mixed types of variables.
• Patterns of missingness.
• Imputations.
Two of the books mentioned have chapters on missing data.
Three or More Continuous Variables
Human beings are quite proficient in seeing in 2.5 dimensions. We don’treally do this by stereoscopic vision, but more by
• Perspective.
• Shading / lighting.
• Texture.
• Motion.
The ways we have to visualize three or more continuous variables arealmost all by two- (or occasionally three-) dimensional ‘windows’ on a high-dimensional point cloud. But there are some others, e.g. via glyphs.
Three or More Continuous Variables
Human beings are quite proficient in seeing in 2.5 dimensions. We don’treally do this by stereoscopic vision, but more by
• Perspective.
• Shading / lighting.
• Texture.
• Motion.
The ways we have to visualize three or more continuous variables arealmost all by two- (or occasionally three-) dimensional ‘windows’ on a high-dimensional point cloud. But there are some others, e.g. via glyphs.
What are we looking for?
What are we looking for?
• Multivariate outliers.
• Subgroups (clusters).
• Gradations.
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
Chernoff faces plot of the state.x77 dataset, from S-PLUS.
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas
Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina
South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia
Wisconsin Wyoming
Chernoff faces plot of the state.x77 dataset, from the R TeachingDemos package(faces).
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas
Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina
South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia
Wisconsin Wyoming
Chernoff faces plot of the state.x77 dataset, from the R TeachingDemos package(faces2).
AlabamaAlaska
ArizonaArkansas
CaliforniaColorado
Connecticut
DelawareFlorida
GeorgiaHawaii
IdahoIllinois
Indiana
IowaKansas
KentuckyLouisiana
MaineMaryland
Massachusetts
MichiganMinnesota
MississippiMissouri
MontanaNebraska
Nevada
New HampshireNew Jersey
New MexicoNew York
North CarolinaNorth Dakota
Ohio
OklahomaOregon
PennsylvaniaRhode Island
South CarolinaSouth Dakota
Tennessee
TexasUtah
VermontVirginia
WashingtonWest Virginia
Wisconsin
Wyoming Frost
Life ExpHS GradIncome
Murder
Illiteracy
R version of stars plot of the state.x77 dataset.
Three Ways to Use Windows on Point Clouds
1. Projections of random/guided rotations, with motion and perhapsdepth cues. Example: exploratory projection pursuit.
2. Parallel axes, so-called parallel coordinate plots. Think of this asmultiple 1D views.
3. Multiple low-dim views.
[Categorization by Young et al (2006).]
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Parallel coordinate plot of the state.x77 dataset.
Frost Life Exp HS Grad Income Murder Illiteracy
A better parallel coordinate plot of the state.x77 dataset. I re-ordered the variablesand flipped some signs.
Interacting with Plots
Parallel coordinate plots rapidly become unusable with large numbers ofcases or variables. We can make some progress by highlighting cases orgroups of cases.
• Colour/glyph for cases/groups, transiently or permanently.
• Shadow cases
• Unshadow cases
• Identify
The first three are done by brushing, often linked to some other display.
Three Ways to Use Windows on Point Clouds
1. Projections of random/guided rotations, with motion and perhapsdepth cues. Example: exploratory projection pursuit.
2. Parallel axes, so-called parallel coordinate plots. Think of this asmultiple 1D views.
3. Multiple low-dim views.
The idea of linking as in parcoord plots can be done in other multiple viewstoo, especially dynamically (brushing) and by colour or glyph type.
Multiple Low-Dimensional Views
The classic examples are 1D (parallel coordinate plots) and scatterplotmatrices. We can link the latter by glyph, colour, dynamically.
Leptograpsus variegatus Crabs
200 crabs from Western Australia. Two colour forms, blue and orange;collected 50 of each form of each sex. Are the colour forms species?
Measurements of carapace (shell) length CL and width CW, the size of thefrontal lobe FL, rear width RW and body depth BD.
.
10 15
15 20
15
20
10
15FL
8 10 12 14
14 16 18 20
14
16
18
20
8
10
12
14
RW
15 20 25 30
30 35 40 45
30
35
40
45
15
20
25
30CL
20 30
40 50
40
50
20
30
CW
8 10 12 14
14 16 18 20
14
16
18
20
8
10
12
14BD
Blue male Blue female Orange Male Orange female
Transformations
Do not forget that your data may need transformation even for visualization.
• Univariate transformations, of the types Tukey promoted in EDA (andgo back a long way).
Also scaling to a common visual scale (as parcoords did): by range,mean/variance, median/IQR, . . . .
Transformations
Do not forget that your data may need transformation even for visualization.
• Univariate transformations, of the types Tukey promoted in EDA (andgo back a long way).
Also scaling to a common visual scale (as parcoords did): by range,mean/variance, median/IQR, . . . .
• Multivariate transformations.
– Removal of correlation: one use of principal components.(May need to do this robustly.)
– Sphering, a multivariate rescaling to common scale.Most commonly done by changing to principal components andscaling each to unit variance.
-1.5 -1.0 -0.5
-0.5 0.0 0.5
-0.5
0.0
0.5
-1.5
-1.0
-0.5Comp. 1
-0.15 -0.10 -0.05
0.00 0.05 0.10
0.00
0.05
0.10
-0.15
-0.10
-0.05
Comp. 2
-0.10 -0.05 0.00
0.00 0.05 0.10
0.00
0.05
0.10
-0.10
-0.05
0.00Comp. 3
Blue male Blue female Orange Male Orange female
First three principal components on log scale.
Grand Tours
Now our first idea, of a moving 2D window into the point cloud (technically,an orthogonal projection).
In Daniel Asimov’s grand tour one chooses a random projection (a randomrotation of the point cloud), and rotates towards it along the geodesic (thekD analogue of the great circle route).
Motion helps us in several ways:
• Front vs back.
• Outliers move at different speeds and directions from the bulk of thedata, and will (perhaps briefly) appear at the periphery of a view.
• Groups move together.
It can be helpful to add trails, as in what Young calls an orbitplot.
Guided Tours
There are far too many possible views in k � 5 dimensions to have a chanceof coming close to an ‘interesting’ view and spotting it.
We need some guidance, and that is the ‘pursuit’ of (exploratory) projectionpursuit.
Choose an index of ‘interestingness’, and optimize the views to high valuesof the index. (Lots of research ideas in the 1980s.) These are normallyapplied to sphered data, so random views look like samples from thestandard bivariate normal distribution, and ‘interesting’ means ‘non-normal’(but remember the Anna Karenina principle).
• ‘Holes’: look for views with relatively few points in the centre.Tends to find clusters.
• ‘Central mass’: look for relatively many points in the centre.Tends to find multivariate outliers.
The Anna Karenina principle
Happy families are all alike; every unhappy family is unhappyin its own way.
‘With this dramatic sentence, Leo Tolstoy begins his famous novel AnnaKarenina about the struggles of multiple, interconnected families to findhappiness.’
(a) (b)
(c) (d)
Projections of the Leptograpsus crabs data found by projection pursuit. View (a) is arandom projection. View (b) was found using the natural Hermite index, view (c) bythe Friedman–Tukey index and view (d) by Friedman’s (1987) index.
Three Ways to Use Windows on Point Clouds
1. Projections of random/guided rotations, with motion and perhapsdepth cues. Example: exploratory projection pursuit.
2. Parallel axes, so-called parallel coordinate plots. Think of this asmultiple 1D views.
3. Multiple low-dim views.
The idea of linking as in parcoord plots can be done in other multiple viewstoo, especially dynamically (brushing) and by colour or glyph type.
Note that we do not need to use projections: we could ‘squeeze’ thepoint cloud to 2D (or 1D, seriation, or 3D), known to psychologists asmultidimensional scaling.
Multidimensional Scaling
Aim is to represent distances between points well.
Suppose we have distances (dij) between all pairs of n points, or a dissim-ilarity matrix. Classical MDS plots the first k principal components, andminimizes ∑
i �=j
d2ij − d̃2
ij
where (d̃ij) are the Euclidean distances in the kD space.
Shepard and Kruskal (1962–4) proposed only to preserve the ordering ofdistances, minimizing
STRESS2 =
∑i �=j
[θ(dij) − d̃ij
]2
∑i �=j d̃2
ij
over both the configuration of points and an increasing function θ.
The optimization task is quite difficult and this can be slow.
Multidimensional scaling
-0.10
0.0
0.10
-1.5 -1.0 -0.5 0.0 0.5 1.0
Blue maleBlue female
Orange MaleOrange female
An order-preserving MDS plot of the (raw) crabs data.
-0.15
-0.10
-0.05
0.0
0.05
0.10
-0.1 0.0 0.1 0.2
Blue maleBlue female
Orange MaleOrange female
After re-scaling to (approximately) constant carapace area.
Now for GGobi demonstrations
A Forensic Example
Data on 214 fragments of glass collected at scenes of crimes. Each has ameasured refractive index and composition (weight percent of oxides of Na,Mg, Al, Si, K, Ca, Ba and Fe).
Grouped as window float glass (70), window non-float glass (76), vehiclewindow glass (17) and other (containers, tableware, headlamps) (22).
RI
Na
Mg
Al
Si
K
Ca
Ba
Fe
WinF
-4 -2 0 2 4 6 8
WinNF Veh
-4 -2 0 2 4 6 8
RI
Na
Mg
Al
Si
K
Ca
Ba
Fe
Con Tabl
-4 -2 0 2 4 6 8
Head
Strip plot by type of glass.
WinFWinNF
VehConTabl
Head
RI
-5 0 5 10 15
Na
12 14 16
Mg
0 1 2 3 4
WinFWinNF
VehConTabl
Head
Al
0.5 1.5 2.5 3.5
Si
70 71 72 73 74 75
K
0 1 2 3 4 5 6
WinFWinNF
VehConTabl
Head
Ca
6 8 10 12 14 16
Ba
0.0 1.0 2.0 3.0
Fe
0.0 0.2 0.4
Strip plot by type of analyte.
WinFWinNFVehConTablHead
Isotonic multidimensional scaling representation.