Upload
others
View
5
Download
2
Embed Size (px)
Citation preview
ETC3250/5250: Visualisationof multivariate data (part 2)Semester 1, 2020
Professor Di Cook
Econometrics and Business Statistics Monash University
Week 5 (b)
Parallel coordinate plots
Parallel coordinate plots show the data using parallel axes
Scatterplots use orthogonal axes, and are thus limited to twovariables on the page. Turning the axes parallel allows for many more variables to bedisplayed together. Lines connecting the points show associations between variables.
2 / 19
Comparison with tours
Compare the tour of the �ea data, with three clusters:
tars1
tars2
headaede1aede2aede3
1 , 00 , 10 , 00 , 00 , 00 , 0
~indx: 11
11 20 29 38 47 56 65 74 83 92 101 110 119 128 137 146 155
Play
3 / 19
Comparison with tours
With the parallel coordinate plot:
4 / 19
How to read parallel coordinate plots
A set of points in -dimensional space map to a set of lines in parallel axes The pattern among and between the lines indicate structure in high-dimensions
Groups of lines trending together indicate clustering Single lines trending differently to other indicate outliersBetween pairs of axes intersecting lines indicate strong negativeassociation, and parallel lines indicate strong positive association
Points in Euclidean space lines in parallel coordinates
p p
≡
5 / 19
Parallel coordinate plots - controls
Details that need to be controlled:
Order of axes can affect perception of structure. Placing axes nextto each other emphasizes that association Variables need to be on a similar scale, and may need to bestandardised
6 / 19
Parallel coordinate plots- controls
Unordered, default standardisedscale.
7 / 19
Parallel coordinate plots- controls
Ordered, default standardised scale.
8 / 19
Parallel coordinate plots- controls
Ordered, centered standardisedscale.
9 / 19
Parallel coordinate plots- controls
Ordered, scaled univariately to 0-1.
10 / 19
Parallel coordinate plots- controls
Ordered, scaled globally to 0-1.
11 / 19
Andrews curves
For the variables make a fourier transform
Preceded the tour, but like a tour laid out on a page, butalgorithm is not space-�lling.
p (x1, . . . ,xp)fx(t) = x1/√2+x2 sin(t)+x3 cos(t)+x4 sin(2t)+x5 cos(2t)+. . .
(d = 1)
12 / 19
Categorical variables
Hammock plots are a variation of parallel coordinate plots forcategorical variables. They show the �ow of groups between stackedbarcharts, providing information about the association betweenmultiple categorical variables.
13 / 19
Categorical variables
Mosaic plots partition the axes sequentially on the categoricalvariables.
For this data, titanic, there is a response variable, "Survived" andgood practice would have this variable mapped to �ll colour, becausewe are interested in the proportion change in response across thepredictor categories.
14 / 19
Categorical variables
Mosaic plots can handle multiple categorical variables.
15 / 19
Data in the model space
(Chapter3/3.9.pdf)
Model in the data space
16 / 19
Inference
Its hard to read residual plots. How do you know if the residual plot"really" has no structure?
17 / 19
Boostrapping
Bootstrapped residuals. If there is still structure in the residual plot, the "true" residualplot should be identi�able from the plots of bootstrapped residuals.
18 / 19
� Made by a human with a computerSlides at https://iml.numbat.space.
Code and data at https://github.com/numbats/iml.
Reading: Wickham, Cook, Hofmann (2015) Visualizing statisticalmodels: Removing the blindfold, Statistical Analysis and Data Mining,8(4):203-225.
Created using R Markdown with �air by xaringan, andkunoichi (female ninja) style.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
19 / 19