20
ETC3250/5250: Visualisation of multivariate data (part 2) Semester 1, 2020 Professor Di Cook Econometrics and Business Statistics Monash University Week 5 (b)

of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

  • Upload
    others

  • View
    5

  • Download
    2

Embed Size (px)

Citation preview

Page 1: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

ETC3250/5250: Visualisationof multivariate data (part 2)Semester 1, 2020

Professor Di Cook

Econometrics and Business Statistics Monash University

Week 5 (b)

Page 2: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots

Parallel coordinate plots show the data using parallel axes

Scatterplots use orthogonal axes, and are thus limited to twovariables on the page. Turning the axes parallel allows for many more variables to bedisplayed together. Lines connecting the points show associations between variables.

2 / 19

Page 3: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Comparison with tours

Compare the tour of the �ea data, with three clusters:

tars1

tars2

headaede1aede2aede3

1 , 00 , 10 , 00 , 00 , 00 , 0

~indx: 11

11 20 29 38 47 56 65 74 83 92 101 110 119 128 137 146 155

Play

3 / 19

Page 4: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Comparison with tours

With the parallel coordinate plot:

4 / 19

Page 5: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

How to read parallel coordinate plots

A set of points in -dimensional space map to a set of lines in parallel axes The pattern among and between the lines indicate structure in high-dimensions

Groups of lines trending together indicate clustering Single lines trending differently to other indicate outliersBetween pairs of axes intersecting lines indicate strong negativeassociation, and parallel lines indicate strong positive association

Points in Euclidean space lines in parallel coordinates

p p

5 / 19

Page 6: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots - controls

Details that need to be controlled:

Order of axes can affect perception of structure. Placing axes nextto each other emphasizes that association Variables need to be on a similar scale, and may need to bestandardised

6 / 19

Page 7: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots- controls

Unordered, default standardisedscale.

7 / 19

Page 8: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots- controls

Ordered, default standardised scale.

8 / 19

Page 9: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots- controls

Ordered, centered standardisedscale.

9 / 19

Page 10: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots- controls

Ordered, scaled univariately to 0-1.

10 / 19

Page 11: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Parallel coordinate plots- controls

Ordered, scaled globally to 0-1.

11 / 19

Page 12: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Andrews curves

For the variables make a fourier transform

Preceded the tour, but like a tour laid out on a page, butalgorithm is not space-�lling.

p (x1, . . . ,xp)fx(t) = x1/√2+x2 sin(t)+x3 cos(t)+x4 sin(2t)+x5 cos(2t)+. . .

(d = 1)

12 / 19

Page 13: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Categorical variables

Hammock plots are a variation of parallel coordinate plots forcategorical variables. They show the �ow of groups between stackedbarcharts, providing information about the association betweenmultiple categorical variables.

13 / 19

Page 14: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Categorical variables

Mosaic plots partition the axes sequentially on the categoricalvariables.

For this data, titanic, there is a response variable, "Survived" andgood practice would have this variable mapped to �ll colour, becausewe are interested in the proportion change in response across thepredictor categories.

14 / 19

Page 15: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Categorical variables

Mosaic plots can handle multiple categorical variables.

15 / 19

Page 16: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Data in the model space

(Chapter3/3.9.pdf)

Model in the data space

16 / 19

Page 17: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Inference

Its hard to read residual plots. How do you know if the residual plot"really" has no structure?

17 / 19

Page 18: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

Boostrapping

Bootstrapped residuals. If there is still structure in the residual plot, the "true" residualplot should be identi�able from the plots of bootstrapped residuals.

18 / 19

Page 19: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,

� Made by a human with a computerSlides at https://iml.numbat.space.

Code and data at https://github.com/numbats/iml.

Reading: Wickham, Cook, Hofmann (2015) Visualizing statisticalmodels: Removing the blindfold, Statistical Analysis and Data Mining,8(4):203-225.

Created using R Markdown with �air by xaringan, andkunoichi (female ninja) style.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

19 / 19

Page 20: of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel coordinate plots show the data using parallel axes Scatterplots use orthogonal axes,