Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12...

Preview:

Citation preview

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Exploratory Data Analysis

Dr. Aijun ZhangSTAT3622 Data Visualization

12 September 2016

StatSoft.org 1

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Outline

1 Exploratory Data Analysis

2 Simple Base Graphics

3 Using Lattice Package

StatSoft.org 2

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

John Tukey

John Tukey (1915 – 2000)

Proposed “Exploratory Data Analysis”

Coined terms: Boxplot, Stem-and-Leafplot, ANOVA (Analysis of Variance)

Coined terms “Bit” and “Software”

Co-Developed Fast Fourier Transformalgorithm, Projection Pursuit, Jackknifeestimation

Famous quote: “The best thing aboutbeing a statistician is that you get to playin everyone’s backyard. ”

StatSoft.org 3

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

John Tukey

“The greatest value of a picture is when it forces us to notice whatwe never expected to see.”

John Tukey (1977)

Tables

Five-number summary

Scatter plot

Box-plot

Residual plot

Smoother

Stem-and-Leaf plot

Bag plot

Median Polish

StatSoft.org 4

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Example 1: Anscombe Dataset

x1 y1 x2 y2 x3 y3 x4 y41 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.582 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.763 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.714 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.845 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.476 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.047 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.258 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.509 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56

10 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.9111 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89

Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50Sd 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03Cor 0.82 0.82 0.82 0.82

StatSoft.org 5

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Example 1: Anscombe Dataset

StatSoft.org 6

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Exploratory Data Analysis

The EDA is a statistical approach to make sense of data by using avariety of techniques (mostly graphical). It may help

Assess assumption about variables distribution

Identify relationship between variables

Extract important variables

Suggest use of appropriate models

Detect problems of collected data (e.g. outliers, missing data,measurement errors)

StatSoft.org 7

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Base Statistical Graphics

UnivariteHistogram, Stem-and-Leaf, Dot, Q-Q, Density plotsBoxplot, Box-and-whiskerBar, Pie, Polar, Waterfall charts

BivariateXYplot, Line, Area, Scatter, Bubble charts

Trivariate3D Scatter, Contour, Level/Heatmap, Surface plots

StatSoft.org 8

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Which Chart to Use?

StatSoft.org 9

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Which Chart to Use?

Indeed, experience matters!

StatSoft.org 10

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Outline

1 Exploratory Data Analysis

2 Simple Base Graphics

3 Using Lattice Package

StatSoft.org 11

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Iris Dataset

Let’s play with Iris data in RStudio. Refer to R markdown soucecodes and html output files (reproducible).

StatSoft.org 12

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Histogram

Options: title(.,main, xlab, ylab), hist(.,breaks, freq, col)figure layout by par(mfrow/mfcol = c(nr,nc))

StatSoft.org 13

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Histogram with Density Plot

Options: hist(.,freq=F); lines(density(.), lty, lwd,)

StatSoft.org 14

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Boxplot

Remarks: Outliers /∈ {Q1− 1.5IQR,Q3 + 1.5IQR}

Options: plotting x (vector), X (matrix) and x ∼ c (grouping)

StatSoft.org 15

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Plotting Categorical Variables

Ticks: data selection/subsetting; see UCLA R-site

StatSoft.org 16

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Relationship Between Variables

Tricks: mathematical annotations in plots; see plotmath.html

StatSoft.org 17

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Relationship Between Variables

Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations

StatSoft.org 18

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Pairwise Scatter Plots

Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations

StatSoft.org 19

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Pairwise Scatter Plots

Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations

StatSoft.org 20

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Outline

1 Exploratory Data Analysis

2 Simple Base Graphics

3 Using Lattice Package

StatSoft.org 21

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Lattice Package

Sarkar (2008; Springer)

Using trellis graphs for multivariate data

Multipanel conditioning and grouping

Elegant high-level data visualization

Covering most of statistical charts

Figures and Codes can be found athttp://lmdvr.r-forge.r-project.org/

Plot customization are not straightforward

StatSoft.org 22

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Univariate Distributions with Conditioning and Grouping

Refer to R Markdown for source codes/outputs (reproducible)

StatSoft.org 23

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Univariate Distributions with Conditioning and Grouping

StatSoft.org 24

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Univariate Distributions with Conditioning and Grouping

StatSoft.org 25

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Exploring Bivariate Relationships

StatSoft.org 26

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Trivariate Heatmap and 3D Plots

StatSoft.org 27

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Trivariate Heatmap and 3D Plots

StatSoft.org 28

Recommended