Upload
asclabisb
View
218
Download
0
Embed Size (px)
Citation preview
7/30/2019 (10) Box-Plot With Fences
1/16
Applied Statistics and Computing Lab
BOX-PLOT WITH FENCES
Applied Statistics and Computing Lab
Indian School of Business
7/30/2019 (10) Box-Plot With Fences
2/16
Applied Statistics and Computing Lab
Learning goals
Why go beyond a basic box-plot?
What are fences?
How is box-plot with fences constructed?
How does one interpret such a plot? What are the gains and limitations?
2
7/30/2019 (10) Box-Plot With Fences
3/16
Applied Statistics and Computing Lab
Box-plot with fences
Can we modify the basic box-plot so that it
helps in detecting unusual observations?
Box-plot with fences can be useful
What are fences? Let us take a look at a figure!
3
7/30/2019 (10) Box-Plot With Fences
4/16
Applied Statistics and Computing Lab
Box-plot with fences (contd.)
4
Source: http://en.wikipedia.org/wiki/Boxplot
7/30/2019 (10) Box-Plot With Fences
5/16
Applied Statistics and Computing Lab
Basis for fences
From the previous figure, we see that for a
normally distributed data, 99.3% of the datalies in the interval
Also, only 3 out of a million or 0.003%
observations are expected to be present
outside the interval
5
))(5.1),(5.1( 133131 QQQQQQ +
))(3),(3( 133131 QQQQQQ +
7/30/2019 (10) Box-Plot With Fences
6/16
Applied Statistics and Computing Lab
Box-plot with fences
6
Outlier
Suspected outlier
Visuals from Aczel A., Sounderpandian J. Complete business statistics
7/30/2019 (10) Box-Plot With Fences
7/16
Applied Statistics and Computing Lab
Box-plot with fences Box-plot with fences are useful in identifying unusual observations
What are unusual observations?
Box-plot serves only as a diagnostic. It is not a test of significance.
Caution: Even for a random sample from a normal distribution, about 7
out of thousand sample points can lie outside the inner fence and 3 out of
a million can lie outside the outer fence. Thus when dealing with large
data sets, one has to be careful about declaration of outliers on the basis
of a Box-plot. Sometimes, simulation-based methods are used for this
purpose. For more information one may see Robert Dawson (2011)
Sometimes only the inner fence is used (as is the default in R)
The default for Box-plot command in R produces Box-plot with inner fence
7
7/30/2019 (10) Box-Plot With Fences
8/16
Applied Statistics and Computing Lab
Comparison of data
8Visuals from Aczel A., Sounderpandian J. Complete business statistics
7/30/2019 (10) Box-Plot With Fences
9/16
Applied Statistics and Computing Lab
Box-plot of final exam scores
9
7/30/2019 (10) Box-Plot With Fences
10/16
Applied Statistics and Computing Lab
Box-plots of all the scores
10
7/30/2019 (10) Box-Plot With Fences
11/16
Applied Statistics and Computing Lab
Box-plots of three minors
11
7/30/2019 (10) Box-Plot With Fences
12/16
Applied Statistics and Computing Lab
Box-plots indicating means
12
7/30/2019 (10) Box-Plot With Fences
13/16
Applied Statistics and Computing Lab
Interpretation of the Box-plot
In the Box-plot corresponding to the scores in the second semester exam, wehave 3 unusual observations among 50. Under normal situation, we expect tohave about 7 in a thousand observations. Thus one needs to probe into theseunusual observations.
The distribution of scores of second semester exam appears to be symmetric,but may have slightly longer tails in view of the unusual observations, situatedsymmetrically below and above the fences.
From the box-plots corresponding to the three minors, it appears that
The distribution of scores in First minor is skewed to the right,
The distributions of scores in Second and Third minors are symmetric and aresomewhat similar, and
The median scores of the three minors seem to be close (we shall examinethis further when we deal with the notched box-plots)
There is an unusual observation in the Box-plot of scores of First semesterexam, with a value of about 18. We know that the GPA is out of 10. Thus this isan outlier!
13
7/30/2019 (10) Box-Plot With Fences
14/16
Applied Statistics and Computing Lab
Gain from a Box-plot with fence
As we saw,
We can identify unusual observations
We can examine the tail behaviour
We can compare two or more variables or datasets more easily
However we cannot get modal information from these plots!
14
7/30/2019 (10) Box-Plot With Fences
15/16
Applied Statistics and Computing Lab
R-codesPlot R-code
Boxplot (of single variable) boxplot(variable name)
Boxplot (of all the variables in a dataset) boxplot(name of data as input in R)
Boxplot (of k distinct variables from a
dataset)
boxplot(dataname$variable 1 name,
dataname$variable 2 name,,
dataname$variable k name)
Boxplot with means (can be drawn for
one or many variables at the same time)
boxplot(variable specification)
points(y=colMeans(variables
specification),x=1:(total number of
variables in a box-plot))
15
7/30/2019 (10) Box-Plot With Fences
16/16
Applied Statistics and Computing Lab
Thank you