(10) Box-Plot With Fences

Embed Size (px)

Citation preview

  • 7/30/2019 (10) Box-Plot With Fences

    1/16

    Applied Statistics and Computing Lab

    BOX-PLOT WITH FENCES

    Applied Statistics and Computing Lab

    Indian School of Business

  • 7/30/2019 (10) Box-Plot With Fences

    2/16

    Applied Statistics and Computing Lab

    Learning goals

    Why go beyond a basic box-plot?

    What are fences?

    How is box-plot with fences constructed?

    How does one interpret such a plot? What are the gains and limitations?

    2

  • 7/30/2019 (10) Box-Plot With Fences

    3/16

    Applied Statistics and Computing Lab

    Box-plot with fences

    Can we modify the basic box-plot so that it

    helps in detecting unusual observations?

    Box-plot with fences can be useful

    What are fences? Let us take a look at a figure!

    3

  • 7/30/2019 (10) Box-Plot With Fences

    4/16

    Applied Statistics and Computing Lab

    Box-plot with fences (contd.)

    4

    Source: http://en.wikipedia.org/wiki/Boxplot

  • 7/30/2019 (10) Box-Plot With Fences

    5/16

    Applied Statistics and Computing Lab

    Basis for fences

    From the previous figure, we see that for a

    normally distributed data, 99.3% of the datalies in the interval

    Also, only 3 out of a million or 0.003%

    observations are expected to be present

    outside the interval

    5

    ))(5.1),(5.1( 133131 QQQQQQ +

    ))(3),(3( 133131 QQQQQQ +

  • 7/30/2019 (10) Box-Plot With Fences

    6/16

    Applied Statistics and Computing Lab

    Box-plot with fences

    6

    Outlier

    Suspected outlier

    Visuals from Aczel A., Sounderpandian J. Complete business statistics

  • 7/30/2019 (10) Box-Plot With Fences

    7/16

    Applied Statistics and Computing Lab

    Box-plot with fences Box-plot with fences are useful in identifying unusual observations

    What are unusual observations?

    Box-plot serves only as a diagnostic. It is not a test of significance.

    Caution: Even for a random sample from a normal distribution, about 7

    out of thousand sample points can lie outside the inner fence and 3 out of

    a million can lie outside the outer fence. Thus when dealing with large

    data sets, one has to be careful about declaration of outliers on the basis

    of a Box-plot. Sometimes, simulation-based methods are used for this

    purpose. For more information one may see Robert Dawson (2011)

    Sometimes only the inner fence is used (as is the default in R)

    The default for Box-plot command in R produces Box-plot with inner fence

    7

  • 7/30/2019 (10) Box-Plot With Fences

    8/16

    Applied Statistics and Computing Lab

    Comparison of data

    8Visuals from Aczel A., Sounderpandian J. Complete business statistics

  • 7/30/2019 (10) Box-Plot With Fences

    9/16

    Applied Statistics and Computing Lab

    Box-plot of final exam scores

    9

  • 7/30/2019 (10) Box-Plot With Fences

    10/16

    Applied Statistics and Computing Lab

    Box-plots of all the scores

    10

  • 7/30/2019 (10) Box-Plot With Fences

    11/16

    Applied Statistics and Computing Lab

    Box-plots of three minors

    11

  • 7/30/2019 (10) Box-Plot With Fences

    12/16

    Applied Statistics and Computing Lab

    Box-plots indicating means

    12

  • 7/30/2019 (10) Box-Plot With Fences

    13/16

    Applied Statistics and Computing Lab

    Interpretation of the Box-plot

    In the Box-plot corresponding to the scores in the second semester exam, wehave 3 unusual observations among 50. Under normal situation, we expect tohave about 7 in a thousand observations. Thus one needs to probe into theseunusual observations.

    The distribution of scores of second semester exam appears to be symmetric,but may have slightly longer tails in view of the unusual observations, situatedsymmetrically below and above the fences.

    From the box-plots corresponding to the three minors, it appears that

    The distribution of scores in First minor is skewed to the right,

    The distributions of scores in Second and Third minors are symmetric and aresomewhat similar, and

    The median scores of the three minors seem to be close (we shall examinethis further when we deal with the notched box-plots)

    There is an unusual observation in the Box-plot of scores of First semesterexam, with a value of about 18. We know that the GPA is out of 10. Thus this isan outlier!

    13

  • 7/30/2019 (10) Box-Plot With Fences

    14/16

    Applied Statistics and Computing Lab

    Gain from a Box-plot with fence

    As we saw,

    We can identify unusual observations

    We can examine the tail behaviour

    We can compare two or more variables or datasets more easily

    However we cannot get modal information from these plots!

    14

  • 7/30/2019 (10) Box-Plot With Fences

    15/16

    Applied Statistics and Computing Lab

    R-codesPlot R-code

    Boxplot (of single variable) boxplot(variable name)

    Boxplot (of all the variables in a dataset) boxplot(name of data as input in R)

    Boxplot (of k distinct variables from a

    dataset)

    boxplot(dataname$variable 1 name,

    dataname$variable 2 name,,

    dataname$variable k name)

    Boxplot with means (can be drawn for

    one or many variables at the same time)

    boxplot(variable specification)

    points(y=colMeans(variables

    specification),x=1:(total number of

    variables in a box-plot))

    15

  • 7/30/2019 (10) Box-Plot With Fences

    16/16

    Applied Statistics and Computing Lab

    Thank you