(9) Basic Box-Plot

Embed Size (px)

Citation preview

  • 7/30/2019 (9) Basic Box-Plot

    1/20

    Applied Statistics and Computing Lab

    BASIC BOX-PLOT

    Applied Statistics and Computing Lab

    Indian School of Business

  • 7/30/2019 (9) Basic Box-Plot

    2/20

    Applied Statistics and Computing Lab

    Learning goals

    What are the components of a basic box-plot?

    How is a basic box-plot constructed?

    How to interpret it?

    What are its salient features?

    What are its limitations?

    How is it related to Histogram? What is the effect of translation on a box-plot?

    2

  • 7/30/2019 (9) Basic Box-Plot

    3/20

    Applied Statistics and Computing Lab

    Where does a box-plot fit in?

    Exploratory Data-analytic tool for continuous

    data

    Visual display of certain important summary

    statistics

    3

  • 7/30/2019 (9) Basic Box-Plot

    4/20

    Applied Statistics and Computing Lab

    Why Box-plot?

    Useful in Studying

    Location

    Spread

    Distribution

    Symmetry

    Tail behaviour Skewness

    Useful in comparison of different batches of Data, or a batch of data with

    factors

    Useful to study information of observations at the tails

    Easy to compute and draw, yet informative

    User-friendly

    4

  • 7/30/2019 (9) Basic Box-Plot

    5/20

    Applied Statistics and Computing Lab

    Constructing a Basic Box-plot Data set 1. Suppose we have data on a batch (variable)

    90, 41, 22, 135, 15, 72, 50, 26, 105

    Step 1: Arrange the data in the increasing order:

    15, 22, 26, 41, 50, 72, 90, 105, 135

    Step 2: Get the Five-point Summary, consisting of (i) the Minimum, (ii) First

    quartile ( ), (iii) Median, (iv) Third quartile ( ) and (v) the Maximum

    For the above data, the Five-point Summary is:

    Minimum= 15

    First Quartile= 26

    Median= 50

    Third Quartile= 90

    Maximum= 1355

  • 7/30/2019 (9) Basic Box-Plot

    6/20

    Applied Statistics and Computing Lab

    Constructing a Basic Box-plot (contd.) Step 3: Draw a boxof length equal to ( ). For now, we can choose the

    width as per convenience. The lower and upper hinges of the box represent the

    first and third quartiles. (In this case, the width is the vertical distance and hingesare the right and the left extremes)

    6

  • 7/30/2019 (9) Basic Box-Plot

    7/20

    Applied Statistics and Computing Lab

    Constructing a Basic Box-plot (contd.) Step 4: From the middle of the lower hinge draw a line (parallel to the

    lines corresponding to the length of the box) up to the minimum. Similarly

    draw a line from the middle of the upper hinge (parallel to the lines

    corresponding to the length of the box) up to the maximum. These lines

    are called the whiskers.

    Step 5: Draw a line at the median parallel to the hinges, dividing the box

    into two parts.

    7

  • 7/30/2019 (9) Basic Box-Plot

    8/20

    Applied Statistics and Computing Lab

    Horizontal and vertical Box-plots

    The box-plot we saw is a horizontal box-plot

    (here the scale is on the horizontal axis)

    One can also have a vertical plot (where the

    scale is on the vertical axis) There is no specific advantage of one over the

    other, in general

    8

  • 7/30/2019 (9) Basic Box-Plot

    9/20

    Applied Statistics and Computing Lab

    What can we get from a basic Box-

    plot?

    We can obviously get the position of the

    location measure, spread and where themiddle 50% of observations are concentrated,

    in a visual display

    After all, a picture is worth a thousand words

    What else can we get ?

    9

  • 7/30/2019 (9) Basic Box-Plot

    10/20

    Applied Statistics and Computing Lab

    What can we get from a basic Box-

    plot? (contd.)

    10

    Visuals from Aczel A., Sounderpandian J. Complete business statistics

  • 7/30/2019 (9) Basic Box-Plot

    11/20

    Applied Statistics and Computing Lab

    Interpretation of the basic box-plot

    Thus the distribution of the Dataset 1 is right skewed.

    11

  • 7/30/2019 (9) Basic Box-Plot

    12/20

    Applied Statistics and Computing Lab

    Basic Box-plot: Features and limitations

    Features

    Basic box-plot displays the location (median), intervals required for thefirst, second, third and fourth quarters of the data

    It visually shows where the middle 50% of data is located

    It tells us whether the data is symmetric or left skewed or right skewed

    Limitations We cannot get modal information

    We cannot identify unusual observations

    It is hard to identify the tail behaviour

    12

  • 7/30/2019 (9) Basic Box-Plot

    13/20

    Applied Statistics and Computing Lab

    Histogram and box-plot

    In histogram, the width of the interval is fixed and the height

    of the vertical bar is proportional to the (relative) frequency inthat interval

    In a box-plot the relative frequency is fixed at 25% and the

    intervals correspond to the first, second, third and fourth 25%

    of the relative frequencies. More precisely, The left and right whiskers correspond to the first and fourth 25%

    The part of the box from the first quartile to the median corresponds

    to the second 25%, and

    The part of the box from the median to the third quartile corresponds

    to the third 25%

    13

  • 7/30/2019 (9) Basic Box-Plot

    14/20

    Applied Statistics and Computing Lab

    Scores dataset

    Comprises 50 students scores in their second

    semester exam of the second course inQuantitative methods

    We also have their GPA of first semester exam

    and scores in 3 minors of the subject

    = 60 =

    20 = 1,2,3

    = ( 10)

    14

  • 7/30/2019 (9) Basic Box-Plot

    15/20

    Applied Statistics and Computing Lab

    Histogram and box-plot (contd.)

    15

    Box-plot of scoresHistogram of scores

  • 7/30/2019 (9) Basic Box-Plot

    16/20

    Applied Statistics and Computing Lab

    Histogram and box-plot (contd.)

    Box-plot readily gives the information on the 5-point

    summary and the information about skewness. However, it is

    not possible to get information about the mode(s)

    Histogram readily give information about mode(s). But it takes

    some effort to extract information about the 5-point summary

    Thus the two plots complement each other!

    16

  • 7/30/2019 (9) Basic Box-Plot

    17/20

    Applied Statistics and Computing Lab

    Effect of translation Consider the box-plot of a variable X. Suppose we translate

    from X to Y = aX + b where a is positive, the features of the

    box-plot do not change

    If we translate from X to Y = aX + b where a is negative, then

    the features of the box-plot would be the same as those of

    boxplot of (X)

    We shall demonstrate this by getting the box-plots of ,

    (3 +5) and (-3 +5) for the Scores data set ( is the score

    in the 2nd

    minor)

    17

  • 7/30/2019 (9) Basic Box-Plot

    18/20

    Applied Statistics and Computing Lab

    Effect of translation

    18

    Box-plot of (3 +5)Box-plot of Box-plot of (-3 +5)

  • 7/30/2019 (9) Basic Box-Plot

    19/20

    Applied Statistics and Computing Lab

    R-codes The R-code for box-plot is boxplot(variable

    name)

    19

  • 7/30/2019 (9) Basic Box-Plot

    20/20

    Applied Statistics and Computing Lab

    Thank you