4
Box plot 1 Box plot Figure 1. Box plot of data from the MichelsonMorley experiment In descriptive statistics, a box plot or boxplot (also known as a box-and-whisker diagram or plot) is a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). A boxplot may also indicate which observations, if any, might be considered outliers. Boxplots display differences between populations without making any assumptions of the underlying statistical distribution: they are non-parametric. The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers. Boxplots can be drawn either horizontally or vertically. Alternative forms Figure 2. Boxplot with whiskers from minimum to maximum Box and whisker plots are uniform in their use of the box: the bottom and top of the box are always the 25th and 75th percentile (the lower and upper quartiles, respectively), and the band near the middle of the box is always the 50th percentile (the median). But the ends of the whiskers can represent several possible alternative values, among them: the minimum and maximum of all the data [1] (as in Figure 2) the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile [2] [3] (as in Figure 3) one standard deviation above and below the mean of the data the 9th percentile and the 91st percentile the 2nd percentile and the 98th percentile. Any data not included between the whiskers should be plotted as an outlier with a dot, small circle, or star, but occasionally this is not done. Some box plots include an additional character to represent the mean of the data. [2] On some box plots a crosshatch is placed on each whisker, before the end of the whisker.

Box Plot

Embed Size (px)

Citation preview

Page 1: Box Plot

Box plot 1

Box plot

Figure 1. Box plot of data from the Michelson–Morley experiment

In descriptive statistics, a box plot orboxplot (also known as a box-and-whiskerdiagram or plot) is a convenient way ofgraphically depicting groups of numericaldata through their five-number summaries:the smallest observation (sample minimum),lower quartile (Q1), median (Q2), upperquartile (Q3), and largest observation(sample maximum). A boxplot may alsoindicate which observations, if any, mightbe considered outliers.

Boxplots display differences betweenpopulations without making anyassumptions of the underlying statisticaldistribution: they are non-parametric. Thespacings between the different parts of thebox help indicate the degree of dispersion(spread) and skewness in the data, andidentify outliers. Boxplots can be drawneither horizontally or vertically.

Alternative forms

Figure 2. Boxplot with whiskers from minimumto maximum

Box and whisker plots are uniform in their use of the box: the bottomand top of the box are always the 25th and 75th percentile (the lowerand upper quartiles, respectively), and the band near the middle of thebox is always the 50th percentile (the median). But the ends of thewhiskers can represent several possible alternative values, amongthem:

• the minimum and maximum of all the data[1] (as in Figure 2)• the lowest datum still within 1.5 IQR of the lower quartile, and the

highest datum still within 1.5 IQR of the upper quartile[2] [3] (as inFigure 3)

• one standard deviation above and below the mean of the data• the 9th percentile and the 91st percentile• the 2nd percentile and the 98th percentile.Any data not included between the whiskers should be plotted as an outlier with a dot, small circle, or star, butoccasionally this is not done.Some box plots include an additional character to represent the mean of the data.[2]

On some box plots a crosshatch is placed on each whisker, before the end of the whisker.

Page 2: Box Plot

Box plot 2

Figure 3. Same Boxplot with whiskers withmaximum 1.5 IQR

Rarely, box plots can be presented with no whiskers at all.Because of this variability, it is appropriate to describe the conventionbeing used for the whiskers and outliers in the caption for the plot.

The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for whisker cross-hatches and whisker ends to showthe seven-number summary. If the data are normally distributed the locations of the seven marks on the box plot willbe equally spaced.

Variations

Figure 4. Four box plots, with and withoutnotches and variable width

Several variations on the traditional box plot have been described. Twoof the most common are variable width box plots and notched boxplots (see figure 4).Variable width box plots illustrate the size of each group whose data isbeing plotted by making the width of the box proportional to the size ofthe group. A popular convention is to make the box width proportionalto the square root of the size of the group.[1]

Notched box plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a roughguide to significance of difference of medians; if the notches of two boxes do not overlap, this offers evidence of astatistically significant difference between the medians.[1] The width of the notches is proportional to theinterquartile range of the sample and inversely proportional to the square root of the size of the sample. However,there is uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of thevariances of the samples).[1] One convention is to use .[3]

Page 3: Box Plot

Box plot 3

Visualization

Figure 5. Boxplot and a probability densityfunction (pdf) of a Normal N(0,1σ2) Population

The boxplot is a quick way of examining one or more sets of datagraphically. Boxplots may seem more primitive than a histogram orkernel density estimate but they do have some advantages. They takeup less space and are therefore particularly useful for comparingdistributions between several groups or sets of data (see Figure 1 for anexample). Choice of number and width of bins techniques can heavilyinfluence the appearance of a histogram, and choice of bandwidth canheavily influence the appearance of a kernel density estimate.

As looking at a statistical distribution is more intuitive than looking ata boxplot, comparing the boxplot against the probability densityfunction (theoretical histogram) for a normal N(0,1σ2) distribution maybe a useful tool for understanding the boxplot (Figure 5).

References[1] Robert McGill, John W. Tukey, Wayne A. Larsen (February 1978). "Variations of Box Plots". The American Statistician 32 (1): 12–16.

doi:10.2307/2683468. JSTOR 2683468.[2] Michael Frigge, David C. Hoaglin, Boris Iglewicz (February 1989). "Some Implementations of the Boxplot". The American Statistician 43

(1): 50–54. doi:10.2307/2685173. JSTOR 2685173.[3] "R: Box Plot Statistics" (http:/ / stat. ethz. ch/ R-manual/ R-devel/ library/ grDevices/ html/ boxplot. stats. html). R manual. . Retrieved 26

June 2011.

• John W. Tukey (1977). Exploratory Data Analysis. Addison-Wesley.• Benjamini, Y. (1988). "Opening the Box of a Boxplot". The American Statistician 42 (4): 257–262.

doi:10.2307/2685133. JSTOR 2685133.• Rousseeuw, P. J.; Ruts, I.; Tukey, J. W. (1999). "The Bagplot: A Bivariate Boxplot". The American Statistician

53 (4): 382–387. doi:10.2307/2686061. JSTOR 2686061.

External links• Visual Presentation of Data by Means of Box Plots (http:/ / www. lcgceurope. com/ lcgceurope/ data/

articlestandard/ lcgceurope/ 132005/ 152912/ article. pdf)• On-line box plot calculator with explanations and examples (http:/ / www. physics. csbsju. edu/ stats/ box2. html)

(Has beeswarm example)• Beeswarm Boxplot (http:/ / www. r-statistics. com/ 2011/ 03/ beeswarm-boxplot-and-plotting-it-with-r/ ) -

superimposing a frequency-jittered stripchart on top of a boxplot

Page 4: Box Plot

Article Sources and Contributors 4

Article Sources and ContributorsBox plot  Source: http://en.wikipedia.org/w/index.php?oldid=466126904  Contributors: 127, 3fingeredPete, A8UDI, ALE!, AbsolutDan, Ajonlime, Allstarecho, AndrewHZ, Anooponnet,Ausinha, Baccyak4H, Berland, BlueAmethyst, Boxplot, BrettMontgomery, Caltas, Chen-Pan Liao, Cholmes75, Chrischan, Christian75, Coffee2theorems, Danharrisdanharris, David Eppstein,Dcljr, Den fjättrade ankan, DerHexer, Dougofborg, Epbr123, Evercat, Fisherjs, Fnielsen, Forerunner411, Fredrik x nilsson, Fvasconcellos, G716, GVOLTT, George Brower, Giftlite, Glane23,Glrx, Gogo Dodo, GraemeL, H.ehsaan, Hahahchris, Headbomb, Henrygb, Hgberman, Hooperbloob, Hssghj, Hu12, IRP, Ingenue Girl, Innohead, Iridescent, J.delanoy, Javidjamae, Jeepday,Jennavecia, Jhguch, Jim.belk, JoanneB, Johannes Hüsing, Johndburger, Johnjohn124, JonPeltier, Jrockley, Kareekacha, KuCM, Lambiam, Lourakis, Mack2, Melcombe, Michael Hardy, Mpt24,MrOllie, Mwtoews, Mww113, Nbarth, Nevron, Nlu, NotTheMilkman, Notreallydavid, Noyder, Ohconfucius, Oleg Alexandrov, Oliphaunt, Oxymoron83, Parametrist, Pinethicket, Piotrus,Plasmidmap, Professordreamsmasher, Qwfp, RJaguar3, RandomXYZb, Res2216firestar, RexNL, Richard001, Rl, Runningonbrains, SMC, Salvio giuliano, SamaRAWR, Sbwoodside, Schutz,Seanstock, Skagedal, Startstop123, Sunroamer, Sunwards, Super-Magician, Taganov, Talgalili, The Anome, Tkirkman, Tom Duff, Tom Lougheed, Wikidilworth, Willking1979, Wissons,ZeroOne, 220 anonymous edits

Image Sources, Licenses and ContributorsImage:Michelsonmorley-boxplot.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Michelsonmorley-boxplot.svg  License: Public Domain  Contributors: User:Mwtoews, User:SchutzFile:Box-Plot mit Min-Max Abstand.png  Source: http://en.wikipedia.org/w/index.php?title=File:Box-Plot_mit_Min-Max_Abstand.png  License: unknown  Contributors: SchlurcherFile:Box-Plot mit Interquartilsabstand.png  Source: http://en.wikipedia.org/w/index.php?title=File:Box-Plot_mit_Interquartilsabstand.png  License: unknown  Contributors: SchlurcherFile:Fourboxplots.png  Source: http://en.wikipedia.org/w/index.php?title=File:Fourboxplots.png  License: Creative Commons Zero  Contributors: User:BrettMontgomeryImage:Boxplot vs PDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Boxplot_vs_PDF.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: Original uploaderwas Jhguch at en.wikipedia; Derivative work: Chen-Pan Liao (talk).

LicenseCreative Commons Attribution-Share Alike 3.0 Unported//creativecommons.org/licenses/by-sa/3.0/