15
Boxplots Boxplots (Box and Whisker (Box and Whisker Plots) Plots)

Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Embed Size (px)

Citation preview

Page 1: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Boxplots Boxplots (Box and Whisker (Box and Whisker

Plots)Plots)

Page 2: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Comparing Data Using BoxplotsComparing Data Using Boxplots

• Each section of the Each section of the boxplot represents boxplot represents 25% 25% of the data. of the data.

• The The medianmedian (50%tile) is the line in (50%tile) is the line in the middle of the box.the middle of the box.

• The The whiskerswhiskers extend extend to the max and min to the max and min value that aren’t value that aren’t outliers. outliers.

• Any Any outliersoutliers are dots are dots past the end of the past the end of the whiskers. whiskers.

Page 3: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

5 Number Summary for a 5 Number Summary for a BoxplotBoxplot

– MinimumMinimum– Q1 (Quartile 1 – 25Q1 (Quartile 1 – 25thth percentile) percentile)– Median (50Median (50thth percentile) percentile)– Q3 (Quartile 3 – 75Q3 (Quartile 3 – 75thth percentile) percentile)– MaximumMaximum

– These are the 5 main points on the These are the 5 main points on the boxplotboxplot

Page 4: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Finding the Median & Finding the Median & QuartilesQuartiles

• To find the median of a set of data:To find the median of a set of data:– Order Order the data from least to greatestthe data from least to greatest– The median is the The median is the middle numbermiddle number– If there is an even number of numbers and If there is an even number of numbers and

there is no one middle number, then there is no one middle number, then averageaverage the two middle numbersthe two middle numbers

• To find the Quartiles:To find the Quartiles:– Q1 is the Q1 is the medianmedian of the of the lowerlower half of the data half of the data– Q3 is the Q3 is the medianmedian of the of the toptop half of the data half of the data– Never include the actual median in the data Never include the actual median in the data

when finding the quartileswhen finding the quartiles

Page 5: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

How to Make a BoxplotHow to Make a Boxplot

• Find the 5 Number SummaryFind the 5 Number Summary• Scale the axis so all numbers fit Scale the axis so all numbers fit

appropriatelyappropriately• Make the box start at Q1 and end at Q3Make the box start at Q1 and end at Q3• Draw a line in the box marking the medianDraw a line in the box marking the median• Extend “whiskers” to the minimum and Extend “whiskers” to the minimum and

maximum maximum – Modified Boxplot: If there are outliers, Modified Boxplot: If there are outliers,

extend whiskers to the smallest and extend whiskers to the smallest and largest values that aren’t outliers and put largest values that aren’t outliers and put dots where the outliers liedots where the outliers lie

Page 6: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Finding OutliersFinding Outliers

IQR or Interquartile Range = Q3 – Q1IQR or Interquartile Range = Q3 – Q1

An outlier on the low end is any point An outlier on the low end is any point

lowerlower than Q1 - than Q1 - 1.5(IQR) 1.5(IQR)

An outlier on the high end is any point An outlier on the high end is any point

higherhigher than than Q3 + 1.5(IQR)Q3 + 1.5(IQR)

Page 7: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Make and compare Boxplots:Make and compare Boxplots:Poverty Rates in the Eastern Poverty Rates in the Eastern

USUSSouthernSouthern Poverty (%)Poverty (%) NorthernNorthern Poverty (%)Poverty (%)

MarylandMaryland 6.16.1 New HampshireNew Hampshire 4.34.3

DelawareDelaware 6.56.5 WisconsinWisconsin 5.65.6

FloridaFlorida 9.09.0 ConnecticutConnecticut 6.26.2

North North CarolinaCarolina

9.09.0 New JerseyNew Jersey 6.36.3

GeorgiaGeorgia 9.99.9 VermontVermont 6.36.3

TennesseeTennessee 10.310.3 IndianaIndiana 6.76.7

South South CarolinaCarolina

10.710.7 MassachusettsMassachusetts 6.76.7

AlabamaAlabama 12.512.5 MichiganMichigan 7.47.4

KentuckyKentucky 12.712.7 MaineMaine 7.87.8

VirginiaVirginia 13.913.9 OhioOhio 7.87.8

West VirginiaWest Virginia 13.913.9 PennsylvaniaPennsylvania 7.87.8

MississippiMississippi 16.016.0 IllinoisIllinois 7.87.8

Rhode IslandRhode Island 8.98.9

New YorkNew York 11.511.5

Page 8: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

5 Number Summary & 5 Number Summary & OutliersOutliers

Southern StatesSouthern States• Min: 6.1Min: 6.1• Q1: 9.0Q1: 9.0• Median: 10.5Median: 10.5• Q3: 13.3Q3: 13.3• Max: 16Max: 16• Outliers: Outliers: • < 9.0 – 1.5(13.3-9.0) = 2.55 < 9.0 – 1.5(13.3-9.0) = 2.55

no poverty rates are < 2.55 no poverty rates are < 2.55 • > 13.3 + 1.5(13.3-9.0) = > 13.3 + 1.5(13.3-9.0) =

19.7519.75• no poverty rates are >19.75 no poverty rates are >19.75 • so no outliers on either endso no outliers on either end

Northern StatesNorthern States• Min: 4.3Min: 4.3• Q1: 6.3Q1: 6.3• Median: 7.05Median: 7.05• Q3: 7.8Q3: 7.8• Max: 11.5Max: 11.5• Outliers: Outliers: • < 6.3 – 1.5(7.8-6.3) = 4.05< 6.3 – 1.5(7.8-6.3) = 4.05• no poverty rates are < no poverty rates are <

4.054.05• > 7.8 + 1.5(7.8-6.3) = 10.05> 7.8 + 1.5(7.8-6.3) = 10.05• NY is 11.5 which is >10.05 NY is 11.5 which is >10.05

so NY is an outlierso NY is an outlier

Page 9: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line
Page 10: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Boxplots in CalculatorBoxplots in Calculator

• Enter data into List (Stat Edit)Enter data into List (Stat Edit)• Choose 1Choose 1stst boxplot option in StatPlot boxplot option in StatPlot• Choose the list you used for XlistChoose the list you used for Xlist• Choose 1 for Freq or a 2Choose 1 for Freq or a 2ndnd list if data is list if data is

stored in two lists (values in one, frequency stored in two lists (values in one, frequency in another)in another)

• Zoom 9 will scale it for you to see the graphZoom 9 will scale it for you to see the graph• Press Trace and the arrow keys to see the Press Trace and the arrow keys to see the

five number summary and any outliersfive number summary and any outliers

Page 11: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line
Page 12: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Measures of CenterMeasures of CenterMean(, ) —add up data values and divide by

number of data values

Median (M)—list data values in order, locate middle data value; average middle 2 if necessary

Data Set: 19, 20, 20, 21, 22

x

Mean = 20.4; Median = 20

Data Set: 19, 20, 20, 21, 38

Mean = 23.6; Median = 20

Page 13: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

RobustRobust (Resistant) (Resistant) StatisticStatistic

• Robust or resistant: Robust or resistant: value doesn’t value doesn’t change dramatically when extreme change dramatically when extreme values (including outliers) are added to values (including outliers) are added to (or taken out of) the data set.(or taken out of) the data set.– MedianMedian is is resistant.resistant.– MeanMean is NOT is NOT resistantresistant against extreme against extreme

values. Mean is pulled values. Mean is pulled away fromaway from the the center center of the distribution of the distribution towardtoward the the extreme value extreme value (“tails of graph”).(“tails of graph”).

Page 14: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Mean Mean or or

MedianMedian??

Page 15: Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line

Measures of Center on Measures of Center on Different Distribution Different Distribution

ShapesShapes

Skewed to the left Symmetric Skewed to the right

In each of the graphs, decide which mark represents the mean µ, median M, and mode Mo.

Remember the mean is pulled toward extreme values.Remember the mean is pulled toward extreme values.

µ, M, Mo all = Mo, M, µ