Upload
vuquynh
View
228
Download
5
Embed Size (px)
Citation preview
Descriptive StatisticsLesson 2
Ryan Safner1
1Department of EconomicsHood College
ECMG 212 - Statistics for Business and EconomicsSpring 2017
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 1 / 95
Lesson Plan
1 Describing Categorical Data
2 Describing Quantitative Data
Measures of Central TendencyMeasures of Locating DataMeasures of Spread
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 2 / 95
Variables and Distributions
All variables have a distribution of different individual values (andhow often it takes on these values)
We often want to display this distribution in a useful way to searchfor interesting patterns
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 3 / 95
Frequency Tables
A frequency table organizes data by recording counts or relativefrequencies for categories
Count: the total number of occurrences for a category
Relative frequency: the proportion or percentage of a categoryoccurring relative to all categories
RF (%) =Count of Category
Total Count(×100%)
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 4 / 95
Frequency Tables
Example
The ads that air during the Super Bowl are very expensive: a 30-second adduring the 2013 Super Bowl cost about $4M. Polls often ask whetherrespondents are more interested in the game or the commercials. Here are40 responses from one such poll:
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 5 / 95
Frequency Tables
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 6 / 95
Frequency Tables
Response Count Percentage
Commercials 8 20%Game 18 45%Won’t Watch 12 30%No Answer/Don’t Know 2 5%
Total 40 100%
Responses to Survey on Super Bowl
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 7 / 95
Displaying Data
Three rules of data analysis:
1 Make a graph2 Make a graph3 Make a graph
The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95
Displaying Data
Three rules of data analysis:1 Make a graph
2 Make a graph3 Make a graph
The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95
Displaying Data
Three rules of data analysis:1 Make a graph2 Make a graph
3 Make a graph
The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95
Displaying Data
Three rules of data analysis:1 Make a graph2 Make a graph3 Make a graph
The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95
Displaying Data
Three rules of data analysis:1 Make a graph2 Make a graph3 Make a graph
The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95
Pie Graph
A pie graph represents categories as wedges in a circle proportional tothe relative frequency of that category
Wedges can be counts...
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 9 / 95
Pie Graph
A pie graph represents categories as wedges in a circle proportional tothe relative frequency of that category
Wedges can be counts...or relative frequencies
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 10 / 95
Bar Graph
A bar graph represents categories as bars with lengths proportional tothe relative frequency of that category
Bars can be counts...
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 11 / 95
Bar Graph
A bar graph represents categories as bars with lengths proportional tothe relative frequency of that category
Bars can be counts...or relative frequencies
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 12 / 95
Categorical Data and Graphs
Pie graphs and bar graphs are only valid for categorical data!
Can only represent counts or frequencies of different categories
Make sure that categories do not overlap – misleading
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 13 / 95
Comparing Two Variables: Contingency Tables
We can see how two categorical variables are related with acontingency table
Shows how individuals are distributed along each variable dependingon the value of the other variable
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 14 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Each cell in a table gives the count for the combination of values ofboth variables
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 15 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Each cell in a table gives the count for the combination of values ofboth variables
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 15 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Marginal distribution of a variable is the distribution of total count ofthat variable’s values alone
Focuses on the margins (in bold) of the table
Marginal distribution of ResponseMarginal distribution of Sex
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 16 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Marginal distribution of a variable is the distribution of total count ofthat variable’s values alone
Focuses on the margins (in bold) of the tableMarginal distribution of Response
Marginal distribution of Sex
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 16 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Marginal distribution of a variable is the distribution of total count ofthat variable’s values alone
Focuses on the margins (in bold) of the tableMarginal distribution of ResponseMarginal distribution of Sex
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 16 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Conditional distribution of a variable is the distribution of values avariable takes conditional on another variable taking on a specificvalue
Conditional distribution of responses for femalesConditional distribution of sex for non-watchers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 17 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Conditional distribution of a variable is the distribution of values avariable takes conditional on another variable taking on a specificvalue
Conditional distribution of responses for females
Conditional distribution of sex for non-watchers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 17 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
Conditional distribution of a variable is the distribution of values avariable takes conditional on another variable taking on a specificvalue
Conditional distribution of responses for femalesConditional distribution of sex for non-watchers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 17 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
277 men plan to watch the game, what percentage is this?
Column percent vs. row percent vs. total percent
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 18 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
277 men plan to watch the game, what percentage is this?
Column percent vs. row percent vs. total percent
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 18 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
What percent of respondents are men who will watch the game?
What percent of women plan to watch for the commercials?
What percent of those who won’t watch are men?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 19 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
What percent of respondents are men who will watch the game?
What percent of women plan to watch for the commercials?
What percent of those who won’t watch are men?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 19 / 95
Comparing Two Variables: Contingency Tables
Example
Sex
Response Female Male Total
Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8
Total 516 492 1008
What percent of respondents are men who will watch the game?
What percent of women plan to watch for the commercials?
What percent of those who won’t watch are men?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 19 / 95
Comparing Two Variables
Is there an association between the response to the survey and therespondent’s sex, or are the two independent?
Find the conditional distribution of responses by sex, and make agraph
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 20 / 95
Comparing Two Variables
A clustered bar chart allows us to compare the two distributions sideby side
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 21 / 95
Comparing Two Variables
A segmented bar chart shows the responses by sex
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 22 / 95
Simpson’s Paradox
Caution
Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox
Example
Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?
Player Overall
A 33 for 103B 45 for 151
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 23 / 95
Simpson’s Paradox
Caution
Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox
Example
Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?
Player Overall
A 33 for 103B 45 for 151
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 23 / 95
Simpson’s Paradox
Caution
Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox
Example
Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?
Player Overall
A 33 for 103B 45 for 151
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 23 / 95
Simpson’s Paradox
Caution
Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox
Example
Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?
Player Overall vs LHP vs RHP
A 33 for 103 28 for 81 5 for 22B 45 for 151 12 for 32 33 for 119
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 24 / 95
Simpson’s Paradox
Example
Two companies have labor and management classifications of employees.Company A’s laborers have a higher average salary than company B’s, asdo Company A’s managers. But overall, company B pays a higher averagesalary. How can that be? And which is the better way to compare earningpotential at the two companies?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 25 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 26 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 27 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 28 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 29 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 30 / 95
Cautions
Open Letter to Kansas School Board
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 31 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 32 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 33 / 95
Cautions
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 34 / 95
Lesson Plan
1 Describing Categorical Data
2 Describing Quantitative Data
Measures of Central TendencyMeasures of Locating DataMeasures of Spread
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 35 / 95
Describing Quantitative Data
Suppose instead we quantitative data
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 36 / 95
Stem-and-Leaf Plots
A stem-and-leaf plot is a quick way of organizing and displaying data(best for small datasets)
Divide each observation into a stem and a leaf, with the leafcontaining the final significant digit
e.g. For 53, stem 5, leaf 3
e.g. For 413 stem 41, leaf 3
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 37 / 95
Stem-and-Leaf Plots
A stem-and-leaf plot is a quick way of organizing and displaying data(best for small datasets)
Divide each observation into a stem and a leaf, with the leafcontaining the final significant digit
e.g. For 53, stem 5, leaf 3e.g. For 413 stem 41, leaf 3
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 37 / 95
Stem-and-Leaf Plots
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
0 0123456 2 67 1 1 4 6 98 3 6 89 3 5
10
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 38 / 95
Stem-and-Leaf Plots
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
0 0123456 2 67 1 1 4 6 98 3 6 89 3 5
10
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 38 / 95
Stem-and-Leaf Plots
Example
A sample of residents of Frederick report the distances from their home totheir local supermarket (in miles):{0.5, 1.2, 1.4, 1.4, 1.5, 2.2, 3.7, 4.2, 4.4, 4.4, 8.2}Create a stem-and-leaf plot.
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 39 / 95
Stem-and-Leaf Plots
We can quickly compare two distributions with a side-by-sidestem-and-leaf plot
Example
The stock prices of Apple over 10 days are: {320, 340, 333, 321, 332, 333,351, 329, 301, 339}
The stock prices of Microsoft over 10 days are: {290, 292, 302, 310, 303,299, 301, 319, 319, 307}
29 0 2 91 30 1 2 3 7
31 0 9 90 1 9 32
2 3 3 9 330 341 35
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 40 / 95
Stem-and-Leaf Plots
We can quickly compare two distributions with a side-by-sidestem-and-leaf plot
Example
The stock prices of Apple over 10 days are: {320, 340, 333, 321, 332, 333,351, 329, 301, 339}
The stock prices of Microsoft over 10 days are: {290, 292, 302, 310, 303,299, 301, 319, 319, 307}
29 0 2 91 30 1 2 3 7
31 0 9 90 1 9 32
2 3 3 9 330 341 35
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 40 / 95
Histograms
A more visually-appealing way to present this data is a histogram, thequantitative analogue to a bar graph
We divide up the data into bins of a certain size, and count up thenumber of values falling within those bins, representing these as bars
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 41 / 95
Histograms
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Quiz Grades
No.
of S
tude
nts
0 20 40 60 80 100
02
4
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 42 / 95
Histograms
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Note: Excel essentially plots a bar graph by first turning quantitative into categorical data
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 43 / 95
Histograms
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
We can also make a relative frequency (percentage) histogram
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 44 / 95
Histograms
0 0123456 2 67 1 1 4 6 98 3 6 89 3 5
10
A stem-and-leaf plot is shaped like a sideways histogram
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 45 / 95
Quantitative Distributions: Shape
For distributions of quantitative data, we are often interested in theirshape, particularly:
ModesSymmetrySkewnessCenterSpreadOutliers
Formal definitions for these using probability theory, for now focus onhow a histogram “looks”
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 46 / 95
Mode
The mode of a variable is its most frequent value
A variable can have more than one mode
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 47 / 95
Mode
The mode of a variable is its most frequent value
A variable can have more than one mode
Example
A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 47 / 95
Mode
Looking at the distribution (histogram), the modes are the “peaks” ofthe distribution
May be unimodal, bimodal, trimodal, etc.
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 48 / 95
Mode
A distribution that does not have any clear mode is uniform
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 49 / 95
Symmetry
A distribution is symmetric if its distribution looks roughly the sameon either side of the “center”
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 50 / 95
Skewness
The thinner ends of a distribution (far left & far right) are called thetails of the distribution
If one tail stretches farther than the other, the distribution is said tobe skewed in the direction of the longer tail
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 51 / 95
Skewness
The thinner ends of a distribution (far left & far right) are called thetails of the distribution
If one tail stretches farther than the other, the distribution is said tobe skewed in the direction of the longer tail
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 52 / 95
Outliers
An extreme value that does not appear part of the general pattern ofa distribution is an outlier
Note: Excel essentially plots a bar graph by first turning quantitative intocategorical data
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 53 / 95
Outliers
Outliers can strongly affect descriptive statistics about a dataset
Outliers can be the most informative part of the data
Outliers could be the result of errors
Outliers should always be discussed in presentations about data
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 54 / 95
Arithmetic Mean
The natural measure of the center of a population’s distribution is its“average” or arithmetic mean (µ)
µ =x1 + x2 + ...+ xn
n=
1
N
N∑i=1
xi
For N values of variable x ,“mu” is the sum of all individual x values(xi ) from 1 to N, divided by the N number of values
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 55 / 95
Arithmetic Mean
The natural measure of the center of a population’s distribution is its“average” or arithmetic mean (µ)
µ =x1 + x2 + ...+ xn
n=
1
N
N∑i=1
xi
For N values of variable x ,“mu” is the sum of all individual x values(xi ) from 1 to N, divided by the N number of values
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 55 / 95
Arithmetic Mean
When we are dealing with a sample, we compute the sample mean(X̄ )
X̄ =x1 + x2 + ...+ xn
n=
1
n
n∑i=1
xi
For n values of variable x ,“x-bar” is the sum of all individual x values(xi ) from 1 to n, divided by the n number of values
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 56 / 95
Arithmetic Mean
When we are dealing with a sample, we compute the sample mean(X̄ )
X̄ =x1 + x2 + ...+ xn
n=
1
n
n∑i=1
xi
For n values of variable x ,“x-bar” is the sum of all individual x values(xi ) from 1 to n, divided by the n number of values
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 56 / 95
Arithmetic Mean
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Mean: 0+62+66+71+71+74+76+79+83+86+88+93+9513 = 944
13 = 72.61
Note the mean need not be an actual value of the data!
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 57 / 95
Arithmetic Mean
{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
If we drop the outlier (0):Mean: 62+66+71+71+74+76+79+83+86+88+93+95
12 = 94412 = 78.67
The mean is not robust to outliers!Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 58 / 95
Median
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
The median is the midpoint of the distribution
50% to the left of the median, 50% to the right of the median
Arrange values of data in numerical order
For odd n: median is middle observation
For even n: median is average of two middle observations
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95
Median
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
The median is the midpoint of the distribution
50% to the left of the median, 50% to the right of the median
Arrange values of data in numerical order
For odd n: median is middle observation
For even n: median is average of two middle observations
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95
Median
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
The median is the midpoint of the distribution
50% to the left of the median, 50% to the right of the median
Arrange values of data in numerical order
For odd n: median is middle observation
For even n: median is average of two middle observations
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95
Median
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
The median is the midpoint of the distribution
50% to the left of the median, 50% to the right of the median
Arrange values of data in numerical order
For odd n: median is middle observation
For even n: median is average of two middle observations
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95
Median
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
The median is robust to outliers!
{62, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 60 / 95
Mean, Median, & Skewness
{1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}
For a symmetric distribution, mean=median
Mean: 6416 = 4
Median: 4
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 61 / 95
Mean, Median, & Skewness
{1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}
For a symmetric distribution, mean=medianMean: 64
16 = 4
Median: 4
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 61 / 95
Mean, Median, & Skewness
{1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}
For a symmetric distribution, mean=medianMean: 64
16 = 4Median: 4
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 61 / 95
Mean, Median, & Skewness
{1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7}
For a distribution skewed to the left, mean<median
Mean: 6013 = 4.6
Median: 5
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 62 / 95
Mean, Median, & Skewness
{1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7}
For a distribution skewed to the left, mean<medianMean: 60
13 = 4.6
Median: 5
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 62 / 95
Mean, Median, & Skewness
{1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7}
For a distribution skewed to the left, mean<medianMean: 60
13 = 4.6Median: 5
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 62 / 95
Mean, Median, & Skewness
{1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7}
For a distribution skewed to the right, mean>median
Mean: 4413 = 3.4
Median: 3
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 63 / 95
Mean, Median, & Skewness
{1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7}
For a distribution skewed to the right, mean>medianMean: 44
13 = 3.4
Median: 3
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 63 / 95
Mean, Median, & Skewness
{1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7}
For a distribution skewed to the right, mean>medianMean: 44
13 = 3.4Median: 3
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 63 / 95
Mean, Median, & Skewness
Example
A sample of the per capita consumption of gasoline (in gallons) for 10U.S. States in the year 2017 are given below:{556, 560, 537, 409, 530, 485, 521, 486, 504, 434}
1 Find the mean
2 Find the median
3 Is this distribution symmetric, skewed to the left, or skewed to theright?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 64 / 95
Mean, Median, & Skewness
Example
A sample of the per capita consumption of gasoline (in gallons) for 10U.S. States in the year 2017 are given below:{556, 560, 537, 409, 530, 485, 521, 486, 504, 434}
1 Find the mean
2 Find the median
3 Is this distribution symmetric, skewed to the left, or skewed to theright?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 64 / 95
Mean, Median, & Skewness
Example
A sample of the per capita consumption of gasoline (in gallons) for 10U.S. States in the year 2017 are given below:{556, 560, 537, 409, 530, 485, 521, 486, 504, 434}
1 Find the mean
2 Find the median
3 Is this distribution symmetric, skewed to the left, or skewed to theright?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 64 / 95
Mean, Median, & Skewness
Example
A sample of the GDP growth rate for 11 developed countries in the year2017 are given below:{0.05, 0.03, 0.02, 0.01, 0.00, 0.09, 0.11, 0.02, 0.03, 0.04, 0.01}
1 Find the mean
2 Find the median
3 Is this distribution symmetric, skewed to the left, or skewed to theright?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 65 / 95
Mean, Median, & Skewness
Example
A sample of the GDP growth rate for 11 developed countries in the year2017 are given below:{0.05, 0.03, 0.02, 0.01, 0.00, 0.09, 0.11, 0.02, 0.03, 0.04, 0.01}
1 Find the mean
2 Find the median
3 Is this distribution symmetric, skewed to the left, or skewed to theright?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 65 / 95
Mean, Median, & Skewness
Example
A sample of the GDP growth rate for 11 developed countries in the year2017 are given below:{0.05, 0.03, 0.02, 0.01, 0.00, 0.09, 0.11, 0.02, 0.03, 0.04, 0.01}
1 Find the mean
2 Find the median
3 Is this distribution symmetric, skewed to the left, or skewed to theright?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 65 / 95
Percentiles
We often care about specific values in the distribution and how theyrelate to the rest of the distribution
A helpful measure for a data value’s local is its percentile, measuringthe percentage of all data that is less than (or equal to) that value
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 66 / 95
Percentiles
To calculate the kth percentile, after ordering the data in numericalorder, calculate:
i =k
100(n + 1)
Where i is the index (rank or position) of the value & n is the totalnumber of observations
If i comes out to a whole number, the answer is that position
If i is not an integer, round up and round down, and take the averageof those positions in the data
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 67 / 95
Percentiles
Example
The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}
1 Find the 20th percentile
2 Find the 84th percentile
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 68 / 95
Percentiles
Example
The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}
1 Find the 20th percentile
2 Find the 84th percentile
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 68 / 95
Percentiles
To find the percentile of a particular data value, after ordering thedata in numerical order, calculate:
x + 0.5y
n∗ 100 then round to the nearest integer
x is number of data values counting from the first up to the valueright before the chosen value
y is the number of data values equal to the chosen value
n is total number of data
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 69 / 95
Percentiles
Example
The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}
1 What percentile is a score of 645?
2 What percentile is a score of 675?
3 What percentile is a score of 720?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 70 / 95
Percentiles
Example
The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}
1 What percentile is a score of 645?
2 What percentile is a score of 675?
3 What percentile is a score of 720?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 70 / 95
Percentiles
Example
The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}
1 What percentile is a score of 645?
2 What percentile is a score of 675?
3 What percentile is a score of 720?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 70 / 95
Quartiles
We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:
Quartile % of data
1 25%2 50%3 75%4 100%
The 2nd quartile (Q2) is the median
The 1st quartile (Q1) is the median of all the data beneath the medianThe 3rd quartile (Q3) is the median of all the data above the median
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95
Quartiles
We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:
Quartile % of data
1 25%2 50%3 75%4 100%
The 2nd quartile (Q2) is the median
The 1st quartile (Q1) is the median of all the data beneath the medianThe 3rd quartile (Q3) is the median of all the data above the median
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95
Quartiles
We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:
Quartile % of data
1 25%2 50%3 75%4 100%
The 2nd quartile (Q2) is the median
The 1st quartile (Q1) is the median of all the data beneath the median
The 3rd quartile (Q3) is the median of all the data above the median
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95
Quartiles
We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:
Quartile % of data
1 25%2 50%3 75%4 100%
The 2nd quartile (Q2) is the median
The 1st quartile (Q1) is the median of all the data beneath the medianThe 3rd quartile (Q3) is the median of all the data above the median
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95
Measures of Spread
The more variation in the data, the less helpful a measure of centraltendency will tell us
So in addition to measuring the center, we also want to measure thespread
The simplest way is looking at the range, or the difference betweenthe extremes:
Range = max −min
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 72 / 95
Measures of Spread
The more variation in the data, the less helpful a measure of centraltendency will tell us
So in addition to measuring the center, we also want to measure thespread
The simplest way is looking at the range, or the difference betweenthe extremes:
Range = max −min
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 72 / 95
Measures of Spread
The more variation in the data, the less helpful a measure of centraltendency will tell us
So in addition to measuring the center, we also want to measure thespread
The simplest way is looking at the range, or the difference betweenthe extremes:
Range = max −min
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 72 / 95
Range
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 0 = 95
Example
{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 62 = 33
Note that the range is not robust to outliers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95
Range
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 0 = 95
Example
{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 62 = 33
Note that the range is not robust to outliers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95
Range
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 0 = 95
Example
{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 62 = 33
Note that the range is not robust to outliers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95
Range
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 0 = 95
Example
{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 62 = 33
Note that the range is not robust to outliers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95
Range
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 0 = 95
Example
{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Range = 95− 62 = 33
Note that the range is not robust to outliers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95
Interquartile Range
One helpful measure of spread is the interquartile range, the middle50%:
IQR = Q3 − Q1
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Median = 76
Q1 = 71
Q3 = 86
IQR = 86− 71 = 15
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95
Interquartile Range
One helpful measure of spread is the interquartile range, the middle50%:
IQR = Q3 − Q1
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Median = 76
Q1 = 71
Q3 = 86
IQR = 86− 71 = 15
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95
Interquartile Range
One helpful measure of spread is the interquartile range, the middle50%:
IQR = Q3 − Q1
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Median = 76
Q1 = 71
Q3 = 86
IQR = 86− 71 = 15
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95
Interquartile Range
One helpful measure of spread is the interquartile range, the middle50%:
IQR = Q3 − Q1
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Median = 76
Q1 = 71
Q3 = 86
IQR = 86− 71 = 15
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95
Five-Number Summary
Once we know the values of the quartiles, we can construct afive-number summary of a distribution, including:
1 Minimum2 Q13 Median4 Q35 Maximum
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 75 / 95
Five-Number Summary
Example
{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}
Min Q1 Median Q3 Max
0 71 76 86 95
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 76 / 95
Boxplots
The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)
The length of the box isthe IQR (Q1-Q3)
The line within the box isthe median
The “whiskers” identifydata within 1.5× IQR
Points beyond thewhiskers are outliers
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95
Boxplots
The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)
The length of the box isthe IQR (Q1-Q3)
The line within the box isthe median
The “whiskers” identifydata within 1.5× IQR
Points beyond thewhiskers are outliers
Q3
Q1
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95
Boxplots
The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)
The length of the box isthe IQR (Q1-Q3)
The line within the box isthe median
The “whiskers” identifydata within 1.5× IQR
Points beyond thewhiskers are outliers
Q3
Q1
Median
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95
Boxplots
The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)
The length of the box isthe IQR (Q1-Q3)
The line within the box isthe median
The “whiskers” identifydata within 1.5× IQR
Points beyond thewhiskers are outliers
Q3
Q1
Median
Q3 + 1.5 ∗ IQR
Q1 − 1.5 ∗ IQR
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95
Boxplots
The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)
The length of the box isthe IQR (Q1-Q3)
The line within the box isthe median
The “whiskers” identifydata within 1.5× IQR
Points beyond thewhiskers are outliers
Q3
Q1
Median
Q3 + 1.5 ∗ IQR
Q1 − 1.5 ∗ IQR
Outlier
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95
Boxplots
Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}
Quiz 1
Min Q1 Median Q3 Max
0 71 76 86 95
Quiz 2
Min Q1 Median Q3 Max
50 73 82 90 99
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 78 / 95
Boxplots
Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}
Quiz 1
Min Q1 Median Q3 Max
0 71 76 86 95
Quiz 2
Min Q1 Median Q3 Max
50 73 82 90 99
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 78 / 95
Boxplots
Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}
Quiz 1
Min Q1 Median Q3 Max
0 71 76 86 95
Quiz 2
Min Q1 Median Q3 Max
50 73 82 90 99
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 78 / 95
Boxplots
Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}
●0
25
50
75
100
Quiz 1 Quiz 2Quiz
Sco
res variable
Quiz 1
Quiz 2
Boxplots are great for quickly comparing multiple datasets
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 79 / 95
Boxplots
Boxplots for daily AIG closing stock price
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 80 / 95
Boxplots
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 81 / 95
Boxplots
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 82 / 95
Boxplots
Alternate way of constructing a boxplot: extend “whiskers” from Q1
to Minimum and Q3 to MaximumBut less rigorous way of discovering outliersYour textbook uses this method, as does MS Excel
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 83 / 95
Deviations
Each observation deviates from the mean of the data:
deviation = xi − µ
There are as many deviations as there are data points (n)
We can measure the average or standard deviation from the mean
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 84 / 95
Variance
The population variance (σ2) of a population distribution measuresthe average of the squared deviations from the population mean
σ2 =
N∑i=1
(xi − µ)2
N
Why do we square deviations?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 85 / 95
Variance
The population variance (σ2) of a population distribution measuresthe average of the squared deviations from the population mean
σ2 =
N∑i=1
(xi − µ)2
N
Why do we square deviations?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 85 / 95
Standard Deviation
Square root the variance to get the population standard deviation(σ), the average deviation from the mean (in x units)
σ =√σ2 =
√√√√√√N∑i=1
(xi − µ)2
N
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 86 / 95
Variance
The sample variance (s2) of a sample distribution measures theaverage of the squared deviations from the sample mean
s2 =
n∑i=1
(xi − x̄)2
n − 1
Why divide by n − 1?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 87 / 95
Variance
The sample variance (s2) of a sample distribution measures theaverage of the squared deviations from the sample mean
s2 =
n∑i=1
(xi − x̄)2
n − 1
Why divide by n − 1?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 87 / 95
Standard Deviation
Square root the variance to get the sample standard deviation (s), theaverage deviation from the mean (in x units)
s =√s2 =
√√√√√√n∑
i=1
(xi − x̄)2
n − 1
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 88 / 95
Descriptive Statistics: Population vs. Sample
Population Parameters
Population Size: N
Mean: µ
Variance:
σ2 = 1N
N∑i=1
(xi − µ)2
Standard Deviation:σ =√σ2
Sample Statistics
Sample Size: n
Mean: x̄
Variance:
s2 = 1n−1
n∑i=1
(xi − x̄)2
Standard Deviation:s =√s2
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 89 / 95
Variance & Standard Deviation
Example
{-10, 0, 10, 20, 30}
1 Find the mean: −10+0+10+20+305 = 10
2 Find deviations from mean and square them:
xi xi − x̄ (xi − x̄)2
-10 -20 4000 -10 100
10 0 020 10 10030 20 400
∑0 1000
3 Add them all up
400 + 100 + 0 + 100 + 400 = 1000
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95
Variance & Standard Deviation
Example
{-10, 0, 10, 20, 30}
1 Find the mean: −10+0+10+20+305 = 10
2 Find deviations from mean and square them:
xi xi − x̄ (xi − x̄)2
-10 -20 4000 -10 100
10 0 020 10 10030 20 400
∑0 1000
3 Add them all up
400 + 100 + 0 + 100 + 400 = 1000
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95
Variance & Standard Deviation
Example
{-10, 0, 10, 20, 30}
1 Find the mean: −10+0+10+20+305 = 10
2 Find deviations from mean and square them:
xi xi − x̄ (xi − x̄)2
-10 -20 4000 -10 100
10 0 020 10 10030 20 400
∑0 1000
3 Add them all up
400 + 100 + 0 + 100 + 400 = 1000
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95
Variance & Standard Deviation
Example
{-10, 0, 10, 20, 30}
1 Find the mean: −10+0+10+20+305 = 10
2 Find deviations from mean and square them:
xi xi − x̄ (xi − x̄)2
-10 -20 4000 -10 100
10 0 020 10 10030 20 400∑
0 1000
3 Add them all up
400 + 100 + 0 + 100 + 400 = 1000
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95
Variance & Standard Deviation
Example
{-10, 0, 10, 20, 30}
5 Divide by n − 11000
4= 250
6 Square root (for standard deviation):
√250 ≈ 16
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 91 / 95
Variance & Standard Deviation
Example
{-10, 0, 10, 20, 30}
5 Divide by n − 11000
4= 250
6 Square root (for standard deviation):
√250 ≈ 16
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 91 / 95
Variance & Standard Deviation
Example
{8, 9, 10, 11, 12}
1 Find the mean
2 Find the variance
3 Find the standard deviation
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 92 / 95
Standardizing Variables
Sometimes we want to know how far a value is from its mean
We standardize a variable, or calculate its z-score:
Z =x − x̄
s
Z is the number of standard deviations a value is away from its mean(above +, below −)
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 93 / 95
Standardizing Variables
Example
A real estate analyst finds from data on 350 recent sales, that the averageprice was $175,000 with a standard deviation of $55,000. The size of thehouses (in square feet) averaged 2100 sq. ft. with a standard deviation of650 sq. ft.Which is more unusual, a house in this town that costs $340,000, or a5000 sq. ft. house?
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 94 / 95
Descriptive Statistics
Most software programs can easily compute descriptive statistics (e.g.mean, median, quartiles, standard deviation) for us
MS Excel: Descriptive statistics in Data Analysis pack
TI-83+ calculators1 Enter data in L1 : STAT → 1.Edit → input data values in column2 CLEAR → STAT → CALC → 1.1-Var Stats, ENTER → 2nd L1 ENTER
Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 95 / 95