39
Chapter 3 Chapter 3 Describing Data Using Describing Data Using Numerical Measures Numerical Measures ©

Chapter 3 Describing Data Using Numerical Measures ©

Embed Size (px)

Citation preview

  • Slide 1
  • Chapter 3 Describing Data Using Numerical Measures
  • Slide 2
  • Chapter 3 - Chapter Outcomes After studying the material in this chapter, you should be able to: Compute the mean, median, and mode for a set of data and understand what these values represent. Compute the range, variance, and standard deviation and know what these values mean. Know how to construct a box and whiskers graph and be able to interpret it.
  • Slide 3
  • Chapter 3 - Chapter Outcomes (continued) After studying the material in this chapter, you should be able to: Compute the coefficient of variation and z scores and understand how they are applied in decision-making situations. Use numerical measures along with graphs, charts, and tables to effectively describe data.
  • Slide 4
  • Parameters and Statistics parameter A parameter is a measure computed from the entire population. As long as the population does not change, the value of the parameter will not change.
  • Slide 5
  • Parameters and Statistics statistic A statistic is a measure computed from a sample that has been selected from the population. The value of the statistic will depend on which sample is selected.
  • Slide 6
  • Mean mean The mean is a numerical measure of the center of a set of quantitative measures computed by dividing the sum of the values by the number of values in the data set.
  • Slide 7
  • Population Mean where: = population mean (mu) N = number of data values x i = i th individual value of variable x
  • Slide 8
  • Population Mean Example 3-1 Table 3-1: Foster City Hotel Data
  • Slide 9
  • Population Mean Example 3-1 The population mean for the number of rooms rented is computed as follows:
  • Slide 10
  • Sample Mean where: = sample mean (pronounced x-bar) n = sample size x i = i th individual value of variable x
  • Slide 11
  • Sample Mean Housing Prices Example {x i } = {house prices} = {$144,000; 98,000; 204,000; 177,000; 155,000; 316,000; 100,000}
  • Slide 12
  • Median median ordered array The median is the center value that divides data that have been arranged in numerical order (i.e. an ordered array ) into two halves.
  • Slide 13
  • Median Housing Prices Example {x i } = {house prices} = {$144,000; 98,000; 204,000; 177,000; 155,000; 316,000; 100,000} Ordered array: $98,000; 100,000; 144,000; 155,000; 177,000; 204,000; 316,000 Middle Value Median = 155,000
  • Slide 14
  • Median Another Housing Prices Example {x i } = {house prices} = {$144,000; 98,000; 204,000; 177,000; 155,000; 316,000; 100,000; 177,000; 177,000; 170,000} Ordered array: $98,000; 100,000; 144,000; 155,000; 170,000; 177,000; 177,000; 177,000; 204,000; 316,000 Middle Values Median = (170,000 + 177,000)/2 = 173,500
  • Slide 15
  • Skewed Data Right-skewed data: Right-skewed data: Data are right-skewed if the mean for the data is larger than the median. Left-skewed data: Left-skewed data: Data are left -skewed if the mean for the data is smaller than the median.
  • Slide 16
  • Skewed Data (Figure 3-3) Median MeanMean Median Mean = Median c) Symmetrica) Right-Skewedb) Left-Skewed
  • Slide 17
  • Mode mode The mode is the value in a data set that occurs most frequently. A data set may have more than one mode if two or more values tie for the highest frequency. A data set might not have a mode at all if no value occurs more than one time.
  • Slide 18
  • Mode Housing Prices Example {x i } = {house prices} = {$144,000; 98,000; 204,000; 177,000; 155,000; 316,000; 100,000; 177,000; 177,000; 170,000} Data array: $98,000; 100,000; 144,000; 155,000; 170,000; 177,000; 177,000; 177,000; 204,000; 316,000 Mean = 1,718,000/10 = 171,800 Median = 173,500 Mode = 177,000
  • Slide 19
  • Percentiles pth percentile The pth percentile in a data array is a value that divides the data into two parts. The lower segment contains at least p % and the upper segment contains at least (100 - p )% of the data. The median is the 50th percentile.
  • Slide 20
  • Quartiles Quartiles Quartiles in a data array are those values that divide the data set into four equal-sized groups. The median corresponds to the second quartile.
  • Slide 21
  • Measures of Variation variation A set of data exhibits variation if all of the data are not the same value.
  • Slide 22
  • Range range The range is a measure of variation that is computed by finding the difference between the maximum and minimum values in the data set. R = Maximum Value - Minimum Value
  • Slide 23
  • Interquartile Range interquartile range The interquartile range is a measure of variation that is determined by computing the difference between the first and third quartiles. Interquartile Range = Third Quartile - First Quartile
  • Slide 24
  • Variance & Standard Deviation variance The population variance is the average of the squared distances of the data values from the population mean. standard deviation The standard deviation is the positive square root of the variance.
  • Slide 25
  • Population Variance where: = population mean N = population size 2 = population variance (sigma squared)
  • Slide 26
  • Population Variance (Bryce Lumber Example)
  • Slide 27
  • Population Standard Deviation (Bryce Lumber Example)
  • Slide 28
  • Sample Variance where: = sample mean n = sample size s 2 = sample variance
  • Slide 29
  • Sample Standard Deviation where: = sample mean n = sample size s = sample standard deviation
  • Slide 30
  • Coefficient of Variation coefficient of variation The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage. The coefficient of variation is used to measure the relative variation in the data.
  • Slide 31
  • Coefficient of Variation Population Coefficient of Variation Sample Coefficient of Variation
  • Slide 32
  • The Empirical Rule If the data distribution is bell-shaped, then the interval: contains approximately 68% of the values in the population or the sample contains approximately 95% of the values in the population or the sample contains virtually all of the data values in the population or the sample
  • Slide 33
  • The Empirical Rule (Figure 3-11) X 68% 95%
  • Slide 34
  • Tchebysheffs Theorem Regardless of how the data are distributed, at least (1 - 1/k 2 ) of the values will fall within k standard deviations of the mean. For example: At least (1 - 1/1 2 ) = 0% of the values will fall within k=1 standard deviation of the mean At least (1 - 1/2 2 ) = 3/4 = 75% of the values will fall within k=2 standard deviation of the mean At least (1 - 1/3 2 ) = 8/9 = 89% of the values will fall within k=3 standard deviation of the mean
  • Slide 35
  • Standardized Data Values standardized data value A standardized data value refers to the number of standard deviations a value is from the mean. The standardized data values are sometimes referred to as z-scores.
  • Slide 36
  • Standardized Data Values STANDARDIZED POPULATION DATA where: x = original data value = population mean = population standard deviation z = standard score (number of standard deviations x is from )
  • Slide 37
  • Standardized Data Values STANDARDIZED SAMPLE DATA where:x = original data value = sample mean s = sample standard deviation z = standard score
  • Slide 38
  • Key Terms Coefficient of Variation Data Array Empirical Rule Interquartile Range Left-Skewed Data Mean Median Parameter Percentiles Quartiles Range Right-Skewed Data Skewed Data Standard Deviation Standardized Data Values Statistic
  • Slide 39
  • Key Terms (continued) Symmetric Data Tchebysheffs Theorem Variance Variation