31
Introduction A Review of Descriptive Statistics

Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Embed Size (px)

Citation preview

Page 1: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Introduction

A Review of Descriptive Statistics

Page 2: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Charts

• When dealing with a larger set of data values, it may be clearer to summarize the data by presenting a graphical image

Page 3: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Intervals

• Numerical data values may be grouped or classified by defining “class intervals”:

Suppose the following data values represent the ACT test scores for 30 individuals.

8, 10, 11, 13, 13, 14, 14, 15, 15, 16,

16, 17, 17, 18, 18, 18, 18, 19, 20, 20,

21, 21, 21, 22, 22, 23, 25, 26, 28, 30

Define intervals so that each of the values fall into exactly one of the intervals.

Page 4: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Frequency

• Determine how many data scores fall in each of the intervals (the "frequency“)

Page 5: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Draw a bar chart (or "histogram") with the height of the bar on each interval determined by the frequency

“Histogram”

Page 6: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Relative Frequency• Alternatively, give the percentage of scores or

"relative frequency".• That is, if 5 of the 30 values fall in the interval,

then the relative frequency is 5/30 = 0.1667.

Page 7: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Relative to each other, the bars are the same height and the histograms have the same shape.

Page 8: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Cumulative Frequency• …or we could “keep a running total”, called a “cumulative

frequency”, as we go from one interval to the next.

• if there are 2 values in the first interval and 5 in the next, then the cumulative frequency is 2 + 5 = 7 for the second interval.

Page 9: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Cumulative Graph• The increase in the height of the bar shows how many

data values were contributed by a given interval.

The increase in the height of the bar shows how many data values were contributed by a given interval.

Page 10: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

The Middle• In addition to the graphical summary • also give numerical measurements which

describe the distribution of the data

The middle ?

Page 11: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Set of Heights

• the height (in inches) of 30 third graders. 47.5 48.5 50 52 52 53 53 54 54 54 54.5 54.5 55 55 55 55.5 55.5 55.5 56 5656 56.5 56.5 57 57 57 57 57.5 58 58

• How should we describe the "middle height"?• For numerical data, we commonly compute the

"arithmetic average" of the values, also called the mean value.

Page 12: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

The Mean Value

• To compute the average: find the sum of the values and divide by the number of values in the set.

• For our 30 third-graders, we find the sum of the 30 heights and then divide by 30:

Compare this to “the middle” of the histogram.

Page 13: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

The “Middle Weight”

• Looks to be in the middle!

Mean = 54.7

Page 14: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Sampling a Population

• We distinguish between a sample and the entire population.

• A population consists of all the members of the set under consideration (eg., all third-graders in the United States)

• A sample consists of a subset of members selected from a population (eg., 30 third-graders in our example)

Page 15: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Notation

The notation used depends on if we’re using the entire population or a sample.

If a selected sample is representative of the population, we expect the mean of the sample is nearly equal to the mean for the population.

Page 16: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Median Value• The median value is literally defined to be the

middle data value. You may need to "split the difference" by averaging two middle values.

• Half the data lies at or below the median and the other half lies at or above the median.

• Median is another “measure of the middle” but is less affected by non-typical data values.

Page 17: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Median third-grader?

• Consider our previous data for 30 third-graders.47.5 48.5 50 52 52 53 53 54 54 54 54.5 54.5 55 55 55 55.5 55.5 55.5 56 56 56 56.5 56.5 57 57 57 57 57.5 58 58

• An even number of data values, so we average the two middle values.

• The median is (55 + 55.5)/2 = 55.25 inches.

Page 18: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Mean vs. Median

• In smaller samples, the median value is often a better measure; it is unaffected a non-typical score and is more representative of the middle.

• Suppose test scores were23, 58, 64, 68, 75, 79, 83, 85, 87, 91, 94

median is 79• Mean equals about 73.36

Page 19: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

The Spread

• Another characteristic of a data set is how widely the data values are spread.

• Find a way to measure how widely the values vary.

• The measurement we use is called the "standard deviation".

Page 20: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

The Deviations

• Having determined the mean value, we can measure how far each data value varies from the middle.

• The difference or "deviation" from the middle, is computed as .

• Our goal is to compute a sort of average of these deviations from the middle.

ix

ix x

Page 21: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

“16 ounce drink”• Suppose a sample of 8 medium colas were

measured. The volumes, measured in ounces, are given by the data below. 16.2 16.5 15.9 15.7 15.9 16.1 16.3 15.8

Volumes have an average or mean value of 16.05 ounces.

Page 22: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Deviations in Colas

• Recall the contents of our 8 colaswhere the mean value is 16.05 ounces. data value deviation from middle 15.7 15.8 15.9 15.9 16.1 16.2 16.3 16.5

15.7 - 16.05 = - 0.35 15.8 - 16.05 = - 0.25 15.9 - 16.05 = - 0.15 15.9 - 16.05 = - 0.15 16.1 - 16.05 = 0.05 16.2 - 16.05 = 0.15 16.3 - 16.05 = 0.25 16.5 - 16.05 = 0.45

Page 23: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Squared Deviations• To prevent the negative and postive values from cancelling each

other out, we square them.

data deviation from middle deviation squared 15.7 15.7 - 16.05 = - 0.35 (- 0.35)2 = 0.1225 15.8 15.8 - 16.05 = - 0.25 (- 0.25)2 = 0.0625 15.9 15.9 - 16.05 = - 0.15 (- 0.15)2 = 0.0225 15.9 15.9 - 16.05 = - 0.15 (- 0.15)2 = 0.0225 16.1 16.1 - 16.05 = 0.05 ( 0.05)2 = 0.0025 16.2 16.2 - 16.05 = 0.15 = 0.0225 16.3 16.3 - 16.05 = 0.25 = 0.0625 16.5 16.5 - 16.05 = 0.45 = 0.2025

Page 24: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Avg. of Squared Deviations

• To average the deviations: add the squared deviations and divide by one less than the number of data values in the sample.

• Finally, we "undo the squaring" by computing the square root.

Page 25: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

data value deviation squared

15.7 0.1225

15.8 0.0625

15.9 0.0225

15.9 0.0225

16.1 0.0025

16.2 0.0225

16.3 0.0625

16.5 0.2025

total = 0.5200 = sum of squared deviations

Page 26: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

s = 0.2726 is a sort of average of how far the data values vary from the middle

Average Spread

Page 27: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Notation

• As with the mean value, notation depends on the whether the data represents the population or a sample.

Page 28: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Compare

• The standard deviation describes the “distribution of the data”.

• Which of the following distributions would you expect to have the larger standard deviation?

Page 29: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Match the statistics with the histograms

Page 30: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

Bell-shaped Distribution• For reasonably large random samples, we

often observe a "bell-shaped" distribution. • In such cases, we expect to find about 68% of

the data within one std. dev. of the mean.

Also, about 95% of the data is expected to lie within 2 standard deviations of the mean.

Page 31: Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting

“Empirical Rule”