Alg 2 Name: Period: Unit 12: Statistics · Unit 12 – Alg 2 3 2 EXPLORE Finding Statistics Using Various Sampling Methods The salaries (in thousands of dollars) of all 30 employees

Unit 12 – Alg 2

1

Name:_________________________

Period:______

Unit 12: Statistics List of topics for this unit/Assignment tracker

Date Topic Assignment & Due Date

12.1 Data Gathering

12.2 Data Displays Part 1

12.3 Measures of Center and Spread

12.4 Data Displays Part 2

12.5 Data Distribution and Outliers

12.6 Data Distribution

No Assignment Included

12.7 Normal Distribution Part 1

12.8 Normal Distribution Part 2

Unit 12 – Alg 2

2

12.1 Data Gathering Date:

Data-Gathering Techniques COMMON CORE

1 ENGAGE Understanding Data-Gathering Techniques In the branch of mathematics known as statistics, you work with data. Data can be ____________________, such as heights or salaries,

or ____________________, such as eye color or political affiliation. You collect data about a ____________________ by surveying or

studying some or all of the ____________________ in the population.

When all the individuals in a population are surveyed or studied, the data-gathering technique is called a ____________________. A

____________________ is a number that summarizes a characteristic of the population.

When only some of the individuals in a population are surveyed or studied, the data-gathering technique is called ___________________.

A ____________________ is a number that summarizes a characteristic of a sample. Statistics can be used to estimate parameters.

Samples that result in accurate estimates are said to be ____________________ of the population. There are a variety of sampling

methods, characterized by how the individuals in the sample are chosen. The table below lists a few. There are a variety of sampling methods, characterized by how the individuals in the sample are chosen. The table below lists a few.

REFLECT

1a. Give an example of numerical data and an example of categorical data other than the examples listed in the first paragraph

1b. Asking your friends is an example of what type of sampling method? Explain.

1c. Which sampling method do you think is most likely to result in a representative sample? Why?

1d. Which sampling method do you think would be least likely to result in a representative sample? Why?

1e. Explain why a researcher might use a sampling method rather than a census to gather information about a population.

Unit 12 – Alg 2

3

2 EXPLORE Finding Statistics Using Various Sampling Methods The salaries (in thousands of dollars) of all 30 employees at a small company are listed in the table.

Use the table to generate a sample of 6 individuals using each sampling method, and then use the sample to predict the mean of the population.

A Suppose individuals whose salaries are 51, 57, 58, 65, 70, and 73 volunteer to be in the sample. Compute the self-selected sample’s mean, rounding

to the nearest whole number.

B Take a convenience sample by choosing the 6 numbers in the first two columns of the table. Record the salaries, and then compute the sample’s

mean, rounding to the nearest whole number.

C Take a systematic sample by choosing every fifth number in the list, reading from left to right in each row. Record the salaries, and then compute the

sample’s mean, rounding to the nearest whole number.

D Take a random sample. Begin by labeling the data in the table with the identifiers 1–10 for the first row, 11–20 for the second row, and 21–30 for the

third row. Then use a graphing calculator’s random integer generator to generate 6 identifiers between 1 and 30, as shown. (If any identifiers are

repeated, simply generate replacements for them until you have 6 unique identifiers.) Record the corresponding salaries, and then compute the sample’s

mean, rounding to the nearest whole number. REFLECT

2a. Compute the mean of the population. Then list the four samples from best to worst in terms of how well each sample mean estimates the population

mean.

2b. How do the best and worst sampling methods from your list compare with your answers to Reflect Questions 1c and 1d?

Some statistics, such as the mean, apply only to numerical data. For categorical data, an appropriate statistic is a __________________, which is the

relative frequency of a category.

Unit 12 – Alg 2

4

EXAMPLE Making Predictions from a Sample A community health center surveyed a small random sample of adults in the community about their exercise habits. The survey asked

whether the person engages in regular cardio exercise (running, walking, swimming, or other) and, if so, what the duration and frequency of exercise

are. Of the 25 people surveyed, 10 said that they do engage in regular cardio exercise. The table lists the data for those 10 people.

Calculate statistics from the sample, and use the statistics to make predictions about the exercise habits of the approximately 5000 adults living in the

community.

A Calculate the following two proportions from the sample data.

Proportion of adults who get regular cardio exercise

Proportion of runners among those who get regular cardio exercise

B Use the proportions from part A and the verbal model below to predict the number of runners among all adults living in the community.

C Calculate the following two means from the sample data.

Mean duration of exercise for those who get regular cardio exercise = ____________________________

Mean frequency of exercise for those who get regular cardio exercise = ____________________________

D Use the means from part C to predict, for those who get regular cardio exercise, the number of hours spent exercising each week.

Show your calculations and include units.

REFLECT

3a. One of the categorical variables in the survey was regular cardio exercise. That variable had only two possible values: yes or no. What was the

other categorical variable, and what were its possible values?

3b. What were the two numerical variables in the survey, and what were their possible values?

3c. How much confidence do you have in the predictions made from the results of the survey? Explain your reasoning.

Unit 12 – Alg 2

5

12.1 Data Gathering Assignment A student council wants to know whether students would like the council to sponsor a mid-winter dance or a mid-winter carnival this year.

Classify each sampling method.

1. Survey every tenth student on the school’s roster.

2. Survey all freshmen and all juniors.

3. Survey 20 freshmen, 20 sophomores, 20 juniors, and 20 seniors.

4. Survey those who ask the council president for a questionnaire.

5. Survey those who happen to be in the cafeteria at noon.

Use the following information for Exercises 6–9.

The officers of a neighborhood association want to know whether residents are interested in beautifying the neighborhood and, if so, how much money

they are willing to contribute toward the costs involved. The officers are considering the three sampling methods below.

A. Call and survey every tenth resident on the association’s roster.

B. Randomly select and survey 10 residents from among those who come to the neighborhood block party.

C. Mail a survey to every resident with instructions to complete and mail the survey back.

6. Identify the population.

7.Which sampling method is most likely to result in a representative sample of the population? Explain.

8. Describe another sampling method that is likely to result in a representative sample of the population.

9. Describe the categorical and numerical data that the officers of the neighborhood association want to gather through a survey.

Use the following information for Exercises 10–14.

A community theater association plans

to produce three plays for the upcoming

season. The association surveys a

random sample of the approximately

7000 households in the community to see

if an adult member of the household is

interested in attending plays and, if so, what

type of plays the person prefers (comedy,

drama, or musical), how many members

of the household (including the person

surveyed) might attend plays, and how

many of the three plays those household

members might attend.

Of the 50 adults surveyed, 12 indicated an

interest in attending plays. The table lists

the data for those 12 people.

10. Describe the categorical and numerical data gathered in the survey.

11. Calculate the proportion of adults who indicated an interest in attending plays.

Then calculate the proportion of those interested in attending plays who prefer dramas.

12. Approximately 15,000 adults live in the community. Predict the number of adults who prefer plays that are dramas. Show your calculations.

13. For an adult with an interest in attending plays, calculate the mean number of household members who might attend plays.

Then calculate the mean number of plays that those household members might attend. Round each mean to the nearest tenth.

14. The theater association plans to sells tickets to the plays for $40 each. Predict the amount of revenue from ticket sales. Show your calculations and

include units

Unit 12 – Alg 2

6

12.2 Data Displays Part 1 Date:

Warm Up

Unit 12 – Alg 2

7

Unit 12 – Alg 2

8

12.2 Data Displays Part 1 Assignment

Keep Going!

Unit 12 – Alg 2

9

Unit 12 – Alg 2

10

12.3 Measures of Center and Spread Date:

Measures of Center and Spread COMMON CORE

Essential question: What statistics can you use to characterize and compare the center and spread of data sets? CC.9-12.S.ID.2*

Two commonly used measures of the center of a set of numerical data are the _______________ and _______________. Let n be the number of data

values.

The mean is:

The median is:

The median divides the data set into two halves. The first quartile (Q1) of a data set is the median of the lower half of the data. The third quartile (Q3)

is the median of the upper half.

Two commonly used measures of the spread of a set of numerical data are the _______________ and ______________________.

The range is:

The interquartile range (IQR) is: EXAMPLE Finding Mean, Median, Range, and Interquartile Range The April high temperatures (in degrees Fahrenheit) for five consecutive years in Boston are listed below. Find the mean, median, range, and

interquartile range for this data set

77 86 84 93 90

A Find the mean.

B Find the median. Write the data values from least to greatest:

C Find the range.

D Find the interquartile range.

Find the first and third quartiles. Do not include the median as part of either the lower half or the upper half of the data.

Find the difference between Q3 and Q1 : IQR = − = REFLECT

1a. If 90°F is replaced with 92°F, will the median or mean change? Explain.

1b. Why is the IQR less than the range?

Another measure of spread is ____________________. It is found by squaring the deviations of the data values from the mean of the data values, then

finding the mean of those squared deviations, and finally taking the square root of the mean of the squared deviations.

Unit 12 – Alg 2

11

Steps:

1. Calculate the mean, x.

2. Calculate each data value's deviation from the mean by finding x − x for each data value x.

3. Find (x − x)2, the square of each deviation.

4. Find the mean of the squared deviations.

5. Take the square root of the mean of the squared deviations. EXAMPLE Calculating the Standard Deviation Calculate the standard deviation for the data from the previous example.

A Complete the table using the fact that the mean of the data is �̅� = 86

B Find the mean of the squared deviations.

C Take the square root of the mean of the squared deviations. Use a calculator, and round to the nearest tenth.

REFLECT

2a. What is the mean of the deviations before squaring?

Use your answer to explain why squaring the deviations is reasonable.

2b. In terms of the data values used, what makes calculating the standard deviation different from calculating the range?

2c. What must be true about a data set if the standard deviation is 0? Explain.

Numbers that characterize a data set, such as measures of center and spread, are called ____________________. They are useful when comparing large

sets of data.

EXAMPLE Comparing Statistics for Related Data Sets The tables below list the average ages of players on 15 teams randomly selected from the 2010 teams in the National Football League (NFL)

and Major League Baseball (MLB).Compare the average ages of NFL players to the average ages of MLB players.

Unit 12 – Alg 2

12

A On a graphing calculator, enter the two sets of data into two lists, L1 and L2. Examine the data as you enter the values, and record your general

impressions about how the data sets compare before calculating any statistics.

B Calculate the statistics for the NFL data in list L1. Then do the same for the MLB data in L2. Record the results in the table below. Your calculator

may use the following notations and abbreviations for the statistics you're interested in.

C Compare the corresponding statistics for the NFL data and the MLB data. Are your comparisons consistent for the two measures of center and the

two measures of spread?

Do your comparisons agree with your general impressions from Part A?

REFLECT 3a. Based on a comparison of the measures of center, what conclusion can you draw about the typical age of an NFL player and of an MLB player?

Unit 12 – Alg 2

13

3b. Based on a comparison of the measures of spread, what conclusion can you draw about variation in the ages of NFL players and of MLB players?

3c. What do you notice about the mean and median for the NFL?

For the MLB?

3d. What do you notice about the IQR and standard deviation for the NFL?

For the MLB?

Unit 12 – Alg 2

14

12.3 Measures of Center and Spread Assignment

The numbers of students in each of a school's six Algebra 1 classes are listed below. Find each statistic for this data set.

28 30 29 26 31 30

1. Mean = 2. Median =

3. Range = 4. IQR =

5. Find the standard deviation of the Algebra 1 class data by completing the table and doing the calculations below it.

Mean of squared deviations =

Standard deviation _ 6. Error Analysis Suppose a student in the Algebra 1 class with 31 students transfers to the class with 26 students. The student claims that the

measures of center and the measures of spread will all change. Correct the student's error.

7. The table lists the heights (in centimeters) of 8 males and 8 females on the U.S. Olympic swim team, all randomly selected from swimmers on the

team who

participated in the 2008 Olympic Games held in Beijing, China.

a. Use a graphing calculator to complete the table below.

b. Discuss the consistency of the measures of center for male swimmers and the measures of center for female swimmers, and then compare the

measures of center for male and female swimmers.

c. What do the measures of spread tell you about the variation in the heights of the male and female swimmers

Unit 12 – Alg 2

15

12.4 Data Displays Part 2 Date:

For Step 3, estimate instead of using a protractor.

Unit 12 – Alg 2

16

Unit 12 – Alg 2

17

Unit 12 – Alg 2

18

12.4 Data Displays Part 2 Assignment

Keep Going!

Unit 12 – Alg 2

19

Unit 12 – Alg 2

20

12.5 Data Distribution and Outliers Date: WARM UP

Data Distributions and Outliers Essential question: Which statistics are most affected by outliers, and what shapes can data distributions have?

EXAMPLE Using Line Plots to Display Data Twelve employees at a small company make the following annual salaries (in thousands of dollars):

25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, 60.

A Create a line plot of the data by putting an X above the number line to represent each data value. Stack the Xs for repeated data values.

B Complete the table. Round to the nearest hundredth, if necessary.

REFLECT

1a. ____________________ data are numbers, such as counts or measurements. ____________________ data are categories, such as attributes or

preferences. For example, employees' salaries are quantitative data while employees' positions within a company are qualitative data.

Is it appropriate to use a line plot for displaying quantitative data, qualitative data, or both? Explain.

1b. The line plot allows you to see how the data are distributed. Describe the overall shape of the distribution of employees' salaries.

1c. When you examine the line plot, do any data values appear to be different than the others? Explain.

Unit 12 – Alg 2

21

An ____________________ is a value in a data set that is relatively much greater or much less than most of the other values in the data set. Outliers are

determined using either the IQR or the standard deviation.

Below is one way to determine whether a data value is an outlier.

EXPLORE Investigating the Effect of an Outlier in a Data Set Suppose the list of salaries in the previous example is expanded to include the owner's salary, which is $150,000. Now the list of salaries is:

25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, 60, 150.

A Create a line plot for the revised data set. Choose an appropriate scale for the number line.

B Complete the table. Use a calculator and round to the nearest hundredth, if necessary.

C Complete each sentence by stating whether the statistic increased, decreased, or stayed the same when the data value 150 was added to the original

data set. If the statistic increased or decreased, say by what amount.

The mean

The median

The range

The IQR

The standard deviation

REFLECT

2a. Show that the data value 150 is an outlier, but the data value 60 is not. Use the inequalities given at the top of the previous page to support your

answer.

2b. What effect does the outlier have on the overall shape of the distribution?

2c. For the original data set, you can conclude that the salary of a typical employee is $40,000 regardless of whether you used the mean or the median.

For the revised data set, you could say that the salary of a typical employee is either $48,500 or $40,000 depending on whether you used the mean or

the median. Which average salary is more reasonable for the revised data set? Explain your reasoning.

2d.

Based on how the IQR and standard deviation are calculated, explain why the IQR was only slightly affected by the addition of the outlier while the

standard deviation was dramatically changed.

2e. Because the median and the IQR are based on quartiles while the standard deviation is based on the mean, the center and spread of a data set are

usually reported either as the median and IQR or as the mean and standard deviation. Which pair of statistics would you use for a data set that includes

one or more outliers? Explain.

Unit 12 – Alg 2

22

A data distribution can be described as ____________________, ____________________, or ____________________ depending on the general shape

of the distribution in a line plot or other data display.

EXAMPLE Comparing Data Distributions The tables list Sierra's and Jacey's scores on math tests in each quarter of the school year. Create a line plot for each student's scores and

identify the distribution as symmetric, skewed to the left, or skewed to the right.

A Create and examine a line plot for Sierra's scores.

The distribution is centered on one value (91) with the data values to the left of the center balanced with the data values to the right, so the distribution

is symmetric.

B Create and examine a line plot for Jacey's scores.

The data values cluster on the right with a few data values spread out to the left of the cluster, so the distribution is skewed to the left.

REFLECT

3a. Find the mean and median for Sierra's test scores. How do they compare?

3b. Will the mean and median in a symmetric distribution always be equal or approximately equal? Explain.

3c. Find the mean and median for Jacey's test scores. How do they compare?

3d. Will the mean and median in a skewed distribution always be different? Explain.

Unit 12 – Alg 2

23

12.5 Data Distribution and Outliers Assignment

1. a. Rounded to the nearest $50,000, the values (in thousands of dollars) of homes sold by a realtor are listed below. Use the number line to create a

line plot for the data set.

300 250 200 250 350

400 300 250 400 300

b. Suppose the realtor sells a home with a value of $650,000. Which statistics are affected when 650 is included in the data set?

c. Would 650 be considered an outlier? Explain.

2. In Exercise 1, find the mean and median for the data set with and without the data value 650. Why might the realtor want to use the mean instead of

the median when advertising the typical value of homes sold?

3. The table shows Chloe's scores on math tests in each quarter of the

school year.

a. Use the number line below to create a line plot for Chloe's

scores.

b. Complete the table below for the data set.

c. Identify any outliers in the data set.

Which of the statistics from the table above would change if the outliers were removed?

d. Describe the shape of the distribution.

e. Which measure of center and which measure of spread should be used to characterize the data? Explain.

4. Give an example of a data set with a symmetric distribution that also includes one or more outliers.

5. Suppose that a data set has an approximately symmetric distribution, with one outlier. What could you do if you wanted to use the mean and standard

deviation to characterize the data?

Unit 12 – Alg 2

24

12.6 Data Distribution Date: WARM UP

Data Distributions Essential question: How can you use shape, center, and spread to characterize a data distribution?

A ____________________ is a set of numerical data that you can graph using a data display that involves a number line, such as a line plot, histogram,

or box plot. The graph will reveal the shape of the distribution.

EXPLORE Seeing the Shape of a Data Distribution The table gives data about a random sample of 20 babies born at a hospital.

A Make a line plot for the distribution of birth months.

B Make a line plot for the distribution of birth weights.

C Make a line plot for the distribution of mothers' ages.

REFLECT

1a. Describe the shape of the distribution of birth months.

1b. Describe the shape of the distribution of birth weights.

1c. Describe the shape of the distribution of mothers' ages.

Unit 12 – Alg 2

25

ENGAGE Understanding Shape, Center, and Spread As you saw in the Explore, data distributions can have various shapes. Some of these shapes are given names in statistics.

• A distribution whose shape is basically level (that is, it looks like a rectangle) is called a _________________________.

• A distribution that is mounded in the middle with symmetric “tails” at each end (that is, it looks bell-shaped) is

called a ________________________________.

• A distribution that is mounded but not symmetric because one “tail” is much longer than the other is called a ______________________.

When the longer “tail” is on the left, the distribution is called _________________________.

When the longer “tail” is on the right, the distribution is called _________________________.

The figures below show the general shape of normal and skewed distributions.

Shape is one way of characterizing a data distribution. Another way is by identifying the distribution's center and spread. You should already be

familiar with the following measures of center and spread:

• The _______________ of n data values is the sum of the data values divided by n. If are data values from a sample, then the mean

is given by:

• The _______________ of n data values written in ascending order is the middle value if n is odd and is the mean of the two middle values if n is even.

• The ____________________ of n data values is the square root of the mean of the squared deviations from the distribution's mean. If x1,x2,...,xn are

data values from a sample, then the standard deviation s is given by:

• The _________________________,or IQR, of data values written in ascending order is the difference between the median of the upper half of the

data, called the third quartile or Q3 , and the median of the lower half of the data, called the first quartile or Q1 .

So, IQR = _______________.

The first quartile, the median, and the third quartile divide a set of data into four groups that each contain about 25% of the data, so the IQR tells you

how spread out the middle 50% (or so) of the data are.

To distinguish a population mean from a sample mean, statisticians use the Greek letter mu, written μ, instead of . Similarly, they use the Greek

letter sigma, written ∑, instead of s to distinguish a population standard deviation from a sample standard deviation.

Also, for a reason best left to a statistics course, the formula for the sample standard deviation sometimes has n − 1 rather than n in the denominator of

the radicand. (In this book, n will always be used.)

REFLECT

2a. Describe the shape of each distribution in the Explore using the vocabulary defined on the previous page.

2b.When the center and spread of a distribution are reported, they are generally given either as the mean and standard deviation or as the median and

IQR. Why do these pairings make sense?

No Data Distribution Assignment…

Unit 12 – Alg 2

26

12.7 Normal Distribution Part 1 Date: WARM UP

Unit 12 – Alg 2

27

Unit 12 – Alg 2

28

Example

Unit 12 – Alg 2

29

Example

Check Point

Unit 12 – Alg 2

30

12.7 Normal Distribution Part 1 Assignment

Unit 12 – Alg 2

31

12.8 Normal Distribution Part 2 Date:

Suppose the scores on a certain test have an approximately normal distribution with a mean of 68 and a standard deviation of 9. 1. Use this information to sketch a graph of the normal curve including labels for 3 standard

deviations on either side of the mean. 2. What percentage of scores are higher than 68? 3. What percentage of scores are higher than 59? 4. What percentage of scores are lower than 86? 5. What percentage of scores are between 50 and 86? 6. What percentage of scores are between 50 and 68? 7. What percentage of scores are between 50 and 77? 8. What percentage of scores are between 41 and 77? 9. What score would it require to make the top 2.5%? 10. Approximately what score would it take to make the bottom 10%? 11. If a class of 31 students took the test, approximately how may students would you expect to score over 59? 12. If a class of 29 students took the test, approximately how may students would you expect to

score between 59 and 68? 13. What percentage of scores are below 55? Above 55? 14. What percentage of scores are between 57 and 72? Between 48 and 60 or above 88?

Unit 12 – Alg 2

32

12.8 Normal Distribution Part 2 Assignment Suppose the class average on a test was 80, with a standard deviation of 5. 1. Use this information to sketch a graph of the normal curve including labels for 3 standard

deviations on either side of the mean. 2. Between which 2 values does the middle 68% of the scores fall? 3. Find the probability that a student received between 75 and 90 on the test? 4. What percentage of scores are lower than 70? 5. What percentage of scores are between 65 and 70? 6. What percentage of scores are between 70 and 80? 7. What percentage of scores are between 70 and 90? 8. What percentage of scores are below 90? 9. What score would it require to make the top 2.5%? 10. Approximately what score would it take to make the bottom 10%? 11. If a class of 31 students took the test, approximately how may students would you expect to score over 85? 12. If a class of 29 students took the test, approximately how may students would you expect to

score between 75 and 85? 13. What percentage of scores are below 77? Above 77? 14. What percentage of scores are between 72 and 84? Between 68 and 78 or above 92?

Documents

Alg 2 Name: Period: Unit 12: Statistics · Unit 12 – Alg 2 3 2 EXPLORE Finding Statistics Using Various Sampling Methods The salaries (in thousands of dollars) of all 30 employees