17
HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING Presented by Mark Goldstein [email protected] The Center for Mathematics and Teaching, Inc. www.mathandteaching.org CMC South Mathematics Conference Palm Springs, CA Nov, 2013

HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING

Presented by

Mark Goldstein

[email protected]

The Center for Mathematics and Teaching, Inc. www.mathandteaching.org

CMC South Mathematics Conference Palm Springs, CA

Nov, 2013

Page 2: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

• Teachers / Grade level(s) Administrators (school site, district office, etc) Instructional coaches Other

• Unless you’re from Alaska, Texas, Nebraska, Minnesota, Virginia, or Puerto Rico, you’re likely interested in what’s going on with the Common Core State Standards in Mathematics (CCSS-M).

• The CCSS-M is a starting point by which we can help our students to become mathematical thinkers, and not just regurgitate facts and procedures. The standards and the new curricular materials won’t do it alone, though. As always, teachers are the number one drivers of curriculum, the greatest influence in the classroom, and the way teachers deliver instruction will be the key in this transformation.

Page 3: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

COMMON CORE STATE STANDARDS – MATHEMATICS

On a scale of 0-5, what is your knowledge of your grade level(s) CCSS Content Standards? (5 = “I can recite them for you;” and 0 = “Huh, what is the common core?”) In Grade 7, instructional time should focus on four critical areas: (1) developing understanding of and applying proportional relationships; (2) developing understanding of operations with rational numbers and working with expressions and linear equations; (3) solving problems involving scale drawings and informal geometric constructions, and working with two- and three-dimensional shapes to solve problems involving area, surface area, and volume; and (4) drawing inferences about populations based on samples.

STANDARDS FOR MATHEMATICAL CONTENT 7.SP.1 Understand that statistics can be used to gain information about a

population by examining a sample of the population; generalizations about a population from a sample are valid only if the sample is representative of that population. Understand that random sampling tends to produce representative samples and support valid inferences.

7.SP.2 Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be.

7.SP.3 Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability. For example, the mean height of players on the basketball team is 10 cm greater than the mean height of players on the soccer team, about twice the variability (mean absolute deviation) on either team; on a dot plot, the separation between the two distributions of heights is noticeable.

7.SP.4 Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations. For example, decide whether the words in a chapter of a seventh-grade science book are generally longer than the words in a chapter of a fourth-grade science book.

Page 4: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

COMMON CORE STATE STANDARDS – MATHEMATICS On a scale of 0-5, what your knowledge of CCSS Standards for Mathematical Practice? The SMP are our new basic skills. Educational organizations must collectively teach their students to be critical thinkers, and much of this comes from students doing mathematics and talking about mathematics. We need to teach our students how to talk to each other, and really, no one is a native speaker of mathematics. Many of the SMP simply do not happen without discussion, and not just teach-student, but also student-student.

STANDARDS FOR MATHEMATICAL PRACTICE MP2: Reason abstractly and quantitatively. Data for fish lengths is given for exploration. Students are expected to take these numbers in context, do calculations and manipulate them as needed as if they have no meaning, and then reconsider these numbers within the original context for the purpose of analysis. MP4: Model with mathematics. Students perform a simulation experiment to estimate the number of fish in a lake. MP5: Use appropriate tools strategically. Spreadsheet software, calculators, and internet applets may be used to calculate various statistics. One advantage of using these tools is that students can modify data sets and observe the effect on statistics without being distracted by cumbersome computations. MP6: Attend to precision. Precision of language is important for clear communication of ideas in this lesson. This lesson includes vocabulary from 6th grade standards, but since this is vocabulary heavy, students may need review. Students compare precise theoretical probability values to those obtained from an experiment. As students use calculators for computations, they will need to round values to an appropriate decimal. For example when computing the MAD statistic for fish lengths, these values should probably be rounded. An appropriate level of precision can be discussed in class.

• Is anyone here from a district doing CCSS testing at the end of this school year?

Page 5: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

MATH NOTES

MN 1: Finding Quartiles

Suppose the numbers of siblings for 12 different students are: 4, 5, 2, 2, 3, 3, 2, 2, 5, 7, 1, 1 To find quartiles, first put the numbers in numerical order. Since the number of data points is divisible by four, simply locate the gaps that divide the data set into four equal parts.

1 1 2 2 2 2 3 3 4 5 5 7

min Q1 Q2 median

Q3 max

For this data set, Q1 = 2 (the first quartile), Q2 = 2.5 (the second quartile), and Q3 = 4.5 (the third quartile). Note that Q1 is the median of the first half of the data set, Q2 = M is the median of the entire data set, and Q3 is the median of the second half of the data set.

MN 2: “The Quartile” or “In the Quartile”

The word “quartile” is used in statistics in two different ways. Most often, it is used to denote numbers that separate the data set into four equal parts. In the data set above, Q3 (or the third quartile) is 4.5.

The word “quartile” can also refer to the set of values in one of the four equal parts of the data set. For the sibling data set above, the “fourth quartile” (or “top quartile”) is {5, 5, 7}, the “first quartile” (or “bottom quartile”) is {1, 1, 2}, and the “third quartile” is {3, 3, 4}.

Thus, “the first quartile” is 2, but “the value 2 lies both in the “first quartile” and in the” second quartile.”

This ambiguous use of terms occurs often in mathematics, and the precise meaning must be determined from context. For instance, a “triangle” is defined to be three segments joined end to end to enclose an area, but the word “triangle” may also refer to the triangular area enclosed by the three segments.

MN 3: The Tale is in the Tail

Roughly speaking, a data set is symmetric if the part of the data to the right of the median is the mirror-image of the data to the left of the median. Any discussion about the overall shape of the data should explore whether the data distribution is symmetric or not.

A data set that is not symmetric is said to be skewed. One way a data set can be skewed is to have a longer tail on one end of the distribution than on the other, as indicated in the figures below. We say the data is skewed to the right if the data has a longer tail on the right than on the left. We say the data is skewed to the left if the data has a longer tail on the left than on the right. The tail may provide valuable insight into the data set when interpreted in context.

tail (skewed to the left) tail (skewed to the right) symmetric

When data is skewed to the left, the mean will be less than the median, since values far out in the tail tend to “pull” the mean to the left. When data is skewed to the right, the mean will be greater than the median, since values far out in the tail “pull” the mean to the right. Consider the data set: {1, 1, 2, 2, 4, 6, 12}. This data is skewed to the right. The median is 2. The mean is 4.

Page 6: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

MATH NOTES (Continued)

MN 4: Mean Absolute Deviation

The standard deviation σ is the measure of spread of a numerical data set preferred by statisticians. It occurs naturally in many applications, such as in the analysis of variants. The standard deviation is difficult for middle school students to calculate by hand, since it has some squares and a square root.

σ = ∑ 2( - )

-1

x x

n

For a normal distribution, about 68% of data will fall within one standard deviation of the mean, and 95% within two standard deviations of the mean. The mean absolute deviation (MAD) is an alternative measure of spread, which has the same general form as the standard deviation but is easier to calculate by hand. The MAD is computed by adding the distances between each data value and the mean, and then dividing by the number of data values. It turns out that σ ≥ MAD. In fact, σ > MAD except in the very special cases that there is only one data value, or that there are exactly two data values, each with probability one half. There are data sets for which σ is quite large while MAD is close to zero. For a normal distribution, σ is obtained by multiplying MAD by the square root of 2π, σ 2 MAD= π • . In a sense that can be made mathematically precise, σ corresponds to a round world while MAD corresponds to a square world.

x

Page 7: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

TEACHING NOTES (Continued)

TN 1: How to Construct a Box Plot

A box plot, or box-and-whiskers plot, is a graphical representation of the five-number summary of a data set. For a box plot, the median, quartiles, minimum, and maximum of the data set are indicated on a number line, a box shows the middle 50% of the data set, and whiskers reach out from the box to the minimum and maximum. To construct a box plot based on the data set in TN2, we proceed as follows: Locate the five-number summary values on a number line, and indicate with vertical segments. Create a “box” to show the interquartile range from Q1 to Q3, and draw “whiskers” that extend to the minimum and maximum values.

Error alert: Be sure students properly scale their box plot. This plot is INCORRECT:

TN 2: Telling a Story with Data

At its heart, statistics seeks to interpret data to tell a story. Although the 7th grade Common Core State Standards in mathematics do not explicitly require the creation of data displays, they are an essential tool for telling a story with data. Not only do data displays represent a story visually, they enable students to grasp better the concepts of center, shape, and spread. Some typical data displays are dot plots, stem-and-leaf diagrams, box plots (or box-and-whisker plots), and histograms. In these lessons we include dot plots and box plots, but teachers are encouraged to introduce other displays as well.

1 2 3 4.5 7

0 1 2 3 4 5 6 7 8

Q1 median

Q3 minimum maximum

0 1 2 3 4 5 6 7 8

Page 8: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

SAMPLING

Goals:

• Use measures of center and spread to compare two populations.

• Use line plots, box plots, and other visual tools to understand and compare numerical data sets.

• Measure the spread of data using mean absolute deviation.

• Use random sampling to make valid inferences about populations.

Here is sample of student quiz scores selected randomly from Ms. Burkhart’s math classes:

13 18 13 14 13 16 14 21 13

Choose a word from the word bank below to match each description to its word. Then calculate each statistic using the scores above.

WORD BANK minimum (min) number of observations (n) mean X

median (med = Q2) interquartile range mode(s) maximum (max) 1st Quartile (Q1) five-number summary

range 3rd Quartile (Q3)

Word, Phrase, or Symbol Description Statistical

Value The number of student scores

The middle number, when the observations are in order

The arithmetic average

The lowest score

The highest score

The difference between the highest and lowest score

The middle of the bottom half of the data set

The middle of the top half of the data set

The following five numbers: (min, Q1, med, Q3, max)

The most common observation(s)

The difference between Q3 and Q1

Page 9: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

ESTIMATING FISH POPULATIONS

You will do an experiment that simulates a method scientists use to estimate fish population in a lake. Follow the directions given by your teacher to complete the fish population experiment. Use this sheet to record and analyze data. Stagged = number of tagged fish in Sample Ptagged = number of tagged fish in Population

Stotal = total number of fish in Sample Ptotal = total number of fish in Population

1. How many fish did you tag in your random sample? ______

What variable described above does this represent? ______________

2. Record your observations for Stagged and Stotal. Then write and solve proportions to

estimate the total number of fish (Ptotal) in the population based on each sample.

SAMPLE # 1 2 3 4 5 6

Stagged

Stotal

Find Ptotal

⎛ ⎞

=⎜ ⎟⎝ ⎠

tagged tagged

total total

S PS P

Page 10: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

ESTIMATING FISH POPULATIONS (Continued)

3. Based on your experiment, estimate the number of fish in the population._______ 4. How did you decide on this estimate? 5. Count all the fish. Find the actual number of fish in the population. _______

6. Find your error as a percent. Use this formula:

7. Did you overestimate or underestimate? Was your estimation close?

8. What might you do to improve your estimate of the number of fish in the lake?

9. In real life, what could weaken the results of collecting a representative sample using

this capture-recapture method?

( )⎛ ⎞−⎜ ⎟⎝ ⎠estimate actual100

actual

Page 11: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

FISHY STATISTICS Suppose that when fish were tagged, they were also measured. Here are fish lengths in centimeters from two different random samples:

Sample A: 75, 32, 38, 42, 47, 68, 51, 51, 58, 31, 51, 65 Sample B: 49, 45, 51, 49, 63, 56, 51, 48, 52, 42, 51, 52

1. Rewrite each list in order from least to greatest: Sample

A

Sample B

Calculate statistics for the two data sets.

Statistic Sample A Sample B

2. number of observations

3. minimum

4. maximum

5. range

6. mean

7. median

8. mode

9. five-number summary

10. interquartile range (IQR)

Page 12: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

A FISHY DATA DISPLAY: BOX PLOT

A box plot (or box-and-whisker plot) is a visual display of data associated with the median (a measure of center). The display is based on the five-number summary.

min Q1 med Q3 max To make a box plot: • Write the values of the data set from least to greatest. • Find the five-number summary. • Locate the five-number summary values on a number line, and indicate with vertical

segments. • Create a “box” by connecting segments between Q1 and Q3, • Create “whiskers” by extending segments from Q1 to the minimum value and Q3 to

the maximum value. 1. Create box plots for Sample A and Sample B using the scale provided.

Sample A:

Sample B:

2. Why is it important to use the same scale when comparing the data? 3. What percent of the data is included in the IQR? ___________________ 4. Calculate the IQR for Sample A. __________________ 5. Calculate the IQR for Sample B. ________________ 6. What do the box plot and interquartile range tell us about fish lengths in the lake?

30 35

30 35

Page 13: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

ANOTHER FISHY STATISTIC: MEAN ABSOLUTE DEVIATION

The mean absolute deviation (MAD) is a measure of spread that is associated with the mean (a measure of center). It is the arithmetic average of the distance of each data point to the mean (absolute value). To calculate the MAD statistic:

• Find the mean of the sample. • Find the distance (absolute value) from each data point to the mean. • Find the sum of the distances. • Divide by the number of data points to find the average distance from the mean.

Calculate the MAD statistics for the lengths of fish in Sample A and Sample B. 1. What is the mean length of the fish in Sample A? _______ in Sample B? _______

2. Use the tables below to help you calculate the MAD statistics for Samples A and B. Sample A

data Distance from

data point to mean 75 75 50.75 = 24.25−

32

38

42

47

68

51

51

58

31

51

65

Sum Average

For sample A, the mean is _____,

The MAD is _____ (round to the nearest 10th).

Sample B data

Distance from data point to mean

49 49 50.75 =1.75−

45

51

49

63

56

51

48

52

42

51

52

Sum Average

For sample B, the mean is _____,

The MAD is _____ (round to the nearest 10th).

Page 14: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

MORE ABOUT MEAN ABSOLUTE DEVIATION

1. Here are the lengths of fish for Sample A (in order). Recall the mean ( )x for Sample A was 50.75 cm. Draw an arrow to locate( )x .

31 32 38 42 47 51 51 51 58 65 68 75

2. Recall the MAD statistic for Sample A was 10.6. Determine which fish lengths fall within

this distance from the mean.

X – MAD = 50.75 – 10.6 = ________ (1)

X + MAD = _______ + _______ = ________ (2) Interval representing one MAD unit from the mean [ _______, _______ ] (1) (2)

3. Draw an oval around the fish lengths that fall in this interval. 4. What percent of the fish lengths fall in this interval? ______

5. Here are the lengths of fish for Sample B (in order). Recall the mean ( )x for Sample B

was 50.75 cm. Draw an arrow to locate( )x .

42 45 48 49 49 51 51 51 52 52 56 63 6. Recall the MAD statistic for Sample A was 3.5. Determine which points fall within this

distance from the mean.

X – MAD = 50.75 – 3.5 = ________ (1)

X + MAD = _______ + _______ = ________ (2) Interval representing one MAD unit from the mean [ _______, _______] (1) (2) 7. Draw an oval around the fish lengths that fall in this interval. 8. What percent of the fish lengths fall in this interval? ______

9. What percent of the data always fall in the interquartile range (IQR)? _____

Is this always true for MAD? _____ 10. How does the MAD statistic help us describe the lengths of fish in the lake?

Page 15: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

CEREAL BOX PROBLEMS This problem requires the use of simulation as a tool for experimenting in the context of determining your chance of winning a prize. Simulation is the imitation of the operation of a real-world process or system over time. It is an effective tool for finding experimental probability when conducting an actual trial is difficult.

1. Consider this problem: Munchy Crunchy cereal offers six different prizes in their cereal boxes. Assume there is an equal chance of getting any of the six prizes when you buy a box of cereal. How many boxes of cereal would you expect to have to buy in order to collect all six different toys? Since it would be impractical to purchase cereal with prizes inside, it makes sense to design and execute a simulation to answer the question. Use the tools you have developed through the study of probability and sampling to design and execute an experiment to answer this question. Then write up your results and conclusions.

Some factors to consider:

• How many times do I need to conduct the experiment to acquire meaningful data? • How can I simulate the purchase of cereal boxes with prizes inside? • How can I organize the information that I collect from the simulation? • How can I write up my work so that the experiment and its conclusions are clear to

the reader? • What are some limitations of the experiment?

2. Now consider this problem variation.

Crispy Chunk cereal offers two different prizes in their cereal boxes. Assume there is an equal chance of getting either prize when you buy a box of cereal. What is the probability that it will take more than 3 cereal box purchases at the store to get both prizes?

• Design a simulation to estimate the probability. Then execute the simulatio

Page 16: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

                       

Page 17: HOW MANY FISH ARE IN THAT LAKE? CCSS STATISTICS AND SAMPLING · 1 = 2 (the first quartile), Q 2 = 2.5 (the second quartile), and Q 3 = 4.5 (the third quartile). Note that Q 1 is the

NAME   EMAIL  (PLEASE  PRINT  NEATLY!!!)