24
Name _____________________________ Chapter 1 Learning Objectives Secti on Related Example on Page(s) Relevant Chapter Review Exercise( s) Can I do this? Identify the individuals and variables in a set of data. Intro 3 R1.1 Classify variables as categorical or quantitative. Intro 3 R1.1 Display categorical data with a bar graph. Decide whether it would be appropriate to make a pie chart. 1.1 9 R1.2, R1.3 Identify what makes some graphs of categorical data deceptive. 1.1 10 R1.3 Calculate and display the marginal distribution of a categorical variable from a two-way table. 1.1 13 R1.4 Calculate and display the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two- way table. 1.1 15 R1.4 Describe the association between two categorical variables by comparing appropriate conditional distributions. 1.1 17 R1.5 Make and interpret dotplots and stemplots of quantitative data. 1.2 Dotplots: 25 Stemplots : 31 R1.6 Describe the overall pattern (shape, center, and spread) of a distribution and identify any major departures from the pattern (outliers). 1.2 Dotplots: 26 R1.6, R1.9 Identify the shape of a distribution from a graph as roughly symmetric or skewed. 1.2 28 R1.6, R1.7, R1.8, R1.9 Make and interpret histograms 1.2 33 R1.7, 1

Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

  • Upload
    lenhi

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Name _____________________________

Chapter 1 Learning Objectives SectionRelated

Exampleon Page(s)

RelevantChapter Review

Exercise(s)

Can I do this?

Identify the individuals and variables in a set of data. Intro 3 R1.1

Classify variables as categorical or quantitative. Intro 3 R1.1

Display categorical data with a bar graph. Decide whether it would be appropriate to make a pie chart.

1.1 9 R1.2, R1.3

Identify what makes some graphs of categorical data deceptive. 1.1 10 R1.3

Calculate and display the marginal distribution of a categorical variable from a two-way table. 1.1 13 R1.4

Calculate and display the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table.

1.1 15 R1.4

Describe the association between two categorical variables by comparing appropriate conditional distributions.

1.1 17 R1.5

Make and interpret dotplots and stemplots of quantitative data. 1.2 Dotplots: 25

Stemplots: 31 R1.6

Describe the overall pattern (shape, center, and spread) of a distribution and identify any major departures from the pattern (outliers).

1.2 Dotplots: 26 R1.6, R1.9

Identify the shape of a distribution from a graph as roughly symmetric or skewed. 1.2 28 R1.6, R1.7,

R1.8, R1.9Make and interpret histograms of quantitative data. 1.2 33 R1.7, R1.8

Compare distributions of quantitative data using dotplots, stemplots, or histograms. 1.2 30 R1.8, R1.10

Calculate measures of center (mean, median). 1.3 Mean: 49Median: 52 R1.6

Calculate and interpret measures of spread (range, IQR, standard deviation). 1.3 IQR: 55

Std. dev: 60 R1.9

Choose the most appropriate measure of center and spread in a given setting. 1.3 65 R1.7

Identify outliers using the 1.5 × IQR rule. 1.3 56 R1.6, R1.7, R1.9

Make and interpret boxplots of quantitative data. 1.3 57 R1.7

Use appropriate graphs and numerical summaries to compare distributions of quantitative variables.

1.3 65 R1.8, R1.10

1

Page 2: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

1.1 Analyzing Categorical Data

Read 2–4

Fr/Soph/Jr/Sr g.p.aEmail address

NameBus route

Phone numberDays absent

AddressCredits earned

AllergiesCurrent on immunizations

Exterior color mileageTotal car length

Number of cylindersCost

ModelVIN

Type of sound systemSize of fuel tank

What do we call these two kinds of variables? What’s the difference?

Why do people sometimes confuse the two kinds of variables?

What is a distribution? It’s all the values that a variable can take on and how often.

2

Page 3: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Alternate Example: Willott’s musicHere is information about 12 randomly selected songs in Willott’s music library.

Song Title Artist Album year

Track Length Genre Tracks on

the albumTrack

NumberDouble Dare Bauhaus 1980 4:54 Gothic 9 1

Carpe Noctum Tiesto 2007 7:03 Dance/Electronic 12 4She Wolf Shakira 2009 3:10 Latin 12 1

Come as You Are Nirvana 1991 3:39 Alternative 12 3The Heinrich

Maneuver Interpol 2007 3:35 Alternative 11 4

Shake It Out Florence + The Machine 2011 4:38 Alternative 12 2

My Songs Know What You Did in the Dark

(Light Em Up)Fall Out Boy 2013 3:07 Alternative 11 2

Locked Out of Heaven Bruno Mars 2012 3:53 Pop 10 2

Womanizer Britney Spears 2008 3:44 Pop 13 1

Iceolate Front Line Assembly 1990 5:13 Industrial 10 7

I Bet You Look Good On The Dancefloor

Arctic Monkeys 2006 2:54 Indie 13 2

Meat is Murder The Smiths 1985 6:06 Alternative 9 9

(a) Who are the individuals in this data set?

(b) What variables are measured? Identify each as categorical or quantitative. In what units were the quantitative variables measured?

(c) Describe the individual in the first row.

Read 7–11

What's the difference between a data table, a frequency table, and a relative frequency table? Data table Frequency table Relative frequency tabletells values of variables for individuals

tells distribution of 1 variable in table form

tells distribution of 1 variable as a %, decimal, or fraction

Which one was the previous example?

When making pie charts and bar graphs, what do people often mess up?

3

Page 4: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Bar Graphs Pie ChartsPros Quick & easy Show part-whole relationships wellCons part-whole relationships are hard to see They’re hard to make by hand.

Don't use when percents don't add up to 100%.

Let's search "misleading graph" and see some examples. Identify some particular problems many of these graphs share.

HW #11: page 7 (1, 3, 5, 7, 8), page 22 (11, 13, 15, 17, 18) Read 12–18Examples of:…two-way table (2 variables are shown with counts or frequencies)

Senior Non-seniorBoy 8 3Girl 15 4

…marginal distribution (totals for rows & columns; the distribution for each variable)Senior Non-senior Totals

Boy 8 3 11Girl 15 4 19Totals 23 7 30

…conditional distribution (distribution of one variable as a % of the other variable)Senior Non-senior

Boy 35% 43%Girl 65% 57%Totals 100% 100%

How do we know which variable to condition on? Divide by the explanatory variable totals.

4

Senior Non-senior TotalsBoy 73% 27% 100%Girl 79% 21% 100%

Died SurvivedHospital AHospital B

Page 5: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

What is a segmented (or stacked) bar graph?

Use a segmented bar graph to compare conditional distributions, to look for differences, and to look for patterns.

When knowing the value of one variable helps predict the value of the other, we say that the variables are associated. Association appears in a segmented bar graph when we see big differences in the proportions. The proportions may be “flipped” or reversed.

Careful! An association does NOT automatically mean that there is a cause-and-effect relationship.

The boy/girl senior/non-senior graphs did not show much association.

Alternate Example: Horseshoe CrabsTwo members of the University of Florida at Gainesville Department of Zoology collected data on Horseshoe Crabs on a Delaware beach during 4 days in the late spring of 1992. Based on the color of the shells, they classified each crab as Young, Intermediate, or Old and whether the crabs could right themselves when flipped on their backs or whether they were stranded for at least a certain period of time. Here are the results.

Young Intermediate Old TotalStranded 214 384 295 893

Not Stranded 1668 1204 216 3088

Total 1882 1588 511 3981(a) Explain what it would mean if there was no association between age and strandedness.

(b) Does there appear to be an association between age and strandedness in this sample? Justify.

5

Page 6: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

HW #12: page 22 (19, 21, 23, 25, 27–34)And now, we change from categorical data to quantitative data…

1.2 Displaying Quantitative Data with GraphsElmer and Ethel have retired and want to move someplace warm. The couple is considering nine different cities. The dotplots below show the distribution of average daily high temperatures in December, January, and February for each of these cities. Help them pick a city by answering the questions below, based on the data shown in the graph.

palmspring...

atlantaH

phoenixH

sandiegoH

orlandoH

miamiH

keywestH

honoluluH

sanjuanH60 65 70 75 80 85 90

Average High Temperatures Dot Plot

1. What is the typical high temperature for these months in Phoenix, Orlando, and San Juan? Which of those 3 cities is most similar in this respect to Palm Springs? (Look for the center: the average, median, or typical value.)

2. Are daily high temperatures for these months more predictable in Palm Springs or in Orlando? (Look at the spread: the variation, including the range.)

3. What might be unique to Atlanta, San Diego, and Honolulu? (Look for outliers: unusual values.)

4. What makes San Juan and San Diego somewhat similar to one another? Likewise, Palm Springs, Phoenix, and Orlando are similar to one another in this way, but different from the first group. (Look at the shape: symmetry vs. asymmetry.)

6

Page 7: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

7

Page 8: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Read 25–27 Notice that we are now looking at quantitative data!

How should we describe the distribution of a quantitative variable? Use “SOCS”Center- Typical value, such as the mean or the median

Spread- Range for now (we'll also use standard deviation and interquartile range "IQR")

Outliers- Unusual values for now (we'll eventually use the "1.5IQR Rule")

Shape- Address the graph's # of peaks and its symmetry(unimodal, bimodal, multimodal, uniform, symmetric, asymmetric, skewed left, skewed right)

Read 27–29 Examples and descriptions of various shapes of distributions: Unimodal SymmetricCurve Dotplot Histogram

Heights on adult women Expected sums on 36 rollsof two 6-sided dice Length of growing

seasons in St. Louis

BimodalCurve Dotplot Histogram

Heights of men and women Maximum angle of aObserved sums on 35 rolls sample of roller coastersof a 4-sided die and an 8-sided die

Unimodal Skewed LeftCurve Dotplot Histogram

Heights of kids at a middle school dance Time to finish a difficult test Heights in my extended family

Unimodal Skewed RightCurve Dotplot Histogram

Salaries of MLB players Selling prices of homes in a new subdivision Scores on a multiple choice pre-test

over completely new materialUniformCurve Dotplot Histogram

Expected outcomes of spins of a spinner with equally-sized spaces Outcomes of 36 rolls Ages of studentsnumbered 1-10 of a 6-sided die in a school district

8

Page 9: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Here are the number of calories per item for 16 convenience store sandwiches, along with a dotplot of the data.

360 430 440 440 440 450 450 460470 480 480 490 490 490 500 510

Describe the shape, center, and spread of the distribution. Are there any outliers?

Read 29–30When asked to compare two distributions, be sure that you compare and don’t just describe!

Be sure that you use “less”, “more”, and “-er” words.

How does the annual energy consumption (kWh/year) compare for top-loading washing machines and front-loading washers? The data below is from the Home Depot website. There are 26 front-loaders and 32 top-loaders included.

front

top

100 150 200 250 300 350 400 450 500kWh_per_year

Collection 1 Dot Plot

Read 31–32 Caution! Remember to include a key when making a stemplot (stem-and-leaf-plot).

If you write "19 | 7", is that 197, 19.7, 1970, ...?

9

Page 10: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

How do gas prices in St. Charles County compare to those in Madison County, where Alton, Illinois is located? A sample of gas prices was taken on several days in July 2015. Make a back-to-back stemplot and compare the distributions. St. Charles Co.: 2.56, 2.56, 2.57, 2.57, 2.58, 2.58, 2.58, 2.58, 2.59, 2.59, 2.59, 2.59, 2.60, 2.60, 2.61 Madison Co.: 2.67, 2.68, 2.69, 2.69, 2.70, 2.70, 2.70, 2.71, 2.71, 2.71, 2.71, 2.72, 2.72, 2.73, 2.74

HW #13: page 41 (37, 39, 43, 45, 47)1.2 HistogramsThe following table presents the total number of triples (3B) for the 30 MLB teams in the 2014 regular season. Make a dotplot to display the distribution of triples for the season. Then, use your dotplot to make a histogram of the distribution.

Team 3B Team 3B Team 3BArizona 47 Pittsburgh 30 Toronto 24

San Francisco 42 San Diego 30 Tampa Bay 24Colorado 41 Kansas City 29 Cleveland 23

LA Dodgers 38 Milwaukee 28 Atlanta 22Miami 36 Texas 28 St. Louis 21

Oakland 33 Minnesota 27 Boston 20Chicago Sox 32 Washington 27 Cincinnati 20

Seattle 32 Philadelphia 27 Houston 19LA Angels 31 Detroit 26 NY Mets 19

Chicago Cubs 31 NY Yankees 26 Baltimore 16

Read 33–36When you make a histogram...

...you can turn a dotplot into a histogram.

... be consistent with "fence sitters".

10

Page 11: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

... be consistent with spacing and bin width.Read 38–41When might we want a relative frequency histogram rather than a frequency histogram?

…to see part-whole relationships or to compare 2 groups

HW #14: page 45 (51, 53, 55, 59–62)1.3 Describing Quantitative Data with NumbersRead 48–50

is is a statistic; "x bar" is the sample mean. is a parameter; "mu" is the population mean.

When adding a very large or very small data value to a data set (or changing a data value to something very large or very small) does not change the value of a statistic very much, or at all, we say that the statistic is resistant.

The mean is not a resistant measure of center. Adding an extreme value, or altering a value to make it extreme, will change the value of the mean quite a bit. Think about what happens to the average age of people in the classroom when Mr. Willott walks in.

The mean is the balancing point.Approximately where will the mean be located, when looking at a histogram or dotplot?

2

4

6

8

10

12

14

16

StL_winter_Avg_High_Temps36 38 40 42 44 46 48 50 52

Average High Temperatures Histogram

StL_winter_Avg_High_Temps36 38 40 42 44 46 48 50

Average High Temperatures Dot Plot

Read 51–53The median is a resistant measure of center. Adding an extreme value, or altering a value to make it extreme, will not change the value of the median much, if at all. Think about what happens to the median age of people in the classroom when Mr. Willott walks in.

If we know the shape of a distribution, as shown below, then where are the mean and the median located in relation to one another?

roughly symmetric exactly symmetric skewed

11

Page 12: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

20

40

60

80

100

120

140

160

180

height (feet)-5 0 5 10 15 20 25

Tides at Birch Islands Maine Histogram

1

2

3

4

5

6

7

theoretical_sum_on_2_dice0 2 4 6 8 10 12 14

Collection 1 Histogram

20

40

60

80

100

120

140

160

height (feet)-2 0 2 4 6 8

Tides at Alcatraz Island CA Histogram

Read 53–55The range = highest data value minus lowest data value. The range is a single number and it is not a resistant measure of spread. An extreme value will affect the value of the range. Think about what happens to the range of ages of people in the classroom when Mr. Willott walks in.

The median divides an ordered list of data into two equal groups.The quartiles divide an ordered list of data into four equal groups.The interquartile range (IQR) is the spread of the middle 50% of the data. The IQR is a resistant measure of spread. Think about what happens to the range of the middle 50% of ages of people in the classroom when Mr. Willott walks in.

Here are data on the amount of fat (in grams) in 9 different Taco Bell menu items. Calculate the median, quartiles, and IQR.

Read 57–58What is the 1.5 IQR Rule for identifying outliers?

Illustration by

12

Item Fat (g)Crunchy Taco 10Nachos Supreme 24Cheese Quesadilla 26Chicken Quesadilla 27Mexican Pizza 31Taco Salad (steak) 37Nachos BellGrande 39XXL Grilled Stuft Burrito – Beef 41Taco Salad (original) 42

Page 13: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Kelly Boles

How many fat grams would qualify as an outlier for the Taco Bell items?

Are there outliers among the 9 taco bell items?

Here are data for the calories for 12 McDonald’s menu items. Are there any outliers?

Read 56–58The five-number summary: Minimum, Q1, Median, Q3, Maximum

A boxplot is a graph that is related to the five-number summary.

Draw a boxplot for the Taco Bell data. Check yours against the one that the graphing calculator makes.

Here are parallel boxplots for the heights of baseball players for 5 of the 2005 MLB teams. Compare these distributions.

13

Sandwich Calorie32 oz. Chocolate Shake 1160Big Breakfast® 740Big Mac® 540Sausage Biscuit with Egg 510McRib® 50010 pc. McNuggets® 460Double Cheeseburger 440Quarter Pounder® 410Filet-O-Fish® 380McChicken® 360Large Caramel Latte 330Large Vanilla Iced Coffee 270

Item Fat (g)Crunchy Taco 10Nachos Supreme 24Cheese Quesadilla 26Chicken Quesadilla 27Mexican Pizza 31Taco Salad (steak) 37Nachos BellGrande 39XXL Grilled Stuft Burrito – Beef 41Taco Salad (original) 42

Page 14: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

HW #15: page 47 (65, 69–74), page 69 (79, 81, 83, 85, 86, 87, 89, 91, 93)1.3 Standard Deviation Arnold ran each afternoon for 5 days. His distances (in miles) were 10, 10, 10, 10, and 10.

Find the mean (or average) number of miles that Arnold ran each day. ____________________Complete the table: Table for Arnold's distancesDistances Difference from the mean Square of difference from the

mean10

10

10

10

10

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Arnold ran. What are the units? ____________

The number above it is the variance for the distances. What are the units? ____________

Becky ran each afternoon for 5 days. Her distances (in miles) were 8, 9, 10, 11, and 12.

Find the mean (or average) number of miles that Becky ran each day. ____________________Complete the table: Table for Becky's distancesDistances Difference from the mean Square of difference from the

mean8

9

10

11

14

Page 15: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

12

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Becky ran. What are the units? ____________

The number above it is the variance for the distances. What are the units? ______________Caleb ran each afternoon for 5 days. His distances (in miles) were 7, 9, 10, 11, and 13.

Find the mean (or average) number of miles that Caleb ran each day. ____________________Complete the table: Table for Caleb's distancesDistances Difference from the mean Square of difference from the

mean7

9

10

11

13

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Caleb ran. What are the units? _____________

The number above it is the variance for the distances. What are the units? _________________

Donna ran each afternoon for 5 days. Her distances (in miles) were 3, 3, 4, 5, and 35.

Find the mean (or average) number of miles that Donna ran each day. ____________________Complete the table: Table for Donna's distancesDistances Difference from the mean Square of difference from the

mean3

3

4

5

35

15

Page 16: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Sum of squared differences:

Sum of squared differences divided by 4 (since there were 5 distances):

Square root of the sum of squared differences divided by 4:

That last value is the standard deviation for the distances Donna ran. What are the units? ___________

The number above it is the variance for the distances. What are the units? ____________

The standard deviation measures the typical distance the data are from the mean.The range, IQR, and standard deviation all measure variation or spread, but only the IQR is resistant.

Read 60–62

If s =4, then s2=16. If s2 =9, then s=3. If σ 2 =25, then σ =5. If σ =6, then σ 2 =36. Four important properties of the standard deviation:

Standard deviation ≥ 0. (0 means no variation, a large number means lots of variation.)Standard deviation units are the same as the units for the data.Standard deviation is not resistant. Standard deviation measures spread around the mean.

s=5 s=6.22 s=9.52 s=10.7

A random sample of 5 students was asked how many minutes they spent listening to music outside school hours the previous day. They responded: 20, 30, 60, 90, 120. Calculate and interpret the standard deviation.

Read 63–66Of mean, median, IQR, and standard deviation, which summary statistics will we typically use for each situation?

Symmetric Skewed

Center

16

Standard deviation VarianceSquare root of variance Square of standard deviations= sample standard deviation s2= sample varianceσ= population standard deviation σ 2= population variance

Page 17: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

Spread

17

Page 18: Wednesday, August 11 (131 minutes) - stevewillott.com ap stats notes in word/1 no…  · Web viewInterpol. 2007. 3:35. Alternative. 11. 4. Shake It Out. Florence + The Machine. 2011

HW #16: page 71 (95, 97, 99, 101–105, 107–110)FRAPPY! page 74HW #17: page 76 Chapter Review ExercisesReview Chapter 1HW #18: page 78 Chapter 1 AP Statistics Practice TestChapter 1 Test

18