7
1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity If a scatter plot has an obvious curve in the pattern, you can sometimes use the following the achieve a linear relationship. A. Exponential Model 1. Procedure a. Take the ln(y) and try a scatterplot of x and ln(y). If this takes out the curve, then it is a power model. b. You can still use the Linear Regression from this graph to predict values, but you must transform them back, i.e., You evaluate the line for x = 2 and find that ln(y) = 2 By definition of log, this translates to e 2 = y or y = 7.389 B. Power Model 1. Procedure a. Make a scatterplot of ln(x) and ln(y). It this takes out the curve, then it is an exponential model. b. You can still use the Linear Regression from this graph to predict values, but you must transform them back, i.e., Regression Line: ln(period) –hat = 0.0002544 + 1.49986ln(distance) Use properties of exponents to simplify: e ln( period ) = e 0.0002544+1.49986 ln( dis tan ce ) period hat = e 0.0002544 e ln( dis tan ce ) 1.49986 period hat = 1.0002544(dis tan ce) 1.49986 C. Examples 1. Imagine you have been put in charge of organizing a fishing tournament in which prizes will be given for the heaviest Atlantic Ocean rockfish caught. You know that many of the fish caught during the tournament will be measured and released. You are also aware that trying to weigh a fish that is flopping around in a moving boat will probably yield inaccurate results. You want to measure the fish and convert the length of the fish to its weight. You get the following information from the nearly marine research laboratory for length (cm) and weight (gms) Length: 5.2 8.5 11.5 14.3 16.8 19.2 21.3 23.3 25.0 26.7 Weight: 2 8 21 28 69 117 148 190 264 293 Length: 28.2 29.6 30.8 32.0 33.0 34.0 34.9 36.4 37.1 37.7 Weight: 318 371 455 504 518 537 651 719 726 810 2. Gordon Moore, one of the founders of Intel Corporation predicted in 1965 that the number of transistors on an integrated circuit chip would double every 18 months. This is Moore’s law, one way to measure the revolution in computing. Here are the data on the dates and the number of transistors for Intel Microprocessors. Processor 4004 8008 8080 8086 286 386 486 DX Pentium Pentium 2 Pentium 3 Pentium 4 Date 1971 1972 1974 1979 1982 1985 1989 1993 1997 1999 2000 Transistors 2250 2500 5000 29000 120K 275K 1180K 3100K 7500K 24000K 42000K Processor Itanium 2 Itanium 2 w/9MB cache Dual-core Itanium 2 6-core Xeon 7400 8core Xeon Year 2003 2004 2006 2008 2010 Transistors 220,000K 592,000K 1,700,000K 1,900,000K 2,300,000K

AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

1

AP Statistics 15 t -Test for Slope

I. Transforming Data to achieve linearity If a scatter plot has an obvious curve in the pattern, you can sometimes use the following the achieve a linear relationship. A. Exponential Model 1. Procedure a. Take the ln(y) and try a scatterplot of x and ln(y). If this takes out the curve, then it is a power model. b. You can still use the Linear Regression from this graph to predict values, but you must transform them back, i.e., You evaluate the line for x = 2 and find that ln(y) = 2 By definition of log, this translates to e2 = y or y = 7.389 B. Power Model 1. Procedure a. Make a scatterplot of ln(x) and ln(y). It this takes out the curve, then it is an exponential model. b. You can still use the Linear Regression from this graph to predict values, but you must transform them back, i.e., Regression Line: ln(period) –hat = 0.0002544 + 1.49986ln(distance) Use properties of exponents to simplify: eln( period ) = e0.0002544+1.49986 ln(dis tance) period − hat = e0.0002544 • eln(dis tance)

1.49986

period − hat = 1.0002544(dis tance)1.49986 C. Examples 1. Imagine you have been put in charge of organizing a fishing tournament in which prizes will be given for the heaviest Atlantic Ocean rockfish caught. You know that many of the fish caught during the tournament will be measured and released. You are also aware that trying to weigh a fish that is flopping around in a moving boat will probably yield inaccurate results. You want to measure the fish and convert the length of the fish to its weight. You get the following information from the nearly marine research laboratory for length (cm) and weight (gms) Length: 5.2 8.5 11.5 14.3 16.8 19.2 21.3 23.3 25.0 26.7 Weight: 2 8 21 28 69 117 148 190 264 293 Length: 28.2 29.6 30.8 32.0 33.0 34.0 34.9 36.4 37.1 37.7 Weight: 318 371 455 504 518 537 651 719 726 810 2. Gordon Moore, one of the founders of Intel Corporation predicted in 1965 that the number of transistors on an integrated circuit chip would double every 18 months. This is Moore’s law, one way to measure the revolution in computing. Here are the data on the dates and the number of transistors for Intel Microprocessors. Processor 4004 8008 8080 8086 286 386 486 DX Pentium Pentium 2 Pentium 3 Pentium 4 Date 1971 1972 1974 1979 1982 1985 1989 1993 1997 1999 2000 Transistors 2250 2500 5000 29000 120K 275K 1180K 3100K 7500K 24000K 42000K Processor Itanium 2 Itanium 2 w/9MB cache Dual-core Itanium 2 6-core Xeon 7400 8core Xeon Year 2003 2004 2006 2008 2010 Transistors 220,000K 592,000K 1,700,000K 1,900,000K 2,300,000K

Page 2: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

2

II. t-Test for Slope (LinRegtTest

A. Procedure: 1. Hypotheses: Ho: β =0 (There is no relationship between the variables) Ha: β ≠ 0 OR Ha: β > 0 OR Ha: β < 0

2. Conditions for Inference (Confidence Interval and t-Test) a. R: Random – the data are produced from a well-designed random sample or a randomized experiment b. L: Linear relationship – check scatterplot and residual plot (no pattern indicates a linear relationship) c. I: Independence – Repeated responses y are independent of each other. d. N: Normal Distribution – the response y varies according to a Normal distribution (check modified boxplot of residuals – look for outliers and strong skewness) e. E: Equal Variance – the standard deviation of y is the same for all values of x (check residual plot to see if the spread of the residuals stays approximately the same as x increases)

3. Calculations: df = n-2

t = b − βSEb

= bSx n −1

NOTE: β =0 (horizontal line when there is no association)

NOTE: SEb

rarely has to calculated by hand and is usually given by regression software.

4. P-value: depends on Ha Ha: β ≠ 0 à P(t<# or t> #) = 2P(t<#) Ha: β > 0 à P(t > #) Ha: β < 0 à P(t < #) 5. Interpretation: Same as for other tests. If p-value is less than the significance level (α ), there is sufficient evidence Reject the Ho and conclude there is a relationship between

B. Examples: 1. Back to problem from yesterday

Page 3: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

3

2. Do Longer Drives Mean Lower Scores on the PGA Tour? Recent advances in technology have led to golf balls that fly farther, clubs that generate more speed at impact, and swings that have been perfected through computer video and analysis. In addition, today’s professional golfers are fitter than ever. The net result is many more players routinely hit drives traveling 300 yards or more. Does greater distance off the tee translate to better (lower scores)? Data on mean drive distance (in yards) and mean score per round from an SRS of 19 of the 197 players on the 2008 Professional Golfers Association (PGA) Tour are shown in the table below. Does these data give convincing evidence that the slope for the true regression line for all 197 PGA Tour golfers is negative?

Player Mean Mean Player Mean Mean Distance Score Distance Score (yards) per round (yards) per round Paul Claxton 275.4 72.1 Robert Gamez 281.6 71.64 K.J. Choi 286.1 70.26 Nick Watney 302.9 70.79 John Huston 291.5 70.81 Tom Scherrer 287.7 71.98 Cliff Kresege 280.9 70.95 J. B. Holmes 310.3 70.6 Davis Love III 301.3 70.3 Bill Haas 296.7 70.64 Chris Riley 279.1 71.24 Glen Day 274.9 70.62 Nick Flanagan 291.8 71.79 Scott McCarron 283.1 71.31 Pat Perez 294.2 70.08 Ryan Palmer 294.2 70.62 Olin Browne 272.1 71.58 Charles Warren 301.1 72.11 Omar Uresti 274.9 71.27

Page 4: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

4

3. 2001 #6

Page 5: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

5

4. 2007B #6

Page 6: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

6

Page 7: AP Statistics 15 t -Test for Slope I. Transforming Data to achieve … - tTest Handout.pdf · 2013-04-23 · 1 AP Statistics 15 t -Test for Slope I. Transforming Data to achieve linearity

7