31
Stat 217 – Day 25 Regression

Stat 217 – Day 25 Regression. Last Time - ANOVA When? Comparing 2 or means (one categorical and one quantitative variable) Research question Null

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Stat 217 – Day 25

Regression

Page 2: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Last Time - ANOVA

When? Comparing 2 or means (one categorical and one

quantitative variable) Research question

Null hypothesis: 1= 2 = … = I (no association between the two variables)

Alternative hypothesis: at least one differs (there is an association between the two variables)

Page 3: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Example (with 3 groups…)

Not significant Significant

Page 4: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

How? Compare differences in means vs. the natural

variability in the data (s) Compare test statistic to F distribution, p-value Output: test statistic, p-value, ratio of variability

between groups to variability within groups Demo

Strong evidence (p-value = .03 < .05) that the type of disability affected the ratings, on average, of these 70 students

Page 5: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Technical Conditions

Technical conditions: Randomness: random sampling or random

assignment Sample sizes: Normal populations Equal standard deviations: Check ratio of sample

standard deviations

Kinda need same shape and spread for a comparison of just means to be reasonable

Page 6: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Technical Conditions

1) RandomnessRandom assignment

2) Each population follows a normal distribution

3) Each population has the same standard deviation1.794/1.482 < 2

Page 7: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Summary: Comparing several groupsCategorical response H0: 1= 2 = … = I

Ha: at least one differs

Is test statistic large? Chi-square test

Expands 2 sample z-test

Quantitative response H0: 1= 2 = … = I

Ha: at least one differs

Is test statistic large? ANOVA

Expands 2 sample t-test

No association between variables

Is an association between variables

Page 8: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Exam 2 comments

Pet owners and CPR(a) Make sure interpret the calculated interval

“55% of pet owners “ – sample or population?

(b) Technical conditions Using the ones for categorical data

(c) See whether .5 is inside CI

(d) Interpretation of p-value: chance of data at least this extreme if null hypothesis is true

(e) Why is sample size information important? Sampling variability

Page 9: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Exam 2 comments

Anchoring(a) Make sure clear which is which

(b) “TC met”, TOS applet with 2 means

(c) Chicago average estimate is 51K to 1.6 million higher than Green Bay average (direction!)

(d) What does it mean to say it’s significant? What is the actual conclusion to the research question

Page 10: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Exam 2 comments

Lab 6

Page 11: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Exam 2 comments

Multiple choice1. B

2. C – either is possible

3. B – small p-value eliminates “random chance” as a plausible explanation

4. B – it’s only unusual if she’s guessing (7s and 11s are only unusual for fair dice)

Extra Credit More likely to get a value far from mean with

smaller sample size (e.g., n =1)

Page 12: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Next Topic: Two quantitative variables Graphical summary Numerical summary Model to allow predictions Inference beyond sample data

Page 13: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Activity 26-1 (p. 532)

Have a sample of 20 homes for sale in Arroyo Grande in 2007 Variable 1 = house price Variable 2 = house size

Is there a relationship between these 2 variables? Does knowing the house size help us predict its

price?

Page 14: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

1) Graphical summary: scatterplotPrice vs. size

1. DirectionPositive or negative?

2. StrengthHow closely follow the pattern

3. FormLinear?

Page 15: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Describing Scatterplots

Activity 26-3 (p. 536)

Positive None Negative

Strong Weak Strong

DirectionStrengthForm: Linear or not

Page 16: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

2) Numerical summary: Correlation coefficient (Act 27-1)

.994 .889 .510 -.081 -.450 -.721 -.907

Page 17: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Temperatures vs. Month

Direction: positive then negative Form: nonlinear Strength: very strong

r = .257

Page 18: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Example 1: Price vs. Size

r = .780

What do you learn from these numerical and graphical summaries?

Page 19: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Turn in, with partner Activity 26-6

parts b, c, and e

For Thursday Pre-lab for Lab 9

For Monday Activity 26-7 HW 7

Page 20: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

2) Guess the correlation

Applet

Page 21: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

3) Model

IF it is linear, what line best summarizes the relationship? Demo

Moral: The “least squares regression line” minimizes the sum of the squared residuals

Page 22: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Interpreting the equation (p. 577)

a = intercept, b = slope Slope = predicted change in response associated

with a one-unit increase in the explanatory Intercept = predicted value of response when

explanatory variable = 0

bxay ˆ Explanatory variableResponse variable

Page 23: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

3) Model?

Price-hat = 265222 + 169 size Slope = each additional square foot in house size

is associated with a $169 increase in predicted price (price per foot) Be a little careful here, don’t sound too “causal” I really do like the “predicted” in here

Intercept = a house of size zero (empty lot?) is predicted to cost $265,222 Be a little careful here, don’t have any houses in data

set with size near 0…

Page 24: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Using the model

Price-hat = 265222 + 169 size Predicted price for a 1250 square foot house?

Predicted price for a 3000 square foot house? Extrapolation: Very risky to use regression equation to

predict values far outside the range of x values used to derive the line!

Page 25: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

4) Is this relationship statistically significant? Is it possible there is no relationship between

house price and size in the population of all homes for sale at that time, and we just happened to coincidently obtain this relationship in our random sample?

Or is this relationship strong enough to convince us it didn’t happen just by chance but reflects a genuine relationship in the population?

Page 26: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

p. 605

Let represent the slope of the population regression line

H0: = 0; no relationship between price and size in population

Ha: ≠ 0; is a relationship < negative; > positive

Idea: Want to compare the observed sample slope to zero, does it differ more than we would expect by chance?

Page 27: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Assume = 0

How many standard deviations away?

Variation in sample slopes

Sample slopes our slope?

Standard error = SE(b)

169

Page 28: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Minitab

The regression equation is

Price = 265222 + 169 Size (sq ft)

 

Predictor Coef SE Coef T P

Constant 265222 42642 6.22 0.000

Size (sq ft) 168.59 31.88 5.29 0.000

 

Regression equation(add hat)

b

a

SE(b) Two-sided

t=(observed slope-hypothesized slope)standard error of slope= (b – 0)/SE(b)= (168.59-0)/31.88 = 5.29

Page 29: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Turn in, with partner Price vs. pages: Interpret slope/evaluate p-value

For Tuesday Activities 26-7, 28-5 Be working on Lab 9 and HW 7

The regression equation is Price = - 3.4 + 0.147 Pages

Predictor Coef SE Coef T PConstant -3.42 10.46 -0.33 0.746Pages 0.14733 0.01925 7.65 0.000

Page 30: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

Describing Scatterplots

Activity 26-6 (p. 539)

Positive, nonlinear, fairly strong Causation?

Strength: How closely do the points follow the pattern?

DirectionStrengthForm: Linear or not

Page 31: Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null

For Monday

Activities 26-7, 28-5 Be working on Lab 9 and HW 7