13
Lesson 3 - 3 Correlation and Regression Wisdom

Lesson 3 - 3

Embed Size (px)

DESCRIPTION

Lesson 3 - 3. Correlation and Regression Wisdom. Knowledge Objectives. Recall the three limitations on the use of correlation and regression. Explain what is meant by an outlier in bivariate data. Explain what is meant by an influential observation and how it relates to regression. - PowerPoint PPT Presentation

Citation preview

Page 1: Lesson 3 - 3

Lesson 3 - 3

Correlation and Regression Wisdom

Page 2: Lesson 3 - 3

Knowledge Objectives• Recall the three limitations on the use of correlation

and regression.

• Explain what is meant by an outlier in bivariate data.

• Explain what is meant by an influential observation and how it relates to regression.

• Define a lurking variable.

• Give an example of what it means to say “association does not imply causation.”

Page 3: Lesson 3 - 3

Construction Objectives• Given a scatterplot in a regression setting, identify

outliers and influential observations

• Explain how correlations based on averages differ from correlations based on individuals

Page 4: Lesson 3 - 3

Vocabulary• Influential Observation – an observation that if removed would

markedly change the result of the regression calculation

Page 5: Lesson 3 - 3

Limitations

• Correlation and regression describe only linear relationships

• Extrapolation (using model outside range of the data) often produces unreliable predications

• Correlation is not resistant (to outliers!)

Page 6: Lesson 3 - 3

Outliers vs Influential Observation

• Outlier is an observation that lies outside the overall pattern of the other observations– Outliers in the Y direction will have large

residuals. but may not influence the slope of the regression line

– Outliers in the X direction are often influential observations

• Influential observation is one that if by removing it, it would markedly change the result of the regression calculation

Page 7: Lesson 3 - 3

Example 1Does the age at which a child begins to talk predict later score on a test of metal ability? A study of the development of 21 children recorded the age in months at which they spoke their first word and their later Gesell Adaptive Score (GAS).

Child Age GAS Child Age GAS Child Age GAS

1 15 95 8 11 100 15 11 102

2 26 71 9 8 104 16 10 100

3 10 83 10 20 94 17 12 105

4 9 91 11 7 113 18 42 57

5 15 102 12 9 96 19 17 121

6 20 87 13 10 83 20 11 86

7 18 93 14 11 84 21 10 100

Page 8: Lesson 3 - 3

Example 1 cont

a) What is the equation of the LS regression line used to model this data?

b) What is the interpretation of this data?

y-hat = 109.8738 – 1.127x r = -0.64

The scatter plot and the slope of the regression line indicates a negative association. Children who begin to speak later tend to have lower test scores than early talkers. The slope suggests that for every month older a child is when they begin to speak, their score on the Gesell test will decrease by about 1.13 points. The y-intercept has no real meaning in this case.

Page 9: Lesson 3 - 3

Example 1 cont

c) Are there any outliers?

d) Are there any influential observations?

Child #19 is an outlier in the Y-direction and child #18 is an outlier in the X-direction.

Child #18 is an outlier in the X-direction and also an influential observation because it has a strong influence on the positioning of the regression line.

Page 10: Lesson 3 - 3

Example 1 contScatterplot w/ Regression Line Residual Plot

Page 11: Lesson 3 - 3

Lurking or Extraneous Variable

• The relationship between two variables can often be misunderstood unless you take other variables into account

• Association does not imply causation!• Instances of Rocky Mt spotted fever and

drownings reported per month are highly correlated, but completely without causation

Page 12: Lesson 3 - 3

Remember Sampling Distributions

• When we looked at individual values, they had much broader spreads (variances) than when we looked at the distributions of x-bar

• Same is true with correlations based on averaged data – strong correlations may exist between averages, but individuals will have much greater variances

• Correlations based on averages are usually too high when applied to individuals.

Page 13: Lesson 3 - 3

Summary and Homework

• Summary– Correlation and regression must be interpreted with

caution– Plot data to be sure that the relationship is roughly

linear and to detect outliers– Check for influential observations that substantially

change the regression line– Lurking variables may explain the relationship

between the explanatory and response variables

• Homework– pg 242-3 3.63-67