17
Chapter 7 Regression

Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Embed Size (px)

Citation preview

Page 1: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Chapter 7

Regression

Page 2: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Difference between correlation and regression

• Regression (Tendency of regressing to the mean)

• In correlation there is no distinction between DV and IV

• In regression Y is the DV and X is the IV

Page 3: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Make sure you use the right graphic: Scatterplot and regression line

Page 4: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Regression equation

• R-square = r * r = variance explained = strength of determination

• Y = a + bx + e• A = intercept, the initial point where

the regression line starts• B = beta weight = slope = regression

coefficient = parameter estimate• E = error (assume zero)

Page 5: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

How can we get the slope?

• Rise /Run• Rise = change in Y• Run = change in X

Page 6: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

How can we get the regression line?

• Least square of the residuals = the best fit

Page 7: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

• To try to get the best fit, I can look at the scatterplot and hand-fit a line (the brown line).

• The bottom panel shows the residuals. I want to make the upper part (residuals above zero) and the lower part (below zero) even.

Page 8: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

• The green line is the one calculated by the computer program. I was wrong! The line is off. That is not the best fit!

Page 9: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

• When it is done correctly, the sum of the squared residuals are the least among all possible lines. The points in the residual plot should be evenly distributed.

Page 10: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Correlation does not necessarily imply causation

• Many children who received vaccine suffer from autism. Vaccine causes autism!

• Christopher Hitchen: In history so much violence done by from religious people. Religiosity inspired cruelty.

Page 11: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Misinterpretation of regression model: Ecological fallacy

• This regression model shows a negative relationship between GNI per capita and happiness scores i.e. the more money you earn, the less happiness you have.

• Should I ask my boss to cut my salary?

Page 12: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

• If I remove two outliers, the regression line is flat. i.e. whatever you earn, it has no impact on your happiness?

• Should I sit here, enjoy my life, and do nothing?

Page 13: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Using summary data to infer to individuals

• Another well-known example is the report of Wall Street Journal (June 22, 1995) showing a negative correlation between the rank of each state's average SAT score and average expenditure on education. At first glance it implies that spending less on education will improve SAT scores.– SAT Rank is ordinal.– Cost of living and expenditures vary from state to

state.– Not everyone takes the SAT. Some take ACT.

Page 14: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

• When we examine the achievement data from the National Assessment of Education Progress (NAEP) based on a representative sample, it was found that there is a positive relationship between NAEP and expenditures.

Page 15: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Misinterpretation of regression model

• An alien civilization visited our planet and collected data about our physical growth. They observed our children (from 1-10 years old) and constructed a regression model of their age and height. The aliens conclude that human is a dangerous species that will threaten them. What’s wrong with their regression model?

• In the 1980s many experts predict that by the end of the 20th century Japan would overtake the US to become the world’s largest economy. Today many experts make similar predictions about China. What is the shortcoming of this predictive model?

Page 16: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

Black swan vs. Elephant in the room• The book entitled "The Signal and the Noise" by Nate Silver also

used the collapse of Japan in the early 1990s as an example. The bloom of Japan in the 1980s was unrealistic because the real estate price could not go up forever.

• Before 2008 the majority of the US experts could not predict a crash like that of Japan would happen in the US. But Silver asserted that the 2008 crash is not a Black Swan; rather, it is an elephant in the room.

• It was right there, but no one saw it or refused to see it. This is a basic rule of regression in statistics 101. Nothing could keep rising forever!

Page 17: Chapter 7 Regression. Difference between correlation and regression Regression (Tendency of regressing to the mean) In correlation there is no distinction

In-class activity (2 points)

• Download the data set “visualization_data.jmp” from http://www.creative-wisdom.com/teaching/299/ch1/.

• Use Fit Y by X to run a simple regression model. Use scores as the dependent variable (Y) and GPA as the independent variable (X). Select Fit Line from the red triangle to get the regression result. Can GPA predict test scores?

• Is there any outlier? If so, please exclude it and re-run the regression model. Is the result different?