14
MATH 1107 Elementary Statistics Lecture 7 Regression Analysis

MATH 1107 Elementary Statistics

  • Upload
    sophie

  • View
    65

  • Download
    7

Embed Size (px)

DESCRIPTION

MATH 1107 Elementary Statistics. Lecture 7 Regression Analysis. MATH 1107 – Regression Analysis. Without question, Regression Analysis is the most heavily used tool in Statistical Modeling. - PowerPoint PPT Presentation

Citation preview

Page 1: MATH 1107 Elementary Statistics

MATH 1107Elementary Statistics

Lecture 7Regression Analysis

Page 2: MATH 1107 Elementary Statistics

Without question, Regression Analysis is the most heavily used tool in Statistical Modeling.

This is true because it enables you to predict or explain a dependent variable based upon one or more independent variables.

Regression Analysis is used in almost every industry.

MATH 1107 – Regression Analysis

Page 3: MATH 1107 Elementary Statistics

MATH 1107 – Regression Analysis

For Example:• If you were a sports agent, how would you propose a “reasonable” contract salary for your client?•If you are interested in selling your house, how can you determine an appropriate market price?• If you are the head of the admissions department in a University, how do you decide who gets accepted?•If you are an investment banker, how do you decide which funds to hold in your portfolio?

Page 4: MATH 1107 Elementary Statistics

MATH 1107 – Regression Analysis

All of the “variables” underlined would be the dependent variables – what would be the associated independent variables that we might use to predict or explain these dependent variables?

Page 5: MATH 1107 Elementary Statistics

MATH 1107 – Regression Analysis

The first step in predicting or explaining a dependent variable using an independent variable*, is evaluating the correlation of the two variables using a scatterplot.

Lets return to Median Household Income and Deathrate…

* although many independent variables can be used in regression analysis, in these notes, we will be using only one.

Page 6: MATH 1107 Elementary Statistics

STATE

Median Household

Income Alabama 25,891 Alaska 41,969 Arizona 29,593 Arkansas 23,893 California 35,173 Colorado 32,716 Connecticut 41,059 Delaware 35,739 Florida 27,456 Georgia 28,889 Hawaii 42,171 Idaho 27,784 Illinois 31,707

MATH 1107 – Regression Analysis

Page 7: MATH 1107 Elementary Statistics

Median Household Income and Death Rate

y = -0.0002x + 13.255

R2 = 0.376

3.0

5.0

7.0

9.0

11.0

13.0

20,000 25,000 30,000 35,000 40,000 45,000Median Income

De

ath

Ra

te

MATH 1107 – Regression Analysis

Page 8: MATH 1107 Elementary Statistics

MATH 1107 – Regression Analysis

Using the =CORREL(array1, array2) function in EXCEL, we can determine that the correlation between Median Income and Death Rate is -.61.

This indicates three things:1. The relationship is fairly strong – the value of -.61 is closer to –1 than it is to 0.2. The direction is negative/inverse. Meaning that as one variable goes up, the other goes down.3. The R2 value of a predictive regression equation using these two variables is .37.

Page 9: MATH 1107 Elementary Statistics

MATH 1107 – Regression AnalysisSince the correlation is pretty good, we can use these two variables to create a linear model – a linear model:

• It will have an equation in the form y=mx+b;• It will be the “best fit” of the data;• it will minimize the distances between the “actual” data points and the “predicted” points (this distance is called a “residual”)• it will enable us to predict the death rates in other states, that were NOT included in the original dataset.

Page 10: MATH 1107 Elementary Statistics

MATH 1107 – Regression AnalysisFrom this analysis, the best fit line is:

This equation was provided by EXCEL (tick the “Display Equation on Chart” option under the “Add Trendline” function).

A better way to represent this equation is:

State Death Rate = (-0.0002 * Median State Income) + 13.255

y = -0.0002x + 13.255

Page 11: MATH 1107 Elementary Statistics

MATH 1107 – Regression AnalysisLets interpret these values directly:

-.0002 is the slope of the line. It can be translated directly to mean “For every one dollar of additional median income, the death rate will decrease by .0002”.

The slope tells you how the dependent variable changes with one unit change in the independent variable.

Page 12: MATH 1107 Elementary Statistics

MATH 1107 – Regression AnalysisLets interpret these values directly:

13.255 is the Y-intercept. Algebraically, this is the point at which the line will cross the y-axis when the x-value is 0. Since it is not reasonable to have a state with 0 Median Income, its not really interpreted directly.

Page 13: MATH 1107 Elementary Statistics

STATE MEDIAN INCOME

Virginia 38,223

Washington 34,064

West Virginia 20,301

Wisconsin 33,415

Wyoming 30,379

MATH 1107 – Regression AnalysisNow, using the model we developed, predict the death rates for the states below:

Page 14: MATH 1107 Elementary Statistics

MATH 1107 – Regression AnalysisNow, lets determine our “residuals” or how far off we were for each prediction.