Upload
sophie
View
65
Download
7
Embed Size (px)
DESCRIPTION
MATH 1107 Elementary Statistics. Lecture 7 Regression Analysis. MATH 1107 – Regression Analysis. Without question, Regression Analysis is the most heavily used tool in Statistical Modeling. - PowerPoint PPT Presentation
Citation preview
MATH 1107Elementary Statistics
Lecture 7Regression Analysis
Without question, Regression Analysis is the most heavily used tool in Statistical Modeling.
This is true because it enables you to predict or explain a dependent variable based upon one or more independent variables.
Regression Analysis is used in almost every industry.
MATH 1107 – Regression Analysis
MATH 1107 – Regression Analysis
For Example:• If you were a sports agent, how would you propose a “reasonable” contract salary for your client?•If you are interested in selling your house, how can you determine an appropriate market price?• If you are the head of the admissions department in a University, how do you decide who gets accepted?•If you are an investment banker, how do you decide which funds to hold in your portfolio?
MATH 1107 – Regression Analysis
All of the “variables” underlined would be the dependent variables – what would be the associated independent variables that we might use to predict or explain these dependent variables?
MATH 1107 – Regression Analysis
The first step in predicting or explaining a dependent variable using an independent variable*, is evaluating the correlation of the two variables using a scatterplot.
Lets return to Median Household Income and Deathrate…
* although many independent variables can be used in regression analysis, in these notes, we will be using only one.
STATE
Median Household
Income Alabama 25,891 Alaska 41,969 Arizona 29,593 Arkansas 23,893 California 35,173 Colorado 32,716 Connecticut 41,059 Delaware 35,739 Florida 27,456 Georgia 28,889 Hawaii 42,171 Idaho 27,784 Illinois 31,707
MATH 1107 – Regression Analysis
Median Household Income and Death Rate
y = -0.0002x + 13.255
R2 = 0.376
3.0
5.0
7.0
9.0
11.0
13.0
20,000 25,000 30,000 35,000 40,000 45,000Median Income
De
ath
Ra
te
MATH 1107 – Regression Analysis
MATH 1107 – Regression Analysis
Using the =CORREL(array1, array2) function in EXCEL, we can determine that the correlation between Median Income and Death Rate is -.61.
This indicates three things:1. The relationship is fairly strong – the value of -.61 is closer to –1 than it is to 0.2. The direction is negative/inverse. Meaning that as one variable goes up, the other goes down.3. The R2 value of a predictive regression equation using these two variables is .37.
MATH 1107 – Regression AnalysisSince the correlation is pretty good, we can use these two variables to create a linear model – a linear model:
• It will have an equation in the form y=mx+b;• It will be the “best fit” of the data;• it will minimize the distances between the “actual” data points and the “predicted” points (this distance is called a “residual”)• it will enable us to predict the death rates in other states, that were NOT included in the original dataset.
MATH 1107 – Regression AnalysisFrom this analysis, the best fit line is:
This equation was provided by EXCEL (tick the “Display Equation on Chart” option under the “Add Trendline” function).
A better way to represent this equation is:
State Death Rate = (-0.0002 * Median State Income) + 13.255
y = -0.0002x + 13.255
MATH 1107 – Regression AnalysisLets interpret these values directly:
-.0002 is the slope of the line. It can be translated directly to mean “For every one dollar of additional median income, the death rate will decrease by .0002”.
The slope tells you how the dependent variable changes with one unit change in the independent variable.
MATH 1107 – Regression AnalysisLets interpret these values directly:
13.255 is the Y-intercept. Algebraically, this is the point at which the line will cross the y-axis when the x-value is 0. Since it is not reasonable to have a state with 0 Median Income, its not really interpreted directly.
STATE MEDIAN INCOME
Virginia 38,223
Washington 34,064
West Virginia 20,301
Wisconsin 33,415
Wyoming 30,379
MATH 1107 – Regression AnalysisNow, using the model we developed, predict the death rates for the states below:
MATH 1107 – Regression AnalysisNow, lets determine our “residuals” or how far off we were for each prediction.