Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
1
Linear Regressionand Correlation
Dr. Rick Jerz
1
1
Correlation and Regression Analysis
• Correlation Analysis is the study of the relationship between variables. It is also defined as a group of techniques to measure the association between two variables.
• Regression Analysis is a technique used to express the relationship between two variables. If the relationship is assumed to be a straight line, this is called “linear regression.”
• Variables must be either interval or ratioscales
2
2
Three Questions
1. Are two variables related? (correlation analysis)
2. Is there a linear relationship between two variables? (linear regression analysis)
3. How strong are these relationships?
3
3
2
Correlation and Linear Regression
4
4
Correlation & Regression Example
• The sales manager of Copier Sales of America, which has a large sales force throughout the United States and Canada, wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month. The manager selects a random sample of 15 representatives and determines the number of sales calls each representative made last month and the number of copiers sold.
5
5
Step 1: Look at the Data(Plot the Data)
• A Scatter Diagram is a chart that portrays the relationship between the two variables. It is the usual first step in correlation analysis.
• The Dependent Variable is the variable being predicted or estimated. It is plotted on the y-axis.
• The Independent Variable provides the basis for estimation. It is the predictor variable. It is plotted on the x-axis.
6
6
3
Step 1: Look at the Data(Plot the Data)
7
7
Step 2: Are they correlated?
8
8
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. It requires interval or ratio-scaled data. • It can range from -1.00 to 1.00• Values of -1.00 or 1.00 indicate perfect and
strong correlation• Values close to 0.0 indicate weak correlation• Negative values indicate an inverse relationship
and positive values indicate a direct relationship
9
9
4
Correlation Coefficient –Interpretation
10
10
Correlation Coefficient Equation, r
𝑟 =𝑛∑𝑋𝑌 − ∑𝑋∑𝑌
𝑛∑𝑋! − ∑𝑋 ! 𝑛∑𝑌! − ∑𝑌 !
𝑟 = 0.865
11
11
Correlation Coefficient Calculation
12
12
5
“The Variables are Related”
• A strong coefficient of correlation only means that there is a strong relationship.
• It doesn’t mean that one “causes” the other
• Be careful of spurious relationships
13
13
Coefficient of Determination
The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. • It ranges from 0 to 1• It does not give any information on the direction
of the relationship between the variables
14
14
Coefficient of Determination (r2) - Example
• The coefficient of determination, r2, is 0.748, found by (0.865)2
• This is a proportion or a percent; we can say that 75 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls.
15
15
6
Step 3: Testing the Significance of the Correlation Coefficient
• H0: r = 0 (the correlation in the population is 0)
• H1: r ≠ 0 (the correlation in the population is not 0)• Reject H0 if:• t > ta/2,n-2 or t < -ta/2,n-2
16
16
t-Test Using Tables
The computed t (6.216) is within the rejection region (4.320,) therefore, we will reject H0. This means the correlation in the population is not zero. From a practical standpoint, it indicates to the sales manager that there is correlation with respect to the number of sales calls made and the number of copiers sold in the population of salespeople.
17
17
Step 4: Is there a linear relationship?
Regression Analysis
• An equation?
• A linear equation
18
18
7
Example: Robot Repeatability Data
19
19
20
20
Bivariate Regression Analysis
• Bivariate Regression analyzes the relationship between two variables.
• It specifies one dependent (response) variable and one independent (predictor) variable.
• This hypothesized relationship may be linear, quadratic, or whatever.
21
21
8
Linear Regression
• Unknown parameters areβ0 (or a) Interceptβ1 (or b) Slope
• The assumed model for a linear relationship is• yi = β 0 + β 1xi + ei, for all observations (i = 1, 2, …,
n)• or,
yi = a + bxi + ei, for all observations (i = 1, 2, …, n)
• The error term is not observable, is assumed normally distributed with mean of 0 and standard deviation σ.
22
22
Linear Regression Model
• Y “Hat”, is the estimate of Y given X• a is the Y-intercept• b is the slope of the line• X is any value of the independent variable
bXaY +=ˆ
23
23
Which Line is Correct?
24
24
9
Least Squares Principle
• Determining a regression equation by minimizing the sum of the squares (the variance) of the vertical distances between the actual Y values and the predicted values of Y.
25
25
Calculating the y-Intercept, a, and Slope, b
22 XnXYXnXYb
-å-å
=
XbYa -=
26
26
Example: Finding the Regression Equation
!𝑌 = 19.9632 + 0.2608𝑥
27
27
10
Example: Using the Regression Equation
• How many copiers would we expect to sell if we made 96 sales calls?
!𝑌 = 19.9632 + 0.2608 ∗ 96!𝑌 = 45
28
28
Step 3: Testing the Significance of the Slope
• H0: 𝛃 = 0 (the population slope is 0)• H1: 𝛃 ≠ 0 (the population slope is not 0)• Reject H0 if:• t > ta, n-2 or t < -ta, n-2
29
29
Assumptions Underlying Linear Regression
• For each value of X, there is a group of Y values, and these • Y values are normally distributed. The means of
these normal distributions of Y values all lie on the straight line of regression.
• The standard deviations of these normal distributions are equal.
• The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.
30
30
11
Assumptions: Graphic
31
31
Confidence and Prediction Intervals
• Prediction interval• Used to predict the mean value of Y for a given
value of x (one x value)• Confidence interval• Used to predict the mean value of Y for a group
of x values
32
32
Confidence Interval Estimate
• Use the formula above by substituting the numbers computed in previous slides
• Thus, the 95 percent confidence interval for the average sales of all sales representatives who make 50 calls is from 27.4 up to 38.6 copiers.
33
33
12
Prediction Interval Estimate
• Use the formula above by substituting the numbers computed in previous slides
• Thus, the 95 percent prediction interval for the average sales of all sales representatives who make 50 calls is from 17.4 up to 48.6 copiers.
34
34
Graphic Illustration
35
35