Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Prepared by:
Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education
Faculty of Educational Studies
Universiti Putra Malaysia
Serdang
– An extension to Pearson correlation analysis
– Used to:
1. Determine relationship between
variables
2. Make prediction
– Regression analysis is a parametric statistic
– To apply regression analysis
1. DV must be interval or ratio
2. IV must also be interval or ratio.
– If IV is non-metric, need to transform into
dummy variable (assign as 0 and 1)
To apply regression analysis
1. The independent and dependent variables
are bivariately normally distributed in the
population
2. The cases represents a random sample from
the population
Derive regression/
prediction equation
Hypothesis Testing
Regression
Model Slope
22
1
n
XX
n
YXXY
SSX
SXYb
)(
)()(
xbyb 10
Y 0b 11 Xb
– Calculate b1 and b0
21 884.184.533.7ˆ XXY
Regression Model
Slope
Regression equation: Ŷ = b0 + b1X1 + b2X2
Multiple correlation coefficient (R)
Coefficient of Determination (R2)
Descriptive
Inferential
Components of
Hypothesis Tests:
Assessing Regression Model-Fit
Assessing the Predictor Variable (Slope)
Regression Model
HO: Y = β0 + ei
HA: Y = β0 + β1X1 + ei
Slope
HO: β1 = 0
HA: β1 ≠ 0
β1 > 0
β1 < 0
– Equation of a straight line:
Y = mx + c
– Regression assumes a linear relationship
between variables
– Regression equation:
Yi = b0 + b1Xi + i
Y
X
b1
b0 ΔX
ΔY
Average assignment scores
1110987654
Te
st
sco
res
100
90
80
70
60
50
40
A plot of paired observations of X and Y
The best fit line
Use the least squares
method to identify the
line
The line is called the
least squares
regression line
This method will
minimize SSE
Which one is the
best-fit line?
The line that minimize the sum of squared difference
● ●
●
● ●
●
●
●
●
●
+
+
+
+
-
-
-
-
-
Y
X
Regression Analysis Steps in
– Calculate b1
22
1
n
XX
n
YXXY
SSX
SXYb
)(
)()(
xbyb 10
– Calculate b0
XbbY 10 ˆ
– Present the regression/prediction equation
Y
X
b1
b0 ΔX
ΔY
Predicted value of Y
b0 Y-intercept
b1 Slope (regression
coefficient)
Y
– Once you have determined the best fit line,
you may want to assess how well the line fits
the actual data
– You are assessing the Goodness-of-Fit of
the regression model
– Use F-ratio to test on the regression model
fit
– Calculate three sum of squares:
1. Sum of square Total
2. Sum of square Regression
3. Sum of square Residual/Error
●
● ●
●
●
●
● ●
● Y
X SST
●
● ●
●
●
● ●
● Y
X
●
SSR
● ●
●
● ●
● ●
●
●
Y
X SSE
SST uses the differences between the observed
data and the mean value of Y
SST = Σ (Y - )2 Y
●
●●
●
●
●
●●
●YY
XX
● ●
●
●●
YY
XX
●●
●
●
SSE uses the differences between the observed
data and the regression line
SSE = Σ (Y - )2 Y
SSR uses the differences between the mean
value of Y and the regression line
SSR = Σ ( )2 YY ˆ
●
●●
●
●
●●
●YY
XX
●
– The basic approach to predict outcome ( ) is
to use mean
– If the value of SSR is large, the regression
model produces better improvement in the
prediction over use of mean
– If the value of SSR is small, the regression is
no better than using mean
– Coefficient of determination, R2 is a measure
of proportion of improvement due to the
regression model
Y
– Indicated the proportion of improvement due
to the regression model
– R2 ranges between 0 to 1
– To express as percentage, multiply R2 by 100
– Constitute the amount of variance in the
dependent variable explained by the model
OR independent variable
– Formula to calculate R2:
SST
SSRR 2
1. State the null and alternative hypotheses:
HO: Y = β0 + ei
HA: Y = β0 + β1X1 + ei
2. Calculate the test statistics
F-ratio
3. Determine critical value
4. Decision making
5. Conclusion
Steps in Hypothesis test:
1. Calculate Sum of Squares
a. Total sum of squares
b. Regression sum of squares
c. Error (Residual) sum of squares
22
n
YYSST
)(
2
SSX
SXYSSR
)(
SSRSSTSSE
2. Calculate Degrees of Freedom (df)
a. Regression
dfReg = p
b. Error (Residual)
dfError = n – p – 1
c. Total
dfTotal = n – 1
3. Calculate Mean Squares
a. Mean squares Regression (MSR)
b. Mean squares Error (MSE)
4. Prepare Summary ANOVA Table
Source SS df MS F
Regression
Error
Total
MSE
MSRSSR p MSR
SSE n-p-1 MSE
SST n-1
)(ppnF 1
Criteria Decision
Fcal > Fcritical Reject HO
Fcal ≤ Fcritical Fail to reject HO
Manual
Criteria Decision
Sig-F < α Reject HO
Sig-F ≥ α Fail to reject HO
SPSS
Decision Criteria:
Reject HO:
Regression model fits the data
(or there is a significant relationship
between X and Y)
Fail to reject HO:
Regression model does not fit data
(or there is no significant relationship
between X and Y)
From the Summary ANOVA table:
Coefficient of determination, R2
SST
SSRR 2
Multiple correlation
coefficient, R
2RR
SSY
SXYbR
)(1
Amount of variance in Y
explained by X
Ranges: 0 ≤ R2 ≤ 1
Relationship between
X and Y
Ranges: 0 ≤ R ≤ 1
– Test on the regression coefficient (b1) or Slope
i.e. Testing contribution of X on Y
– b1 represents the change in Y resulting from a
unit change in X
– Use t-test for the hypothesis test
– Steps in hypothesis testing
1. Hypotheses
HO: β1 = 0
HA: β1 ≠ 0
β1 > 0
β1 < 0
1. State the null and alternative hypotheses
2. Calculate the test statistics
t-value
3. Determine critical value
4. Decision making
5. Conclusion
HO: β1 = 0
HA: β1 ≠ 0
β1 > 0
β1 < 0
SSX
MSE
bt 11
Summary ANOVA Table
Source SS df MS F
Regression
Error
Total
MSE
MSRSSR p MSR
SSE n-p-1 MSE
SST n-1
dfdf
tORt ,,
2
df = n - 2
Criteria Decision
|tcal| > |tcritical| Reject HO
|tcal| ≤ |tcritical| Fail to reject HO
Manual
Criteria Decision
Sig-t < α Reject HO
Sig-t ≥ α Fail to reject HO
SPSS
Decision Criteria:
Reject HO:
There is significant contribution of X
towards Y (or there is a significant
relationship between X and Y)
Fail to reject HO:
There is no significant contribution of
X towards Y (or there is no significant
relationship between X and Y)
Example 1:
Data were collected from a randomly selected sample to
determine relationship between average assignment scores
and test scores in statistics. Distribution for the data is
presented in the table below. Data set:
Scores
ID Assign Test
1 8.5 88
2 6 66
3 9 94
4 10 98
5 8 87
6 7 72
7 5 45
8 6 63
9 7.5 85
10 5 77
Data: 5950 SL Regression 1 Class
1. Calculate b1 and b0 and
derive the prediction equation
2. Test the hypothesis for the
regression model at α = .05
3. Calculate coefficient of
determination and multiple
correlation coefficient.
Interpret the two values.
4. Test hypothesis for the slope
at .05 level of significance.
1. Derive Regression/Prediction equation
ID X Y
1 8.5 88
2 6 66
3 9 94
4 10 98
5 8 87
6 7 72
7 5 45
8 6 63
9 7.5 85
10 5 77
Summary stat:
n 10
ΣX 72
ΣY 775
ΣX2 544.5
ΣY2 62,441
ΣXY 5,795.5
xy 257.805.18ˆ
Prediction equation:
Ex 1 – Deriving prediction equation
257.81.26
5.215
2
10
)72(5.544
10
)775()72(5.795,5
22
1)(
)()(
n
XX
n
YXXY
b
050.18
)2.7(257.85.77
xbyb 10
Interpretation of the regression equation
X
Y
18.05
ΔX
ΔY
| | | | | |
xy 257.805.18ˆ
For every 1 unit increase
in X, Y will increase by
8.257 units
Ex 1 – Deriving prediction equation
2. Hypothesis test – Regression model
HO: Y = β0 + ei
HA: Y = β0 + β1X1 + ei
22 )(
n
YYSST
b. Calculate test statistic
Sum of squares
a. Hypotheses
5.378,2
5.062,60441,62
2
10
775441,62
320.779,1
1.26
5.215
)(
2
2
SSX
SXYSSR
180.599
320.779,15.378,2
SSRSSTSSE
Prepare Summary ANOVA table
Source SS df MS F
Regression
Error
Total
1,779.320 1 1,779.320
599.180 8 74.898
2,378.500 9
23.757
Ex 1 – Deriving prediction equation
c. Critical value
32.5)05(.18 F
d. Decision
Criteria Decision
Fcal > Fcritical Reject HO
Fcal ≤ Fcritical Fail to reject HO
Decision criteria
Since Fcal (23.757) is bigger than Fcritical (5.32)
Reject HO
e. Conclusion
The regression model fits the data
i.e. There is significant contribution of X
towards Y
Ex 1 – Deriving prediction equation
3. R2 and R
748.
5.378,2
320.779,1
2
SST
SSRR
About 75% of variance in test scores
is explained by assignment scores
865.
748.
2
RR
There is a positive and high correlation
between assignment scores and test
scores
865.
748.
5.378,2
)5.215(257.8
)(1
SSY
SXYbR
Source SS df MS F
Regression
Error
Total
1,779.320 1 1,779.320
599.180 8 74.898
2,378.500 9
23.757
Source SS df MS F
Regression
Error
Total
1,779.320 1 1,779.320
599.180 8 74.898
2,378.500 9
23.757
OR
Ex 1 – R and R2
Source SS df MS F
Regression
Error
Total
1,779.320 1 1,779.320
599.180 8 74.898
2,378.500 9
23.757
Source SS df MS F
Regression
Error
Total
1,779.320 1 1,779.320
599.180 8 74.898
2,378.500 9
23.757
a. Hypotheses
HO: β1 = 0
HA: β1 ≠ 0
b. Calculate test statistic
874.4694.1
257.8
1.26
898.74
0257.8
11
SSX
MSE
bt
4. Hypothesis test – Slope
c. Critical value
306.28,025. t
d. Decision
Criteria Decision
|tcal| > |tcritical| Reject HO
|tcal| ≤ |tcritical| Fail to reject HO
Decision criteria
Since |t cal| (4.874) is bigger than |t critical| (2.306)
Reject HO
e. Conclusion
There is significant contribution of assignment
scores towards test score
i.e. there is a significant relationship between
assignment scores and test scores)
Variables Entered/Removedb
Average
assignme
nt scoresa
. Enter
Model
1
Variables
Entered
Variables
Remov ed Method
All requested v ariables entered.a.
Dependent Variable: Test scoresb.
Model Summary
.865a .748 .717 8.65433
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predic tors : (Constant), Av erage assignment scoresa.
SPSS: Regression Analysis
The method used in the
regression analysis is
ENTER
Multiple correlation
coefficient
Independent variable
Dependent variable
Coefficient of
determination
Ex 1: SPSS Analysis output
Ex 1 – SPSS analysis output
ANOVAb
1779.320 1 1779.320 23.757 .001a
599.180 8 74.898
2378.500 9
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Average assignment scoresa.
Dependent Variable: Test scoresb.
Coefficientsa
18.052 12.500 1.444 .187
8.257 1.694 .865 4.874 .001
(Constant)
Average ass ignment
scores
Model
1
B Std. Error
Unstandardized
Coeff icients
Beta
Standardized
Coeff icients
t Sig.
Dependent Variable: Test scoresa.
Summary ANOVA table
Hypothesis – Regression model Report F-ratio
However decision is
based on sig-F
Since sig-F (.001) is
smaller than α (.05),
reject HO
Conclude that the
regression model fits the
data
Prediction equation
bO
b1
Hypothesis - Slope
Report t-value
Ŷ = 18.052 + 8.257X
Decision is
based on sig-t
Conclude assignment scores
(X) contributes significantly
towards test scores (Y) Since sig-t (.001) <
α (.05), reject HO
Ex 1 – SPSS analysis output
Example 2:
Dr Imran is conducting a study on subordinates’ perception on superior as
autocratic and their job satisfaction. Summary data collected from a
randomly selected sample is presented in the table below.
1. Calculate b1 and b0 and derive the prediction equation
2. Test the hypothesis for the regression model at α = .01
3. Calculate coefficient of determination and multiple correlation coefficient.
Interpret the two values.
4. Test hypothesis for the slope
at .01 level of significance. Descriptive Statistics
12 143.00 11.9167
12 168.00 14.0000
12 1785.00 148.7500
12 2396.00 199.6667
12 1962.00 163.5000
12
Perception
Job satisf act ion
X_SQ
Y_SQ
XY
Valid N (listwise)
N Sum Mean
Data: 5950 SL Regression2 Class
1. Derive Regression/Prediction equation
891.19
)9167.11(49434.14
10
xbyb
494.
9167.80
40
12
)143(785,1
12
)168()143(962,1
)(
)()(
2
22
1
n
XX
n
YXXY
b
xy 494.891.19ˆ
Prediction equation:
Descriptive Statistics
12 143.00 11.9167
12 168.00 14.0000
12 1785.00 148.7500
12 2396.00 199.6667
12 1962.00 163.5000
12
Perception
Job satisf act ion
X_SQ
Y_SQ
XY
Valid N (listwise)
N Sum Mean
Ex 2 – Deriving prediction equation
Interpretation of the regression equation
X
Y
19.891
ΔX
ΔY
| | | | | |
xy 494.891.19ˆ
For every 1 unit increase in
X, Y will decrease by .494
unit
Ex 2 – Deriving prediction equation
2. Hypothesis test – Regression model
a. Hypotheses
HO: Y = β0 + ei
HA: Y = β0 + β1X1 + ei
44
352,2396,2
12
168396,2
)(
2
22
n
YYSST
b. Calculate test statistic
Sum of squares
Ex 2 – Hypothesis test (Regression model)
Descriptive Statistics
12 143.00 11.9167
12 168.00 14.0000
12 1785.00 148.7500
12 2396.00 199.6667
12 1962.00 163.5000
12
Perception
Job satisf act ion
X_SQ
Y_SQ
XY
Valid N (listwise)
N Sum Mean
773.19
9167.80
)40(
)(
2
2
SSX
SXYSSR
227.24
773.1944
SSRSSTSSE
Prepare Summary ANOVA table
Source SS df MS F
Regression
Error
Total
19.773 1 19.773
24.227 10 2.423
44.000 11
8.162
Ex 2 – Hypothesis test (Regression model)
494.
9167.80
40
12
)143(785,1
12
)168()143(962,1
)(
)()(
2
22
1
n
XX
n
YXXY
b
c. Critical value
04.10)01(.110 F
d. Decision
Criteria Decision
Fcal > Fcritical Reject HO
Fcal ≤ Fcritical Fail to reject HO
Decision criteria
Since Fcal (8.162) is smaller than Fcritical (10.04)
Fail to reject HO
e. Conclusion
The regression model does not fit the data at .01 level
of significance.
i.e. There is no significant contribution of X towards Y
Ex 2 – Hypothesis test (Regression model)
3. R2 and R
449.
0.44
773.19
2
SST
SSRR
About 45 of variance in job satisfaction is explained
by perception towards superior as autocratic
670.
449.
2
RR
The is a negative and moderate correlation between
assignment scores and test scores 670.
449.
44
)40(494.
)(1
SSY
SXYbR
OR
Source SS df MS F
Regression
Error
Total
19.773 1 19.773
24.227 10 2.423
44.000 11
8.162
Source SS df MS F
Regression
Error
Total
19.773 1 19.773
24.227 10 2.423
44.000 11
8.162
Ex 1 – R and R2
a. Hypotheses
HO: β1 = 0
HA: β1 ≠ 0
b. Calculate test statistic
855.2
1770.
494.
9167.80
423.2
0494.
11
SSX
MSE
bt
4. Hypothesis test – Slope
Source SS df MS F
Regression
Error
Total
19.773 1 19.773
24.227 10 2.423
44.000 11
8.162
Source SS df MS F
Regression
Error
Total
19.773 1 19.773
24.227 10 2.423
44.000 11
8.162
Ex 2 – Hypothesis test (Slope)
c. Critical value
169.310,005. t
d. Decision
Criteria Decision
|tcal| > |tcritical| Reject HO
|tcal| ≤ |tcritical| Fail to reject HO
Decision criteria
Since |t cal| (2.855) is smaller than |t critical| (3.169)
Fail to reject HO
e. Conclusion
There is no significant contribution of perception
towards superior on job satisfaction at .01 level of
significance
i.e. there is a no significant relationship between
perception towards superior and job satisfaction
Ex 2 – Hypothesis test (Slope)
Model Summary
.670a .449 .394 1.55649
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predic tors : (Constant), Percept iona.
Variables Entered/Removedb
Perceptiona . Enter
Model
1
Variables
Entered
Variables
Remov ed Method
All requested v ariables entered.a.
Dependent Variable: Job sat isf actionb.
SPSS: Regression Analysis
The method used in the
regression analysis is
ENTER
Multiple correlation
coefficient
Independent variable
Dependent variable
Coefficient of
determination
Ex 2 – SPSS analysis output
Coefficientsa
19.891 2.110 9.425 .000
-.494 .173 -.670 -2.857 .017
(Constant)
Perception
Model
1
B Std. Error
Unstandardized
Coeff icients
Beta
Standardized
Coeff icients
t Sig.
Dependent Variable: Job sat isf actiona.
ANOVAb
19.773 1 19.773 8.162 .017a
24.227 10 2.423
44.000 11
Regress ion
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predic tors: (Constant), Perceptiona.
Dependent Variable: Job sat isf act ionb.
Summary ANOVA table
Hypothesis – Regression model Report F-ratio However decision is
based on sig-F
Since sig-F (.017) is
larger than α (.01),
fail to reject HO
Conclude that the
regression model fits the
data
Prediction equation
bO
b1
Hypothesis - Slope
Report t-value
Ŷ = 19.891 - .494 X
Decision is
based on sig-t
Conclude perception towards
superior (X) does not
contributes significantly
towards job satisfaction (Y) Since sig-t (.017) >
α (.01), fail to
reject HO
Ex 2 – SPSS analysis output
Prepared by:
Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education
Faculty of Educational Studies
Universiti Putra Malaysia
Serdang
Simple