36
Thursday, December 18, 2014 1

Day 10 prediction and regression

Embed Size (px)

Citation preview

Page 1: Day 10 prediction and regression

Thursday, December 18, 20141

Page 2: Day 10 prediction and regression

The Slides discuss• The extending of the concept of correlation in day 9 and

show how it cat be used in prediction. The statistical test that is introduced in this slide is called regression.

• The process of using one variable to predict another when two are correlated.

• How to calculate and determine how accurate your prediction is going to be with the use of the standard error of estimate (SE)

• A numerical example to demonstrate and apply the concepts and terms. The statistical and practical significance of the results are also explained and discussed.

Page 3: Day 10 prediction and regression

Three kinds of relationships between variables

• Association or Correlation or Covary– Both variables tend to be high or low (positive relationship)

or one tends to be high when the other is low (negative relationship). Variables do not have independent & dependent roles.

• Prediction– Variables are assigned independent and dependent roles.

Both variables are observed. There is a weak causal implication that the independent predictor variable is the cause and the dependent variable is the effect.

• Causal– Variables are assigned independent and dependent roles.

The independent variable is manipulated and the dependent variable is observed. Strong causal statements are allowed.

Page 4: Day 10 prediction and regression

Regression AnalysesRegression Analyses

Regression: technique concerned with predicting some variables by knowing others

The process of predicting variable Y using variable X or the process of using one variable to predict another when the two are correlated.It makes sense to expect that the higher the correlation between the variables, the more accurate the prediction.

Page 5: Day 10 prediction and regression

RegressionRegression

Uses a variable (x) to predict some outcome Uses a variable (x) to predict some outcome variable (y)variable (y)

Tells you how values in y change as a function of Tells you how values in y change as a function of changes in values of xchanges in values of x

Page 6: Day 10 prediction and regression

Examples of PredictionExamples of Prediction

When we hear thunder and see lightning, we When we hear thunder and see lightning, we often predict they will be followed by rain.often predict they will be followed by rain.

We also might predict the relationship between We also might predict the relationship between the day of the week and the expected crowd at the day of the week and the expected crowd at the movie theatre.the movie theatre.

We might predict that a bright elementary We might predict that a bright elementary school student will do well in high school.school student will do well in high school.

A student who is having difficulties on the A student who is having difficulties on the midterm examination is probably going to get a midterm examination is probably going to get a low grade on the final examination.low grade on the final examination.

Page 7: Day 10 prediction and regression

Correlation and RegressionCorrelation and Regression

Correlation describes the strength of a Correlation describes the strength of a linear relationship between two variables

Linear means “straight line”

Regression tells us how to draw the straight line described by the correlation

Page 8: Day 10 prediction and regression

Simple RegressionSimple Regression Prediction is based on the assumption that when Prediction is based on the assumption that when

two variables are correlated, we can use one of two variables are correlated, we can use one of them to predict the other.them to predict the other.

The variable used as a predictor is the independent variable (X). The predicted variable (Y) is called the criterion variable or dependent variable.

The technique used for prediction is called regression.

When only one variable is used to predict another, the procedure is called simple regression, and when two or more variables are used as predictors, the procedure is called multiple regression.

Page 9: Day 10 prediction and regression

The Formula Used in Simple The Formula Used in Simple RegressionRegression

1.

2.

3.

Page 10: Day 10 prediction and regression

An Example of Simple Regression An Example of Simple Regression

Ms. Wright, an eight-grade language arts teacher, want to know whether she could use a practice test she constructed to predict the scores of her students on the state-mandated end-of year language arts test. The teacher hypothesizes that the practice test administered at the beginning of the second semester is good predictor of the state-mandated test. Thus, she might want to administer the practice test to her students, then use the test result to design early intervention and remediation programs for students who are expected to do poorly on the state-mandated test. To ascertain whether the practice test is a good predictor of the state-mandated test, the teacher uses the scores from the practice test (the predictor, or independent variable) and the scores from the state-mandated test (the criterion, or dependent variable) from her last year’s students to generate the regression equation. Since the state-mandated language arts test is scored on a scale of 1 to 50, the teacher has designed her test to use the same scale.

Page 11: Day 10 prediction and regression

TasksTasks

Page 12: Day 10 prediction and regression

Data SPSS OutcomeData SPSS OutcomeDescriptive Statistics

Mean Std. Deviation N

State-mandated test 41.60 6.883 10

Practice test 41.80 7.843 10

Descriptive Statistics

Mean Std. Deviation N

State-mandated test 41.60 6.883 10

Practice test 41.80 7.843 10

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

B Std. Error Beta t Sig.

1 (Constant) 13.572 8.500 1.597 .149

Practice test .671 .200 .764 3.349 .010

a. Dependent Variable: State-mandated test

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

B Std. Error Beta t Sig.

1 (Constant) 13.572 8.500 1.597 .149

Practice test .671 .200 .764 3.349 .010

a. Dependent Variable: State-mandated test

Model Summary

Model

Change Statistics

R Square Change

F

Change df1 df2 Sig. F Change

1 .584 11.218 1 8 .010

Model Summary

Model

Change Statistics

R Square Change

F

Change df1 df2 Sig. F Change

1 .584 11.218 1 8 .010

Page 13: Day 10 prediction and regression

TaskTask1. Research Question: Does Practice test scores influence State-mandated test scores?1. Research Question: Does Practice test scores influence State-mandated test scores?

2. HypothesesHo: ß=0 : Practice test scores does not influence state-mandated test scores.HA: ß≠0 : Practice test scores influence state-mandated test scores

2. HypothesesHo: ß=0 : Practice test scores does not influence state-mandated test scores.HA: ß≠0 : Practice test scores influence state-mandated test scores

Page 14: Day 10 prediction and regression

Student Practice Test (X) State Test (Y)

ABCDEFGHIJ

45454650354723464041

40463749315032484439

Mean

SD

By Using the Pearson Product Moment Correlation coefficient The teachers finds that the correlation between the two test is rxy= 0.764 .Next the teacher computes the b coefficient, followed by the computation of the value of a as followed.

Page 15: Day 10 prediction and regression

After finding the values of b (the slope) and a (the intercept), they can be entered into the regression equation.After finding the values of b (the slope) and a (the intercept), they can be entered into the regression equation.

Page 16: Day 10 prediction and regression

Now, after administering the practice test to her students, the teacher can use the equation to predict their scores on the state-administered language arts test. For example, the teacher can predict that a student with a practice text (X) score of 30 is expected to have a score of 33.73 on the state test :

Now, after administering the practice test to her students, the teacher can use the equation to predict their scores on the state-administered language arts test. For example, the teacher can predict that a student with a practice text (X) score of 30 is expected to have a score of 33.73 on the state test :

Page 17: Day 10 prediction and regression

Of course, using this equation to predict the scores of new students on the state mandated language arts test is predicted on the assumption that the new students taking the practice test are similar to those whose scores were used to derive the regression equation.

Of course, using this equation to predict the scores of new students on the state mandated language arts test is predicted on the assumption that the new students taking the practice test are similar to those whose scores were used to derive the regression equation.

Using the equation above, we found that the standard error of estimate for the data in the table above is 4.4Using the equation above, we found that the standard error of estimate for the data in the table above is 4.4

This means that for each student, on the average, the teacher is likely to overestimate or underestimate the state-mandated language arts score by close to 4.5 points. For example, for students whose Y’ score about 42, about 68 percent the time the actual Y score will lie within 4.44 above or below the Y’score (i.e., between approximately 37.5 and 46.5)

This means that for each student, on the average, the teacher is likely to overestimate or underestimate the state-mandated language arts score by close to 4.5 points. For example, for students whose Y’ score about 42, about 68 percent the time the actual Y score will lie within 4.44 above or below the Y’score (i.e., between approximately 37.5 and 46.5)

Page 18: Day 10 prediction and regression

The coefficient of Determination r2The coefficient of Determination r2

The teacher found that the correlation between the two test was rxy=0.764. To find the coefficient of determination we need to square the correlation (rxy2)With rxy=0.764, the coefficient of determination is 0.7642= 0.584 (or 58%).

The teacher found that the correlation between the two test was rxy=0.764. To find the coefficient of determination we need to square the correlation (rxy2)With rxy=0.764, the coefficient of determination is 0.7642= 0.584 (or 58%).

This coefficient means that about 58 percent of the variation in performance on the state test (Y) can be accounted for by individual differences in performance on the practice test (X); 42 percent of the variation is due to other factors. In other words, 58 percent of the total variation on the state test (Y) can be explained by the linear relationship between X(practice test) and Y(state-mandated test) as described by the regression equation. The other 42 percent of the total variation in Y (state-mandated test) remain unexplained.

This coefficient means that about 58 percent of the variation in performance on the state test (Y) can be accounted for by individual differences in performance on the practice test (X); 42 percent of the variation is due to other factors. In other words, 58 percent of the total variation on the state test (Y) can be explained by the linear relationship between X(practice test) and Y(state-mandated test) as described by the regression equation. The other 42 percent of the total variation in Y (state-mandated test) remain unexplained.

The coefficient of determination (r2) can be used to describe the relationship between the variables. In our sample, the language arts teacher used her own practice test to predict her students’ scores on the end-of-year state-mandated language arts test.

The coefficient of determination (r2) can be used to describe the relationship between the variables. In our sample, the language arts teacher used her own practice test to predict her students’ scores on the end-of-year state-mandated language arts test.

Page 19: Day 10 prediction and regression

Graphing the Regression EquationGraphing the Regression Equation

Figure : A regression line for predicting scores of ten students on the state-mandated test using the practice test scores as a predictor

Figure : A regression line for predicting scores of ten students on the state-mandated test using the practice test scores as a predictor

Page 20: Day 10 prediction and regression

Regression Equation

Regression equation describes the regression line mathematically Intercept Slope

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Page 21: Day 10 prediction and regression

Linear EquationsLinear Equations

Y

Y = b X + a

a = Y - i n t e r c e p t

X

C h a n g ei n Y

C h a n g e i n X

b = S l o p e

bXay +=

Page 22: Day 10 prediction and regression

Task Do in group of 5-6. An English teacher in an Islamic Boarding School wants to know whether the students’ grammar scores influence the students’ speaking ability.

Task Do in group of 5-6. An English teacher in an Islamic Boarding School wants to know whether the students’ grammar scores influence the students’ speaking ability.

No Students’ grammar scores (X) Students’ speaking ability (Y)

1 70 80

2 75 90

3 60 70

4 55 65

5 45 65

6 65 75

7 65 70

8 65 80

9 45 60

10 60 65

11 55 60

12 75 70

13 80 75

14 70 60

15 75 80

16 85 85

17 70 75

18 60 65

19 60 75

20 55 80

Page 23: Day 10 prediction and regression

TasksTasks

Page 24: Day 10 prediction and regression

Hours studying and gradesHours studying and grades

Page 25: Day 10 prediction and regression

Regressing grades on hours grades on hours

Linear Regression

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

70.00

80.00

90.00

Fin

al g

rad

e in

co

urs

e

Final grade in course = 59.95 + 3.17 * studyR-Square = 0.88

Predicted final grade in class =

59.95 + 3.17*(number of hours you study per week)

Page 26: Day 10 prediction and regression

Predict the final grade ofPredict the final grade of……

Someone who studies for 12 hours Final grade = 59.95 + (3.17*12) Final grade = 97.99

Someone who studies for 1 hour: Final grade = 59.95 + (3.17*1) Final grade = 63.12

Predicted final grade in class = 59.95 + 3.17*(hours of study)

Page 27: Day 10 prediction and regression

An additional way toInterpret Pearson r

• Coefficient of Determination– r2

– The proportion of the variability of Y accounted for by X

Variability of YThis area of overlap

represents the proportion of variability of Y accounted

for by X (value is expressed as a %)

X

Page 28: Day 10 prediction and regression

ExerciseExercise

A sample of 6 persons was selected the A sample of 6 persons was selected the value of their age ( x variable) and their value of their age ( x variable) and their weight is demonstrated in the following weight is demonstrated in the following table. Find the regression equation and table. Find the regression equation and what is the predicted weight when age is what is the predicted weight when age is 8.5 years8.5 years..

Page 29: Day 10 prediction and regression

Serial no. Age (x( Weight (y(

123456

768569

128

12101113

Page 30: Day 10 prediction and regression

AnswerAnswer

Serial no. Age (x( Weight (y( xy X2 Y2

123456

768569

128

12101113

8448965066

117

493664253681

14464

144100121169

Total 41 66 461 291 742

Page 31: Day 10 prediction and regression

6.836

41x == 11

6

66 ==y

92.0

6

)41(291

6

6641461

2=

×−=b

Regression equation

6.83)0.9(x11y (x) −+=

Page 32: Day 10 prediction and regression

0.92x4.675y (x) +=

12.50Kg8.5*0.924.675y (8.5) =+=

Kg58.117.5*0.924.675y (7.5) =+=

Page 33: Day 10 prediction and regression

11.411.611.8

1212.212.412.6

7 7.5 8 8.5 9

Age (in years)

Wei

ght (

in K

g)

we create a regression line by plotting two estimated values for y against their X component,

then extending the line right and left.

Page 34: Day 10 prediction and regression

Data SPSS Grammar test and Speaking test OutcomeData SPSS Grammar test and Speaking test Outcome

Descriptive Statistics

Mean Std. Deviation N

Students Speaking Scores 72.25 8.656 20

Students' grammar scores 64.50 10.748 20

Descriptive Statistics

Mean Std. Deviation N

Students Speaking Scores 72.25 8.656 20

Students' grammar scores 64.50 10.748 20

Model Summary

Model

R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .621a .385 .351 6.972

a. Predictors: (Constant), Students' grammar scores

Model Summary

Model

R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .621a .385 .351 6.972

a. Predictors: (Constant), Students' grammar scores

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

B Std. Error Beta t Sig.

1 (Constant) 40.000 9.724 4.113 .001

Students' grammar scores .500 .149 .621 3.360 .003

a. Dependent Variable: Students Speaking Scores

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

B Std. Error Beta t Sig.

1 (Constant) 40.000 9.724 4.113 .001

Students' grammar scores .500 .149 .621 3.360 .003

a. Dependent Variable: Students Speaking Scores

Page 35: Day 10 prediction and regression

ReferencesReferences Main Sources

Coolidge, F. L.2000. Statistics: A gentle introduction. London: Sage.Kranzler, G & Moursund, J .1999. Statistics for the terrified. (2nd ed.). Upper Saddle River, NJ: Prentice Hall.Butler Christopher.1985. Statistics in Linguistics. Oxford: Basil Blackwell.Hatch Evelyn & Hossein Farhady.1982. Research design and Statistics for Applied Linguistics. Massachusetts: Newbury House Publishers, Inc.Ravid Ruth.2011. Practical Statistics for Educators, fourth Ed. New York: Rowman & Littlefield Publisher, Inc.Quirk Thomas. 2012. Excel 2010 for Educational and Psychological Statistics: A Guide to Solving Practical Problem. New York: Springer.

Other relevant sources

Agresi A, & B. Finlay.1986. Statistical methods for the social sciences. San Francisco, CA: Dellen Publishing Company.Bachman, L.F. 2004. Statistical Analysis for Language Assessment. New York: Cambridge University Press.Field, A. (2005). Discovering statistics using SPSS (2nd ed.). London: Sage. Moore, D. S. (2000). The basic practice of statistics (2nd ed.). New York: W. H. Freeman and Company. 

  Thursday, December 18, 2014

Page 36: Day 10 prediction and regression