Upload
trevor-fletcher
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
What Types Of Data Are Collected?
What Kinds Of Question Can Be
Asked Of Those Data?
Do people who say they study for more hours also think they’ll finish their doctorate earlier?
Are computer literates less anxious about statistics?
…. ?
Are men more likely to study part-time?
Are women more likely to enroll in CCE?
…. ?
Questions that Require Us To
Examine Relationships
Between Features of the
Participants.
How tall are class members, on average?
How many hours a week do class members report that they study?
…. ?
How many members of the class are women?
What proportion of the class is fulltime?
…. ?
Questions That Require Us To
DescribeSingle Features
of the Participants
“Continuous”
Data
“Categorical”
Data
Research Is A Partnership Of
Questions And Data
Research Is A Partnership Of
Questions And Data
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 1
S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 2
S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
Just to remind you, here’s the codebook for the WALLCHT data …Just to remind you, here’s the codebook for the WALLCHT data …
Dataset WALLCHT.txt
OverviewSummary information on selected aspects of state educational performance outcomes, resource inputs, and population characteristics, in 1988.
SourceUS Department of Education and the National Center for Education Statistics.
Sample Size 50 states
Updated December 5, 2003
Col Variable
Name Description Metric
1 STATE Name of the State. Words
2 TCHRSAL Average teacher salary in the State. dollars
3 STRATIOAverage number of students per teacher statewide.
ratio
4 PPEXPENDAverage expenditure per pupil in the State.
dollars
5 HSGRADRTAverage high-school graduation rate statewide
%age
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 3
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
This is my “best guess” for a summary linear trend line to represent the HSGRADRT vs.
STRATIO relationship.
I obtained it by a mysterious process called ordinary least-squares (OLS)
regression analysis..
66.066.0
24.724.713.313.3
78.878.8
… and the output from the analysis gives me its best prediction for
the values of HSGRADRT
(the “predicted values”).
After I have conducted my “OLS Regression Analysis,” I just pick some sensible values
of STRATIO … the MIN and MAX perhaps?
And the line that joins up the predicted
values is known as the “fitted regression
line”
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 4
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
The “OLS” method that was actually used by the regression analysis to provide this “best
guess” for the trend …
Both the thumbtack and elastic band and the ordinary
least-squares regression approaches find that fitted linear trend line for which
the sum of the squared vertical distances of the data points from the fitted line is
the least.
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 5
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
Here’s a couple of things to help you develop better intuition about
the nature of fitted trend lines produced by OLS Regression
Analysis.
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
A simulation that lets you try out the OLS regression
fitting algorithm for yourself.
A simulation that: Provides data examples, Lets you draw your own
version of the fitted trend line,
Then shows you what an OLS regression analysis would produce, by way of comparison.
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 6
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data';TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables';TITLE3 'The Infamous Wallchart Data';TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------*Input data, name and label variables in the dataset*--------------------------------------------------------------------------------*; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------*Plotting the relationship between HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10;RUN;
OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data';TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables';TITLE3 'The Infamous Wallchart Data';TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------*Input data, name and label variables in the dataset*--------------------------------------------------------------------------------*; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------*Plotting the relationship between HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10;RUN;
Of course, you can also get PC-SAS to tell you where the OLS-fitted regression line is …Of course, you can also get PC-SAS to tell you where the OLS-fitted regression line is …
Here are the usual data
input statements
Here are the PC-SAS regression
analysis commands – we dissect them in
detail on the next slide
Creates another scatterplot of the data for use later
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 7
*--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO;
*--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO;
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
Here’s the part of the PC_SAS program that deals specifically with the OLS Regression Analysis of the HSGRADRT versus STRATIO relationship …Here’s the part of the PC_SAS program that deals specifically with the OLS Regression Analysis of the HSGRADRT versus STRATIO relationship …
You request an OLS Regression Analysis by specifying a “Regression Model” that identifies the “Outcome” and the “Predictor(s)” to include in the analysis:
Model HSGRADRT = STRATIO
You identify the outcome variable (HSGRADRT) by
placing it to the left of the “equals” sign, in
the MODEL statement
You identify the predictor variable
(STRATIO) by placing it to the right of the “equals” sign, in the MODEL statement
PROC REG is the command in PC-SAS that requests an OLS Regression Analysis
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 8
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
The REG Procedure Model: MODEL1 Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 337.52168 337.52168 6.07 0.0174 Error 48 2669.04952 55.60520 Corrected Total 49 3006.57120 Root MSE 7.45689 R-Square 0.1123 Dependent Mean 74.27600 Adj R-Sq 0.0938 Coeff Var 10.03943 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174
Here’s output from the OLS Regression Analysis of Outcome HSGRADRT on Predictor STRATIO…..Here’s output from the OLS Regression Analysis of Outcome HSGRADRT on Predictor STRATIO…..
This is the major part of the
regression output.I unpack it on the next several slides
Ignore this part of the output. When you go on to S030, you’ll learn what it
all means
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 9
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174
Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174
The core part of the OLS Regression Output describes the fitted regression line..The core part of the OLS Regression Output describes the fitted regression line..
But, how do you
work with this “Fitted
Model”?
These “Parameter Estimates” tell you where PROC REG thinks that the fitted trend line should be drawn … by listing them, it’s telling you that the fitted trend line has the following algebraic
equation:
STRATIOHSGRADRTof)12.1(69.93of value
Observed value
Predicted
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 10
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
STRATIOHSGRADRT
Predictedof value
Observed)12.1(69.93of value
STRATIOHSGRADRT
Predictedof value
Observed)12.1(69.93of value
Let’s try a couple .. Remember that the fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…
1. When STRATIO = 13.3 (the minimum value of STRATIO),
Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8
1. When STRATIO = 13.3 (the minimum value of STRATIO),
Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8
2. When STRATIO = 24.7 (the maximum value of STRATIO),
Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0
2. When STRATIO = 24.7 (the maximum value of STRATIO),
Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0
You substitute reasonable values for predictor, STRATIO, into the fitted equation and then use it to compute the best predictions – or predicted values -- for HSGRADRT, as follows:You substitute reasonable values for predictor, STRATIO, into the fitted equation and then use it to compute the best predictions – or predicted values -- for HSGRADRT, as follows:
Recognize these
values?
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 11
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
66.066.0
24.724.713.313.3
78.878.8
Here they are … and, of course, by choosing other values of
STRATIO, the fitted equation can also tell us the location of every other point on the fitted
line in between.
To reproduce the fitted line, I just need to: Systematically
substitute all-possible values of STRATIO into the fitted equation, and
Compute corresponding predicted values of HSGRADRT.
Then, if I plotted them all, this is what I’d see
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 12
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
STRATIOHSGRADRT
Predictedof value
Observed)12.1(69.93of value
STRATIOHSGRADRT
Predictedof value
Observed)12.1(69.93of value
The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…
1. When STRATIO = 0 (this is a value of STRATIO that does not exist in the dataset, but provides an interesting anchor point nevertheless),
Predicted value of HSGRADRT = (93.69) + (-1.12)(0) = 93.69 – 0 = 93.69 Recognize
this value?
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 13
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
STRATIOHSGRADRT
Predictedof value
Observed)12.1(69.93of value
STRATIOHSGRADRT
Predictedof value
Observed)12.1(69.93of value
The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance…
1. When STRATIO = 20 (or any other ad-hoc value of STRATIO that is within the sample range),
Predicted value of HSGRADRT = (93.69) + (-1.12)(20) = 93.69 – 22.4 = 71.29
2. When STRATIO = 21 (notice that this is just one unit higher than the previous value of 20)
Predicted value of HSGRADRT = (93.69) + (-1.12)(21) = 93.69 – 23.52 = 70.17
Recognize the difference in these values?
= (70.17 – 71.29)
= -1.12
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 14
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
STRATIOADRTRHSG )12.1(69.93ˆ STRATIOADRTRHSG )12.1(69.93ˆ
This means that each term in the fitted regression model has a specific interpretation …This means that each term in the fitted regression model has a specific interpretation …
This is the predicted value of HSGRADRT, based on the OLS regression fit: Its “hat” indicates
that it is a prediction.
The predicted value represents the value of HSGRADRT that you
would expect for a State, based solely on its value
of STRATIO.
This is the estimated intercept of the fitted regression line: It tells you the
predicted value of HSGRADRT, when STRATIO is zero.
In the current context, it doesn’t make much sense to interpret it
(why?).
This is the estimated slope of the fitted regression line: It summarizes the
relationship between HSGRADRT and STRATIO.
It tells you the difference in the predicted value of HSGRADRT per unit difference in STRATIO.
Here, slope is negative, meaning
that States with student/teacher
ratios that are one child bigger will
have a graduation rate that is 1.12% lower, on average
This represents the actual values
of predictor, STRATIO
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 15
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
It’s the estimated slope in a regression analysis that captures the relationship between outcome & predictor…..It’s the estimated slope in a regression analysis that captures the relationship between outcome & predictor…..
What would the scatterplot look like and what would the slope be, if states with larger student/teacher ratios
tended to have higher graduation rates?
What would the scatterplot look like and what would the slope be, if states with larger student/teacher ratios
tended to have higher graduation rates?
STRATIO
HSGRADRT
What would the scatterplot look like and what would the slope be, if there were no relationship between high
school graduation rate and student/teacher ratio?
What would the scatterplot look like and what would the slope be, if there were no relationship between high
school graduation rate and student/teacher ratio?
STRATIO
HSGRADRT
Here’s a simulation that let’s you create datasets with your mouse, and then shows you the OLS fitted line.Here’s a simulation that let’s you create datasets with your mouse, and then shows you the OLS fitted line.
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 16
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174
Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174
Like in our categorical data analysis, we can ask whether we could have
reached this same conclusion by an accident of sampling.
Could we have gotten a slope value of –1.12 by sampling from a population in which there was no relationship between HSGRADRT and STRATIO (i.e., by sampling from a null population in which the slope was zero).
And, again, as in categorical data analysis, PROC REG provides a p-value to help you check out the effects of the idiosyncrasies of sampling:
The p-value for the HSGRADRT/STRATIO regression slope is 0.0174,
Since 0.0174 is less than .05, we can reject the null hypothesis that there is no relationship between HSGRADRT and STRATIO, in the population.
© Willett, Harvard University Graduate School of Education, 04/21/23 S010Y/C10 – Slide 17
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio
The Story So Far …In our investigation of state-level aggregate statistics, we have found that, on average, the percentage of seniors graduating from High School is lower in states with a higher student/teacher ratio.
When state-wide high-school graduation rate (HSGRADRT) is treated as outcome and state-wide student/teacher ratio (STRATIO) is treated as the predictor, we find that the trend-line estimated by ordinary least-squares regression analysis has a slope of –1.12 (p = 0.0174).
This tells us that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, with states that enjoy lower student/teacher ratios tending to have the higher high-school graduation rates.