Human Capital Policies in Education: Further Research on Teachers and
Principals
5rd Annual CALDER ConferenceJanuary 27th, 2012
ASSESSING TEACHER PREPARATION IN WASHINGTON STATE BASED ON STUDENT ACHIEVEMENT*
Dan Goldhaber, Stephanie Liddle, & Roddy TheobaldCenter for Education Data and Research
University of Washington Bothell
The research presented here utilizes confidential data from Washington State Office the Superintendent of Public Instruction (OSPI). We gratefully acknowledge the receipt of these data. We wish to thank the Carnegie Corporation of New York for support of this research. This paper has benefitted from helpful comments from Joe Koski, John Krieg, Jim Wyckoff, & Dale Ballou. We thank Jordan Chamberlain for editorial assistance, and Margit McGuire for thoughtful feedback. The views expressed in this paper do not necessarily reflect those of UW
Bothell, Washington State, or the study’s sponsor. Responsibility for any and all errors rests solely with the authors.
Calder ConferenceJanuary 27th, 2012www.cedr.u
s
Context
“Under the existing system of quality control, too many weak programs have achieved state approval and been granted accreditation” (Levine, 2006, p. 61) U.S. DOE debating reporting requirements that would
include value-added based assessments of their program graduates (Sawchuk, 2012)
All 2010 RttT grantees committed to public disclosure of student-achievement outcomes of program graduates
Recent research (e.g. Boyd et al., 2009) has used administrative data to study teacher training, and, in some cases rank (Noell et al., 2008) institutions
3
The Questions… and (Quick) Answers
4
1. How much of the variation in teacher effectiveness is associated with different teacher training programs in Washington State?
Less than 1% in both reading and math
2. Do training effects decay over time? Yes—a bit faster for reading than math
3. What are the magnitudes of differences in student achievement effects associated w/ initial training programs?
Majority of programs produce teachers who cannot be distinguished from one another, but there are some educationally meaningful differences in the effectiveness of teachers who received training from different programs
4. Do we see evidence of institutional change or specialization? Yes—some evidence of institutional change No—little evidence of specialization (geographic or by student
subgroups)
Caveats It’s not necessarily appropriate to consider program
effects to be an indicator of value of training: Selection of teacher candidates into programs
May not matter for program accountability Teacher candidates who graduate from different training
programs may be selected into school districts, schools, or classrooms that are systematically different from each other in ways that are not accounted for by statistical models
But, district or school fixed effects may (Mihaly et al., 2011) or may not (Jim Wyckoff) be appropriate
Precision of training program estimates contingent on sample size; we have to worry about non-random attrition from the teacher labor market
5
Data and Sample6
Information on teachers and students are derived from five administrative databases prepared by Washington State’s Office of the Superintendent of Public Instruction (OSPI)
Sample includes 8,732 elementary (4th, 5th, & a few 6th grade) teachers who received a credential anytime between 1959 and 2011 and were teaching in Washington during 2006-07 to 2009-10 school years These teachers are linked (mainly through proctors) to
293,994 students (391,922 student-years) who have valid WASL scores in both reading and math for at least two consecutive years
Analytic Approach We estimate training program effects in two stages
1. Stage 1: estimate VAMs designed to net out student background factors from student gains and use these to obtain teacher-classroom-year effect estimates
Teacher-classroom-year effects are shrunk towards the mean using Empirical Bayes methods
2. Stage 2: model stage 1 teacher-classroom-year effects as a function of teacher credentials, including training program, school district/school covariates
Standard errors are corrected by clustering at the program level In some specifications we include district/school fixed effects in
stage 2 Innovation of model is that we allow the effects of teacher
training to exponentially decay depending on a teacher’s time in the labor market
7
Program Estimates: MATH8
• The difference between the average program and the top program is 4% of a SD of student performance • The difference between the top program and bottom program is 10% of a SD of student performance
Program Estimates: READING9
• The difference between the average program and the top program is 9% of a SD of student performance • The difference between the top program and bottom program is 16% of a SD of student performance
Findings (1) Training program effects do decay over time
Half life of program effects varies by specification (9-50 years)
Half life in reading is smaller for each specification Individual training program estimates are
directionally robust to model specification Programs graduating teachers who are effective in math
also tend to graduate teachers who are effective in reading (r = 0.4)
Training program indicators explain only a small percentage of the variation in teacher effectiveness The effect size of one standard deviation change in program
effectiveness is roughly a quarter of the effect size of a one standard deviation change in teacher effectiveness
There is much more variation within than between programs, but even the small percentage is comparable to percent explained by teachers’ experience and degree level
10
Findings (2) Differences between program indicators are
educationally meaningful In math, largest differences (0.10) are roughly twice the
difference in: Student achievement explained by limited English proficiency (0.06) Productivity gains associated with early career experience (0.06)
In reading, largest differences (0.16) are roughly two or three times the difference in: Student achievement explained by student poverty status (0.08) Productivity gains associated with early career experience (0.06)
There is not much evidence of programs specializing, either geographically or in the students they serve
There is evidence that teachers who were trained in Washington State within last five to ten years are relatively more effective than those who had been credentialed in-state prior to 2000, at least as compared to teachers trained out-of-state
11
Assessing Program Specialization:Proximity to Training Institution12
Linear Proximity Effects
MATH READING
10 Miles
0.012 -0.00525 Miles
0.014 -0.002
50 Miles
-0.004 -0.007
Teachers teaching in districts that are close to where they received their initial credential are not found to be differentially effective relative to teachers who teach in districts further from their training program.
Assessing Institutional Change Recent cohorts of in-state trained teachers
are relatively more effective than prior cohorts (both measured relative to teachers credentialed by OSPI) Some program indicator estimates have changed
substantially over time for both math or reading All programs are measured relative to those
teachers who received training out of state, thus we cannot say whether findings are a reflection of the effectiveness of in-state credential or a change in the quality of teachers who are coming in from out of state
13
Summary/Conclusions/Feedback Program indicators do provide some information
about teacher effectiveness Magnitude of effects are sensitive to specification
Study is a starting point in a conversation/research Admissions requirements, sequences of courses,
different strands of teacher education programs, student teaching, etc.
We are modeling decay toward out-of-state teachers (close to the mean teacher), can we better capture acculturation (decay toward localized peers)?
14
Backup Slides
15
Comparison Across Model Specifications: MATH
16
back
Comparison Across Model Specifications: READING
17
back
Tension with School FE model
Training Program A
Training Program B
Mean difference in program effects
Tension with School FE model
What if a school only hires teachers from a narrow range of the performance distribution?
Training Program A
Training Program B
Tension with School FE model
Estimated difference based upon within school comparison
Training Program A
Training Program B
back
ANOVA Results21
back
Main Program Effects22
MATH READING (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Model Specification Base Base
Decay Selectivity
Decay District FE
Decay School FE
Decay Base
Base Decay
Selectivity Decay
District FEDecay
School FEDecay
λ=0 λ=0.051 λ=0.051 λ=0.012 λ=0.008 λ=0 λ=0.060 λ=0.078 λ=0.060 λ=0.029Antioch -3.11*** -3.87*** -4.1*** -2.51*** -1.94* 0.19 0.01 -0.17 0.25 1.09 (0.31) (0.72) (0.52) (0.42) (0.83) (0.28) (0.59) (0.89) (0.75) (0.72)Central Washington -0.77*** -0.26 -2.36 -0.38 0.03 0.33*** 0.72 -0.05 0.89 1.02** (0.11) (0.63) (1.38) (0.32) (0.39) (0.08) (0.49) (1.76) (0.76) (0.32)
City -0.84** -0.34 -0.6 -0.04 -0.2 -0.26 -0.21 -0.28 -0.18 0.07
(0.29) (0.74) (0.50) (0.46) (0.52) (0.26) (0.59) (0.89) (0.75) (0.48)
Eastern Washington -0.38* 0.54 -1.58 -2.95*** -2.45** -1.63*** -3.37*** -4.73* -3.39*** -3.08***
(0.16) (0.56) (1.18) (0.40) (0.61) (0.15) (0.61) (1.78) (0.51) (0.66)
Gonzaga 1.28*** 3.8*** 2.37* -1.18** -2.31*** -2.22*** -3.91*** -6.97** -4.18*** -4.14***
(0.18) (0.63) (0.92) (0.33) (0.42) (0.16) (0.54) (1.96) (0.60) (0.56)
Heritage -0.2 0.95 0.71 -0.6 -0.02 -0.4 -0.18 -0.1 -1.26 -0.15
(0.38) (0.77) (0.69) (0.52) (0.74) (0.43) (0.88) (1.14) (1.05) (0.78)
Northwest -3.77*** -4.61*** -4.54*** -3.42*** 1.61 -1.96*** -5.37*** -6.41*** -4.69*** -0.77
(0.24) (0.63) (1.00) (0.55) (1.22) (0.09) (0.42) (1.07) (0.79) (1.27)
Pacific Lutheran 0.65** 1.98* 0.29 1.28* 1.61** -1.56*** -4.54*** -8.17*** -4.78*** -2.61***
(0.17) (0.82) (1.06) (0.58) (0.47) (0.09) (0.38) (1.89) (0.87) (0.54)
St Martin's -2.17*** -2.98*** -5.06*** -0.88 -1.31* -3.8*** -7.49*** -9.15*** -4.58*** -3.03***
(0.15) (0.58) (1.18) (0.63) (0.58) (0.09) (0.42) (1.77) (0.77) (0.73)
Seattle Pacific 0.47*** 2.62** 0.4 1.06* 0.27 0.65*** 2.24*** -0.49 2.48*** 2.25**
(0.10) (0.64) (1.05) (0.39) (0.44) (0.10) (0.43) (2.00) (0.54) (0.63)
Seattle -2.03*** -0.4 -1.75 -2.4*** -2.5*** -0.44** 1.39* 0.37 0.36 0.3
(0.18) (0.66) (1.18) (0.44) (0.63) (0.13) (0.56) (2.08) (0.62) (0.83)
Evergreen State -3.93*** -6.4*** -9.08*** -4.2*** -3.45*** -0.82** -3.15*** -3.56 -3.07*** -2.54**
(0.31) (0.95) (1.95) (0.66) (0.81) (0.24) (0.60) (2.83) (0.60) (0.66)
U of Puget Sound 1.22*** 3.8** 3.55 1.49* 1.66** 1.24*** 2.4*** 1.67 1.21 1.83*
(0.20) (1.05) (1.94) (0.58) (0.58) (0.11) (0.52) (2.48) (1.09) (0.73)
UW Seattle 1.27*** 4.36*** 3.74* 1.04** 0.69 1.03*** 2.42*** 0.64 0.17 0.53
(0.14) (0.77) (1.43) (0.30) (0.45) (0.08) (0.48) (1.94) (0.59) (0.60)
UW Bothell 0.76** 1.84** 1.57*** 0.81* 1.41 3.36*** 4.67*** 4.93*** 2.57*** 2.77***
(0.23) (0.52) (0.35) (0.37) (0.75) (0.18) (0.52) (0.80) (0.54) (0.55)
UW Tacoma 1.2*** 2.1** 1.85** 0.45 -0.27 2.11*** 2.52*** 2.46** 1.63* 0.01
(0.27) (0.72) (0.51) (0.43) (0.98) (0.21) (0.42) (0.66) (0.62) (0.61)
Walla Walla 0.77** 1.79** 1.46* 0.5 -4.15** 5.12*** 9.2*** 10.07*** 7.1** 5.38**
(0.30) (0.59) (0.60) (1.38) (1.27) (0.31) (0.77) (1.19) (2.19) (1.90)
Washington State 0.18** 0.52 -1.29 -0.09 -0.31 0.69*** 0.92 -0.79 0.17 0.16
(0.06) (0.56) (0.91) (0.31) (0.25) (0.11) (0.56) (1.89) (0.62) (0.56)
Western Washington -0.14** 0.96 -0.41 -0.46 -0.65* 0.77*** 1.49* -1.52 0.13 -0.07
(0.04) (0.57) (1.02) (0.32) (0.26) (0.04) (0.42) (1.79) (0.65) (0.43)
Whitworth 1.02*** 3.56*** 1.76 -0.88 -1.16 -0.93*** -0.03 -2.1 0.16 -0.12
(0.22) (0.54) (0.99) (0.45) (0.67) (0.22) (0.67) (2.06) (0.65) (0.60)
Observations 17677 17677 17677 17677 17677 17677 17677 17677 17677 17677
R2 0.0427 0.0431 0.0435 0.0934 0.2136 0.0339 0.0346 0.0355 0.0826 0.2026
Examples of Program Cohort Effects(Referent group: Out-of-state teachers credentialed before 2000) 23
back
Pearson/Spearman Rank Correlationsof Program Effects by Subject and Model24
Decay Curves25
back
• Dots represent math models• Lines represent reading models• Colors represent model specifications