Upload
renate
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Technical and Consequential Validity in the Design and Use of Value-Added systems LaFollette School of Public Affairs & value-added research center, university of Wisconsin-Madison. Robert Meyer, Research Professor and Director. VARC Partner Districts and States. - PowerPoint PPT Presentation
Citation preview
TECHNICAL AND CONSEQUENTIAL VALIDITY IN THE DESIGN AND USE OF VALUE-ADDED SYSTEMSLAFOLLETTE SCHOOL OF PUBLIC AFFAIRS & VALUE-ADDED RESEARCH CENTER, UNIVERSITY OF WISCONSIN-MADISON
Robert Meyer, Research Professor and Director
VARC Partner Districts and States
Design of Wisconsin State Value-Added System (1989) Minneapolis (1992) Milwaukee (1996) Chicago (2006) Department of Education: Teacher Incentive Fund (TIF) (2006 and 2010) Madison (2008) Wisconsin Value-Added System (2009) Milwaukee Area Public and Private Schools (2009) Racine(2009) New York City (2009) Minnesota, North Dakota & South Dakota: Teacher Education Institutions and Districts
(2009) Illinois (2010) Hillsborough (2010) Atlanta (2010) Los Angeles (2010) Tulsa (2010) Collier County (2012) New York (2012) California Charter Schools Association (2012) Oklahoma Gear Up (2012)
MinneapolisMilwauk
ee
Chicago
Madison
Tulsa
Atlanta
New York City
Los Angeles
Hillsborough County
NORTH DAKOTA
SOUTH DAKOTA
MINNESOTA
WISCONSIN
ILLINOIS
Districts and States Working with VARC
Collier County
NEW YORK
Context and Research Questions
Components to Educator Effectiveness Systems
Educator Effectiven
ess Systems
Data Requirements and Data
Quality
Professional Developmen
t (Understandi
ng and Application)
Evaluating Instructional
Practices, Programs,
and Policies
Alignment with School,
District, State
Policies and Practices Embed
within a Framework
of Data-Informed Decision-Making
Value-Added System
Uses of a Value-Added System
Value-
Added
Evidence that All
students can Learn
Set School Performance Standards
Triage: Identify Low Performing
Schools
Contribute to District Knowledge
about “What Works”
Data-Informed Decision-Making /
Performance
Management
Development of a Value-Added System
Clarity: What is the objective?
Dimensions of validity and reliability
Why? Achieve accuracy, fairness, improved teaching and learning
How complex should a value-added model be?
Possible rule: "Simpler is better, unless it is wrong.”
Dimensions of Validity and Reliability
Accuracy Criterion validity Technical (causal) validity Reliability (precision)
Consequential validity Transparency
Technical validity Technical validity measures the degree to which
the statistical model and data used in the model (for example, student outcomes, student characteristics, and student-classroom-teacher linkages) provide consistent (unbiased) estimates of performance using the available student outcomes/assessments
Requires development of a quasi-experimental model that captures (to the extent possible) the structural factors that determine student achievement and growth in student achievement
Consequential validity Consequential validity addresses the
incentives and decisions that are triggered by the design and use of performance measures and performance systems
Transparency Transparency addresses the
consequences of simplicity versus complexity in the design (and clarity of explanation) of value-added models and reports
Criterion Validity Criterion validity captures the degree to
which effect estimates based on available student outcome data fully align with estimates based on the complete spectrum of student outcomes valued by stakeholders
Reliability Reliability (or precision) captures
statistical error due to the fact that effectiveness estimates are based on finite samples of students, which in the context of estimating classroom and teacher performance are generally small
Application of Framework Develop a value-added model that incorporates
important structural factors that determine growth in student achievement and specify performance parameters that represent educational units (classrooms) and agents (teachers)
Identify and address threats to validity that could cause bias in the estimation of desired performance parameters
Specify data uses, including the design of reports intended to inform decision making
Technical vs. Consequential Validity I
Consider the consequences of controlling for prior achievement and other predictors – switching from measurement of attainment (as in NCLB) to growth
Positive from the standpoint of technical validity because the estimates are more accurate
Possibly negative from the perspective of consequential validity if controlling for prior achievement and other predictors inevitably leads to reduced expectations for poor and minority students.
Technical vs. Consequential Validity II
Consequences of inclusion of demographic variables?
Possibly positive from the standpoint of technical validity because the estimates are more accurate
Possibly negative from the perspective of consequential validity because the inclusion of these variables inevitably leads to reduced expectations for poor and minority students.
Or, the reverse is true
Value-Added Model
Generally Recommended Value-Added Model Features
Longitudinal student outcome/assessment data Flexible (data-driven) posttest-on-pretest link, including
possible nonlinearities in this relationship Contextual covariates Adjust for test measurement error Address changes in assessments over time Allow for end-of-grade & end-of-course exams Dosage/student mobility Allow differential effects by student characteristics Statistical shrinkage: address noise due to small samples Measures of precision and confidence ranges
Model Simplifications Longitudinal data for two time periods
(appropriate for early grades) Model will be defined in terms of true
test scores. Estimation method controls for test measurement error
Posttest on pretest relationship is assumed to be linear – this can be generalized
Student mobility with the school year is ignored in order to simplify notation
Structural Determinants of Achievement and Achievement Growth
Student level Prior achievement Student and family contribution Within-classroom allocation of resources
(including student performance expectations)
School contributions external to classroom (supplemental in-school instruction, after school instruction, summer school)
Structural Determinants of Achievement and Achievement Growth
Classroom level Peer effects Contributions external to teacher (school
resources, policies, and climate, class size) Contributions internal to teacher (teacher
resources, policies, and instructional practices, alignment with standards implied by assessments) (factors that may be covered by observational rubrics)
Preview of Alternative Performance Parameters Teacher performance: Classroom performance:
Includes contributions in classroom from student peers and resources external to teacher (such as other staff and class size)
Factors external to the classroom (supplemental in-school instruction, after school instruction, summer school):
Classroom/school performance: Includes contributions from classroom and
resources external to the classroom
Sjk Cjk Xjk
TjkCjk
Xjk
Model Specification Strategy Include in the model all structural determinants
of achievement and achievement growth Be explicit how demographic variables and prior
achievement contribute directly or indirectly (via other determinants) to achievement and growth
Two types of student and demographic variables: Level I (Student level): Level II (Classroom level): Subscripts: student i, teacher j, and school k
0,i iX y0,jk jkX y
I: Student-Level Equation
Posttest: Pretest: with durability/decay parameter: Student and family contribution: Within classroom contribution: Supplemental contribution:
Measures of supplemental factors not observed Subscripts: student i, teacher j, and school k
1 0 ( ) ( )i i i i jk i jky y b c d
1iy0iy
ib
( )i jkc
( )i jkd
Alternative Student-Level Equation Include explicit measures of supplemental
resources in the model, producing a multiple-input (crossed effects) model
This model is tractable if the crossed effects are not highly collinear. If the crossed effects are highly (or completely) collinear, then it may be possible to address provision of supplemental resources in the second level of the model as a factor external to the teacher.
Our focus is on the conventional one input model
Condition Factors on Student-Level Demographic Variables
Student and family factor
Within classroom factor
Supplemental factor
0 0 1 2 1i i i ib b y b X b e
( ) 0 0 1 2 2i jk i i Cjk ic c y c X c e
( ) 0 0 1 2 3i jk i i Xjk id d y d X d e
Defines a VAM of Student Growth and Classroom/School Performance
Combine student-level structural factors
Pretest coefficient
Effect of student-level characteristics
Classroom/school performance
1 1 1b c d
1 0i i i Sjk iy y X e
2 2 2b c d
Sjk Cjk Xjk
Decomposition of Average Achievement
Predicted achievement = Prior achievement + Student growth
Average post achievement = Predicted achievement + Classroom/school performance
Teacher subscripts jk dropped
1 0py y X
1 1p
Sy y
Technical Validity Classroom/school performance from the
value-added model that includes demographic variables is structural parameter of interest:
The performance parameter obtained from a model that excludes demographic variables is (approximately)
This parameter is biased
Sjk
Sjk X
II. Classroom/School Level Equation
Classroom/school performance: Peer effects: Contributions external to teacher: Contributions internal to teacher:
Sjk jk jk Tjkp q
jkp
jkq
Tjk
Sjk
Condition Factors on Average Classroom-Level Demographic Variables
Peer effects:
Contributions external to teacher:
Contributions internal to teacher:
0 0 1 2jk jk jkp p y p X p
0 0 1 2 1jk jk jk jkq q y q X q u
0 0 1 2 2T jk jk jk jkr y r X r u
Defines a Model of Classroom/School Performance
Preferred model (but not identified)
Teacher parameter (not identified):
Bias: productivity external to teacher = Feasible model (biased)
Bias is caused by “over-controlling”
0 0 0 1 1 2 2 1( ) ( ) ( )Sjk jk jk Tjk jkp q y p q X p q u
1Tjk Tjk jku
1 jku
1 0 1 2( )pTjk Sjk Sjk Tjk jk jk jku y r X r
Dilemma in the Choice of Models from the Perspective of Technical Validity
Option A: Use classroom/school performance as a proxy measure of teacher performance; commit an error of “omission”
Option B: Use the feasible, but biased, estimate of teacher performance; commit an error of “commission”
Option C: Use a more complicated model to control for the factors external to the teacher
Consequential Validity: Uses and Decisions Parental choice of schools Teachers willingness to teach in given schools Identification of master teachers Identification of teachers for professional
development Performance based compensation Provision of supplemental services Avoid bubble effects: incentives to deploy
resources to students as artifact of statistical measures (Statistics based on means rather than medians can be affected by all students)
Key Point: the Power of Two Decisions need to be informed by:
Measure of school/classroom or teacher performance
Measures of student achievement Actual average student achievement Student achievement target (e.g., proficiency status)
Options Use only information on student attainment
(NCLB) Use only information on value-added performance Use both pieces of data to inform decisions
Achievement Target, Performance, and Achievement Shortfall – Retrospective View Example with two teachers Focus on use of classroom/school indicator Scale of parameters:
Value-added ratings are centered around zero with a standard deviation of one, and thus range from approximately -3 to 3
All other parameters (average achievement and the average contribution of demographic characteristics) are centered around zero and have been transformed to the value-added scale, although the standard deviations of these parameters are not constrained to equal one
How to Read the Scatter Plots
1 2 3 540
20
40
60
80
100
Value-Added (2009-2010)
Perc
ent
Prof
/Adv
(20
09)
Schools in your district
A
A. Students know a lot and are growing faster than predicted
B
B. Students are behind, but are growing faster than predicted
C
C. Students know a lot, but are growing slower than predicted
D
D. Students are behind, and are growing slower than predicted
E
E. Students are about average in how much they know and how fast they are growing
Achievement Target, Performance, and Achievement Shortfall – Retrospective View
Achievement Target
Average Prior
Achievement
Student
Factor
Classroom/School
Performance
Average
Posttest
Achievement
Shortfall
4 3 1 -1 3 14 0 -1 -1 -2 -6
1Ty 0y X S 1y
Achievement Target, Performance, and Achievement Shortfall – Prospective View
Achievement Target
Average Prior
Achievement
Student
Factor
Classroom/School
Performance
Average
Posttest
Achievement
Shortfall
4 3 1 -1 3 10 4 01 5 NA2 6 NA3 7 NA
1Ty 0y X S 1y
Achievement Target, Performance, and Achievement Shortfall – Prospective View
Achievement Target
Average Prior
Achievement
Student
Factor
Classroom/School
Performance
Average
Posttest
Achievement
Shortfall
4 0 -1 -1 -2 -60 -1 -51 0 -42 1 -33 2 -24 3 -15 4 0
1Ty 0y X S 1y
The Pros and Cons of Using Attainment Only
It is straightforward to connect actual attainment with achievement targets and maintain a universal target
Average achievement and related attainment indicators such as percent proficient are severely biased as measures of classroom/school performance
Given a universal achievement target, the achievement shortfalls very enormously across teachers and schools
The Pros and Cons of Using Value-Added Only
The value-added model provides an unbiased/consistent estimate of classroom/school performance
High value-added targets do not eliminate achievement shortfalls if prior achievement (or more correctly, predicted achievement, which includes student growth) is extremely low
The Power of Using Both Indicators The value-added model provides an
unbiased/consistent estimate of classroom/school performance
Achievement shortfalls can be identified prospectively and thus can trigger supplemental resource allocations designed to eliminate them
Include Student-Level Demographics?
Yes, to provide more accurate measures of classroom/school performance
Does this reduce expectations? No, achievement targets are set
independently Predicted achievement shortfalls are not
reduced in a model that includes student demographics. In fact, they are identical
Supplemental resource allocations can be triggered to eliminate achievement shortfalls
Does Including Demographic Variables Matter?
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.300.050.1
0.150.2
0.250.3
0.350.4
State Wide Data for Grade 3 Math - VA Tier Difference After Removing Demographics
Percent of Schools
Value Added Difference
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3
Percent of Schools 0.000894
0.004468
0.001787
0.028597
0.033959
0.064343
0.161752
0.341376
0.308311
0.053619
0.000894
Percent of Students 0.000637
0.004155
0.001839
0.026487
0.029387
0.060667
0.141844
0.323455
0.341526
0.068783 0.00122
Female 0.38889 0.5234 0.60577 0.54272 0.49097 0.50277 0.50274 0.49702 0.47864 0.46581 0.44928African American 0.97222 0.83404 0.86538 0.75501 0.52828 0.29875 0.12653 0.04729 0.02879 0.01851 0.01449
Hispanic 0 0.08085 0.01923 0.14219 0.30987 0.29991 0.16181 0.07987 0.05307 0.04704 0.01449Asian 0 0.02553
20 0.03938
60.05415
20.04546
80.06756
40.03843 0.02945
90.02185
10.07246
4Indian 0 0.01276
60.00961
50.00400
50.00481
30.00903
50.02443
30.02110
10.01563
60.00462
70
White 0.02778 0.04681 0.10577 0.05941 0.10289 0.34684 0.61967 0.81321 0.87305 0.90797 0.89855Free Reduced Lunch 1 0.93617 0.82692 0.90053 0.90433 0.73535 0.56008 0.40638 0.29889 0.23805 0.15942