Measuring Teacher Effectiveness: Challenges and Opportunities
Laura Goe, Ph.D.Research Scientist, ETS, and Principal Investigator for the
National Comprehensive Center for Teacher Quality
National Association of Latino Elected and Appointed Officials (NALEO) Education
Fund NALEO Audio Conference April 10, 2012
2
Laura Goe, Ph.D.
• Former teacher in rural & urban schools Special education (7th & 8th grade, Tunica, MS) Language arts (7th grade, Memphis, TN)
• Graduate of UC Berkeley’s Policy, Organizations, Measurement & Evaluation doctoral program
• Principal Investigator for the National Comprehensive Center for Teacher Quality
• Research Scientist in the Performance Research Group at ETS
3
The National Comprehensive Center for Teacher Quality
• A federally-funded partnership whose mission is to help states carry out the teacher quality mandates of ESEA
• Vanderbilt University• Learning Point Associates, an affiliate of
American Institutes for Research• Educational Testing Service
4
Today’s presentation available online
• To download a copy of this presentation go to www.lauragoe.com Go to Publications and Presentations page Today’s presentation is at the bottom of the
page
5
The goal of teacher evaluation
The ultimate goal of all teacher evaluation should be…
TO IMPROVE TEACHING AND
LEARNING
6
Questions to be considered
• What is teacher effectiveness and why should we measure it?
• How do you measure teacher effectiveness? • What are strengths and cautions to keep in
mind when using these measures?
7
Differentiating among teachers
• “It is nearly impossible to discover and act on performance differences among teachers when documented records show them all to be the same.” (Glazerman et al., 2011, pg 1)
8
Trends in teacher evaluation
• The policy imperative to change teacher evaluation has outstripped the research
Though we don’t yet know which model and combination of measures will identify effective teachers, many states and districts feel compelled to move forward at a rapid pace
• Inclusion of student achievement growth data represents an important “culture shift” in evaluation
Communication and teacher/administrator participation and buy-in are crucial to ensure change
• The implementation challenges are considerable We are models exist for states and districts to adopt or adapt Many districts have limited capacity to implement comprehensive
systems, and states have limited resources to help them
9
It’s an equity issue
• Value-added research shows that teachers vary greatly in their contributions to student achievement (Rivkin, Hanushek, & Kain, 2005).
• The Widget Effect report (Weisberg et al., 2009) found that 90% of teachers were rated “good” or better in districts where students were failing at high levels
10
A simple definition of teacher effectiveness
Anderson (1991) stated that “… an effective teacher is one who quite consistently achieves goals which either directly or indirectly focus on the learning of
their students” (p. 18).
11
Race to the Top definition of effective & highly effective teacher
Effective teacher: students achieve acceptable rates (e.g., at least one grade level in an academic year) of student growth (as defined in this notice). States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (pg 7)
Highly effective teacher students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice).
12
Race to the Top definition of student growth
• Student growth means the change in student achievement (as defined in this notice) for an individual student between two or more points in time. A State may also include other measures that are rigorous and comparable across classrooms. (pg 11)
12
13
Goe, Bell, & Little (2008) definition of teacher effectiveness
1. Have high expectations for all students and help students learn, as measured by value-added or alternative measures.
2. Contribute to positive academic, attitudinal, and social outcomes for students, such as regular attendance, on-time promotion to the next grade, on-time graduation, self-efficacy, and cooperative behavior.
3. Use diverse resources to plan and structure engaging learning opportunities; monitor student progress formatively, adapting instruction as needed; and evaluate learning using multiple sources of evidence.
4. Contribute to the development of classrooms and schools that value diversity and civic-mindedness.
5. Collaborate with other teachers, administrators, parents, and education professionals to ensure student success, particularly the success of students with special needs and those at high risk for failure.
14
Measures and models: Definitions
• Measures are the instruments, assessments, protocols, rubrics, and tools that are used in determining teacher effectiveness
• Models are the state or district systems of teacher evaluation including all of the inputs and decision points (measures, instruments, processes, training, and scoring, etc.) that result in determinations about individual teachers’ effectiveness
15
Multiple measures of teacher effectiveness
• Evidence of growth in student learning and competency Standardized tests, pre/post tests in untested subjects Student performance (art, music, etc.) Curriculum-based tests given in a standardized manner Classroom-based tests such as DIBELS
• Evidence of instructional quality Classroom observations Lesson plans, assignments, and student work Student surveys such as Harvard’s Tripod Electronic portfolios/evidence binders
• Evidence of professional responsibility Administrator/supervisor reports, parent surveys Teacher reflection and self-reports, records of contributions
16
Teacher observations: strengths and weaknesses
• Strengths Great for teacher professional growth
- If observation is followed by opportunity to discuss results- If support is provided for those who need it
Helps evaluator (principals or others) understand teachers’ needs across school or across district
• Weaknesses Essential to have alignment between teaching standards
and observation instrument Resource intensive (personnel time, training, calibrating) Validity of observation results may vary with who is
doing them, depending on how well trained and calibrated they are
17
Example: University of Virginia’s CLASS observation tool
Emotional Support Classroom Organization
Instructional Support
Pre-KandK-3
Positive Climate
Negative Climate
Teacher Sensitivity
Regard for Student (Adolescent) Perspectives
Behavior Management
Productivity
Instructional Learning Formats
Concept DevelopmentQuality of FeedbackLanguage Modeling
Upper Elementary/Secondary
Content UnderstandingAnalysis and Problem
SolvingQuality of Feedback
18
Domain 1: Planning and Preparation includes comprehensive understanding of the content to be taught, knowledge of the students’ backgrounds, and designing instruction and assessment.
Domain 3: Instruction is concerned with the teacher’s skill in engaging students in learning the content, and includes the wide range of instructional strategies that enable students to learn.
Domain 2: The Classroom Environment addresses the teacher’s skill in establishing an environment conducive to learning, including both the physical and interpersonal aspects of the environment.
Domain 4: Professional Responsibilities addresses a teacher’s additional professional responsibilities, including self-assessment and reflection, communication with parents, participating in ongoing professional development, and contributing to the school and district environment.
Example: Charlotte Danielson’s Framework for Teaching
19
Validity of classroom observations is highly dependent on training
• A teacher should get the same score no matter who observes him
This requires that all observers be trained on the instruments and processes
Occasional “calibrating” should be done; more often if there are discrepancies or new observers
Who the evaluators are matters less than the fact that they are trained to recognize evidence and score it consistently
• Teachers should also be trained on the observation forms and processes so they can participate actively and fully in the process
20
Risk management vs. one-size-fits-all in teacher observations
• Conducting high-quality observations is a resource-intensive process A more efficient use of resources is for teachers
who have not yet demonstrated competence to be on a more intensive observation schedule
- New teachers- Teachers who have changed teaching
assignments or schools
• Other measures are less resource intensive and can be used routinely (surveys, student outcomes, portfolios)
21
Reliability results when using different combinations of raters and lessons
Figure 2. Errors and Imprecision: the reliability of different combinations of raters and lessons. From Hill et al., 2012 (see references list). Used with permission of author.
22
Formal vs. informal observations
• Formal observations are likely to be Announced and scheduled in advance according to a
pre-determined yearly schedule Include pre- and post-conferences with review of lesson
plans and artifacts Last an entire class period Result in a set of scores on multiple indicators
• Informal observations are likely to be Unannounced, drop-in Last less than an entire class period Result in informal verbal or written feedback to the
teacher, perhaps on only one indicator
23
Questions to ask about observations
• How many observations per year? Vary by new vs. experience? Vary by demonstrated competence? Combination of formal and informal?
• Who should conduct the observations?• Will multiple observers be required?• How will they be trained?
Workshops? Online (video-based)?
• Will they need to be certified?
24
Value-added models
• Many variations on value-added models TVAAS (Sander’s original model) typically uses 3+
years of prior test scores to predict the next score for a student
- Used since the 1990’s for teachers in Tennessee, but not for high-stakes evaluation purposes
- Most states and districts that currently use VAMs use the Sanders’ model, also called EVAAS
There are other models that use less student data to make predictions
Considerable variation in “controls” used
24
25
Growth vs. Proficiency Models
End of YearStart of School Year
Achievement
Proficient
Teacher B: “Failure” on Ach. Levels
Teacher A: “Success” on Ach. Levels
In terms of growth,
Teachers A and B are
performing equally
Slide courtesy of Doug Harris, Ph.D, University of Wisconsin-Madison
26
Growth vs. Proficiency Models (2)
End of YearStart of School Year
Achievement
ProficientTeacher A
Teacher B
A teacher with low-
proficiency students can still be high in terms of GROWTH (and vice
versa)
Slide courtesy of Doug Harris, Ph.D, University of Wisconsin-Madison
27
Colorado Growth Model
• Colorado Growth model Focuses on “growth to proficiency” Measures students against “academic peers” Also called criterion‐referenced growth‐to‐standard
models
• The student growth percentile is “descriptive” whereas value-added seeks to determine the contribution of a school or teacher to student achievement (Betebenner 2008)
29
What value-added and growth models cannot tell you
• Value-added and growth models are really measuring classroom, not teacher, effects
• Value-added models can’t tell you why a particular teacher’s students are scoring higher than expected Maybe the teacher is focusing instruction
narrowly on test content Or maybe the teacher is offering a rich,
engaging curriculum that fosters deep student learning.
• How the teacher is achieving results matters!
30
Recommendation from NBPTS Task Force (Linn et al., 2011)
Recommendation 2: Employ measures of student learning explicitly aligned with the elements of curriculum for which the teachers are responsible. This recommendation emphasizes the importance of ensuring that teachers are evaluated for what they are teaching.
31
School-wide VAM illustration
SS1
SS2
Sci1
Sci2
Sped FL PE ELL
Math
1
Math
2ELA
1ELA
20
1
2
3
4
5
6
7
8
Obs/SurvVAM
32
Measuring teachers’ contributions to student learning growth: A summary of current models
Model Description
Student learning objectives
Teachers assess students at beginning of year and set objectives then assesses again at end of year; principal or designee works with teacher, determines success
Subject & grade alike team models (“Ask a Teacher”)
Teachers meet in grade-specific and/or subject-specific teams to consider and agree on appropriate measures that they will all use to determine their individual contributions to student learning growth
Content Collaboratives Content experts (external) identify measures and groups of content teachers consider the measures from the perspective of classroom use; may not include pre- and post measures
Pre-and post-tests model Identify or create pre- and post-tests for every grade and subject
School-wide value-added Teachers in tested subjects & grades receive their own value-added score; all other teachers get the school-wide average
33
Tripod Survey domains
• Harvard’s Tripod Survey – the 7 C’s– Caring about students (nurturing productive relationships);– Controlling behavior (promoting cooperation and peer
support);– Clarifying ideas and lessons (making success seem
feasible);– Challenging students to work hard and think hard (pressing
for effort and rigor);– Captivating students (making learning interesting and
relevant);– Conferring (eliciting students’ feedback and respecting their
ideas);– Consolidating (connecting and integrating ideas to support
learning)
34
Tripod Survey results
• Control is the strongest correlate of value added gains
• However, it is important to keep in mind that a good teacher achieves control by being good on the other dimensions
• English & Spanish, paper or online versions at three levels: k-2, 3-5, 6-12
• For more info: http://www.tripodproject.org/index.php/index/
35
Why you should keep (and provide support to) the less effective teachers
• With the right instructional strategies and guidance, motivated teachers can improve practice and student outcomes
• The teachers you hire to replace your less effective teachers are not necessarily going to be more effective
• You may not be able to find better replacements!• You may not be any to find any replacements!• The replacements you find may not stay
36
Measures that help teachers grow
• Measures which include protocols and processes that teachers can examine and comprehend
• Measures that are directly and explicitly aligned with teaching standards
• Measures that motivate teachers to examine their own practice against specific standards
• Measures that allow teachers to participate in or co-construct the evaluation (such as portfolios)
• Measures that give teachers opportunities to discuss the results for formative purposes with evaluators, administrators, teacher learning communities, mentors, coaches, etc.
• Measures that are aligned with and used to inform professional growth and development offerings
37
Evaluating Teacher Preparation Programs (TPPs)
Evaluate teacher performance (including student outcomes)
Use results as a measure of TPP success (for evaluation purposes)
Use results to improve TPP curriculum and instruction
K-12 Teaching and learning improves as a result of changes made by TPPs
38
Meeting the “standards”
• It’s possible to be meeting accreditation standards (NCATE, TEAC) but still not be preparing fully effective teachers
• If TPPs are not adequately preparing teachers for the contexts and communities which they serve, their effectiveness may be hampered
39
Final thoughts
• The limitations: There are no perfect measures There are no perfect models Changing the culture of evaluation is hard work
• The opportunities: Evidence can be used to trigger support for struggling
teachers and acknowledge effective ones Multiple sources of evidence can provide powerful
information to improve teaching and learning Evidence is more valid than “judgment” and provides
better information for teachers to improve practice
40
References
Anderson, L. (1991). Increasing teacher effectiveness. Paris: UNESCO, International Institute for Educational Planning.
Glazerman, S., D. Goldhaber, et al. (2011). Passing muster: Evaluating evaluation systems. Washington, DC, Brown Center on Education Policy at Brookings.
http://www.brookings.edu/reports/2010/1117_evaluating_teachers.aspx Goe, L., C. Bell, et al. (2008). Approaches to evaluating teacher effectiveness: A research synthesis,
Washington, DC: National Comprehensive Center for Teacher Quality: 1-103.http://www.tqsource.org/publications/teacherEffectiveness.php Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacher
observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56-64.
Linn, R., Bond, L., Darling-Hammond, L., Harris, D., Hess, F., & Shulman, L. (2011). Student learning, student achievement: How do teachers measure up? Arlington, VA: National Board for Professional Teaching Standards.
http://www.nbpts.org/index.cfm?t=downloader.cfm&id=1305
Race to the Top Application
http://www2.ed.gov/programs/racetothetop/resources.html
Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 417 - 458.
http://www.econ.ucsb.edu/~jon/Econ230C/HanushekRivkin.pdf
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Brooklyn, NY: The New Teacher Project.
http://widgeteffect.org/downloads/TheWidgetEffect.pdf
41
Questions?
42
Laura Goe, [email protected]://twitter.com/GoeLaura
National Comprehensive Center for Teacher Quality1000 Thomas Jefferson Street, NWWashington, D.C. 20007www.tqsource.org