Washington State Teacher and Principal Evaluation Project

1

Washington State Teacher and Principal

Evaluation ProjectMaximizing Rater Agreement

June 2013

2 2

Entry Task: Confidence Conversation As you enter, please have a brief discussion with your district team to decide your level of confidence in whether the following statements are true for your district:1. Our evaluators demonstrate accuracy and strong rater

agreement when using observation data to score teacher performance.

2. Our district’s new evaluation system includes frequent, structured opportunities for evaluators to practice and calibrate their observation and rating skills.

3. Our teachers and principals trust their evaluators to rate their performance accurately and reliably.

Write your district name on three sticky notes and place them on the confidence scales posted on [INSERT LOCATION] for each statement.

3

Agenda Connecting Learning I Implementing Reflecting Wrap-Up

Welcome! Introductions Logistics Agenda

4

Modules Introduction to Educator Evaluation in Washington Using Instructional and Leadership Frameworks in

Educator Evaluation Preparing and Applying Formative Multiple Measures

of Performance: An Introduction to Self-Assessment, Goal Setting, and Criterion Scoring

Including Student Growth in Educator Evaluation Conducting High-Quality Observations and Maximizing

Rater Agreement Providing High-Quality Feedback for Continuous

Professional Growth and Development Combining Multiple Measures into a Summative

Rating

5

The Evaluation System Components

6

TPEP Core Principles“We Can’t Fire Our Way to Finland”

1. The critical importance of teacher and leadership quality2. The professional nature of teaching and leading a school 3. The complex relationship between the system for

teacher and principal evaluation and district systems and negotiations

4. The belief in professional learning as an underpinning of the new evaluation system

5. The understanding that the career continuum must be addressed in the new evaluation system

6. The system must determine the balance of “inputs or acts” and “outputs or results”

7

Session Norms Pausing Paraphrasing Posing Questions Putting Ideas on the Table Providing Data Paying Attention to Self and Others Presuming Positive Intentions

What Else?

Connecting

Builds community, prepares the team for learning, and links to prior knowledge, other modules, and

current work

8

9

Module Overview: 2 PartsA. Conducting High-Quality ObservationsB. Maximizing Rater Agreement

Reminder! This module provides an orientation to

the basic concepts. This module does not go into great depth about

evidence relating to any of the specific instructional or leadership frameworks and instead leaves it up to the districts to seek additional training.

10

Overview of Intended Participant Outcomes Participants will know and be able to: Describe the OSPI working definition of rater

agreement and the stages for development. Identify common rating errors in their own and

others’ practice. Utilize appropriate strategies for minimizing

bias and error in the observation and rating process.

Understand the elements of high-quality training required to achieve maximum rater agreement.

11

Connecting Content: Importance of Rater Agreement Even if you select a high-quality instructional

or leadership framework AND Observers use best practices in collecting the

observation data:The results will be meaningless if observers are unable to demonstrate accuracy and consistency in scoring using the framework.

KEY POINT: An educator’s observation scores should be the same regardless of the observer.

12

Importance of Rater AgreementDemonstrating rater agreement is critical to ensuring that: Educators can trust the new evaluation

system. Educators receive relevant, useful

information for professional growth. The new system is legally defensible for

personnel decisions.

13

Rater Agreement Background The new law requires that evaluators of both

teachers and principals “must engage in professional development designed to implement the revised systems and maximize rater agreement.”

The Teacher and Principal Evaluation Project (TPEP) has relied heavily on the growing body of research, the framework authors, and the practical input from practitioners in pilot sites to create a “working definition” of rater agreement for the 2012-13 school year.

14

OSPI Definition of Rater AgreementThe extent to which the scores between the raters have consistency and accuracy against predetermined standards. The predetermined standards are the instructional and leadership frameworks and rubrics that define the basis for summative criterion-level scores.

15

OSPI Definition of Rater Agreement Consistency: A measure of observer data

quality indicating the extent to which an observer is assigning scores that agree with scores assigned to the same observation of practice by another typical observer.

Accuracy: A measure of observer data quality indicating the extent to which an observer is assigning scores that agree with scores assigned to the same observation by an expert rater; the extent to which rater’s scores agree with the true or “correct” score for the performance.

16

Calculating Rater AgreementTable 1. Illustrating Rater Agreement

Component Component Score Type of Agreement

Rater A Rater B Master Scorer

1 4 4 4 Exact Agreement

2 3 2 3 Adjacent Agreement

3 1 4 4 ?4 3 3 1 ?

17

Calculating Rater AgreementTable 1I. Illustrating Rater Agreement (Cont.)

Component Component Score More than 1pt Off

Rater A Rater B Master Scorer

Subcomponent 1

4 3 4 No

Subcomponent 2

2 1 3 Yes

Subcomponent 3

1 3 4 Yes

Subcomponent 4

4 3 1 Yes

Subcomponent 5

3 4 2 Yes

Component Score (Average)

2.8 2.8 2.8

18

Connecting Activity: Where Can You Assess Rater Agreement? Summativ

e Criterion

ScoreCriteria 2

Criteria 1

Criteria 3

Criteria 4

Criteria 5

Criteria 6

Criteria 7

Criteria 8

Framework scales (e.g.,

components,

domains, dimensio

ns)

Evidence

• Observation evidence

Framework Score

• Student growth data

• Artifacts• Other

relevant evidence

Learning

Understand common sources of rater error and strategies for minimizing their influence in observer

ratingsUnderstand the role of high-quality observation

training in achieving rater agreement

19

20

Learning Content I. Avoiding Rater ErrorRecall that a skilled observer: 1. Understands each component and indicator on

the district rubric thoroughly and deeply.

2. Gathers and sorts sufficient evidence of practice as it happens in the classroom or school.

3. Recognizes and puts aside preferences and biases.

4. Interprets the evidence appropriately to give an accurate rating using the evaluation instrument.

(McClellan, Atkinson, & Danielson, 2012)

21

Avoiding Common Rater Errors Central Tendency

A rater evaluates the observation using points on the middle of the scale and avoids extremely high or low ratings.

Strategy to avoid this error? Pay careful attention to behavioral anchors that define

performance at each scale point. Compare observation evidence with the behavioral

anchors. Keep in mind that behavioral anchors are examples—

you do not have to have observational evidence for every single anchor for a particular rating.

22

Avoiding Common Rater Errors Contrast Effect

A rater directly compares the performance of one educator to that of another educator.

This is particularly problematic when a group of educators select a common criterion on a focused evaluation cycle.

Strategy to avoid this error? When assigning observation ratings, do not

use another educator’s performance as a point of reference. Raters should only compare the observation evidence against the anchors on the rating scale.

23

Avoiding Common Rater Errors Focusing on One or Two Incidents

Ratings are based on only a small sample of observation evidence that typically includes either very strong or weak examples of practice.

Strategy to avoid this error? Be sure to take into account the full range of performance described in the observation evidence. Assess the frequency and depth of the behaviors recorded against the behavioral indicators in the rubric.

24

Avoiding Common Rater Errors Halo Error

A rater allows ratings on one component/scale to influence ratings on another component/scale.

Strategy to avoid this error? Remember that framework components are scored separately. Your ratings on one component should not influence ratings on another component.

Consider the observation evidence for each component separately and only use the information that is relevant to the component you are considering.

25

Avoiding Common Rater Errors Potential Error

A rater gives higher or lower ratings to an educator then is warranted by the observation evidence because he or she believes the educator has (or does not have) the potential to be an excellent educator.

Strategy to avoid this error? Remember to consider all instances of an

educator’s actual observation data. Ratings should be made based only on the observation evidence collected, not on anticipated improvements or declines.

26

Avoiding Common Rater Errors Leniency and Severity Errors

A rater gives mostly high (lenient) or low (severe) ratings to an educator in a manner inconsistent with the observation data collected.

Strategy to avoid this error? Pay careful attention to the scale anchors when

making your ratings. Also, review the anchors in order to understand how performance is defined at each scale point. Do not try to be intentionally “easy” or “hard” in your ratings.

27

Avoiding Common Rater Errors Recency Bias

A rater is inclined to remember recent events better than those that occur in the past; thus, raters often place greater weight or emphasis on evidence collected near the end of the observation.

Strategy to avoid this error? Consider all of the observation evidence

collected over the entire class period. Remind yourself that the educator’s performance at the beginning of the observation is just as important as his or her performance at the end.

28

Avoiding Common Rater Errors Similar-to-me Bias

A rater provides higher ratings to educators who are similar to themselves and lower ratings to educators who are dissimilar.

Strategy to avoid this error? Avoid incorporating personal preferences, feelings, or perceptions about the educator into your ratings. Only actual observation evidence should be used to make an observation rating.

29

Learning Activity I. Practicing Observation Rating

You will need the following: Your observation notes from the Conducting High

Quality Observations module Your district’s instructional framework

Identify sections of your framework aligned to Criteria 5:

Instructional Framework & Criteria 5 Alignment

CEL 5D+ Danielson Marzano

CEC 1, 2, 4-7 2a, 2c, 2d, and 2e 5.1 through 5.6

30


Step 1: As a group: select two indicators from the list,

read them through, and discuss the key differences between the performance levels in each.

Step 2: As an individual:

Read your observation notes and code the evidence relevant to each indicator (e.g., use highlighters, make notations, etc.).

Select a rating for each indicator based on your coded evidence.

31


Step 3: Select one person to be the “recorder” and

write down ratings in Handout 4: Ratings Record.

Share your indicator ratings for recording.

32

Learning Activity I. Practicing Observation Rating Step 4:

Identify any indicator without exact rater agreement.

Discuss and attempt to achieve a rating consensus on each (e.g., explain your ratings with reference to evidence).

Note and record, during the discussion, any common rating errors you find in your own ratings or others in your group (see Handout 3: Common Rating Errors for reference).

33

Learning Activity I: Debrief/Wrap-Up Did anyone achieve exact rater

agreement on at least one indicator? Both?

Were you able to achieve consensus on ratings where you did not have exact rater agreement?

What rater errors did you identify and what strategies could you utilize in the future to avoid the error?

34

Learning Content II: Observer Training to Achieve Rater Agreement Intensive Training to Achieve Rater

Agreement Orientation and deep understanding of

standards and framework, components, and tools

Practice rating using a combination of videos and live observations

Feedback, coaching, and discussion of ratings

Assessment of rater agreement (e.g., certification testing)

35

OSPI’s Stages of Rater Agreement Training

2-3 Day

Foundational

Training

Ongoing Rater

Agreement

Training

36

Certification and Calibration Exams The certification exam should cover all grades

and subjects the observer will observe. There are a variety of ways to reduce the

time burden of certification: Include a knowledge assessment of the observation rubric Mix shorter videos of practice with longer, full-lesson

videos of practice The calibration exam should test the observer

on a representative selection of skills and content to ensure continued accuracy in rating.

Certification and calibration exams are high-stakes exams.

37

Ongoing Calibration, Practice, and MonitoringOngoing Calibration, Practice, and Monitoring Rater agreement is NOT ensured

by a single training or certification test.

38

Ongoing Calibration, Practice, and MonitoringRater drift will naturally occur unless evaluators have:Periodic opportunities to re-calibrate.

Access to practice videos for difficult-to-score domains/components.

Expectations that their ratings will be monitored.

39

Ongoing Calibration, Practice, and MonitoringLessons from TPEP Pilots:

Informal calibration through discussion forums where observers share challenges and best practices have a big impact.

Use pre-existing professional learning groups (such as principal PLCs) to practice and calibrate.

To practice, co-observe a classroom lesson, score separately, and meet to compare scores.

40

Learning Activity II: Identifying Opportunities for CalibrationDiscuss with your team: What opportunities already exist in your district for ongoing calibration? Identify at least two.

Share: What opportunities have you identified?

Implementing

41

Develop a district plan for ongoing assessment and monitoring of rater

agreementDevelop a district plan for ongoing

rater calibration and practice

42

Implementing Activity: Monitoring and Maintaining Rater Agreement Read “Maximizing Rater Agreement: A Primer” and “Rater Agreement in Washington State’s Evaluation System” (20 minutes)

Use the Implementation Planning Tool (Handout #5) to begin developing your district’s plan for monitoring and maintaining rater agreement over time

43

Implementing Activities DebriefEach team share two things to debrief our implementing tasks:1. One decision you made today

(could be a key decision, a preliminary decision, a change of course, etc.)

2. One of the immediate next steps you are taking when you return to your district

Reflecting

44

45

Revisiting Our Confidence Conversations1. Our evaluators demonstrate accuracy and

strong rater agreement when using observation data to score teacher performance.

2. Our district’s new evaluation system includes frequent, structured opportunities for evaluators to practice and calibrate their observation and rating skills.

3. Our teachers and principals trust their evaluators to rate their performance accurately and reliably.

46

What’s Next Homework options:

District or school teams: use your observation notes to practice scoring additional indicators in your framework and discuss ratings to achieve agreement. Identify specific components or dimensions

you think will be particularly hard for your observers to score. Prioritize those components or dimensions in your ongoing calibration and practice sessions.

47

Thank you!

Presenter NameXXX-XXX-XXXX

[email protected]

1234 Street AddressCity, State 12345-1234

800-123-1234

Documents

Washington State Teacher and Principal Evaluation Project