University of Wisconsin in Milwaukee Socially Responsible … · 2018. 6. 12. · In 2013-2014, the state of Wisconsin piloted aspects of its Educator Effectiveness System. The pilot

EvaluationoftheWisconsinEducatorEffectivenessSystem

ResultsoftheTeacherPracticeRatingSystemPilot

CurtisJ.Jones

UniversityofWisconsininMilwaukeeSociallyResponsibleEvaluationinEducation

January2015

SociallyResponsibleEvaluationinEducation

EvaluationoftheWisconsinEducatorEffectivenessSystemi

EvaluationTeam

CurtisJ.JonesisaseniorscientistintheSchoolofEducationattheUniversityofWisconsin,Milwaukee,anddirectorofSociallyResponsibleEvaluationinEducation.

JessicaArrigoniisaresearcherattheConsortiumforPolicyResearchinEducationintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.

MikhailPyatigorskyisaneconomistandresearcherattheValue-AddedResearchCenterintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.

ClarissaSteeleisasurveyresearcherandtheleadforcommunicationsandprofessionaldevelopmentfortheValue-AddedResearchCenterintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.

RobinWorthisaresearcherattheConsortiumforPolicyResearchinEducationintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.

Acknowledgments

Wewouldliketothankthemanyindividualswhocontributedtothedevelopmentofthisreport,especiallyKatharineRaineyandLauraRuckertattheWisconsinDepartmentofPublicInstruction,andStevenKimballattheUniversityofWisconsin,Madison.

Wewouldalsoliketothankthefollowingindividualswhoprovidedfeedbackonthereportandcontributedtotheevaluation:BradleyCarl,HerbHeneman,JacobHollnagel,RachelLander,TonyMilanowski,SamuelPurdy,LoganRoman,andStevenSmith.

FormoreinformationaboutthisreportortheevaluationofEE,[email protected]


EvaluationoftheWisconsinEducatorEffectivenessSystemii

KeyFindingsIn2013-2014,thestateofWisconsinpilotedaspectsofitsEducatorEffectivenessSystem.Thepilotoftheteacherpracticeevaluationprocess,usingtheDepartmentofPublicInstruction(DPI)model1,involvedevaluatorsratingthequalityofteachingaccordingtotheCharlotteDanielson2013FrameworkforTeachingrubric.Severalresultsfromthepilotwillinformthestate-wideimplementationoftheDPIteacherpracticeevaluationprocess.However,interpretationoftheseresultsshouldbedonecautiously.Itisnotknowniftheseresultswillholdupwhenmoreevaluatorsarecertifiedandtheanalysesarebasedonfinalratingsforthewholestateratherthansingleobservationsforaselectionofschools.Further,lackofavailableteacherdatamakemanyofthesefindingsdifficulttointerpret.

ThegreatmajorityofpilotteachersandprincipalsfelttheyknewhowtoimplementtheDPIteacherpracticeevaluationprocess.ThissuggeststhatthetrainingandinformationdevelopedbyDPIhasbeeneffective.

PilotingeducatorsgenerallybelievedtheFrameworkforTeachingaccuratelydefinesinstructionalqualityandthatitisfairtouseaspartofateacherevaluationsystem.

Pilotingeducatorsexpressedconcernsabouttheconsistencyoftheimplementationoftheteacherpracticeevaluationprocess.TherewassomeconcernthattheFrameworkforTeachingmaynotbefairtoteacherswithdifferentevaluators,typesofstudentsandindifferentsubjectareas.

ThetimeandresourceburdentoschoolsrepresentsthelargestsinglebarrierforimplementingEE,bothgenerally,andspecificallyfortheteacherpracticeevaluationprocess.

TheTeachscapedatacollectionplatformwasviewedbymanypilotingeducatorsasaseriousdeficiencyinthesystem.Manyexpressedfrustrationwithitandfeltitwaswastingagreatdealoftime.

Evenconsideringthetimeandresourceburdendistrictsexperienced,pilotfindingssuggestthatmostparticipantsbelievethatstandardizingtheteacherpracticeevaluationprocessisaworthwhileendeavor.

NearlyallpilotteacherswereratedasProficient.74%ofteachersdidnotreceiveanyratingsbelowProficientonanycomponents.Althoughitispossiblethattheseresultsmostlyreflectthequalityofteachingamongpilotteachers,thedisproportionatenumberofteacherswhowereratedasexactlythreeacrossall22Frameworkfor

1DatawerenotavailableforanalysistoinformtheimplementationofotherteacherpracticemodelsusedinWisconsin,suchastheCESA6model.


EvaluationoftheWisconsinEducatorEffectivenessSystemiii

Teachingcomponentssuggeststhatotherfactors,suchasschoolculture,mayhavealsoinfluencedratings.

Thereweresignificantdifferencesintheratingsassignedtoteachersindifferentdistrictsandschools.ThesedifferencesmayreflectrealdifferencesortheymayreflectdifferencesintheapplicationoftheFrameworkforTeaching.

TeachersindistrictsthatplantouseEEresultsforhigh-stakespurposesandteachersinlesscrowdedschoolswerefoundtohavehigherpracticeratings.Althoughtheserelationshipswereentirelyexplainedbyschoolfree/reducedlunchparticipationrates,thesefactorsmaystillpartiallyexplainwhyschoolfree/reducedlunchparticipationratespredictpracticeratings.Futureevaluationworkisneededtoexplorethisfurther.

Schoolfree/reducedlunchparticipationrateswerefoundtobeastrongpredictorofteacherpracticeratings.Thisfindingmayindicatethatmoreeffectiveteachersareselectingintohigherincomeschools.ItmayalsosuggestthatratingsusingtheFrameworkforTeachingarerelatedtothetypesofstudentsinclassrooms.Therewasevidenceforbothinthepilot.ThisfindingisconsistentwithwhathasbeenfoundinotherEducatorEffectivenessevaluationsincludingtheMeasuresofEffectiveTeaching(MET)ProjectandarecentreportbyBrookingsInstitute.

UntiltherelationshipofF/Rlunchratesandpracticeratingsisbetterunderstood,itisimportantthatdistrictsnotusepracticeratingstocompareteachers.ItisnotclearyethowtheFrameworkforTeachingcouldbeusedinavalidwayforthispurpose.Districtsshould,instead,focusonanalyzinggrowthofindividualeducatorsacrosstheyear.Theproperuseofpracticeratingswillberevisitedafterthefirstfullyearofimplementationin2014-2015.


EvaluationoftheWisconsinEducatorEffectivenessSystemiv

TableofContents

EvaluationofWisconsinEducatorEffectiveness(EE)SystemPilotingtheTeacherPracticeRatingSysteminWisconsin.............................................................................................................................................................................................................1

TheWisconsinteacherpracticeevaluationprocess..................................................................................................................1Evaluationquestionsandmethods....................................................................................................................................................3Data....................................................................................................................................................................................................................5Analysis.............................................................................................................................................................................................................6

Teacherpracticeevaluationresults...................................................................................................................................................7Howwelldoeducatorsunderstandtheteacherpracticeevaluationprocess?................................................................7HowdoeducatorsfeelabouttheFrameworkforTeachingbeingusedtoevaluateteachers?................................8Doeducatorsbelievetheywillhavethetimeandresourcesnecessarytocompleteteacherpracticeevaluations?..................................................................................................................................................................................................11HowdoeducatorsperceivetheinclusionoftheFrameworkforTeachingwillimpactthequalityofWisconsinteaching?.................................................................................................................................................................................13Whoparticipatedandtowhatdegreeinthepilotofteacherpracticeevaluations?..................................................16HowwelldidratingsassignedtoteachersreflectexpectedrelationshipsbetweenthedomainsandcomponentsthatcomprisetheFrameworkforTeaching?.....................................................................................................18Howwereteachersratedoverall?......................................................................................................................................................20Howwereteachersratedoncomponents?.....................................................................................................................................22Whatevidenceistherethatteachersarebeingrateddifferentlyindifferentcontexts?..........................................24

Summaryanddiscussion......................................................................................................................................................................33Appendix......................................................................................................................................................................................................37


EvaluationoftheWisconsinEducatorEffectivenessSystemv

Tableoftables

Table1.Teacherfinalratings:Domaincorrelations.....................................................................................................................19

Table2.Descriptivestatisticsofannounced,unannouncedandfinalratings...................................................................20

Table3.Descriptivestatisticsofschoolcharacteristicsandratings......................................................................................27

Table4.Correlationsofschoolcharacteristicsandteacherratings.......................................................................................28

Table5.Covarianceparametersofunconditionalmulti-levelmodelspredictingpracticeratings.........................29

Table6.Resultsofmodelspredictingratings..................................................................................................................................30

Table7.Averagemulti-levelmodelcomponentcoefficients.....................................................................................................33

Table8.Frequenciesofcomponentsratedbytypeofobservation........................................................................................37

Table9.Internalconsistencyoffinalteacherratings...................................................................................................................38

Table10.ComponentcorrelationswithinFrameworkforTeachingdomains...................................................................39

Table11.CFAFinalratingcomponentresults.................................................................................................................................41

Table12.Descriptivestatisticsofcomponentratings..................................................................................................................43

Table13.Resultsofmulti-levelmodelspredictingAnnouncedobservationcomponentratings.............................44

Table14.Resultsofmultinomialgeneralizedlinearmixedmodelspredictingratingsresults:ChangeinprobabilitiesofratingsasafunctionofschoolF/Rlunchrate(ratingsarecomparedtoDistinguished).............45


EvaluationoftheWisconsinEducatorEffectivenessSystemvi

Tableoffigures

Figure1.Confidenceofprincipalsinworkingthroughthestepsoftheteacherpracticeevaluationprocess......7

Figure2.UnderstandingoftheFrameworkforTeaching..............................................................................................................8

Figure3.PerceivedfairnessofusingtheFrameworkforTeachingtoevaluateteacherpractice.............................10

Figure4.PerceptionsaboutthequalityoftheFrameworkforTeaching.............................................................................11

Figure5.Educatorimpressionsofthetimeandresourcebarrierstocompletingteacherpracticeevaluations...............................................................................................................................................................................................................................13

Figure6.Perceptionoftheimpactofevaluatingteacherpracticeoninstruction...........................................................15

Figure7.Teacherperceptionoftheimpactoftheevaluationontheirpractice...............................................................15

Figure8.Cumulativepercentageofaverageteacherratings....................................................................................................21

Figure9.Finalratings:FewteachersreceivedafinalratingofBasiconanycomponentswhilenonewereratedasUnsatisfactory................................................................................................................................................................................23

Figure10.Observationratings:MoreteacherswereratedasBasicthanwereinfinalratings................................24

Figure11.PlotofteacherratingsandschoolF/Rlunchparticipation.................................................................................30

Figure12.2d(ManagingStudentBehavior)-ProbabilitiesofbeingratedUnsatisfactory,Basic,andProficientcomparedDistinguishedinschoolswithdifferentF/Rlunchrates........................................................................................32

Figure13.ConfirmatoryfactoranalysisFrameworkforTeachingstructuralmodelwithstandardizedcoefficients.......................................................................................................................................................................................................40

Figure14.Distributionofaverageoverallratings.........................................................................................................................42


EvaluationoftheWisconsinEducatorEffectivenessSystem1

EvaluationofWisconsinEducatorEffectiveness(EE)SystemResultsoftheTeacherPracticeRatingSystemPilot

ThisreportpresentstheresultsoftheteacherpracticeevaluationprocessoftheWisconsin

EducatorEffectiveness(EE)Pilotandexplorestheimplicationsoftheseresultsfor

Wisconsin’seffortstoimplementcomponentsofthesystemacrossallWisconsindistricts

in2014-2015.Inpreparationforthestate-wideimplementation,Wisconsinpilotedfour

aspectsofitsEEsystemduringthe2012-2013and2013-2014schoolyears:evaluationsof

teacherpractice,StudentLearningObjectives,SchoolLearningObjectives,andevaluations

ofprincipalpractice.Thepurposeofthispilotwasbothtogivedistrictstheopportunityto

practiceandlearntheseprocessesandtoinformthestate’seffortstoimprovetheir

implementation.Aspartofthislearningprocess,duringthe2013-2014schoolyear,the

WisconsinDepartmentofPublicInstruction(DPI)contractedtheUniversityofWisconsin,

Milwaukee,SociallyResponsibleEvaluationinEducationtoconductanindependent

evaluationofthepilotoftheirEESystem.Thisreportfocusesexclusivelyontheresultsof

theteacherpracticeevaluationpilot.

TheWisconsinteacherpracticeevaluationprocess

DistrictsworkingwithDPI2ontheteacherpracticeevaluationcomponentoftheEEsystem

usetheCharlotteDanielson2013FrameworkforTeachingtomeasureinstructional

quality.3Theinclusionofthe2011versionintheMeasuresofEffectiveTeaching(MET)

Project4showedthateducatorratingsusingthisrubricwererelatedtobothstudent

growthonachievementtestsandstudentratingsofteacherquality.The2013versionis

virtuallyidenticaltothe2011version.TheFrameworkforTeachingseparatesteacher

practiceintofourdomainsand,withinthosedomains,22components(presentedbelow).

2Districtsarepermittedtouseothersystemsforevaluatingteacherpractice.AlthoughmostWisconsindistrictshaveoptedtoworkwithDPI,anumberhavedecidedtoworkwithCooperativeEducationalServiceAgency(CESA)6.CESA6hasnotmadetheirteacherpracticeratingsdataavailableforindependentanalysis.3http://www.danielsongroup.org/userfiles/files/downloads/2013EvaluationInstrument.pdf4http://www.metproject.org/



2013FrameworkforTeaching

Domain1:PlanningandPreparation

1aDemonstratingKnowledgeofContentandPedagogy

1bDemonstratingKnowledgeofStudents

1cSettingInstructionalOutcomes

1dDemonstratingKnowledgeofResources

1eDesigningCoherentInstruction

1fDesigningStudentAssessments

Domain2:ClassroomEnvironment

2aCreatinganEnvironmentofRespectandRapport

2bEstablishingaCultureforLearning

2cManagingClassroomProcedures

2dManagingStudentBehavior

2eOrganizingPhysicalSpace

Domain3:Instruction

3aCommunicatingWithStudents

3bUsingQuestioningandDiscussionTechniques

3cEngagingStudentsinLearning

3dUsingAssessmentinInstruction

3eDemonstratingFlexibilityandResponsiveness

Domain4:ProfessionalResponsibilities

4aReflectingonTeaching

4bMaintainingAccurateRecords

4cCommunicatingwithFamilies

4dParticipatinginaProfessionalCommunity

4eGrowingandDevelopingProfessionally

4fShowingProfessionalism

Duringtheteacherpracticeevaluationpilot,teacherswereaskedtoparticipateinatleast

one40-minuteAnnouncedobservationoftheirinstructionandatleastthree15-minute



Unannouncedobservations.AlthoughtheWisconsinEEsystemdoesnotrecommend

ratingstoberecordedafterobservations,Teachscape,theonlineEEprogresstracking

platform,requiredevaluatorstoenterratingstorecordthattheyhaddoneanobservation.

Thishassincebeenchanged.Thus,althoughratingsdataareavailableforpilot

observations,insubsequentyearsratingsforspecificobservationsmaynotbeavailable.

Basedontheseobservations,teachersweretypicallyratedonthetencomponentsofthe

FrameworkforTeachingthatcomprisetheClassroomEnvironmentandInstructiondomains

aseitherone(Unsatisfactory),two(Basic),three(Proficient),orfour(Distinguished).5,6At

theendoftheyear,basedontheseobservationsandothersourcesofevidence,suchas

teacherconferencesanddocumentreview,evaluatorsassignedfinalratingsonall22

componentstoteachers.

Certifiedevaluators(oftenprincipals)receivedintensivetraining(30hoursofprofessional

development)andthenweretestedtodetermineiftheycouldcorrectlyratethequalityof

instructionseeninexamplevideos.Thepurposeofthistrainingwastocalibrateand

normalizescoringacrossevaluators.Pilotingdistrictsalsoreceivedtrainingandsupport

fromDPIandtheirlocalCESAthroughouttheyeartoensuretheyunderstoodtheprocess

andthetimingforcompletingthestepsintheteacherpracticeevaluationprocess.These

supportsandresourcesweredesignedtoincreasethelikelihoodthatteacherpractice

ratingswouldbebothvalidandreliableacrossthepilotingdistricts.

Evaluationquestionsandmethods

TheoverallpurposeoftheongoingevaluationoftheWisconsinEducatorEffectiveness(EE)

systemistoidentifythestate’sprogresstowardimplementingthepartsoftheEEsystem;

teacherpracticeevaluations,principalpracticeevaluations,studentlearningobjectives,

schoollearningobjectives,andeventuallyprincipalandteachervalue-added.Thisreport

5Althoughoverallanddomainscoreswerenotcalculatedinthepilot,forthepurposeofthisreport,overallscoreswerecalculatedbyaveragingcomponentscores.6Foradetaileddescriptionofthedifferentratingcategories,seehttp://ee.dpi.wi.gov/teacher/t-levels-performance



presentstheresultsofthe2013-2014teacherpracticeevaluationpilotandexploresthe

conditionsthatmaybothpromoteandinhibititsfullimplementationin2014-2015.7

Theresultsoftheevaluationareorganizedtoaddresseachofthequestionsoutlinedbelow.

TopromotethesuccessfulimplementationsofEEsystems,previousresearchhas

emphasizedtheneedforeducatorstobothunderstandthesystemandhaveanopenmind

aboutitsusefulnessandfairness.8Assuch,thisreportfirstpresentsevidencefromsurveys

abouthowwellWisconsineducatorsfeeltheyunderstandtheteacherpracticeevaluation

process.Italsosummarizeseducatorattitudesabouttheteacherpracticeevaluation

process;specificallyhowfairtheybelieveitisandhowtheevaluationofteacherpractice

mayimpactthequalityofinstructioninWisconsin.Italsosummarizeseducatoropinions

aboutthefeasibilityoftheworknecessarytocompletetheteacherpracticeevaluation

process.

Aftersummarizingeducatoropinionsabouttheprocess,thisreportthenpresentsthe

resultsoftheteacherpracticeevaluationpilotandsummarizesevidencesuggestinghow

wellschoolsimplementedtheprocess.Inaddition,theresultsofadhocstatisticalanalyses

designedtoidentifydistrictandschoolfactorsthatmayinfluenceteacherpracticeratings

arepresented.

7WhilethisreportwasnotpubliclyavailableuntilJanuary2015,JonesupdatedDPIbiweeklyontrends,findings,andfeedbacktoinformmodificationstotheSystemanditsresources,asnecessary.8Milanowski,A&Kimball,S.(April,2003).TheFramework-BasedTeacherPerformanceAssessmentSystemsinCincinnatiandWashoe.CPRE‐UWWorkingPaperSeries.



Data

Invitationstocompleteend-of-yearsurveysweree-mailedtothe329administratorsand

388teachersthathadoriginallyagreedtopilotthesystemin2013-2014.Ofthese,190

(58%)administratorsand171(44%)teachersresponded.Allbutfouradministratorswere

principalsbut11%ofprincipalsalsoheldotherpositionsintheirschooldistricts.

RespondingteachersrepresentedarangeofgradelevelsincludingK-2nd(23%),3rdto5th

(24%),6thto8th(29%),andhighschool(35%).9Althoughthesurveyscapturedmany

aspectsoftheEEpilot,onlythequestionsthataddresstheteacherpracticeevaluationpilot

arepresentedinthisreport.

9Someteacherstaughtmorethanonegradelevel.

EvaluationQuestions

1. Howwelldoeducatorsunderstandtheteacherpracticeevaluationprocess?

2. HowdoeducatorsfeelabouttheFrameworkforTeachingbeingusedtoevaluate

teachers?

3. Doeducatorsbelievetheywillhavethetimeandresourcesnecessaryto

completeteacherpracticeevaluations?

4. HowdoeducatorsperceivetheinclusionoftheFrameworkforTeachingwill

impactthequalityofWisconsinteaching?

5. Whoparticipatedandtowhatdegreeinthepilotofteacherpracticeevaluations?

6. Howwelldidratingsassignedtoteachersreflectexpectedrelationshipsbetween

thedomainsandcomponentsthatcomprisetheFrameworkforTeaching?

7. Howwereteachersratedoverall?

8. Howwereteachersratedoncomponents?

9. Whatevidenceistherethatteachersarebeingrateddifferentlyindifferent

contexts?



PilotparticipationandratingsdatawereobtainedfromTeachscape,theonlineplatform

usedbyDPItodocumentpilotparticipationactivitiesacrossEEsystemcomponents.These

dataweresupplementedwithadditionaldatacollectedfromtheNationalCenterof

EducationStatistics(NCES),andthestateofWisconsin’sWISEdashsystem.

Thespecificdatausedinthisreportandthesourcesarepresentedbelow.

Teachscape teacherpracticeratings, evaluator, school, schooldistrict, dateofobservation

WISEdash 2013-2014schoolfreeorreducedlunchparticipation Schoolsize

NCES 2011-2012schoolteacher/studentratio Equivalentfull-timeteachersineachschool

Surveys planneduseofteacherratings(highstakesornot) understandingoftheevaluationofteacherpractice attitudestowardtheevaluationofteacherpractice perceivedimpactoftheevaluationofteacherpractice

oninstruction

Analysis

Mostoftheanalysesinvolvedsimpledescriptivestatisticsandfrequenciestopresent

educatorattitudesandteacherpracticeratings.Variouspsychometricmethodswereused

toexplorehowwelltheratingsonindividualcomponentsanddomainsfittogetherto

defineinstructionalpractice.Theseincludedcorrelations,internalconsistencyanalysis,

andconfirmatoryfactoranalysis.10Finally,statisticalmodeling11wasusedtoexplorethe

relationshipsofteacherpracticeratingswithschoolanddistrictcharacteristics.Models

10Matsunaga,M.(2010).Howtofactor-analyzeyourdataright:Do’s,don’tsandhowto’s.InternationalJournalofPsychologicalResearch,3,97-110.11Raudenbush,S.W.andBryk,A.S.(2002).HierarchicalLinearModels(SecondEdition).ThousandOaks:SagePublications.



nestedpracticeratingswithinschoolsandthenwithindistricts.Onlyschoolanddistrict-

levelvariablesweretestedinthemodelssinceindividualteacherandclassroomdatawere

notavailable.Futureevaluationworkwillanalyzetherelationshipsofteacherand

classroomfactorswitheffectivenessratings.

Teacherpracticeevaluationresults

Howwelldoeducatorsunderstandtheteacherpracticeevaluationprocess?

Asmentionedpreviously,itiscriticalforthesuccessoftheimplementationofEEthat

educatorsunderstandthesystem.ItappearsthattheworkDPIhasdevotedtodeveloping

thisunderstandingineducatorshaspaidoff.Specifically,thegreatmajorityofprincipals

(74%)expressedconfidencethattheycouldworkthroughthevariousstepsoftheteacher

evaluationprocessusingtheFrameworkforTeachingandonly2%didnotfeelconfident

(Figure1).

Figure1.Confidenceofprincipalsinworkingthroughthestepsoftheteacherpracticeevaluationprocess

25%

49%

24%

2%

HowconfidentdoyoufeelworkingthroughthestepsoftheteacherevaluationprocessusingtheDanielsonFramework?(Administrators)

veryconfident confident somewhatconfident notconfident



Inaddition,themajorityofbothprincipals(71%)andteachers(70%),agreedwiththe

statementthattheyunderstoodtheFrameworkforTeachingwhileonly1%ofprincipals

and10%ofteachersdisagreed(Figure2).Basedonthesesurveyresultsitappearsthatthe

greatmajorityofrespondentsfelttheyknowhowtoimplementtheteacherpractice

evaluationprocess.

Figure2.UnderstandingoftheFrameworkforTeaching

HowdoeducatorsfeelabouttheFrameworkforTeachingbeingusedtoevaluateteachers?

Althougheducatorsgenerallyreportedunderstandinghowtoevaluateteacherpractice

usingtheFrameworkforTeaching,itisstillimportantfortheacceptanceofthesystemthat

educatorsfeelitwillresultinfairevaluationsofteacherpractice.Surveyresultssuggest

thatbothteachers(89%)andadministrators(95%)atleastsomewhatagreedthatthe

FrameworkforTeachingisafairmethodforpartiallydeterminingtheeffectivenessof

teachers(Figure3).Asonerespondentexpressed:

71% 70%

28%20%

1% 6%4%

IunderstandtheDanielsonFramework.(Administrators)

MyevaluatorunderstoodtheDanielsonFramework.(Teachers)

agree somewhatagree somewhatdisagree disagree



“Icommendthestateinselectinganevaluationprocessthatishighlyprescribed‐theTeachscapetrainingwereceivedontheDanielsonframeworkwaslengthy,butittrulyhelpedmyunderstandingoftheratings.”

Therestillremainssomedegreeofconcernthoughthatyoucannevercompletelyremove

thehumanelementfromtheevaluationprocess.

“Theevaluationcanstillbebiasedbasedonwhoislookingforwhat.Thelinesbetweena3ora4ora2aversusa2bcanbeveryblurred.Thereisnoclearcutsystembutbecauseofthisnosystemwilltrulybe100%fair.Thiscanbeapartialdeterminationofeffectivenessyes...howeverIamleeryaboutdistrictsthatlooktoonlyusethismodel.”

“Sure,itcanbeafairtool,butthereisstillthehumanelementintheevaluator.Iunderstandthatcalibrationissupposedlytakingplace,buttowhatend.Nextyear,ourdistrictmayonlyhaveoneof3administratorsdoingevalsbecause2ofthemarestrugglingtogetcalibrated.Whentheyfinallygettherubberstamp,howconsistentdoyouthinktheevalswillbeacrossourdistrict?”

Therewasfurtherconcernthatratingsmaybesomewhatdependentonthecharacteristics

ofstudentsintheclassroomandofthesubjectbeingtaught.

“Manyofthe"lookfors"intheframeworkaresomewhatdependentonthestudentpopulationthatteachersorprincipalsareworkingwith.”

Finally,therewasalsosomeconcernthattheobservation(especiallyAnnounced

observations)wouldbestagedandnotentirelyreflecttheactualqualityofinstruction

occurringinclassrooms.

“Ifeelthatifevaluatorsonlyuseevidencegleanedinoneobservationandnothingelseitcanandwillskewscoresbothpositivelyandnegatively.Someevaluatorswillneedtousewhattheyknowabouttheeducatorandhis/herpracticetofairlyevaluatetheteacherbutaccordingtoDanielsonifnotseenitdoesn'texist.Thatwilllikelymaketheoneformalobservationa"dogandponyshow"whichisnotanaccuratedemonstrationofpracticeandisnobetterthanwhatwasinplacebefore.”



TheseconcernsreflectconcernregardinghowdistrictsimplementtheSystemandthe

FrameworkforTeaching:analyzinganindividual’sgrowthacrosstime(asintended),or

comparingteachers.

WhenaskedaboutthequalityoftheFrameworkforTeachingrubricspecifically,thegreat

majorityofbothadministratorsandteachersgenerallyfeltthatitmeasuresthemost

importantaspectsofeffectiveteaching,adequatelyarticulatesperformancelevelswithin

eachcomponent,andidentifiestheknowledgeandskillsthatreflectperformancelevels

withineachcomponent(Figure4).

Takentogether,theseresultssuggestthatpilotingeducatorsgenerallybelievethe

FrameworkforTeachingaccuratelydefinesinstructionalqualityandthatitisfairtouseas

partofasystemofevaluatingteachers.However,educatorsalsoexpressedconcernsabout

theconsistencyofitsimplementation.

Figure3.PerceivedfairnessofusingtheFrameworkforTeachingtoevaluateteacherpractice

50% 45%

45%44%

3% 9%

2% 3%

TheevaluationofteacherpracticeusingtheDanielsonFrameworkisafairmethodforpartiallydeterminingthe

effectivenessofteachers.(Adminstrators)

TheevaluationofteacherpracticeusingtheDanielsonFrameworkisafairmethodforpartiallydeterminingthe

effectivenessofteachers.(teachers)




Figure4.PerceptionsaboutthequalityoftheFrameworkforTeaching

Doeducatorsbelievetheywillhavethetimeandresourcesnecessarytocompleteteacherpracticeevaluations?

AnongoingconcernexpressedbyeducatorsacrossWisconsinhasbeenwhethertheywill

havethenecessarytimeandresourcestocompletetheEEprocess,bothgenerallyand

especiallyfortheevaluationofteacherpractice.Tocompletethisprocesswellandwith

fidelitytakesaconsiderabletimeandresourcecommitment.Thisisespeciallytruefor

evaluators,generallyprincipals,whoneedtoconductmultipleobservationsofteacher

practiceandholdseveralmeetingswithasmanyas30or40teachersbeingevaluatedeach

year.Thisresourceburdenisfurthercomplicatedbytherealitythatmanyprincipals

actuallyfillotherroleswithintheirdistrictsthatfurthertaxtheirtime.Theconcernisthat

althougheducatorsgenerallyfeelthatusingtheFrameworkforTeachingtoevaluate

teacherpracticeisfairandtheyunderstandhowtodoit,iftheydonothavethetimetodo

itwellthentheresultsmaybeunreliable.Thisfearpartiallyexplainswhymanyeducators

stillhavesomedoubtsaboutthefairnessoftheteacherpracticeevaluationprocess.

54% 51% 53% 46% 45% 51%

40% 43% 42%44% 45% 39%

5% 6% 4% 8% 9% 10%1% 1% 1% 1% 1%

TheDanielsonFramework

measuresthemostimportantaspects

ofbeinganeffectiveteacher.

TheDanielsonFrameworkadequately

articulatestheperformancelevels

foreachknowledgeandskillsarea.

TheDanielsonFrameworkaccurately

identifiesthekeyknowledgeandskillsthatreflect

teachereffectiveness.

TheDanielsonFramework

measuresthemostimportantaspects

ofbeinganeffectiveteacher.

TheDanielsonFrameworkaccurately

identifiesthekeyknowledgeandskillsthatreflect

teachereffectiveness.

TheDanielsonFrameworkadequately

articulatestheperformancelevels

foreachknowledgeandskillsarea.

Administrators Teachers




“IbelievetheDanielsonFrameworkisavalidandreasonablebasisforteacherevaluation,Idon'tbelievewearegiventhetimeortheresourcestocarryitthroughandapplytheevaluationintheclassroom.”

Thereisalsoaconcernthat,aftersettingasideenoughtimetocompleteEE,principalswill

beoverwhelmedandnothaveenoughtimeleftovertomeetalloftheirother

responsibilities.

“Theamountoftimeneededforevaluationwillmakeitdifficulttocompleteotheraspectsofmyjob.Itwillbecome75‐80%ofmypracticeservingastaffof60andastudentpopulationof465.”

“InthesmallruralschooldistrictthePrincipalhastoomany"hats."MyconcernisthatIwillspendsomuchtimeonEEthattherestofthejobwillbeshortchanged.Istartedmycareer34yearsagotoworkwithkidsandparents,thejobhasturnedintoaseaofpaperwork.”

Finally,anunexpectedpointofconcernregardingthetimecommitmentforimplementing

theteacherpracticepilotwasvoicedbymanyprincipalsattemptingtoutilizeTeachscape,

theonlinetrackingplatform.

“Therewereseveralglitchesthatcauseddatatobere‐enteredseveraltimeswhichtooktime.Iamconcernedthatoncewegoliveinthestatetheloadonthesystemwillincreasetheglitches.Thatwilldecreasethemotivationofpersonstousethesystem.Peoplewillneedabackupplantousewhenthesystemwon'tallowthemtoadddatatohelpincreaseefficiencyoftimeusage.”

“Theplatformdoesnotworkasawholesystem.ThesectionofTeachscapefortheDanielsonmodelsissufficient.Thefrustrationcomesfromthefacttheartifacts,walkthroughs,evaluation,andotherpiecesofdatadonotlink(i.e.all2ddatadoesnotcometogether).Itisverytimeconsumingtohavetodointothesystemandretypetheartifactsbynameandthenscore.Wedonothavethattypeoftime.Thisisnotanefficientoreffectiveuseofadministratortime.”

Thesurveyresultsreflecttheseconcerns.Only40%ofadministratorseitheragreedor

somewhatagreedthattheyhadenoughtimetoprovidegoodevaluationsofpracticeto

theirteachersandonlyhalffelttheyhadenoughresources(Figure5).Resourcescould

includeassistantprincipals,effectivenesscoaches,orsupportfromCESAssuchas

implementationcoaches.Evenwiththeseresourcesthough,thetimeandresourceburden



toschoolsisrealandrepresentsthelargestsinglebarrierforimplementingEE,both

generally,andspecificallyfortheteacherpracticeevaluationprocess.

Figure5.Educatorimpressionsofthetimeandresourcebarrierstocompletingteacherpracticeevaluations

HowdoeducatorsperceivetheinclusionoftheFrameworkforTeachingwillimpactthequalityofWisconsinteaching?

Giventhelargetimeandresourcelimitationsfacingschoolswiththeimplementationof

CommonCore,SmarterBalancedAssessments,andamyriadofotherpolicies,itiseasyto

losesightofthereasoningforwhythestateofWisconsinisstandardizingtheevaluationof

teacherpractice.Thehopeisthattheprocesswillprovideeducatorswithaclearerpicture

oftheirinstructionalstrengthsandweaknessesandthatthisinformationwilleventually

11%18%

25% 29%37%

29%

42% 32%

45%38%35%

32%34%

20%14%

26%

8% 9% 6%11%

Ihaveenoughtimetoprovidegoodevaluationsofpracticetomyteachers.

Ihaveenoughresourcestoprovidegood

evaluationsofpracticetomyteachers.

Ihadenoughtimetoreceiveagoodevaluation

ofmypractice.

Ihadenoughresourcestoreceiveagoodevaluation

ofmypractice.

Ihadenoughsupportfrommy

administrator/supervisortoreceiveagood

evaluationofmypractice.

Administrators Teachers

Agree SomewhatAgree SomewhatDisagree Disagree



empowerteacherstoimprovetheiroverallqualityofinstruction.Thesurveyresultsreflect

thisoptimism.Nearlyalladministratorsandteachersatleastsomewhatagreedthatthe

teacherpracticeevaluationprocesswillimproveteachers’abilitiestomeasuretheir

effectivenessandthatitwillimproveteacherpractice(Figure6).Furthermore,between

70%and80%ofteachersatleastsomewhatagreedthattheevaluationofteacherpractice

improvedtheirperformanceoneachofthefourFrameworkforTeachingdomains(Figure

7).

Thus,evenconsideringthelargetimeandresourceburden,theseresultsseemtosuggest

thatmostparticipantsbelievethatstandardizingtheteacherpracticeevaluationprocessis

aworthwhileendeavor.However,thereseemstobeanunderstandingamongmany

Wisconsineducatorsthatitisgoingtotaketimeandpracticetoimplementitwell.

“IthinkEEwillimprovemypracticeandteacherpractice.Ithinkitisgoingtotakesometimetocreateasystemthatisfair,responsive,andeffectivelyimprovesstudentlearning.”

Withthe2013-2014EEpilot,districtsbegantopracticethesesystems.Theresultsofthe

teacherpracticepilotarepresentedinthenextsectionsandestablishbaselineresultsas

theimplementationoftheteacherpracticeevaluationprocessexpandstoallWisconsin

districtsin2014-15.



Figure6.Perceptionoftheimpactofevaluatingteacherpracticeoninstruction

Figure7.Teacherperceptionoftheimpactoftheevaluationontheirpractice

30%39% 44%

53%

54% 50%

12%6% 5%5% 1% 1%

Theevaluationofteacherpracticeimprovestheabilityofteachersto

measuretheireffectiveness.

Theevaluationofteacherpracticeimprovestheabilityofteachersto

measuretheireffectiveness.

Theevaluationofteacherpracticewillimproveteacherpractice.

Teachers Administrators


26%18%

26% 22%

52%57%

55%53%

17% 18%14%

16%

5% 7% 6% 9%

Myevaluationofprofessionalpracticeimprovedmy

PlanningandPreparation.

MyevaluationofprofessionalpracticeimprovedmyClassroomEnvironment.


Instruction.


ProfessionalResponsibilities.




Whoparticipatedandtowhatdegreeinthepilotofteacherpracticeevaluations?

Originally,192principals,and402teachersacross195schooldistrictsvolunteeredand

weretrainedtoparticipateinthe2013-2014pilot.DPIhadaskedthesedistrictstorecruit

twohigherperformingteacherstoparticipateinthepilotsothatthepilotwouldrunmore

smoothlyandtherewouldbelessconcernamongparticipantsthattheresultscouldreflect

negativelyonparticipants.

Ultimately,385schoolsacross123schooldistrictsparticipated,withratingsdataonly

availablefor135ofthe402teacherswhooriginallyvolunteeredtopilottheprocess.

However,withinthesedistricts,farmore(449evaluatorsand2,595teachers)pilotedthe

teacherpracticeevaluationcomponentoftheEEsystemthanwereoriginallyplanned.

ManydistrictsdecidedtopilotaspectsthenewDPIteacherpracticeevaluationprocess

morewidelythanwasoriginallyplanned.Seventy-nineofthepilotschoolsreported

ratingsforatleasttenteachers,whichrepresents71%(1,839)ofalltheteachersengaged

inthepilot,and131schoolsreportedratingsforatleastfiveteachers,representing84%

(2,173)ofpilotteachers.Clearly,theteachersengagedinthepilotrepresentawider

distributionofteacherskilllevelsthanwasoriginallyintended.

Ofthe2,595teachersinvolvedinthepilot,Announced

firstobservationresultswererecordedfor2,191

teachersandUnannouncedfor1,466,butfinalratings

wereonlyrecordedfor507teachersacross82schools

(47elementary,16middle,21high,and8combined)and

43districts.Announcedobservationsweretypically

recordedinlateFebruary,UnannouncedinlateMarch,

andfinalratingsinearlyJune.12

12304and385teacherswereevaluatedinmorethanoneAnnouncedandUnannouncedobservationrespectively.

Locationsofdistrictswithfinalteacherratings



Itisnotclearwhyfinalratingdatawerenotreported.Itmaybethatparticipantsdidnotdo

allofthestepsnecessarytocompletetheprocess.Thereislikelysometruthtothissince,as

discussedearlier,mostprincipalsfeelthattheydonothaveenoughtimetoprovidegood

evaluationsofpracticetoteachers.13Itisalsopossiblethatmanyoftheschoolsdidnot

intendtocompletetheprocessandinsteadwantedsimplytofamiliarizethemselveswith

theprocess.Anotherpossibilityforthelowpilotcompletionrateisthatresultswere

simplynotenteredintoTeachscape,theon-linetrackingplatform.Thisisalsoalikely

partialexplanation,asmanyprincipalsexpressedfrustrationwiththeusabilityof

Teachscapeand48%ontheend-of-yearsurveyreportedbeingatleastsomewhat

dissatisfiedwiththeplatform.Anotherpossibilityisthatparticipantswereawarethatthe

TeachscapesystemwasbeingupgradedtobetteralignwiththeDPIEEprocessanddidnot

wanttotakethetimeduringtheendofthepilotyearwhentheyknewtheon-lineplatform

waschanging.

Asanadditionaldata-cleaningstep,thenumberofcomponentsscoredwithineachtypeof

observationwasanalyzed(Appendix:Table8).ThesesuggestthatduringAnnouncedand

Unannouncedobservations,teachersweretypicallyratedonbetweensevenandten

components,whilethegreatmajorityoffinalratingsincludedall22components.Theten

componentsintheClassroomEnvironmentandInstructiondomainswastypicallyassessed

duringobservations,whileeffectivenessintheotherdomaincomponentswasassessed

primarilythroughartifactanalysis,discussionswiththeteacher,andsomeobservation.

Thereweretenteacherswhosefinalratingsincludedfiveorfewerratedcomponents.The

ratingsfortheseteacherswerenotincludedinsubsequentanalyses

13Note:DPIintendeddistrictstoimplementthepilotwithsmallernumbersofeducatorstospecificallyaddresscapacityconcernsandhaveaclearunderstandingoftheactualtimerequired.Byincreasingthesizeofthepilotlocally,districtsmayhaveimpactedtheircapacity.



HowwelldidratingsassignedtoteachersreflectexpectedrelationshipsbetweenthedomainsandcomponentsthatcomprisetheFrameworkforTeaching?

Internalconsistency

Focusingonfinalratings,theinternalconsistencyofthecomponentswithineachdomain

suggestseachdomainmeasuredseparate-but-relatedconstructsthattogetherdefinedan

overallconstructofteacherpractice(Table9:Appendix).Perfectconsistencywouldbe

representedbyaCronbach’sAlphavalueof1.0whilenoconsistencywouldbe0.Alpha

levelsshouldbeatleast0.6tosuggestthattheindividualcomponentsarerelatedwithina

domain.Whilelevelsof1.0wouldsuggestthatevaluatorsdidnotdifferentiateteacher

performanceontheseparateitems.Theseresultssuggestthatratingsonthecomponents

thatcompriseallfourinstructionaldomainsareconsistent;teachersscoringhigheronone

arelikelytoscorehigherontheothers.Therewasoneitemwhich,ifremoved,wouldresult

inmoreconsistentresults.Removing2e(OrganizingPhysicalSpace)fromtheClassroom

EnvironmentDomainwouldresultinasmallincreaseintheinternalconsistencyofthat

domain.Theremovaloftwootheritems,1d:DemonstratingKnowledgeofResourcesand4b:

MaintainingAccurateRecordsresultedinminimalchangestointernalconsistency.

Correlations

Componentrelationshipswerefurtherexploredthroughcorrelationalanalysis(Table10:

Appendix).Aswasfoundwithinternalconsistencies,components1d,2e,and4bwerethe

leastcorrelatedwiththeothercomponentsintheirdomains.Correlationsbetweenthefour

domainsfactorswerealsoanalyzed(Table1).Theseresultssuggestthatallfourdomains

werecloselyrelated,withcorrelationsabove.5.



Table1.Teacherfinalratings:Domaincorrelations

1 2 3 4

Domain1:PlanningandPrep 1

Domain2:ClassroomEnvironment 0.560 1

Domain3:Instruction 0.669 0.656 1

Domain4:ProfResponsibility 0.704 0.503 0.588 1

Confirmatoryfactoranalysis(CFA)

CFAwasusedtodeterminehowconsistenttheoverallstructureoftheFrameworkfor

Teachingtheoreticalmodeliswiththeactualpracticeratingsresults.SPSS,AMOS22was

usedtobuildtheratingsstructuralmodel(Figure13:Appendix).Thecovariancesbetween

domainswereconstrainedaccordingtothebi-variatecorrelationsbetweendomains.

Measuresofmodelfitareusedtodeterminetherelativeconsistencyofthemeasured

practiceratingswiththeFrameworkforTeaching.TheRMSEAof.057isnotidealbutstill

withintheacceptablelevels.However,thecomparativefitindex(CFI)of.89isbelowevena

liberalthresholdformodelfit.Ananalysisofthecomponentloadingsprovidessome

indicationsofwhypracticeratingsarenotentirelyconsistentwiththeFrameworkfor

Teaching(Table 11:Appendix).Consistentwiththeinternalconsistencyandcorrelation

analyses,components1d,2e,and4bdonotloadontothelatentconstructsaswellasthe

othercomponentsineachdomain.Interestingly,removingthesethreecomponentsfrom

themodelincreasedCFIto.91,whichiswithinanacceptablerangeformodelfit.

Takentogether,theseresultssuggestthattheFrameworkforTeaching’stheoretical

groundingmaynotbeentirelyconsistentwiththeempiricalresults.Specifically,ratingson

components1d,2e,and4bdonotrelatewelltotheothercomponentsintheirdomains.

Althoughtheseresultsmightsuggestthatthesethreeitemsshouldberemovedfrom

practiceratings,itisimportanttonotethattheseresultsarebasedonarelativelysmall



numberofteachers.Strongerconclusionsaboutthestructureoffinalratingsdatawillbe

possibleattheendofthe2014-2015schoolyear.

Howwereteachersratedoverall?

TheresultsofAnnouncedandUnannouncedobservationsandfinalratingssuggestthat

teachersweretypicallyratedasProficientoverall(Table2).Itisworthnotingthatfinal

ratingsweresomewhathigheronaveragethanobservationratings.However,these

differencesweresomewhatmitigatedwhencomparingratingsonlyforteacherswithfinal

ratingsandobservationalratings.

Table2.Descriptivestatisticsofannounced,unannouncedandfinalratings

N Min Max MeanStd.

DeviationAllrecordedratings

Finalratings 497 2.05 3.91 3.13 0.251stAnnouncedObservations 2186 1 4 3.03 0.371stUnannouncedObservations 1460 1 4 2.99 0.38

RatingsforteacherswithallthreeratingsFinalRatings 308 2.05 3.91 3.11 0.251stAnnouncedObservations 308 2 3.9 3.04 0.261stUnannouncedObservations 308 1.89 3.9 3.07 0.27

Thedistributionoffinalratingssuggeststhatthegreatmajorityofteacherswereratedas

atleastProficient.14ItisnotclearexactlywhatratingwillbeconsideredasBasicor

ProficientintheWisconsinsystem,butifthestandardforProficientisthattheteachermust

averagegreaterthan2.5onallcomponentsscores,then98.4%ofteacherswouldhavemet

thisstandard.Ifthestandardisincreasedto2.75(allowingforonlyfiveof22standardsto

beratedasBasic),then94.3%wouldhavebeenratedasProficient(Figure14).

14Evaluatorsweretoldthatifteachersreceivedlowratings(e.g.,level1Unsatisfactory)theyshouldbetakenoutofthepilotandplacedonthedistrict’stypicalinterventionplan.



Adeeperlookintothedistributionofratingsshowsmoreclearlytheskewednatureof

resultsandsuggestsapossiblepartialexplanation.Figure8presentsthecompounding

percentageofteachersacrossaverageoverallratings.Only84teachers(16.6%)received

finalratingslessthananaveragescoreof3(Proficient),while95(18.7%)wereratedas

exactly3.Althoughnotaspositivelyskewedasfinalratings,theresultsofobservations

followasimilarpattern.

Thispatternissomewhatpuzzlingandsuggeststhattheremaybeaculturalbarrier

resultinginevaluatorsadjustingtheircomponentratingssothatanindividualteacher’s

averageratingisnotbelowProficient(3).

Figure8.Cumulativepercentageofaverageteacherratings

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 1.5 2 2.5 3 3.5 4

CumulativePercentageofTeachers

Announced Unannouced FinalRatings



Howwereteachersratedoncomponents?

Frequenciesofcomponentratingsprovidemoredetailabouthowteacherswereratedon

eachcomponent.FewteachersreceivedfinalratingsofBasic(Figure9).Further,74%

receivednofinalratingsofBasiconanycomponents.Althoughasmallpercentage,more

teachersreceivedBasicratingsonindividualcomponentsbasedonobservations(Figure

10).Still,only34%ofteachersreceivedanyBasiccomponentratingsonAnnouncedor

Unannouncedobservations.

Withineachdomain,specificcomponentswereidentifiedasstrengthsandweaknesses

(Table12).WithinthePlanningandPreparationdomain,finalratingsofComponent1f

(DesigningStudentAssessments)werestatisticallylowerthanalltheothercomponentsin

thatdomain(p<.05),whileteacherswereratedhigheron1a(DemonstratingKnowledgeof

ContentandPedagogy)thanallotherDomain1Componentsexcept1b(Demonstrating

KnowledgeofStudents)and1e(DesigningCoherentInstruction)(p<.05).Withinthe

ClassroomEnvironmentDomain,Component2a(CreatinganEnvironmentofRespectand

Rapport)wasidentifiedasanareaofstrengthandwasthehighestratedcomponentacross

alldomainsregardlessofratingtype.WithintheInstructionDomain,Component3b(Using

QuestioningandDiscussionTechniques)wasthelowestratedskillacrossallfourdomains

andwasstatisticallylowerthantheotherInstructionDomaincomponents(p<.05).

Component3a(CommunicatingwithStudents)wasconsistentlyratedasarelativestrength.

Finally,withintheProfessionalResponsibilitiesDomain,bothComponents4b(Maintaining

AccurateRecords)and4c(CommunicatingwithFamilies)wereidentifiedasgrowthareas,

whileComponent4f(ShowingProfessionalism)wasratedasastrength.



Figure9.Finalratings:FewteachersreceivedafinalratingofBasiconanycomponentswhilenonewereratedasUnsatisfactory

2% 6% 6% 3% 4% 6% 2% 3% 3% 4% 1% 3% 11% 6% 4% 3% 5% 5% 8% 6% 6% 3%

77%74%

81%86%

77%

88%

57%

77%81%

70%

89%

75%

82% 81%87%

82%

71%

89%86%

67%

77%

62%

21%21% 13%

11% 18% 6%41% 20% 16% 27% 10% 22%

7%13% 9% 15% 24% 6% 7% 27% 17%

35%

1.a 1.b 1.c 1.d 1.e 1.f 2.a 2.b 2.c 2.d 2.e 3.a 3.b 3.c 3.d 3.e 4.a 4.b 4.c 4.d 4.e 4.f

Rating

Basic Proficient Distinguished



Figure10.Observationratings:MoreteacherswereratedasBasicthanwereinfinalratings

Whatevidenceistherethatteachersarebeingrateddifferentlyindifferentcontexts?

Asmentioned,allWisconsinevaluatorsvolunteeringtoparticipateinthepilotreceivedthe

samerigoroustrainingdesignedtopromotevalidandreliableassessments,andtherefore,

improvethecomparabilityofratingsacrosscontexts.However,becauseofthelarge

numberofadditionalteachersandevaluatorsinthepilot,itisnotclearwhoreceivedwhat

leveloftraining.Therefore,thereisanincreasedriskthattheratingspresentedheremay

notbebasedonconsistentstandardsofevidence.Ideally,inter-raterreliabilitycould

specificallydefinetheconsistencyofobservationsacrosscontexts.Thiscouldbedoneby

usingindependentevaluatorstoconductobservationsintandemwithlocalevaluators.

However,theWisconsinEEsystemdoesnotrequiredistrictsusethisresource.

1% 1% 1% 1%0%

0%1% 1% 1%

1%6% 9% 10% 9% 4% 9% 21% 13% 15% 7%

63%

76% 73%

65%

85%77%

72%77% 78%

82%

30% 14% 16% 24%11%

14%

6%9% 7%

10%

2a:CreatinganEnvironmentofRespectandRapport

2b:EstablishingaCultureforLearning

2c:ManagingClassroomProcedures

2d:ManagingStudentBehavior

2e:OrganizingPhysicalSpace

3a:CommunicatingwithStudents

3b:UsingQuestioning

andDiscussionTechniques

3c:EngagingStudentsinLearning

3d:UsingAssessmentinInstruction

3e:DemonstratingFlexibilityandResponsiveness

Unsatisfactory Basic Proficient Distinguished



Nonetheless,therearestillsomeanalysespossiblethatcanexplorethepossibilitythat

teacherswererateddifferentlyindifferentsettings.Multi-levelmodeling15wasusedto

exploreschoolanddistrictseffectsonAnnouncedandUnannouncedratings.16Tobe

includedinthesemodels,schoolshadtohaveatleastfiveteacherswithratingsdata.This

selectioncriterionresultedin129schools.These129schoolsprovidedratingsfor84%

(2,173)ofpilotteachers.However,asaresultofthiscriterion,schooleffectsonfinal

ratingscouldnotbemodeledsincenodistrictsincludedmorethanoneschoolmeetingthis

threshold.

Oneschool-levelfactortestedinthemodelswasFreeorReduced(F/R)lunchparticipation.

Thiswasdoneinresponsetoconcernsexpressedbypilotparticipantsandagrowing

nationaldebateaboutwhetherteacherratingsacrossdifferentschoolswithdifferenttypes

ofstudentsarecomparable.17Withtheideathatteachersmayhaveamoredifficulttime

demonstratingtheirskillsincrowdedclassrooms,schools’studentsperteacherratioswere

alsoincludedasafactorinthemodels.18

Anotherfactortestedwasthepercentageofteachersrated.Thiswasdonetodetermineif

theschoolsinvolvedintheanalysiswithlowerinclusionratesselectedthehighest

performingteachersforthepilot.Theinclusionofthisfactorhelpsdeterminehow

representativetheresultspresentedhereareofthedistrictsandschoolsinvolvedinthe

pilot.

Thefinalfactorincludedinthemodelswasthedistrictplanneduseofteacherratings,

gatheredfromsurveys.Districtsusingratingsforhigh-stakesdecisionsputadditional

pressuresonratersthatmayinfluencehowstringentlytheyrateteachers.End-of-year

15Raudenbush,S.W.andBryk,A.S.(2002).HierarchicalLinearModels(SecondEdition).ThousandOaks:SagePublications.16Finalratingswerenotmodeledbecausetoofewschoolscompletedthem.17http://www.brookings.edu/~/media/research/files/reports/2014/05/13%20teacher%20evaluation/evaluating%20teachers%20with%20classroom%20observations.pdf18Certainlyhavingspecificclassroomdatawouldallowforamoredirecttestoftheseeffectsbutschooldatacanatleastprovideapproximateclassroomcharacteristics.



principalsurveyresultswereusedtoidentifydistrictsplanningtousetheresultsforhigh-

stakesdecisionslikepromotionorpaybonuses.

Themulti-levelmodelpresentedbelowsummarizeshowthesefactorswerestatistically

tested.

Descriptivestatisticsresults

Table3presentsthedescriptivestatisticsofschoolfactorsforthesampleusedinthe

models.Fifty-fiveschoolsreportedtheyweregoingtouseEEdatatomakehigh-stakes

decisionswhile53reportedtheywerenotordidnotknowiftheywere.Schoolsranged

fromasmallfree/reducedlunchparticipationrate(7.8%)tohigh(100%).Also,acrossthe

130schoolsabouthalf(49%)oftheteachersintheschoolsparticipatedinthepilot.

Thefollowingmulti-levelmodelequationwasusedtoidentifyfactorsthatpredictteacherpracticeratings:

Level1:Teacher-levelmodel:

Practiceratingsijk=π0jk+eijk

Level2:School-levelmodel:

π0jk=β00k+β01kF/Rlunchratejk+β02kTeacher/studentratiojk+β03kPercentofteachersratedjk+r0jk

βpjk=γp0kforp=1to3

Level3:District-levelmodel:

β00k=γ000+γ001Highstakesuseofresults+u00

Themulti-levelmodelcorrespondstothefollowingmixedmodel:

Practiceratingsijk=γ000+β01kF/Rlunchratejk+β02kTeacher/studentratiojk+β03kPercentofteachersratedjk+γ001High-stakesuseofresults+r0jk+u00+eijk



Table3.Descriptivestatisticsofschoolcharacteristicsandratings

N Min Max MeanStd.

DeviationAnnouncedobservationratings 129 1.9 3.4 3.00 0.25Unannouncedobservationratings 110 2 3.5 2.95 0.28Finalratings* 28 2.8 3.5 3.14 0.15UseofEEforhighstakesdecisions 108 0 1 0.51 0.50F/Rlunchrate 128 7.8 100 42.2 27.5Teachertostudentratio 126 8.1 41.7 16.5 4.6Ratioofteacherstoteachersrated 126 0.06 1 0.49 0.28

*FinalratingsnotmodelledduetolowN.

Bivariatecorrelationspresenttheunadjustedrelationshipsbetweenschoolfactors.These

showthatfree/reducedlunchrateswerestronglynegativelycorrelatedwithboth

AnnouncedandUnannouncedratings,i.e.teachersinlowerincomeschoolshadlower

ratings.Further,teachersinschoolsthatreportedtheywereusingratingstomakehigh

stakesdecisionshadsomewhathigherratings.Teachersinschoolsnotusingratingsfor

highstakesdecisionsreceivedaverageratingsof2.94whilethoseinschoolsusingthemfor

highstakesaveraged3.05(F=4.7,p=.032).However,itisimportanttonotethatschools

usingratingsforhighstakesdecisionsalsohadlowerF/Rlunchparticipationratesand

fewerstudentsperteacher.Thus,itisnotentirelyclearthatusingratingsforhigh-stakes

decisions,onitsown,wouldresultinteachersreceivinghigherscore.Thestatistical

modelingpresentinginthenextsectiontestswhichofthesefactorsuniquelypredict

teacherratings.



Table4.Correlationsofschoolcharacteristicsandteacherratings

Announcedratings

Unannouncedratings

Finalratings

UseofEEforhighstakes

decisions

F/Rlunchrate

Studentto

teacherratio

Ratioofteachersratedtoteachers

Numberof

teachersrated

Announcedratings(n=129) 1

Unannouncedratings(n=110) 0.544** 1

Finalratings(n=28) 0.342 0.347 1

UseofEEforhighstakesdecisions(n=108)

0.227* 0.260* 0.058 1

F/Rlunchrate(n=128) -0.578** -0.321** -0.07 -0.232* 1

Studenttoteacherratio(n=126) -0.242** -0.220* -0.195 -0.300** 0.261** 1

Ratioofteacherstoteachersrated(n=126)

0.201* 0.135 0.111 -0.064 -0.256** -0.021 1

Numberofteachersrated(n=384) 0.13 0.09 0.08 0.013 -0.335** 0.008 0.583** 1

**Correlationissignificantatthe0.01level(2-tailed).*Correlationissignificantatthe0.05level(2-tailed).



Modelresults

Thestatisticalmodelingusedisolatestheuniquerelationshipofeachschoolfactorwith

ratings.Nullresultsfromthesemodelssuggestthatroughly12%ofAnnouncedand8%of

Unannouncedratingsareattributabletodistrict-levelfactors,while16%ofAnnouncedand

15%ofUnannouncedratingsareattributabletoschool-levelfactors(Table5).

Table5.Covarianceparametersofunconditionalmulti-levelmodelspredictingpracticeratings

Announced Unannounced Estimate Std.

ErrorEstimate Std.

ErrorResidual 0.096091 0.003354 0.113213 0.00489District 0.016488 0.00647 0.011593 0.005365School 0.022071 0.004885 0.021789 0.006393

Theresultsofbothconditionalmodelswereconsistent;schoolteachertostudentratio,

percentofteachersratedinaschool,andtheintendeduseofEEresultsdidnotuniquely

explainteacherpracticeratingsandwerethereforeremovedfromthemodels.However,

schoolfree/reducedlunchparticipationwasastrongpredictorofpracticeratings(Table

6).Theinclusionofthisexplained9%and7%oftheschooleffectonAnnouncedand

Unannouncedratingsrespectivelyandthemajorityofthedistrict-levelvarianceinboth

models(AnnouncedpseudoR-squared=.60;UnannouncedpseudoR-squared=.75).These

resultssuggestthatforevery10percentagepointsgreaterschoolF/Rlunchrate,the

predictedUnannouncedratingisreducedby.04andAnnouncedby.05.Giventhata

standarddeviationforbothtypesofratingsisbetween.35and.40,thedifferenceinthe

predictedratingsbetweenateacherinanaffluentschoolwith10%F/Rlunchandalow-

incomeschoolwith90%F/Rlunchparticipationisclosetoonefullstandarddeviation.



Table6.Resultsofmodelspredictingratings

EstimateRobustStd.

Error df t Sig.SchoolF/RLunch(Unannounced) -0.00394 0.000905 11.395 -4.36 0.001SchoolF/RLunch(Announced) -0.0048 0.000905 33.897 -5.298 <.0001

Tofurtherexplorethiseffect,teacherratingswereplottedaccordingtotheF/Rlunch

participationrateintheschool(Figure11),thusdemonstratingaclearconnectionbetween

thefree/reducedlunchparticipationofstudentsintheschoolandthepracticeratings

assignedtoteachers.

Figure11.PlotofteacherratingsandschoolF/Rlunchparticipation

TherelationshipofF/Rlunchparticipationrateswithteacherpracticeratingsmaybethe

resultofthreefactors.Onepossibilityisthatmoreeffectiveteachersareselectinginto

districtswithfewerlow-incomestudents.Thus,theseresultsmayindicatethatmore

affluentschoolssimplyhavebetterteachers.Asecondpossibilityisthatevaluatorsare

ratingteachersmoreharshlyinlowerSESschools.Thismayberelatedtothepreviously

mentionedconcernthatitmaybedifficultforteacherstodemonstratetheirskillsinlarger

classroomswithmorelower-incomestudentswhoarelikelylowerachievingandmore

pronetobehaviorproblemsaswell.Athirdpossibilityisthatteacherssimplydonot

performaswellinlower-incomeschoolsaccordingtotherubric.



Totestthesethreepossibilities,therelationshipsofF/Rlunchwithindividualcomponent

ratingsweremodeled.IftherelationshipbetweenF/Rlunchandratingsisbeingdrivenby

classroomcharacteristics,onewouldexpectthatcomponentsthatrelyonstudentbehavior

wouldbemoresensitivetochangesinF/Rlunchrates.Iftherelationshipisbeingdriven

morebyteacherselection,thenonewouldexpectthatallcomponentratingswouldbe

similarlyaffected.Ratingsforeachcomponentwerepredictedusingthesamemodelaswas

usedtotesttotalpracticeratings(page28).TheresultsofthesemodelsshowthatF/R

lunchratespredictedallDomain2(ClassroomEnvironment)andDomain3(Instruction)

components.However,schoolF/Rlunchparticipationonlypredictedfiveof11

componentsinDomains1(PlanningandPreparation)and4(ProfessionalResponsibilities)

(Table13:Appendix).

TofurtherexplorethemagnitudeoftherelationshipsofF/RlunchrateswithDomains2

and3practiceratings,additionalmodels(generalizedlinearmixedmodelswith

multinomialdistributions)wereusedtodeterminethelikelihoodofreceivingdifferent

ratingsaccordingtotheF/Rlunchrateintheschool.Theresultsproducelogoddsratios

forbeingratedasUnsatisfactory,Basic,orProficientascomparedtoDistinguished.Logodds

ratiosweretranslatedintoprobabilities(Table14:Appendix).Forinstance,thereisa3%

probabilitythatateacherinahypotheticalschoolwithnostudentseligibleforF/Rlunch

willberatedasBasiconcomponent2a,ascomparedtobeingratedasDistinguished.

However,ateacherinaschoolwith100%F/Rlunchhasa57%probability.Figure12

belowpresentstheresultsforonecomponent(2c:ManagingStudentBehavior)thatisby

itsdefinitionrelatedtostudentbehavior.TheseresultsshowthatastheF/Rlunchrateina

schoolincreases,theprobabilityofbeingratedasDistinguisheddecreases.Inaschoolwith

50%F/Rluncheligiblestudents,teachersarethreetimesaslikely(74%versus26%)of

beingratedasProficientcomparedtoDistinguishedandalmosttwiceaslikelyofbeing

ratedasDistinguishedthanBasic(65%versus36%).However,increasetheF/Rlunchrate

to100%andteachersareoverfivetimesmorelikelytoberatedasProficientthan

Distinguished(84%versus16%)andarealmosttwiceaslikelytoberatedasBasicthan



Distinguished(35%versus65%).Theseresultswouldseemtosuggestthateithertheway

evaluatorsareapplyingtheFrameworkforTeachingrubricortherubricitselfmayresultin

teachersbeingratedlowerinlowerSESschools.

Figure12.2d(ManagingStudentBehavior)-ProbabilitiesofbeingratedUnsatisfactory,Basic,andProficientcomparedDistinguishedinschoolswithdifferentF/Rlunchrates.

Althoughtheseresultswouldseemtosuggestthatpracticeratingsaremoreaffectedby

F/RlunchinDomains2and3,itisimportanttonotethatthemagnitudeoftheeffectson

manyofthecomponentsinDomain1and4areconsistentwiththoseofmanyofthe

componentsinDomains2and3(Table11:Appendix).Therelativelackofstatistical

significanceforcomponentsinDomains1and4isrelatedtothesmallersampleofteachers

withratingsonthesecomponents(Table12:Appendix).Further,whenthecomponent

coefficientsareaveraged,thecoefficientsinDomains1and4arenotmuchsmallerthan

theyarein2and3(Table7).Theseresultswouldseemtosuggestthatmostoftheratings

differenceisduetomoreskilledteachersselectingintoschoolswithlowerF/Rlunchrates.

However,itisimportanttonotethat2a(CreatinganEnvironmentofRespectandRapport)

and2d(ManagingStudentBehavior)havetwoofthethreelargestmagnituderelationships

0%

36%

7%

65%64%

84%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

0%F/Rlunch 100%F/Rlunch

Unsatisfactory Basic Proficient



amongallcomponentsand,intuitively,theywouldseemtobemorerelatedtothe

characteristicsofstudentswithinaclassroom.

Table7.Averagemulti-levelmodelcomponentcoefficients

Summaryanddiscussion

Theresultsoftheteacherpracticeevaluationpilotsuggestthatmanyoftheconditions

necessaryforitsfullimplementationtobesuccessfulareinplace;Educatorsreported

understandingboththeevaluationprocessandtheFrameworkforTeaching,thatitwas

fair,andthattheprocesswouldlikelyempowerteacherstobetterunderstandtheir

instructionalskillsandthatitwouldleadtoimprovedteachinginWisconsin.However,the

largestconcernexpressedbyeducatorswasthattheprocessistimeconsumingandthat

implementingitmayleavelittletimeforprincipalstofulfillalltheirotherduties.Thehope

isthatasprincipalsbecomemorefamiliarwiththeprocessitwillbecomemoreroutinized

andwilltakelesstimetocomplete.Also,fixingtheTeachscapeplatformwillfurtherreduce

thetimedemands.

Theresultsoftheteacherpracticeevaluationpilotindicatedthat,althoughratingsdata

wererecordedforonlyaminorityofthe400educatorswhohadoriginallyvolunteeredto

pilotEE,ratingsdatawereavailableforover2,500teachersacrossthestate.However,final

practiceratingswererecordedforrelativelyfewofthese(503).Thereasonforthelackof

completionmaybeduetoanumberoffactorsincludingdifficultieswiththeTeachscape

platformandprincipalsnothavingenoughtimetocompletetheprocess.

TheresultsofAnnouncedobservations,Unannouncedobservationsandfinalratingsall

suggestthatteacherratingswereskewedpositively.Anationaldebatecontinuesabout

IncludingallcomponentsAveragedCoefficients

Domain1:PlanningandPreparation -0.0037Domain2:ClassroomEnvironment -0.0043Domain3:Instruction -0.0038Domain4:ProfessionalResponsibilities -0.0033



why,evenwiththenew,morerigorous,evaluationmethods,newteacherevaluation

systemsstillratethevastmajorityofteachersaseffective,withanywherefrom94%19to

98%20ofteachersreceivingoverallratingsofProficient/Effective.Thisfindinghasbeen

interpretedasevidencethatthegreatmajorityofteachersare,infact,Effectiveorthatthe

cultureofschoolsforcesevaluatorstocontinuetoratemostteachersas“good”,depending

onone’spoliticalleanings.21,22However,thefindinginthecurrentevaluationthata

disproportionatenumberofteachersreceivedanoverallratingsofexactly3.0wouldseem

tosuggestthatsomeamountoftheskewnessisduetoexogenousfactorsunrelatedto

teacherpracticequality.Further,althoughthereweresignificantdifferencesbetween

teacherpracticecomponentratings,suggestingthatevaluatorsdidatleastsomewhat

differentiateratings,individualcomponentfrequenciesindicatethatitwasrarefora

teachertoberatedbelowProficientonanycomponents.

Theresultsalsosuggestthatthereweresignificantdifferencesinpracticeratingsfor

teachersindifferentdistrictsanddifferentschoolswithindistricts.Thismaysuggestthat

evaluatorsarenotbeingconsistentintheirratingsofteachersoritmayreflectreal

differencesintheproficiencyofteachers.Teachertostudentratio,districtplanneduseof

EEresults,schoolF/Rlunchparticipationrateandthepercentoftheteachersevaluatedin

aschoolwerealltestedaspossibleexplanationsforthesedifferences.Although,ifdistricts

plannedtouseratingsforhigh-stakespurposes,thepercentageoftheschoolparticipating

inthepilot,andteachertostudentratiodidpredictpracticeratings,onlyschoolF/Rlunch

participationratesuniquelypredictedratings,explainingnearlyallofthedistrict-level

ratingvariance.

InregardtotherelationshipwithF/Rlunchparticipation,thereisagrowingbodyof

researchthatteacherpracticeratingsaresomewhatdependentontypesofstudentsin

19http://www.gadoe.org/School-Improvement/Teacher-and-Leader-Effectiveness/Documents/Pilot%20Report_Overview%20and%20Report%20Combined%201-10-13.pdf20http://www.michigan.gov/documents/mde/Educator_Effectiveness_Ratings_Policy_Brief_403184_7.pdf21http://www.ajc.com/news/news/new-evaluation-pilot-skewed-with-too-few-unsatisfa/nTpKN/22http://www.nyssba.org/news/2013/12/12/on-board-online-december-16-2013/why-are-most-teachers-rated-effective-when-most-students-test-below-standards/



theirclassrooms.23,24F/Rlunchratesmayberelatedtoteacherratingsthroughits

relationshiptostudentachievementandbehavior.Lowerincomestudents,onaverage,are

lowerachievingthanhigherincomestudentsandmayexhibitmorebehaviorproblems.It

maybemorechallengingforteachersintheseclassroomstoelicitthehigh-levelstudent

performancenecessarytoberatedasDistinguished.Thus,therelationshipofpractice

ratingswithF/RlunchratesintheWisconsinEEpilotmaybedrivenbyitsrelationship

withotherstudentscharacteristics.

Ontheotherhand,therelationshipwithF/Rlunchandpracticeratingsmaybedrivenby

teachercharacteristics.ItisimportanttonotethatmostcomponentsintheDomain1:

PlanningandPreparationandDomain4:ProfessionalResponsibilitiesdomainswererelated

toschoolSESatasimilarmagnitudecomparedtothecomponentsintheDomain2:

ClassroomEnvironmentandDomain3:Instructiondomains.Becausethereislessevidence

thatstudentbehaviorcaninfluenceratingsinDomains1and4,thisseemstosuggestthat

someamountoftherelationshipwithF/Rlunchratesmaybeduetoteacherselection.In

WisconsinandacrosstheU.S.thereareclearindicationsthatdistrictswithmorelower-

incomefamilieshaveamuchhardertimerecruitingeffectiveteachers.

Finally,itispossiblethatteachertostudentratioandplanneduseofEEdatapartially

explaintherelationshipofF/Rlunchrateswithpracticeratings.Lessaffluentschoolshave

largerclassroomsandmaybelesslikelytouseratingsforhigh-stakesdecisions.So

althoughneitherfactorwasfoundtouniquelypredictpracticeratings,futureevaluation

workwillfocusonthesefactorsaspotentialmediators.

Takentogether,theratingsdifferencesbetweenschoolswithdifferentnumbersofF/R

lunchstudentsarelikelyduetoacombinationofstudent,teacher,andschoolfactors.

Futureevaluationworkwillcontinuetoexplorethisissue.Untilmorecanbelearnedabout

23Polikoff,M.(March,2013).TheStabilityofObservationalandStudentSurveyMeasuresofTeachingEffectiveness.PaperpresentedattheannualmeetingofAmericanEducationFinanceandPolicy.24Whitehurst,G..,Chingos,M.,&Lindquist,K.(March,2014).EvaluatingTeacherswithClassroomObservations:LessonsLearnedinFourDistricts.BrownCenteronEducationPolicy.



thisrelationship,andfaircomparisonsofratingscanbeestablished,itisnotrecommended

thatschoolsanddistrictsusepracticeratingtocompareteachers.

Theresultsoftheevaluationsuggestthatthepilotofteacherpracticeevaluationsprovided

educatorswithusefulexperienceandpracticethatwillhelpthemimplementtheteacher

practiceprocessfullyin2014-2015.However,severalissuesidentifiedinthisreport

shouldcontinuetobeexploredasmoredatabecomeavailablethroughthefull

implementationoftheprocess.Itisnotknowntowhatdegreetheskewnessofratingsand

theirrelationshipwithexogenousfactorslikeF/Rlunchparticipationwillholdupwhen

moreevaluatorsarecertifiedandtheanalysesarebasedonfinalratingsratherthanratings

basedmoreonsingleobservations.Further,onelimitationofthisreportwasthatitrelied

onschool-levelinformationtopredictratings.Insubsequentreports,obtainingdata

specifictoteachers,suchastheirexperience,andtheirclassroomswillprovideaclearer

pictureofwhatpredictsteacherratingsandhowtoensuretheprocessisasfairaspossible

forallWisconsinteachers.



Appendix

Table8.Frequenciesofcomponentsratedbytypeofobservation

Componentsscored

Announced Unannounced

FinalRatings

Componentsscored

Announced Unannounced FinalRatings

1 15 7 8 12 78 49 02 17 2 0 13 51 38 03 15 8 1 14 65 40 04 39 12 0 15 45 37 05 63 37 1 16 49 41 16 66 61 0 17 43 29 07 100 104 0 18 32 21 18 322 347 0 19 50 22 49 318 278 0 20 54 21 810 698 445 0 21 88 32 2411 72 51 0 22 215 69 459 Total 2495 1751 507



Table9.Internalconsistencyoffinalteacherratings

Domain Cronbach'sAlpha

Component Cronbach'sAlphaifComponentDeleted

PlanningandPreparation 0.75 1.a 0.72 1.b 0.72 1.c 0.69 1.d 0.73 1.e 0.70 1.f 0.71ClassroomEnvironment 0.67 2.a 0.60 2.b 0.63 2.c 0.60 2.d 0.59 2.e 0.68Instruction 0.66 3.a 0.60 3.b 0.59 3.c 0.58 3.d 0.60 3.e 0.64ProfessionalResponsibility 0.75 4.a 0.72 4.b 0.75 4.c 0.74 4.d 0.70 4.e 0.69 4.f 0.69Totalof22components 0.90



Table10.ComponentcorrelationswithinFrameworkforTeachingdomains

PlanningandPreparation 1.a 1.b 1.c 1.d 1.e 1.f 1.a 1

1.b 0.282 1

1.c 0.332 0.383 1

1.d 0.342 0.207 0.308 1

1.e 0.322 0.367 0.434 0.322 1

1.f 0.38 0.365 0.411 0.214 0.357 1

ClassroomEnvironment 2.a 2.b 2.c 2.d 2.e 2.a 1

2.b 0.352 1

2.c 0.313 0.316 1

2.d 0.443 0.262 0.365 1

2.e 0.106 0.196 0.297 0.247 1

Instruction 3.a 3.b 3.c 3.d 3.e 3.a 1

3.b 0.305 1

3.c 0.313 0.344 1

3.d 0.285 0.31 0.321 1

3.e 0.228 0.215 0.246 0.227 1

ProfessionalResponsibility 4.a 4.b 4.c 4.d 4.e 4.f 4.a 1 4.b 0.225 1 4.c 0.253 0.294 1 4.d 0.327 0.222 0.278 1 4.e 0.323 0.309 0.321 0.528 1 4.f 0.484 0.252 0.277 0.468 0.452 1



Figure13.ConfirmatoryfactoranalysisFrameworkforTeachingstructuralmodelwithstandardizedcoefficients



Table11.CFAFinalratingcomponentresults

SquaredMultiple

Correlation

StandardizedRegressionWeights

RegressionWeights S.E. C.R. P

1a 0.387 0.622 0.315 0.023 13.696 ***1b 0.337 0.58 0.329 0.026 12.601 ***1c 0.415 0.644 0.306 0.021 14.292 ***1d 0.194 0.44 0.185 0.02 9.219 ***1e 0.381 0.617 0.316 0.023 13.566 ***1f 0.379 0.615 0.243 0.018 13.508 ***2a 0.321 0.567 0.366 0.032 11.469 ***2b 0.422 0.65 0.36 0.027 13.391 ***2c 0.321 0.566 0.292 0.025 11.461 ***2d 0.295 0.543 0.337 0.031 10.954 ***2e 0.13 0.361 0.148 0.021 7.113 ***3a 0.268 0.518 0.302 0.026 11.578 ***3b 0.261 0.511 0.261 0.023 11.391 ***3c 0.319 0.564 0.299 0.023 12.868 ***3d 0.315 0.562 0.256 0.02 12.788 ***3e 0.258 0.508 0.269 0.024 11.313 ***4a 0.376 0.614 0.343 0.027 12.614 ***4b 0.175 0.418 0.164 0.02 8.383 ***4c 0.235 0.485 0.209 0.021 9.805 ***4d 0.341 0.584 0.359 0.03 11.949 ***4e 0.43 0.656 0.344 0.025 13.587 ***4f 0.466 0.683 0.412 0.029 14.214 ***



Figure14.Distributionofaverageoverallratings



Table12.Descriptivestatisticsofcomponentratings AllAnnounced AllUnannounced FinalRatings

MeanStd.

Deviation N MeanStd.

Deviation N MeanStd.

Deviation N

1.a 3.09 0.49 756 3.02 0.44 372 3.19 0.44 495

1.b 3.11 0.52 732 2.98 0.49 364 3.15 0.49 491

1.c 2.98 0.55 739 2.88 0.59 344 3.08 0.43 493

1.d 3.01 0.47 629 2.94 0.53 291 3.08 0.36 493

1.e 3.04 0.56 752 2.94 0.54 360 3.14 0.46 495

1.f 2.89 0.45 474 2.85 0.53 192 3 0.36 492

2.a 3.25 0.58 2311 3.19 0.57 1702 3.39 0.52 497

2.b 3.06 0.49 2271 2.99 0.52 1653 3.17 0.45 496

2.c 3.05 0.54 2328 3.02 0.55 1686 3.13 0.42 497

2.d 3.15 0.60 2305 3.1 0.62 1702 3.23 0.50 497

2.e 3.09 0.41 1843 3.02 0.37 1247 3.08 0.32 494

3.a 3.08 0.48 2383 3.01 0.49 1700 3.19 0.46 497

3.b 2.86 0.55 2195 2.8 0.53 1508 2.96 0.42 494

3.c 2.97 0.50 2376 2.9 0.51 1693 3.07 0.43 497

3.d 2.92 0.49 2051 2.89 0.49 1408 3.05 0.36 495

3.e 3.02 0.45 1465 3 0.43 905 3.12 0.42 492

4.a 3.17 0.55 511 3.09 0.60 185 3.19 0.50 491

4.b 3 0.45 446 3.02 0.52 178 3 0.34 495

4.c 3.02 0.46 477 3.01 0.52 165 2.99 0.38 493

4.d 3.24 0.58 585 3.17 0.60 258 3.22 0.53 496

4.e 3.17 0.54 520 3.15 0.53 212 3.11 0.46 493

4.f 3.31 0.58 605 3.31 0.61 278 3.32 0.52 489



Table13.Resultsofmulti-levelmodelspredictingAnnouncedobservationcomponentratings

Component EstimateStd.Error df t Sig.

1a -0.00303 0.001716 8.257 -1.767 0.1141b -0.00462 0.001618 45.849 -2.855 0.0061c -0.00338 0.001524 46.411 -2.214 0.0321d -0.00174 0.002144 19.889 -0.813 0.4261e -0.00527 0.001776 5.519 -2.968 0.0281f -0.00426 0.002181 6.001 -1.954 0.0982a -0.00519 0.001461 53.131 -3.551 0.0012b -0.00357 0.000959 21.848 -3.722 0.0012c -0.00479 0.001255 57.321 -3.819 02d -0.00522 0.001425 35.709 -3.662 0.0012e -0.0027 0.001133 33.006 -2.385 0.0233a -0.0031 0.001051 34.969 -2.944 0.0063b -0.0041 0.001037 13.734 -3.949 0.0023c -0.00379 0.001141 44.642 -3.316 0.0023d -0.00502 0.000837 16.826 -5.994 03e -0.00308 0.001372 69.718 -2.243 0.0284a -0.00483 0.002216 12.196 -2.178 0.054b 0.001082 0.001883 30.816 0.574 0.574c -0.00331 0.002126 44.614 -1.556 0.1274d -0.005 0.002222 11.832 -2.252 0.0444e -0.00418 0.00279 22.02 -1.498 0.1484f -0.00339 0.002635 16.337 -1.287 0.216



Table14.Resultsofmultinomialgeneralizedlinearmixedmodelspredictingratingsresults:ChangeinprobabilitiesofratingsasafunctionofschoolF/Rlunchrate(ratingsarecomparedtoDistinguished)

Intercept(0%F/Rlunch)

Adjusted(100%F/Rlunch)

Component Comparisonlogoddsratio Probability

logoddsratio Probability

2a Unsatisfactory -6.432 0% -2.032 12% Basic -3.498 3% 0.302 57% Proficient 0.157 54% 1.557 83%2b Unsatisfactory -6.355 0% -0.755 32% Basic -1.898 13% 0.902 71% Proficient 1.194 77% 2.694 94%2c Unsatisfactory -6.528 0% 0.572 64% Basic -2.075 11% 1.325 79% Proficient 1.118 75% 2.418 92%2d Unsatisfactory -7.292 0% -0.592 36% Basic -2.581 7% 0.619 65% Proficient 0.59 64% 1.69 84%2e Unsatisfactory -7.361 0% -0.361 41% Basic -2.766 6% 0.334 58% Proficient 1.564 83% 2.464 92%3a Unsatisfactory -5.324 0% -2.324 9% Basic -1.98 12% 0.62 65% Proficient 1.169 76% 2.169 90%3b Unsatisfactory -4.241 1% 0.359 59% Basic 0.182 55% 1.982 88% Proficient 2.158 90% 2.358 91%3c Unsatisfactory -5.581 0% -0.681 34% Basic -1.107 25% 1.493 82% Proficient 1.748 85% 2.348 91%3d Unsatisfactory -5.831 0% 0.969 72% Basic -0.627 35% 2.373 91% Proficient 2.097 89% 3.197 96%3e Unsatisfactory -5.814 0% 0.386 60% Basic -2.022 12% 1.378 80% Proficient 1.79 86% 2.69 94%