Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
EvaluationoftheWisconsinEducatorEffectivenessSystem
ResultsoftheTeacherPracticeRatingSystemPilot
CurtisJ.Jones
UniversityofWisconsininMilwaukeeSociallyResponsibleEvaluationinEducation
January2015
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystemi
EvaluationTeam
CurtisJ.JonesisaseniorscientistintheSchoolofEducationattheUniversityofWisconsin,Milwaukee,anddirectorofSociallyResponsibleEvaluationinEducation.
JessicaArrigoniisaresearcherattheConsortiumforPolicyResearchinEducationintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.
MikhailPyatigorskyisaneconomistandresearcherattheValue-AddedResearchCenterintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.
ClarissaSteeleisasurveyresearcherandtheleadforcommunicationsandprofessionaldevelopmentfortheValue-AddedResearchCenterintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.
RobinWorthisaresearcherattheConsortiumforPolicyResearchinEducationintheWisconsinCenterforEducationResearchattheUniversityofWisconsin,Madison.
Acknowledgments
Wewouldliketothankthemanyindividualswhocontributedtothedevelopmentofthisreport,especiallyKatharineRaineyandLauraRuckertattheWisconsinDepartmentofPublicInstruction,andStevenKimballattheUniversityofWisconsin,Madison.
Wewouldalsoliketothankthefollowingindividualswhoprovidedfeedbackonthereportandcontributedtotheevaluation:BradleyCarl,HerbHeneman,JacobHollnagel,RachelLander,TonyMilanowski,SamuelPurdy,LoganRoman,andStevenSmith.
FormoreinformationaboutthisreportortheevaluationofEE,[email protected]
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystemii
KeyFindingsIn2013-2014,thestateofWisconsinpilotedaspectsofitsEducatorEffectivenessSystem.Thepilotoftheteacherpracticeevaluationprocess,usingtheDepartmentofPublicInstruction(DPI)model1,involvedevaluatorsratingthequalityofteachingaccordingtotheCharlotteDanielson2013FrameworkforTeachingrubric.Severalresultsfromthepilotwillinformthestate-wideimplementationoftheDPIteacherpracticeevaluationprocess.However,interpretationoftheseresultsshouldbedonecautiously.Itisnotknowniftheseresultswillholdupwhenmoreevaluatorsarecertifiedandtheanalysesarebasedonfinalratingsforthewholestateratherthansingleobservationsforaselectionofschools.Further,lackofavailableteacherdatamakemanyofthesefindingsdifficulttointerpret.
ThegreatmajorityofpilotteachersandprincipalsfelttheyknewhowtoimplementtheDPIteacherpracticeevaluationprocess.ThissuggeststhatthetrainingandinformationdevelopedbyDPIhasbeeneffective.
PilotingeducatorsgenerallybelievedtheFrameworkforTeachingaccuratelydefinesinstructionalqualityandthatitisfairtouseaspartofateacherevaluationsystem.
Pilotingeducatorsexpressedconcernsabouttheconsistencyoftheimplementationoftheteacherpracticeevaluationprocess.TherewassomeconcernthattheFrameworkforTeachingmaynotbefairtoteacherswithdifferentevaluators,typesofstudentsandindifferentsubjectareas.
ThetimeandresourceburdentoschoolsrepresentsthelargestsinglebarrierforimplementingEE,bothgenerally,andspecificallyfortheteacherpracticeevaluationprocess.
TheTeachscapedatacollectionplatformwasviewedbymanypilotingeducatorsasaseriousdeficiencyinthesystem.Manyexpressedfrustrationwithitandfeltitwaswastingagreatdealoftime.
Evenconsideringthetimeandresourceburdendistrictsexperienced,pilotfindingssuggestthatmostparticipantsbelievethatstandardizingtheteacherpracticeevaluationprocessisaworthwhileendeavor.
NearlyallpilotteacherswereratedasProficient.74%ofteachersdidnotreceiveanyratingsbelowProficientonanycomponents.Althoughitispossiblethattheseresultsmostlyreflectthequalityofteachingamongpilotteachers,thedisproportionatenumberofteacherswhowereratedasexactlythreeacrossall22Frameworkfor
1DatawerenotavailableforanalysistoinformtheimplementationofotherteacherpracticemodelsusedinWisconsin,suchastheCESA6model.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystemiii
Teachingcomponentssuggeststhatotherfactors,suchasschoolculture,mayhavealsoinfluencedratings.
Thereweresignificantdifferencesintheratingsassignedtoteachersindifferentdistrictsandschools.ThesedifferencesmayreflectrealdifferencesortheymayreflectdifferencesintheapplicationoftheFrameworkforTeaching.
TeachersindistrictsthatplantouseEEresultsforhigh-stakespurposesandteachersinlesscrowdedschoolswerefoundtohavehigherpracticeratings.Althoughtheserelationshipswereentirelyexplainedbyschoolfree/reducedlunchparticipationrates,thesefactorsmaystillpartiallyexplainwhyschoolfree/reducedlunchparticipationratespredictpracticeratings.Futureevaluationworkisneededtoexplorethisfurther.
Schoolfree/reducedlunchparticipationrateswerefoundtobeastrongpredictorofteacherpracticeratings.Thisfindingmayindicatethatmoreeffectiveteachersareselectingintohigherincomeschools.ItmayalsosuggestthatratingsusingtheFrameworkforTeachingarerelatedtothetypesofstudentsinclassrooms.Therewasevidenceforbothinthepilot.ThisfindingisconsistentwithwhathasbeenfoundinotherEducatorEffectivenessevaluationsincludingtheMeasuresofEffectiveTeaching(MET)ProjectandarecentreportbyBrookingsInstitute.
UntiltherelationshipofF/Rlunchratesandpracticeratingsisbetterunderstood,itisimportantthatdistrictsnotusepracticeratingstocompareteachers.ItisnotclearyethowtheFrameworkforTeachingcouldbeusedinavalidwayforthispurpose.Districtsshould,instead,focusonanalyzinggrowthofindividualeducatorsacrosstheyear.Theproperuseofpracticeratingswillberevisitedafterthefirstfullyearofimplementationin2014-2015.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystemiv
TableofContents
EvaluationofWisconsinEducatorEffectiveness(EE)SystemPilotingtheTeacherPracticeRatingSysteminWisconsin.............................................................................................................................................................................................................1
TheWisconsinteacherpracticeevaluationprocess..................................................................................................................1Evaluationquestionsandmethods....................................................................................................................................................3Data....................................................................................................................................................................................................................5Analysis.............................................................................................................................................................................................................6
Teacherpracticeevaluationresults...................................................................................................................................................7Howwelldoeducatorsunderstandtheteacherpracticeevaluationprocess?................................................................7HowdoeducatorsfeelabouttheFrameworkforTeachingbeingusedtoevaluateteachers?................................8Doeducatorsbelievetheywillhavethetimeandresourcesnecessarytocompleteteacherpracticeevaluations?..................................................................................................................................................................................................11HowdoeducatorsperceivetheinclusionoftheFrameworkforTeachingwillimpactthequalityofWisconsinteaching?.................................................................................................................................................................................13Whoparticipatedandtowhatdegreeinthepilotofteacherpracticeevaluations?..................................................16HowwelldidratingsassignedtoteachersreflectexpectedrelationshipsbetweenthedomainsandcomponentsthatcomprisetheFrameworkforTeaching?.....................................................................................................18Howwereteachersratedoverall?......................................................................................................................................................20Howwereteachersratedoncomponents?.....................................................................................................................................22Whatevidenceistherethatteachersarebeingrateddifferentlyindifferentcontexts?..........................................24
Summaryanddiscussion......................................................................................................................................................................33Appendix......................................................................................................................................................................................................37
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystemv
Tableoftables
Table1.Teacherfinalratings:Domaincorrelations.....................................................................................................................19
Table2.Descriptivestatisticsofannounced,unannouncedandfinalratings...................................................................20
Table3.Descriptivestatisticsofschoolcharacteristicsandratings......................................................................................27
Table4.Correlationsofschoolcharacteristicsandteacherratings.......................................................................................28
Table5.Covarianceparametersofunconditionalmulti-levelmodelspredictingpracticeratings.........................29
Table6.Resultsofmodelspredictingratings..................................................................................................................................30
Table7.Averagemulti-levelmodelcomponentcoefficients.....................................................................................................33
Table8.Frequenciesofcomponentsratedbytypeofobservation........................................................................................37
Table9.Internalconsistencyoffinalteacherratings...................................................................................................................38
Table10.ComponentcorrelationswithinFrameworkforTeachingdomains...................................................................39
Table11.CFAFinalratingcomponentresults.................................................................................................................................41
Table12.Descriptivestatisticsofcomponentratings..................................................................................................................43
Table13.Resultsofmulti-levelmodelspredictingAnnouncedobservationcomponentratings.............................44
Table14.Resultsofmultinomialgeneralizedlinearmixedmodelspredictingratingsresults:ChangeinprobabilitiesofratingsasafunctionofschoolF/Rlunchrate(ratingsarecomparedtoDistinguished).............45
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystemvi
Tableoffigures
Figure1.Confidenceofprincipalsinworkingthroughthestepsoftheteacherpracticeevaluationprocess......7
Figure2.UnderstandingoftheFrameworkforTeaching..............................................................................................................8
Figure3.PerceivedfairnessofusingtheFrameworkforTeachingtoevaluateteacherpractice.............................10
Figure4.PerceptionsaboutthequalityoftheFrameworkforTeaching.............................................................................11
Figure5.Educatorimpressionsofthetimeandresourcebarrierstocompletingteacherpracticeevaluations...............................................................................................................................................................................................................................13
Figure6.Perceptionoftheimpactofevaluatingteacherpracticeoninstruction...........................................................15
Figure7.Teacherperceptionoftheimpactoftheevaluationontheirpractice...............................................................15
Figure8.Cumulativepercentageofaverageteacherratings....................................................................................................21
Figure9.Finalratings:FewteachersreceivedafinalratingofBasiconanycomponentswhilenonewereratedasUnsatisfactory................................................................................................................................................................................23
Figure10.Observationratings:MoreteacherswereratedasBasicthanwereinfinalratings................................24
Figure11.PlotofteacherratingsandschoolF/Rlunchparticipation.................................................................................30
Figure12.2d(ManagingStudentBehavior)-ProbabilitiesofbeingratedUnsatisfactory,Basic,andProficientcomparedDistinguishedinschoolswithdifferentF/Rlunchrates........................................................................................32
Figure13.ConfirmatoryfactoranalysisFrameworkforTeachingstructuralmodelwithstandardizedcoefficients.......................................................................................................................................................................................................40
Figure14.Distributionofaverageoverallratings.........................................................................................................................42
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem1
EvaluationofWisconsinEducatorEffectiveness(EE)SystemResultsoftheTeacherPracticeRatingSystemPilot
ThisreportpresentstheresultsoftheteacherpracticeevaluationprocessoftheWisconsin
EducatorEffectiveness(EE)Pilotandexplorestheimplicationsoftheseresultsfor
Wisconsin’seffortstoimplementcomponentsofthesystemacrossallWisconsindistricts
in2014-2015.Inpreparationforthestate-wideimplementation,Wisconsinpilotedfour
aspectsofitsEEsystemduringthe2012-2013and2013-2014schoolyears:evaluationsof
teacherpractice,StudentLearningObjectives,SchoolLearningObjectives,andevaluations
ofprincipalpractice.Thepurposeofthispilotwasbothtogivedistrictstheopportunityto
practiceandlearntheseprocessesandtoinformthestate’seffortstoimprovetheir
implementation.Aspartofthislearningprocess,duringthe2013-2014schoolyear,the
WisconsinDepartmentofPublicInstruction(DPI)contractedtheUniversityofWisconsin,
Milwaukee,SociallyResponsibleEvaluationinEducationtoconductanindependent
evaluationofthepilotoftheirEESystem.Thisreportfocusesexclusivelyontheresultsof
theteacherpracticeevaluationpilot.
TheWisconsinteacherpracticeevaluationprocess
DistrictsworkingwithDPI2ontheteacherpracticeevaluationcomponentoftheEEsystem
usetheCharlotteDanielson2013FrameworkforTeachingtomeasureinstructional
quality.3Theinclusionofthe2011versionintheMeasuresofEffectiveTeaching(MET)
Project4showedthateducatorratingsusingthisrubricwererelatedtobothstudent
growthonachievementtestsandstudentratingsofteacherquality.The2013versionis
virtuallyidenticaltothe2011version.TheFrameworkforTeachingseparatesteacher
practiceintofourdomainsand,withinthosedomains,22components(presentedbelow).
2Districtsarepermittedtouseothersystemsforevaluatingteacherpractice.AlthoughmostWisconsindistrictshaveoptedtoworkwithDPI,anumberhavedecidedtoworkwithCooperativeEducationalServiceAgency(CESA)6.CESA6hasnotmadetheirteacherpracticeratingsdataavailableforindependentanalysis.3http://www.danielsongroup.org/userfiles/files/downloads/2013EvaluationInstrument.pdf4http://www.metproject.org/
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem2
2013FrameworkforTeaching
Domain1:PlanningandPreparation
1aDemonstratingKnowledgeofContentandPedagogy
1bDemonstratingKnowledgeofStudents
1cSettingInstructionalOutcomes
1dDemonstratingKnowledgeofResources
1eDesigningCoherentInstruction
1fDesigningStudentAssessments
Domain2:ClassroomEnvironment
2aCreatinganEnvironmentofRespectandRapport
2bEstablishingaCultureforLearning
2cManagingClassroomProcedures
2dManagingStudentBehavior
2eOrganizingPhysicalSpace
Domain3:Instruction
3aCommunicatingWithStudents
3bUsingQuestioningandDiscussionTechniques
3cEngagingStudentsinLearning
3dUsingAssessmentinInstruction
3eDemonstratingFlexibilityandResponsiveness
Domain4:ProfessionalResponsibilities
4aReflectingonTeaching
4bMaintainingAccurateRecords
4cCommunicatingwithFamilies
4dParticipatinginaProfessionalCommunity
4eGrowingandDevelopingProfessionally
4fShowingProfessionalism
Duringtheteacherpracticeevaluationpilot,teacherswereaskedtoparticipateinatleast
one40-minuteAnnouncedobservationoftheirinstructionandatleastthree15-minute
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem3
Unannouncedobservations.AlthoughtheWisconsinEEsystemdoesnotrecommend
ratingstoberecordedafterobservations,Teachscape,theonlineEEprogresstracking
platform,requiredevaluatorstoenterratingstorecordthattheyhaddoneanobservation.
Thishassincebeenchanged.Thus,althoughratingsdataareavailableforpilot
observations,insubsequentyearsratingsforspecificobservationsmaynotbeavailable.
Basedontheseobservations,teachersweretypicallyratedonthetencomponentsofthe
FrameworkforTeachingthatcomprisetheClassroomEnvironmentandInstructiondomains
aseitherone(Unsatisfactory),two(Basic),three(Proficient),orfour(Distinguished).5,6At
theendoftheyear,basedontheseobservationsandothersourcesofevidence,suchas
teacherconferencesanddocumentreview,evaluatorsassignedfinalratingsonall22
componentstoteachers.
Certifiedevaluators(oftenprincipals)receivedintensivetraining(30hoursofprofessional
development)andthenweretestedtodetermineiftheycouldcorrectlyratethequalityof
instructionseeninexamplevideos.Thepurposeofthistrainingwastocalibrateand
normalizescoringacrossevaluators.Pilotingdistrictsalsoreceivedtrainingandsupport
fromDPIandtheirlocalCESAthroughouttheyeartoensuretheyunderstoodtheprocess
andthetimingforcompletingthestepsintheteacherpracticeevaluationprocess.These
supportsandresourcesweredesignedtoincreasethelikelihoodthatteacherpractice
ratingswouldbebothvalidandreliableacrossthepilotingdistricts.
Evaluationquestionsandmethods
TheoverallpurposeoftheongoingevaluationoftheWisconsinEducatorEffectiveness(EE)
systemistoidentifythestate’sprogresstowardimplementingthepartsoftheEEsystem;
teacherpracticeevaluations,principalpracticeevaluations,studentlearningobjectives,
schoollearningobjectives,andeventuallyprincipalandteachervalue-added.Thisreport
5Althoughoverallanddomainscoreswerenotcalculatedinthepilot,forthepurposeofthisreport,overallscoreswerecalculatedbyaveragingcomponentscores.6Foradetaileddescriptionofthedifferentratingcategories,seehttp://ee.dpi.wi.gov/teacher/t-levels-performance
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem4
presentstheresultsofthe2013-2014teacherpracticeevaluationpilotandexploresthe
conditionsthatmaybothpromoteandinhibititsfullimplementationin2014-2015.7
Theresultsoftheevaluationareorganizedtoaddresseachofthequestionsoutlinedbelow.
TopromotethesuccessfulimplementationsofEEsystems,previousresearchhas
emphasizedtheneedforeducatorstobothunderstandthesystemandhaveanopenmind
aboutitsusefulnessandfairness.8Assuch,thisreportfirstpresentsevidencefromsurveys
abouthowwellWisconsineducatorsfeeltheyunderstandtheteacherpracticeevaluation
process.Italsosummarizeseducatorattitudesabouttheteacherpracticeevaluation
process;specificallyhowfairtheybelieveitisandhowtheevaluationofteacherpractice
mayimpactthequalityofinstructioninWisconsin.Italsosummarizeseducatoropinions
aboutthefeasibilityoftheworknecessarytocompletetheteacherpracticeevaluation
process.
Aftersummarizingeducatoropinionsabouttheprocess,thisreportthenpresentsthe
resultsoftheteacherpracticeevaluationpilotandsummarizesevidencesuggestinghow
wellschoolsimplementedtheprocess.Inaddition,theresultsofadhocstatisticalanalyses
designedtoidentifydistrictandschoolfactorsthatmayinfluenceteacherpracticeratings
arepresented.
7WhilethisreportwasnotpubliclyavailableuntilJanuary2015,JonesupdatedDPIbiweeklyontrends,findings,andfeedbacktoinformmodificationstotheSystemanditsresources,asnecessary.8Milanowski,A&Kimball,S.(April,2003).TheFramework-BasedTeacherPerformanceAssessmentSystemsinCincinnatiandWashoe.CPRE‐UWWorkingPaperSeries.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem5
Data
Invitationstocompleteend-of-yearsurveysweree-mailedtothe329administratorsand
388teachersthathadoriginallyagreedtopilotthesystemin2013-2014.Ofthese,190
(58%)administratorsand171(44%)teachersresponded.Allbutfouradministratorswere
principalsbut11%ofprincipalsalsoheldotherpositionsintheirschooldistricts.
RespondingteachersrepresentedarangeofgradelevelsincludingK-2nd(23%),3rdto5th
(24%),6thto8th(29%),andhighschool(35%).9Althoughthesurveyscapturedmany
aspectsoftheEEpilot,onlythequestionsthataddresstheteacherpracticeevaluationpilot
arepresentedinthisreport.
9Someteacherstaughtmorethanonegradelevel.
EvaluationQuestions
1. Howwelldoeducatorsunderstandtheteacherpracticeevaluationprocess?
2. HowdoeducatorsfeelabouttheFrameworkforTeachingbeingusedtoevaluate
teachers?
3. Doeducatorsbelievetheywillhavethetimeandresourcesnecessaryto
completeteacherpracticeevaluations?
4. HowdoeducatorsperceivetheinclusionoftheFrameworkforTeachingwill
impactthequalityofWisconsinteaching?
5. Whoparticipatedandtowhatdegreeinthepilotofteacherpracticeevaluations?
6. Howwelldidratingsassignedtoteachersreflectexpectedrelationshipsbetween
thedomainsandcomponentsthatcomprisetheFrameworkforTeaching?
7. Howwereteachersratedoverall?
8. Howwereteachersratedoncomponents?
9. Whatevidenceistherethatteachersarebeingrateddifferentlyindifferent
contexts?
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem6
PilotparticipationandratingsdatawereobtainedfromTeachscape,theonlineplatform
usedbyDPItodocumentpilotparticipationactivitiesacrossEEsystemcomponents.These
dataweresupplementedwithadditionaldatacollectedfromtheNationalCenterof
EducationStatistics(NCES),andthestateofWisconsin’sWISEdashsystem.
Thespecificdatausedinthisreportandthesourcesarepresentedbelow.
Teachscape teacherpracticeratings, evaluator, school, schooldistrict, dateofobservation
WISEdash 2013-2014schoolfreeorreducedlunchparticipation Schoolsize
NCES 2011-2012schoolteacher/studentratio Equivalentfull-timeteachersineachschool
Surveys planneduseofteacherratings(highstakesornot) understandingoftheevaluationofteacherpractice attitudestowardtheevaluationofteacherpractice perceivedimpactoftheevaluationofteacherpractice
oninstruction
Analysis
Mostoftheanalysesinvolvedsimpledescriptivestatisticsandfrequenciestopresent
educatorattitudesandteacherpracticeratings.Variouspsychometricmethodswereused
toexplorehowwelltheratingsonindividualcomponentsanddomainsfittogetherto
defineinstructionalpractice.Theseincludedcorrelations,internalconsistencyanalysis,
andconfirmatoryfactoranalysis.10Finally,statisticalmodeling11wasusedtoexplorethe
relationshipsofteacherpracticeratingswithschoolanddistrictcharacteristics.Models
10Matsunaga,M.(2010).Howtofactor-analyzeyourdataright:Do’s,don’tsandhowto’s.InternationalJournalofPsychologicalResearch,3,97-110.11Raudenbush,S.W.andBryk,A.S.(2002).HierarchicalLinearModels(SecondEdition).ThousandOaks:SagePublications.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem7
nestedpracticeratingswithinschoolsandthenwithindistricts.Onlyschoolanddistrict-
levelvariablesweretestedinthemodelssinceindividualteacherandclassroomdatawere
notavailable.Futureevaluationworkwillanalyzetherelationshipsofteacherand
classroomfactorswitheffectivenessratings.
Teacherpracticeevaluationresults
Howwelldoeducatorsunderstandtheteacherpracticeevaluationprocess?
Asmentionedpreviously,itiscriticalforthesuccessoftheimplementationofEEthat
educatorsunderstandthesystem.ItappearsthattheworkDPIhasdevotedtodeveloping
thisunderstandingineducatorshaspaidoff.Specifically,thegreatmajorityofprincipals
(74%)expressedconfidencethattheycouldworkthroughthevariousstepsoftheteacher
evaluationprocessusingtheFrameworkforTeachingandonly2%didnotfeelconfident
(Figure1).
Figure1.Confidenceofprincipalsinworkingthroughthestepsoftheteacherpracticeevaluationprocess
25%
49%
24%
2%
HowconfidentdoyoufeelworkingthroughthestepsoftheteacherevaluationprocessusingtheDanielsonFramework?(Administrators)
veryconfident confident somewhatconfident notconfident
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem8
Inaddition,themajorityofbothprincipals(71%)andteachers(70%),agreedwiththe
statementthattheyunderstoodtheFrameworkforTeachingwhileonly1%ofprincipals
and10%ofteachersdisagreed(Figure2).Basedonthesesurveyresultsitappearsthatthe
greatmajorityofrespondentsfelttheyknowhowtoimplementtheteacherpractice
evaluationprocess.
Figure2.UnderstandingoftheFrameworkforTeaching
HowdoeducatorsfeelabouttheFrameworkforTeachingbeingusedtoevaluateteachers?
Althougheducatorsgenerallyreportedunderstandinghowtoevaluateteacherpractice
usingtheFrameworkforTeaching,itisstillimportantfortheacceptanceofthesystemthat
educatorsfeelitwillresultinfairevaluationsofteacherpractice.Surveyresultssuggest
thatbothteachers(89%)andadministrators(95%)atleastsomewhatagreedthatthe
FrameworkforTeachingisafairmethodforpartiallydeterminingtheeffectivenessof
teachers(Figure3).Asonerespondentexpressed:
71% 70%
28%20%
1% 6%4%
IunderstandtheDanielsonFramework.(Administrators)
MyevaluatorunderstoodtheDanielsonFramework.(Teachers)
agree somewhatagree somewhatdisagree disagree
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem9
“Icommendthestateinselectinganevaluationprocessthatishighlyprescribed‐theTeachscapetrainingwereceivedontheDanielsonframeworkwaslengthy,butittrulyhelpedmyunderstandingoftheratings.”
Therestillremainssomedegreeofconcernthoughthatyoucannevercompletelyremove
thehumanelementfromtheevaluationprocess.
“Theevaluationcanstillbebiasedbasedonwhoislookingforwhat.Thelinesbetweena3ora4ora2aversusa2bcanbeveryblurred.Thereisnoclearcutsystembutbecauseofthisnosystemwilltrulybe100%fair.Thiscanbeapartialdeterminationofeffectivenessyes...howeverIamleeryaboutdistrictsthatlooktoonlyusethismodel.”
“Sure,itcanbeafairtool,butthereisstillthehumanelementintheevaluator.Iunderstandthatcalibrationissupposedlytakingplace,buttowhatend.Nextyear,ourdistrictmayonlyhaveoneof3administratorsdoingevalsbecause2ofthemarestrugglingtogetcalibrated.Whentheyfinallygettherubberstamp,howconsistentdoyouthinktheevalswillbeacrossourdistrict?”
Therewasfurtherconcernthatratingsmaybesomewhatdependentonthecharacteristics
ofstudentsintheclassroomandofthesubjectbeingtaught.
“Manyofthe"lookfors"intheframeworkaresomewhatdependentonthestudentpopulationthatteachersorprincipalsareworkingwith.”
Finally,therewasalsosomeconcernthattheobservation(especiallyAnnounced
observations)wouldbestagedandnotentirelyreflecttheactualqualityofinstruction
occurringinclassrooms.
“Ifeelthatifevaluatorsonlyuseevidencegleanedinoneobservationandnothingelseitcanandwillskewscoresbothpositivelyandnegatively.Someevaluatorswillneedtousewhattheyknowabouttheeducatorandhis/herpracticetofairlyevaluatetheteacherbutaccordingtoDanielsonifnotseenitdoesn'texist.Thatwilllikelymaketheoneformalobservationa"dogandponyshow"whichisnotanaccuratedemonstrationofpracticeandisnobetterthanwhatwasinplacebefore.”
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem10
TheseconcernsreflectconcernregardinghowdistrictsimplementtheSystemandthe
FrameworkforTeaching:analyzinganindividual’sgrowthacrosstime(asintended),or
comparingteachers.
WhenaskedaboutthequalityoftheFrameworkforTeachingrubricspecifically,thegreat
majorityofbothadministratorsandteachersgenerallyfeltthatitmeasuresthemost
importantaspectsofeffectiveteaching,adequatelyarticulatesperformancelevelswithin
eachcomponent,andidentifiestheknowledgeandskillsthatreflectperformancelevels
withineachcomponent(Figure4).
Takentogether,theseresultssuggestthatpilotingeducatorsgenerallybelievethe
FrameworkforTeachingaccuratelydefinesinstructionalqualityandthatitisfairtouseas
partofasystemofevaluatingteachers.However,educatorsalsoexpressedconcernsabout
theconsistencyofitsimplementation.
Figure3.PerceivedfairnessofusingtheFrameworkforTeachingtoevaluateteacherpractice
50% 45%
45%44%
3% 9%
2% 3%
TheevaluationofteacherpracticeusingtheDanielsonFrameworkisafairmethodforpartiallydeterminingthe
effectivenessofteachers.(Adminstrators)
TheevaluationofteacherpracticeusingtheDanielsonFrameworkisafairmethodforpartiallydeterminingthe
effectivenessofteachers.(teachers)
agree somewhatagree somewhatdisagree disagree
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem11
Figure4.PerceptionsaboutthequalityoftheFrameworkforTeaching
Doeducatorsbelievetheywillhavethetimeandresourcesnecessarytocompleteteacherpracticeevaluations?
AnongoingconcernexpressedbyeducatorsacrossWisconsinhasbeenwhethertheywill
havethenecessarytimeandresourcestocompletetheEEprocess,bothgenerallyand
especiallyfortheevaluationofteacherpractice.Tocompletethisprocesswellandwith
fidelitytakesaconsiderabletimeandresourcecommitment.Thisisespeciallytruefor
evaluators,generallyprincipals,whoneedtoconductmultipleobservationsofteacher
practiceandholdseveralmeetingswithasmanyas30or40teachersbeingevaluatedeach
year.Thisresourceburdenisfurthercomplicatedbytherealitythatmanyprincipals
actuallyfillotherroleswithintheirdistrictsthatfurthertaxtheirtime.Theconcernisthat
althougheducatorsgenerallyfeelthatusingtheFrameworkforTeachingtoevaluate
teacherpracticeisfairandtheyunderstandhowtodoit,iftheydonothavethetimetodo
itwellthentheresultsmaybeunreliable.Thisfearpartiallyexplainswhymanyeducators
stillhavesomedoubtsaboutthefairnessoftheteacherpracticeevaluationprocess.
54% 51% 53% 46% 45% 51%
40% 43% 42%44% 45% 39%
5% 6% 4% 8% 9% 10%1% 1% 1% 1% 1%
TheDanielsonFramework
measuresthemostimportantaspects
ofbeinganeffectiveteacher.
TheDanielsonFrameworkadequately
articulatestheperformancelevels
foreachknowledgeandskillsarea.
TheDanielsonFrameworkaccurately
identifiesthekeyknowledgeandskillsthatreflect
teachereffectiveness.
TheDanielsonFramework
measuresthemostimportantaspects
ofbeinganeffectiveteacher.
TheDanielsonFrameworkaccurately
identifiesthekeyknowledgeandskillsthatreflect
teachereffectiveness.
TheDanielsonFrameworkadequately
articulatestheperformancelevels
foreachknowledgeandskillsarea.
Administrators Teachers
agree somewhatagree somewhatdisagree disagree
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem12
“IbelievetheDanielsonFrameworkisavalidandreasonablebasisforteacherevaluation,Idon'tbelievewearegiventhetimeortheresourcestocarryitthroughandapplytheevaluationintheclassroom.”
Thereisalsoaconcernthat,aftersettingasideenoughtimetocompleteEE,principalswill
beoverwhelmedandnothaveenoughtimeleftovertomeetalloftheirother
responsibilities.
“Theamountoftimeneededforevaluationwillmakeitdifficulttocompleteotheraspectsofmyjob.Itwillbecome75‐80%ofmypracticeservingastaffof60andastudentpopulationof465.”
“InthesmallruralschooldistrictthePrincipalhastoomany"hats."MyconcernisthatIwillspendsomuchtimeonEEthattherestofthejobwillbeshortchanged.Istartedmycareer34yearsagotoworkwithkidsandparents,thejobhasturnedintoaseaofpaperwork.”
Finally,anunexpectedpointofconcernregardingthetimecommitmentforimplementing
theteacherpracticepilotwasvoicedbymanyprincipalsattemptingtoutilizeTeachscape,
theonlinetrackingplatform.
“Therewereseveralglitchesthatcauseddatatobere‐enteredseveraltimeswhichtooktime.Iamconcernedthatoncewegoliveinthestatetheloadonthesystemwillincreasetheglitches.Thatwilldecreasethemotivationofpersonstousethesystem.Peoplewillneedabackupplantousewhenthesystemwon'tallowthemtoadddatatohelpincreaseefficiencyoftimeusage.”
“Theplatformdoesnotworkasawholesystem.ThesectionofTeachscapefortheDanielsonmodelsissufficient.Thefrustrationcomesfromthefacttheartifacts,walkthroughs,evaluation,andotherpiecesofdatadonotlink(i.e.all2ddatadoesnotcometogether).Itisverytimeconsumingtohavetodointothesystemandretypetheartifactsbynameandthenscore.Wedonothavethattypeoftime.Thisisnotanefficientoreffectiveuseofadministratortime.”
Thesurveyresultsreflecttheseconcerns.Only40%ofadministratorseitheragreedor
somewhatagreedthattheyhadenoughtimetoprovidegoodevaluationsofpracticeto
theirteachersandonlyhalffelttheyhadenoughresources(Figure5).Resourcescould
includeassistantprincipals,effectivenesscoaches,orsupportfromCESAssuchas
implementationcoaches.Evenwiththeseresourcesthough,thetimeandresourceburden
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem13
toschoolsisrealandrepresentsthelargestsinglebarrierforimplementingEE,both
generally,andspecificallyfortheteacherpracticeevaluationprocess.
Figure5.Educatorimpressionsofthetimeandresourcebarrierstocompletingteacherpracticeevaluations
HowdoeducatorsperceivetheinclusionoftheFrameworkforTeachingwillimpactthequalityofWisconsinteaching?
Giventhelargetimeandresourcelimitationsfacingschoolswiththeimplementationof
CommonCore,SmarterBalancedAssessments,andamyriadofotherpolicies,itiseasyto
losesightofthereasoningforwhythestateofWisconsinisstandardizingtheevaluationof
teacherpractice.Thehopeisthattheprocesswillprovideeducatorswithaclearerpicture
oftheirinstructionalstrengthsandweaknessesandthatthisinformationwilleventually
11%18%
25% 29%37%
29%
42% 32%
45%38%35%
32%34%
20%14%
26%
8% 9% 6%11%
Ihaveenoughtimetoprovidegoodevaluationsofpracticetomyteachers.
Ihaveenoughresourcestoprovidegood
evaluationsofpracticetomyteachers.
Ihadenoughtimetoreceiveagoodevaluation
ofmypractice.
Ihadenoughresourcestoreceiveagoodevaluation
ofmypractice.
Ihadenoughsupportfrommy
administrator/supervisortoreceiveagood
evaluationofmypractice.
Administrators Teachers
Agree SomewhatAgree SomewhatDisagree Disagree
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem14
empowerteacherstoimprovetheiroverallqualityofinstruction.Thesurveyresultsreflect
thisoptimism.Nearlyalladministratorsandteachersatleastsomewhatagreedthatthe
teacherpracticeevaluationprocesswillimproveteachers’abilitiestomeasuretheir
effectivenessandthatitwillimproveteacherpractice(Figure6).Furthermore,between
70%and80%ofteachersatleastsomewhatagreedthattheevaluationofteacherpractice
improvedtheirperformanceoneachofthefourFrameworkforTeachingdomains(Figure
7).
Thus,evenconsideringthelargetimeandresourceburden,theseresultsseemtosuggest
thatmostparticipantsbelievethatstandardizingtheteacherpracticeevaluationprocessis
aworthwhileendeavor.However,thereseemstobeanunderstandingamongmany
Wisconsineducatorsthatitisgoingtotaketimeandpracticetoimplementitwell.
“IthinkEEwillimprovemypracticeandteacherpractice.Ithinkitisgoingtotakesometimetocreateasystemthatisfair,responsive,andeffectivelyimprovesstudentlearning.”
Withthe2013-2014EEpilot,districtsbegantopracticethesesystems.Theresultsofthe
teacherpracticepilotarepresentedinthenextsectionsandestablishbaselineresultsas
theimplementationoftheteacherpracticeevaluationprocessexpandstoallWisconsin
districtsin2014-15.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem15
Figure6.Perceptionoftheimpactofevaluatingteacherpracticeoninstruction
Figure7.Teacherperceptionoftheimpactoftheevaluationontheirpractice
30%39% 44%
53%
54% 50%
12%6% 5%5% 1% 1%
Theevaluationofteacherpracticeimprovestheabilityofteachersto
measuretheireffectiveness.
Theevaluationofteacherpracticeimprovestheabilityofteachersto
measuretheireffectiveness.
Theevaluationofteacherpracticewillimproveteacherpractice.
Teachers Administrators
agree somewhatagree somewhatdisagree disagree
26%18%
26% 22%
52%57%
55%53%
17% 18%14%
16%
5% 7% 6% 9%
Myevaluationofprofessionalpracticeimprovedmy
PlanningandPreparation.
MyevaluationofprofessionalpracticeimprovedmyClassroomEnvironment.
Myevaluationofprofessionalpracticeimprovedmy
Instruction.
Myevaluationofprofessionalpracticeimprovedmy
ProfessionalResponsibilities.
agree somewhatagree somewhatdisagree disagree
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem16
Whoparticipatedandtowhatdegreeinthepilotofteacherpracticeevaluations?
Originally,192principals,and402teachersacross195schooldistrictsvolunteeredand
weretrainedtoparticipateinthe2013-2014pilot.DPIhadaskedthesedistrictstorecruit
twohigherperformingteacherstoparticipateinthepilotsothatthepilotwouldrunmore
smoothlyandtherewouldbelessconcernamongparticipantsthattheresultscouldreflect
negativelyonparticipants.
Ultimately,385schoolsacross123schooldistrictsparticipated,withratingsdataonly
availablefor135ofthe402teacherswhooriginallyvolunteeredtopilottheprocess.
However,withinthesedistricts,farmore(449evaluatorsand2,595teachers)pilotedthe
teacherpracticeevaluationcomponentoftheEEsystemthanwereoriginallyplanned.
ManydistrictsdecidedtopilotaspectsthenewDPIteacherpracticeevaluationprocess
morewidelythanwasoriginallyplanned.Seventy-nineofthepilotschoolsreported
ratingsforatleasttenteachers,whichrepresents71%(1,839)ofalltheteachersengaged
inthepilot,and131schoolsreportedratingsforatleastfiveteachers,representing84%
(2,173)ofpilotteachers.Clearly,theteachersengagedinthepilotrepresentawider
distributionofteacherskilllevelsthanwasoriginallyintended.
Ofthe2,595teachersinvolvedinthepilot,Announced
firstobservationresultswererecordedfor2,191
teachersandUnannouncedfor1,466,butfinalratings
wereonlyrecordedfor507teachersacross82schools
(47elementary,16middle,21high,and8combined)and
43districts.Announcedobservationsweretypically
recordedinlateFebruary,UnannouncedinlateMarch,
andfinalratingsinearlyJune.12
12304and385teacherswereevaluatedinmorethanoneAnnouncedandUnannouncedobservationrespectively.
Locationsofdistrictswithfinalteacherratings
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem17
Itisnotclearwhyfinalratingdatawerenotreported.Itmaybethatparticipantsdidnotdo
allofthestepsnecessarytocompletetheprocess.Thereislikelysometruthtothissince,as
discussedearlier,mostprincipalsfeelthattheydonothaveenoughtimetoprovidegood
evaluationsofpracticetoteachers.13Itisalsopossiblethatmanyoftheschoolsdidnot
intendtocompletetheprocessandinsteadwantedsimplytofamiliarizethemselveswith
theprocess.Anotherpossibilityforthelowpilotcompletionrateisthatresultswere
simplynotenteredintoTeachscape,theon-linetrackingplatform.Thisisalsoalikely
partialexplanation,asmanyprincipalsexpressedfrustrationwiththeusabilityof
Teachscapeand48%ontheend-of-yearsurveyreportedbeingatleastsomewhat
dissatisfiedwiththeplatform.Anotherpossibilityisthatparticipantswereawarethatthe
TeachscapesystemwasbeingupgradedtobetteralignwiththeDPIEEprocessanddidnot
wanttotakethetimeduringtheendofthepilotyearwhentheyknewtheon-lineplatform
waschanging.
Asanadditionaldata-cleaningstep,thenumberofcomponentsscoredwithineachtypeof
observationwasanalyzed(Appendix:Table8).ThesesuggestthatduringAnnouncedand
Unannouncedobservations,teachersweretypicallyratedonbetweensevenandten
components,whilethegreatmajorityoffinalratingsincludedall22components.Theten
componentsintheClassroomEnvironmentandInstructiondomainswastypicallyassessed
duringobservations,whileeffectivenessintheotherdomaincomponentswasassessed
primarilythroughartifactanalysis,discussionswiththeteacher,andsomeobservation.
Thereweretenteacherswhosefinalratingsincludedfiveorfewerratedcomponents.The
ratingsfortheseteacherswerenotincludedinsubsequentanalyses
13Note:DPIintendeddistrictstoimplementthepilotwithsmallernumbersofeducatorstospecificallyaddresscapacityconcernsandhaveaclearunderstandingoftheactualtimerequired.Byincreasingthesizeofthepilotlocally,districtsmayhaveimpactedtheircapacity.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem18
HowwelldidratingsassignedtoteachersreflectexpectedrelationshipsbetweenthedomainsandcomponentsthatcomprisetheFrameworkforTeaching?
Internalconsistency
Focusingonfinalratings,theinternalconsistencyofthecomponentswithineachdomain
suggestseachdomainmeasuredseparate-but-relatedconstructsthattogetherdefinedan
overallconstructofteacherpractice(Table9:Appendix).Perfectconsistencywouldbe
representedbyaCronbach’sAlphavalueof1.0whilenoconsistencywouldbe0.Alpha
levelsshouldbeatleast0.6tosuggestthattheindividualcomponentsarerelatedwithina
domain.Whilelevelsof1.0wouldsuggestthatevaluatorsdidnotdifferentiateteacher
performanceontheseparateitems.Theseresultssuggestthatratingsonthecomponents
thatcompriseallfourinstructionaldomainsareconsistent;teachersscoringhigheronone
arelikelytoscorehigherontheothers.Therewasoneitemwhich,ifremoved,wouldresult
inmoreconsistentresults.Removing2e(OrganizingPhysicalSpace)fromtheClassroom
EnvironmentDomainwouldresultinasmallincreaseintheinternalconsistencyofthat
domain.Theremovaloftwootheritems,1d:DemonstratingKnowledgeofResourcesand4b:
MaintainingAccurateRecordsresultedinminimalchangestointernalconsistency.
Correlations
Componentrelationshipswerefurtherexploredthroughcorrelationalanalysis(Table10:
Appendix).Aswasfoundwithinternalconsistencies,components1d,2e,and4bwerethe
leastcorrelatedwiththeothercomponentsintheirdomains.Correlationsbetweenthefour
domainsfactorswerealsoanalyzed(Table1).Theseresultssuggestthatallfourdomains
werecloselyrelated,withcorrelationsabove.5.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem19
Table1.Teacherfinalratings:Domaincorrelations
1 2 3 4
Domain1:PlanningandPrep 1
Domain2:ClassroomEnvironment 0.560 1
Domain3:Instruction 0.669 0.656 1
Domain4:ProfResponsibility 0.704 0.503 0.588 1
Confirmatoryfactoranalysis(CFA)
CFAwasusedtodeterminehowconsistenttheoverallstructureoftheFrameworkfor
Teachingtheoreticalmodeliswiththeactualpracticeratingsresults.SPSS,AMOS22was
usedtobuildtheratingsstructuralmodel(Figure13:Appendix).Thecovariancesbetween
domainswereconstrainedaccordingtothebi-variatecorrelationsbetweendomains.
Measuresofmodelfitareusedtodeterminetherelativeconsistencyofthemeasured
practiceratingswiththeFrameworkforTeaching.TheRMSEAof.057isnotidealbutstill
withintheacceptablelevels.However,thecomparativefitindex(CFI)of.89isbelowevena
liberalthresholdformodelfit.Ananalysisofthecomponentloadingsprovidessome
indicationsofwhypracticeratingsarenotentirelyconsistentwiththeFrameworkfor
Teaching(Table 11:Appendix).Consistentwiththeinternalconsistencyandcorrelation
analyses,components1d,2e,and4bdonotloadontothelatentconstructsaswellasthe
othercomponentsineachdomain.Interestingly,removingthesethreecomponentsfrom
themodelincreasedCFIto.91,whichiswithinanacceptablerangeformodelfit.
Takentogether,theseresultssuggestthattheFrameworkforTeaching’stheoretical
groundingmaynotbeentirelyconsistentwiththeempiricalresults.Specifically,ratingson
components1d,2e,and4bdonotrelatewelltotheothercomponentsintheirdomains.
Althoughtheseresultsmightsuggestthatthesethreeitemsshouldberemovedfrom
practiceratings,itisimportanttonotethattheseresultsarebasedonarelativelysmall
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem20
numberofteachers.Strongerconclusionsaboutthestructureoffinalratingsdatawillbe
possibleattheendofthe2014-2015schoolyear.
Howwereteachersratedoverall?
TheresultsofAnnouncedandUnannouncedobservationsandfinalratingssuggestthat
teachersweretypicallyratedasProficientoverall(Table2).Itisworthnotingthatfinal
ratingsweresomewhathigheronaveragethanobservationratings.However,these
differencesweresomewhatmitigatedwhencomparingratingsonlyforteacherswithfinal
ratingsandobservationalratings.
Table2.Descriptivestatisticsofannounced,unannouncedandfinalratings
N Min Max MeanStd.
DeviationAllrecordedratings
Finalratings 497 2.05 3.91 3.13 0.251stAnnouncedObservations 2186 1 4 3.03 0.371stUnannouncedObservations 1460 1 4 2.99 0.38
RatingsforteacherswithallthreeratingsFinalRatings 308 2.05 3.91 3.11 0.251stAnnouncedObservations 308 2 3.9 3.04 0.261stUnannouncedObservations 308 1.89 3.9 3.07 0.27
Thedistributionoffinalratingssuggeststhatthegreatmajorityofteacherswereratedas
atleastProficient.14ItisnotclearexactlywhatratingwillbeconsideredasBasicor
ProficientintheWisconsinsystem,butifthestandardforProficientisthattheteachermust
averagegreaterthan2.5onallcomponentsscores,then98.4%ofteacherswouldhavemet
thisstandard.Ifthestandardisincreasedto2.75(allowingforonlyfiveof22standardsto
beratedasBasic),then94.3%wouldhavebeenratedasProficient(Figure14).
14Evaluatorsweretoldthatifteachersreceivedlowratings(e.g.,level1Unsatisfactory)theyshouldbetakenoutofthepilotandplacedonthedistrict’stypicalinterventionplan.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem21
Adeeperlookintothedistributionofratingsshowsmoreclearlytheskewednatureof
resultsandsuggestsapossiblepartialexplanation.Figure8presentsthecompounding
percentageofteachersacrossaverageoverallratings.Only84teachers(16.6%)received
finalratingslessthananaveragescoreof3(Proficient),while95(18.7%)wereratedas
exactly3.Althoughnotaspositivelyskewedasfinalratings,theresultsofobservations
followasimilarpattern.
Thispatternissomewhatpuzzlingandsuggeststhattheremaybeaculturalbarrier
resultinginevaluatorsadjustingtheircomponentratingssothatanindividualteacher’s
averageratingisnotbelowProficient(3).
Figure8.Cumulativepercentageofaverageteacherratings
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 1.5 2 2.5 3 3.5 4
CumulativePercentageofTeachers
Announced Unannouced FinalRatings
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem22
Howwereteachersratedoncomponents?
Frequenciesofcomponentratingsprovidemoredetailabouthowteacherswereratedon
eachcomponent.FewteachersreceivedfinalratingsofBasic(Figure9).Further,74%
receivednofinalratingsofBasiconanycomponents.Althoughasmallpercentage,more
teachersreceivedBasicratingsonindividualcomponentsbasedonobservations(Figure
10).Still,only34%ofteachersreceivedanyBasiccomponentratingsonAnnouncedor
Unannouncedobservations.
Withineachdomain,specificcomponentswereidentifiedasstrengthsandweaknesses
(Table12).WithinthePlanningandPreparationdomain,finalratingsofComponent1f
(DesigningStudentAssessments)werestatisticallylowerthanalltheothercomponentsin
thatdomain(p<.05),whileteacherswereratedhigheron1a(DemonstratingKnowledgeof
ContentandPedagogy)thanallotherDomain1Componentsexcept1b(Demonstrating
KnowledgeofStudents)and1e(DesigningCoherentInstruction)(p<.05).Withinthe
ClassroomEnvironmentDomain,Component2a(CreatinganEnvironmentofRespectand
Rapport)wasidentifiedasanareaofstrengthandwasthehighestratedcomponentacross
alldomainsregardlessofratingtype.WithintheInstructionDomain,Component3b(Using
QuestioningandDiscussionTechniques)wasthelowestratedskillacrossallfourdomains
andwasstatisticallylowerthantheotherInstructionDomaincomponents(p<.05).
Component3a(CommunicatingwithStudents)wasconsistentlyratedasarelativestrength.
Finally,withintheProfessionalResponsibilitiesDomain,bothComponents4b(Maintaining
AccurateRecords)and4c(CommunicatingwithFamilies)wereidentifiedasgrowthareas,
whileComponent4f(ShowingProfessionalism)wasratedasastrength.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem23
Figure9.Finalratings:FewteachersreceivedafinalratingofBasiconanycomponentswhilenonewereratedasUnsatisfactory
2% 6% 6% 3% 4% 6% 2% 3% 3% 4% 1% 3% 11% 6% 4% 3% 5% 5% 8% 6% 6% 3%
77%74%
81%86%
77%
88%
57%
77%81%
70%
89%
75%
82% 81%87%
82%
71%
89%86%
67%
77%
62%
21%21% 13%
11% 18% 6%41% 20% 16% 27% 10% 22%
7%13% 9% 15% 24% 6% 7% 27% 17%
35%
1.a 1.b 1.c 1.d 1.e 1.f 2.a 2.b 2.c 2.d 2.e 3.a 3.b 3.c 3.d 3.e 4.a 4.b 4.c 4.d 4.e 4.f
Rating
Basic Proficient Distinguished
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem24
Figure10.Observationratings:MoreteacherswereratedasBasicthanwereinfinalratings
Whatevidenceistherethatteachersarebeingrateddifferentlyindifferentcontexts?
Asmentioned,allWisconsinevaluatorsvolunteeringtoparticipateinthepilotreceivedthe
samerigoroustrainingdesignedtopromotevalidandreliableassessments,andtherefore,
improvethecomparabilityofratingsacrosscontexts.However,becauseofthelarge
numberofadditionalteachersandevaluatorsinthepilot,itisnotclearwhoreceivedwhat
leveloftraining.Therefore,thereisanincreasedriskthattheratingspresentedheremay
notbebasedonconsistentstandardsofevidence.Ideally,inter-raterreliabilitycould
specificallydefinetheconsistencyofobservationsacrosscontexts.Thiscouldbedoneby
usingindependentevaluatorstoconductobservationsintandemwithlocalevaluators.
However,theWisconsinEEsystemdoesnotrequiredistrictsusethisresource.
1% 1% 1% 1%0%
0%1% 1% 1%
1%6% 9% 10% 9% 4% 9% 21% 13% 15% 7%
63%
76% 73%
65%
85%77%
72%77% 78%
82%
30% 14% 16% 24%11%
14%
6%9% 7%
10%
2a:CreatinganEnvironmentofRespectandRapport
2b:EstablishingaCultureforLearning
2c:ManagingClassroomProcedures
2d:ManagingStudentBehavior
2e:OrganizingPhysicalSpace
3a:CommunicatingwithStudents
3b:UsingQuestioning
andDiscussionTechniques
3c:EngagingStudentsinLearning
3d:UsingAssessmentinInstruction
3e:DemonstratingFlexibilityandResponsiveness
Unsatisfactory Basic Proficient Distinguished
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem25
Nonetheless,therearestillsomeanalysespossiblethatcanexplorethepossibilitythat
teacherswererateddifferentlyindifferentsettings.Multi-levelmodeling15wasusedto
exploreschoolanddistrictseffectsonAnnouncedandUnannouncedratings.16Tobe
includedinthesemodels,schoolshadtohaveatleastfiveteacherswithratingsdata.This
selectioncriterionresultedin129schools.These129schoolsprovidedratingsfor84%
(2,173)ofpilotteachers.However,asaresultofthiscriterion,schooleffectsonfinal
ratingscouldnotbemodeledsincenodistrictsincludedmorethanoneschoolmeetingthis
threshold.
Oneschool-levelfactortestedinthemodelswasFreeorReduced(F/R)lunchparticipation.
Thiswasdoneinresponsetoconcernsexpressedbypilotparticipantsandagrowing
nationaldebateaboutwhetherteacherratingsacrossdifferentschoolswithdifferenttypes
ofstudentsarecomparable.17Withtheideathatteachersmayhaveamoredifficulttime
demonstratingtheirskillsincrowdedclassrooms,schools’studentsperteacherratioswere
alsoincludedasafactorinthemodels.18
Anotherfactortestedwasthepercentageofteachersrated.Thiswasdonetodetermineif
theschoolsinvolvedintheanalysiswithlowerinclusionratesselectedthehighest
performingteachersforthepilot.Theinclusionofthisfactorhelpsdeterminehow
representativetheresultspresentedhereareofthedistrictsandschoolsinvolvedinthe
pilot.
Thefinalfactorincludedinthemodelswasthedistrictplanneduseofteacherratings,
gatheredfromsurveys.Districtsusingratingsforhigh-stakesdecisionsputadditional
pressuresonratersthatmayinfluencehowstringentlytheyrateteachers.End-of-year
15Raudenbush,S.W.andBryk,A.S.(2002).HierarchicalLinearModels(SecondEdition).ThousandOaks:SagePublications.16Finalratingswerenotmodeledbecausetoofewschoolscompletedthem.17http://www.brookings.edu/~/media/research/files/reports/2014/05/13%20teacher%20evaluation/evaluating%20teachers%20with%20classroom%20observations.pdf18Certainlyhavingspecificclassroomdatawouldallowforamoredirecttestoftheseeffectsbutschooldatacanatleastprovideapproximateclassroomcharacteristics.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem26
principalsurveyresultswereusedtoidentifydistrictsplanningtousetheresultsforhigh-
stakesdecisionslikepromotionorpaybonuses.
Themulti-levelmodelpresentedbelowsummarizeshowthesefactorswerestatistically
tested.
Descriptivestatisticsresults
Table3presentsthedescriptivestatisticsofschoolfactorsforthesampleusedinthe
models.Fifty-fiveschoolsreportedtheyweregoingtouseEEdatatomakehigh-stakes
decisionswhile53reportedtheywerenotordidnotknowiftheywere.Schoolsranged
fromasmallfree/reducedlunchparticipationrate(7.8%)tohigh(100%).Also,acrossthe
130schoolsabouthalf(49%)oftheteachersintheschoolsparticipatedinthepilot.
Thefollowingmulti-levelmodelequationwasusedtoidentifyfactorsthatpredictteacherpracticeratings:
Level1:Teacher-levelmodel:
Practiceratingsijk=π0jk+eijk
Level2:School-levelmodel:
π0jk=β00k+β01kF/Rlunchratejk+β02kTeacher/studentratiojk+β03kPercentofteachersratedjk+r0jk
βpjk=γp0kforp=1to3
Level3:District-levelmodel:
β00k=γ000+γ001Highstakesuseofresults+u00
Themulti-levelmodelcorrespondstothefollowingmixedmodel:
Practiceratingsijk=γ000+β01kF/Rlunchratejk+β02kTeacher/studentratiojk+β03kPercentofteachersratedjk+γ001High-stakesuseofresults+r0jk+u00+eijk
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem27
Table3.Descriptivestatisticsofschoolcharacteristicsandratings
N Min Max MeanStd.
DeviationAnnouncedobservationratings 129 1.9 3.4 3.00 0.25Unannouncedobservationratings 110 2 3.5 2.95 0.28Finalratings* 28 2.8 3.5 3.14 0.15UseofEEforhighstakesdecisions 108 0 1 0.51 0.50F/Rlunchrate 128 7.8 100 42.2 27.5Teachertostudentratio 126 8.1 41.7 16.5 4.6Ratioofteacherstoteachersrated 126 0.06 1 0.49 0.28
*FinalratingsnotmodelledduetolowN.
Bivariatecorrelationspresenttheunadjustedrelationshipsbetweenschoolfactors.These
showthatfree/reducedlunchrateswerestronglynegativelycorrelatedwithboth
AnnouncedandUnannouncedratings,i.e.teachersinlowerincomeschoolshadlower
ratings.Further,teachersinschoolsthatreportedtheywereusingratingstomakehigh
stakesdecisionshadsomewhathigherratings.Teachersinschoolsnotusingratingsfor
highstakesdecisionsreceivedaverageratingsof2.94whilethoseinschoolsusingthemfor
highstakesaveraged3.05(F=4.7,p=.032).However,itisimportanttonotethatschools
usingratingsforhighstakesdecisionsalsohadlowerF/Rlunchparticipationratesand
fewerstudentsperteacher.Thus,itisnotentirelyclearthatusingratingsforhigh-stakes
decisions,onitsown,wouldresultinteachersreceivinghigherscore.Thestatistical
modelingpresentinginthenextsectiontestswhichofthesefactorsuniquelypredict
teacherratings.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem28
Table4.Correlationsofschoolcharacteristicsandteacherratings
Announcedratings
Unannouncedratings
Finalratings
UseofEEforhighstakes
decisions
F/Rlunchrate
Studentto
teacherratio
Ratioofteachersratedtoteachers
Numberof
teachersrated
Announcedratings(n=129) 1
Unannouncedratings(n=110) 0.544** 1
Finalratings(n=28) 0.342 0.347 1
UseofEEforhighstakesdecisions(n=108)
0.227* 0.260* 0.058 1
F/Rlunchrate(n=128) -0.578** -0.321** -0.07 -0.232* 1
Studenttoteacherratio(n=126) -0.242** -0.220* -0.195 -0.300** 0.261** 1
Ratioofteacherstoteachersrated(n=126)
0.201* 0.135 0.111 -0.064 -0.256** -0.021 1
Numberofteachersrated(n=384) 0.13 0.09 0.08 0.013 -0.335** 0.008 0.583** 1
**Correlationissignificantatthe0.01level(2-tailed).*Correlationissignificantatthe0.05level(2-tailed).
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem29
Modelresults
Thestatisticalmodelingusedisolatestheuniquerelationshipofeachschoolfactorwith
ratings.Nullresultsfromthesemodelssuggestthatroughly12%ofAnnouncedand8%of
Unannouncedratingsareattributabletodistrict-levelfactors,while16%ofAnnouncedand
15%ofUnannouncedratingsareattributabletoschool-levelfactors(Table5).
Table5.Covarianceparametersofunconditionalmulti-levelmodelspredictingpracticeratings
Announced Unannounced Estimate Std.
ErrorEstimate Std.
ErrorResidual 0.096091 0.003354 0.113213 0.00489District 0.016488 0.00647 0.011593 0.005365School 0.022071 0.004885 0.021789 0.006393
Theresultsofbothconditionalmodelswereconsistent;schoolteachertostudentratio,
percentofteachersratedinaschool,andtheintendeduseofEEresultsdidnotuniquely
explainteacherpracticeratingsandwerethereforeremovedfromthemodels.However,
schoolfree/reducedlunchparticipationwasastrongpredictorofpracticeratings(Table
6).Theinclusionofthisexplained9%and7%oftheschooleffectonAnnouncedand
Unannouncedratingsrespectivelyandthemajorityofthedistrict-levelvarianceinboth
models(AnnouncedpseudoR-squared=.60;UnannouncedpseudoR-squared=.75).These
resultssuggestthatforevery10percentagepointsgreaterschoolF/Rlunchrate,the
predictedUnannouncedratingisreducedby.04andAnnouncedby.05.Giventhata
standarddeviationforbothtypesofratingsisbetween.35and.40,thedifferenceinthe
predictedratingsbetweenateacherinanaffluentschoolwith10%F/Rlunchandalow-
incomeschoolwith90%F/Rlunchparticipationisclosetoonefullstandarddeviation.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem30
Table6.Resultsofmodelspredictingratings
EstimateRobustStd.
Error df t Sig.SchoolF/RLunch(Unannounced) -0.00394 0.000905 11.395 -4.36 0.001SchoolF/RLunch(Announced) -0.0048 0.000905 33.897 -5.298 <.0001
Tofurtherexplorethiseffect,teacherratingswereplottedaccordingtotheF/Rlunch
participationrateintheschool(Figure11),thusdemonstratingaclearconnectionbetween
thefree/reducedlunchparticipationofstudentsintheschoolandthepracticeratings
assignedtoteachers.
Figure11.PlotofteacherratingsandschoolF/Rlunchparticipation
TherelationshipofF/Rlunchparticipationrateswithteacherpracticeratingsmaybethe
resultofthreefactors.Onepossibilityisthatmoreeffectiveteachersareselectinginto
districtswithfewerlow-incomestudents.Thus,theseresultsmayindicatethatmore
affluentschoolssimplyhavebetterteachers.Asecondpossibilityisthatevaluatorsare
ratingteachersmoreharshlyinlowerSESschools.Thismayberelatedtothepreviously
mentionedconcernthatitmaybedifficultforteacherstodemonstratetheirskillsinlarger
classroomswithmorelower-incomestudentswhoarelikelylowerachievingandmore
pronetobehaviorproblemsaswell.Athirdpossibilityisthatteacherssimplydonot
performaswellinlower-incomeschoolsaccordingtotherubric.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem31
Totestthesethreepossibilities,therelationshipsofF/Rlunchwithindividualcomponent
ratingsweremodeled.IftherelationshipbetweenF/Rlunchandratingsisbeingdrivenby
classroomcharacteristics,onewouldexpectthatcomponentsthatrelyonstudentbehavior
wouldbemoresensitivetochangesinF/Rlunchrates.Iftherelationshipisbeingdriven
morebyteacherselection,thenonewouldexpectthatallcomponentratingswouldbe
similarlyaffected.Ratingsforeachcomponentwerepredictedusingthesamemodelaswas
usedtotesttotalpracticeratings(page28).TheresultsofthesemodelsshowthatF/R
lunchratespredictedallDomain2(ClassroomEnvironment)andDomain3(Instruction)
components.However,schoolF/Rlunchparticipationonlypredictedfiveof11
componentsinDomains1(PlanningandPreparation)and4(ProfessionalResponsibilities)
(Table13:Appendix).
TofurtherexplorethemagnitudeoftherelationshipsofF/RlunchrateswithDomains2
and3practiceratings,additionalmodels(generalizedlinearmixedmodelswith
multinomialdistributions)wereusedtodeterminethelikelihoodofreceivingdifferent
ratingsaccordingtotheF/Rlunchrateintheschool.Theresultsproducelogoddsratios
forbeingratedasUnsatisfactory,Basic,orProficientascomparedtoDistinguished.Logodds
ratiosweretranslatedintoprobabilities(Table14:Appendix).Forinstance,thereisa3%
probabilitythatateacherinahypotheticalschoolwithnostudentseligibleforF/Rlunch
willberatedasBasiconcomponent2a,ascomparedtobeingratedasDistinguished.
However,ateacherinaschoolwith100%F/Rlunchhasa57%probability.Figure12
belowpresentstheresultsforonecomponent(2c:ManagingStudentBehavior)thatisby
itsdefinitionrelatedtostudentbehavior.TheseresultsshowthatastheF/Rlunchrateina
schoolincreases,theprobabilityofbeingratedasDistinguisheddecreases.Inaschoolwith
50%F/Rluncheligiblestudents,teachersarethreetimesaslikely(74%versus26%)of
beingratedasProficientcomparedtoDistinguishedandalmosttwiceaslikelyofbeing
ratedasDistinguishedthanBasic(65%versus36%).However,increasetheF/Rlunchrate
to100%andteachersareoverfivetimesmorelikelytoberatedasProficientthan
Distinguished(84%versus16%)andarealmosttwiceaslikelytoberatedasBasicthan
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem32
Distinguished(35%versus65%).Theseresultswouldseemtosuggestthateithertheway
evaluatorsareapplyingtheFrameworkforTeachingrubricortherubricitselfmayresultin
teachersbeingratedlowerinlowerSESschools.
Figure12.2d(ManagingStudentBehavior)-ProbabilitiesofbeingratedUnsatisfactory,Basic,andProficientcomparedDistinguishedinschoolswithdifferentF/Rlunchrates.
Althoughtheseresultswouldseemtosuggestthatpracticeratingsaremoreaffectedby
F/RlunchinDomains2and3,itisimportanttonotethatthemagnitudeoftheeffectson
manyofthecomponentsinDomain1and4areconsistentwiththoseofmanyofthe
componentsinDomains2and3(Table11:Appendix).Therelativelackofstatistical
significanceforcomponentsinDomains1and4isrelatedtothesmallersampleofteachers
withratingsonthesecomponents(Table12:Appendix).Further,whenthecomponent
coefficientsareaveraged,thecoefficientsinDomains1and4arenotmuchsmallerthan
theyarein2and3(Table7).Theseresultswouldseemtosuggestthatmostoftheratings
differenceisduetomoreskilledteachersselectingintoschoolswithlowerF/Rlunchrates.
However,itisimportanttonotethat2a(CreatinganEnvironmentofRespectandRapport)
and2d(ManagingStudentBehavior)havetwoofthethreelargestmagnituderelationships
0%
36%
7%
65%64%
84%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
0%F/Rlunch 100%F/Rlunch
Unsatisfactory Basic Proficient
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem33
amongallcomponentsand,intuitively,theywouldseemtobemorerelatedtothe
characteristicsofstudentswithinaclassroom.
Table7.Averagemulti-levelmodelcomponentcoefficients
Summaryanddiscussion
Theresultsoftheteacherpracticeevaluationpilotsuggestthatmanyoftheconditions
necessaryforitsfullimplementationtobesuccessfulareinplace;Educatorsreported
understandingboththeevaluationprocessandtheFrameworkforTeaching,thatitwas
fair,andthattheprocesswouldlikelyempowerteacherstobetterunderstandtheir
instructionalskillsandthatitwouldleadtoimprovedteachinginWisconsin.However,the
largestconcernexpressedbyeducatorswasthattheprocessistimeconsumingandthat
implementingitmayleavelittletimeforprincipalstofulfillalltheirotherduties.Thehope
isthatasprincipalsbecomemorefamiliarwiththeprocessitwillbecomemoreroutinized
andwilltakelesstimetocomplete.Also,fixingtheTeachscapeplatformwillfurtherreduce
thetimedemands.
Theresultsoftheteacherpracticeevaluationpilotindicatedthat,althoughratingsdata
wererecordedforonlyaminorityofthe400educatorswhohadoriginallyvolunteeredto
pilotEE,ratingsdatawereavailableforover2,500teachersacrossthestate.However,final
practiceratingswererecordedforrelativelyfewofthese(503).Thereasonforthelackof
completionmaybeduetoanumberoffactorsincludingdifficultieswiththeTeachscape
platformandprincipalsnothavingenoughtimetocompletetheprocess.
TheresultsofAnnouncedobservations,Unannouncedobservationsandfinalratingsall
suggestthatteacherratingswereskewedpositively.Anationaldebatecontinuesabout
IncludingallcomponentsAveragedCoefficients
Domain1:PlanningandPreparation -0.0037Domain2:ClassroomEnvironment -0.0043Domain3:Instruction -0.0038Domain4:ProfessionalResponsibilities -0.0033
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem34
why,evenwiththenew,morerigorous,evaluationmethods,newteacherevaluation
systemsstillratethevastmajorityofteachersaseffective,withanywherefrom94%19to
98%20ofteachersreceivingoverallratingsofProficient/Effective.Thisfindinghasbeen
interpretedasevidencethatthegreatmajorityofteachersare,infact,Effectiveorthatthe
cultureofschoolsforcesevaluatorstocontinuetoratemostteachersas“good”,depending
onone’spoliticalleanings.21,22However,thefindinginthecurrentevaluationthata
disproportionatenumberofteachersreceivedanoverallratingsofexactly3.0wouldseem
tosuggestthatsomeamountoftheskewnessisduetoexogenousfactorsunrelatedto
teacherpracticequality.Further,althoughthereweresignificantdifferencesbetween
teacherpracticecomponentratings,suggestingthatevaluatorsdidatleastsomewhat
differentiateratings,individualcomponentfrequenciesindicatethatitwasrarefora
teachertoberatedbelowProficientonanycomponents.
Theresultsalsosuggestthatthereweresignificantdifferencesinpracticeratingsfor
teachersindifferentdistrictsanddifferentschoolswithindistricts.Thismaysuggestthat
evaluatorsarenotbeingconsistentintheirratingsofteachersoritmayreflectreal
differencesintheproficiencyofteachers.Teachertostudentratio,districtplanneduseof
EEresults,schoolF/Rlunchparticipationrateandthepercentoftheteachersevaluatedin
aschoolwerealltestedaspossibleexplanationsforthesedifferences.Although,ifdistricts
plannedtouseratingsforhigh-stakespurposes,thepercentageoftheschoolparticipating
inthepilot,andteachertostudentratiodidpredictpracticeratings,onlyschoolF/Rlunch
participationratesuniquelypredictedratings,explainingnearlyallofthedistrict-level
ratingvariance.
InregardtotherelationshipwithF/Rlunchparticipation,thereisagrowingbodyof
researchthatteacherpracticeratingsaresomewhatdependentontypesofstudentsin
19http://www.gadoe.org/School-Improvement/Teacher-and-Leader-Effectiveness/Documents/Pilot%20Report_Overview%20and%20Report%20Combined%201-10-13.pdf20http://www.michigan.gov/documents/mde/Educator_Effectiveness_Ratings_Policy_Brief_403184_7.pdf21http://www.ajc.com/news/news/new-evaluation-pilot-skewed-with-too-few-unsatisfa/nTpKN/22http://www.nyssba.org/news/2013/12/12/on-board-online-december-16-2013/why-are-most-teachers-rated-effective-when-most-students-test-below-standards/
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem35
theirclassrooms.23,24F/Rlunchratesmayberelatedtoteacherratingsthroughits
relationshiptostudentachievementandbehavior.Lowerincomestudents,onaverage,are
lowerachievingthanhigherincomestudentsandmayexhibitmorebehaviorproblems.It
maybemorechallengingforteachersintheseclassroomstoelicitthehigh-levelstudent
performancenecessarytoberatedasDistinguished.Thus,therelationshipofpractice
ratingswithF/RlunchratesintheWisconsinEEpilotmaybedrivenbyitsrelationship
withotherstudentscharacteristics.
Ontheotherhand,therelationshipwithF/Rlunchandpracticeratingsmaybedrivenby
teachercharacteristics.ItisimportanttonotethatmostcomponentsintheDomain1:
PlanningandPreparationandDomain4:ProfessionalResponsibilitiesdomainswererelated
toschoolSESatasimilarmagnitudecomparedtothecomponentsintheDomain2:
ClassroomEnvironmentandDomain3:Instructiondomains.Becausethereislessevidence
thatstudentbehaviorcaninfluenceratingsinDomains1and4,thisseemstosuggestthat
someamountoftherelationshipwithF/Rlunchratesmaybeduetoteacherselection.In
WisconsinandacrosstheU.S.thereareclearindicationsthatdistrictswithmorelower-
incomefamilieshaveamuchhardertimerecruitingeffectiveteachers.
Finally,itispossiblethatteachertostudentratioandplanneduseofEEdatapartially
explaintherelationshipofF/Rlunchrateswithpracticeratings.Lessaffluentschoolshave
largerclassroomsandmaybelesslikelytouseratingsforhigh-stakesdecisions.So
althoughneitherfactorwasfoundtouniquelypredictpracticeratings,futureevaluation
workwillfocusonthesefactorsaspotentialmediators.
Takentogether,theratingsdifferencesbetweenschoolswithdifferentnumbersofF/R
lunchstudentsarelikelyduetoacombinationofstudent,teacher,andschoolfactors.
Futureevaluationworkwillcontinuetoexplorethisissue.Untilmorecanbelearnedabout
23Polikoff,M.(March,2013).TheStabilityofObservationalandStudentSurveyMeasuresofTeachingEffectiveness.PaperpresentedattheannualmeetingofAmericanEducationFinanceandPolicy.24Whitehurst,G..,Chingos,M.,&Lindquist,K.(March,2014).EvaluatingTeacherswithClassroomObservations:LessonsLearnedinFourDistricts.BrownCenteronEducationPolicy.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem36
thisrelationship,andfaircomparisonsofratingscanbeestablished,itisnotrecommended
thatschoolsanddistrictsusepracticeratingtocompareteachers.
Theresultsoftheevaluationsuggestthatthepilotofteacherpracticeevaluationsprovided
educatorswithusefulexperienceandpracticethatwillhelpthemimplementtheteacher
practiceprocessfullyin2014-2015.However,severalissuesidentifiedinthisreport
shouldcontinuetobeexploredasmoredatabecomeavailablethroughthefull
implementationoftheprocess.Itisnotknowntowhatdegreetheskewnessofratingsand
theirrelationshipwithexogenousfactorslikeF/Rlunchparticipationwillholdupwhen
moreevaluatorsarecertifiedandtheanalysesarebasedonfinalratingsratherthanratings
basedmoreonsingleobservations.Further,onelimitationofthisreportwasthatitrelied
onschool-levelinformationtopredictratings.Insubsequentreports,obtainingdata
specifictoteachers,suchastheirexperience,andtheirclassroomswillprovideaclearer
pictureofwhatpredictsteacherratingsandhowtoensuretheprocessisasfairaspossible
forallWisconsinteachers.
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem37
Appendix
Table8.Frequenciesofcomponentsratedbytypeofobservation
Componentsscored
Announced Unannounced
FinalRatings
Componentsscored
Announced Unannounced FinalRatings
1 15 7 8 12 78 49 02 17 2 0 13 51 38 03 15 8 1 14 65 40 04 39 12 0 15 45 37 05 63 37 1 16 49 41 16 66 61 0 17 43 29 07 100 104 0 18 32 21 18 322 347 0 19 50 22 49 318 278 0 20 54 21 810 698 445 0 21 88 32 2411 72 51 0 22 215 69 459 Total 2495 1751 507
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem38
Table9.Internalconsistencyoffinalteacherratings
Domain Cronbach'sAlpha
Component Cronbach'sAlphaifComponentDeleted
PlanningandPreparation 0.75 1.a 0.72 1.b 0.72 1.c 0.69 1.d 0.73 1.e 0.70 1.f 0.71ClassroomEnvironment 0.67 2.a 0.60 2.b 0.63 2.c 0.60 2.d 0.59 2.e 0.68Instruction 0.66 3.a 0.60 3.b 0.59 3.c 0.58 3.d 0.60 3.e 0.64ProfessionalResponsibility 0.75 4.a 0.72 4.b 0.75 4.c 0.74 4.d 0.70 4.e 0.69 4.f 0.69Totalof22components 0.90
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem39
Table10.ComponentcorrelationswithinFrameworkforTeachingdomains
PlanningandPreparation 1.a 1.b 1.c 1.d 1.e 1.f 1.a 1
1.b 0.282 1
1.c 0.332 0.383 1
1.d 0.342 0.207 0.308 1
1.e 0.322 0.367 0.434 0.322 1
1.f 0.38 0.365 0.411 0.214 0.357 1
ClassroomEnvironment 2.a 2.b 2.c 2.d 2.e 2.a 1
2.b 0.352 1
2.c 0.313 0.316 1
2.d 0.443 0.262 0.365 1
2.e 0.106 0.196 0.297 0.247 1
Instruction 3.a 3.b 3.c 3.d 3.e 3.a 1
3.b 0.305 1
3.c 0.313 0.344 1
3.d 0.285 0.31 0.321 1
3.e 0.228 0.215 0.246 0.227 1
ProfessionalResponsibility 4.a 4.b 4.c 4.d 4.e 4.f 4.a 1 4.b 0.225 1 4.c 0.253 0.294 1 4.d 0.327 0.222 0.278 1 4.e 0.323 0.309 0.321 0.528 1 4.f 0.484 0.252 0.277 0.468 0.452 1
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem40
Figure13.ConfirmatoryfactoranalysisFrameworkforTeachingstructuralmodelwithstandardizedcoefficients
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem41
Table11.CFAFinalratingcomponentresults
SquaredMultiple
Correlation
StandardizedRegressionWeights
RegressionWeights S.E. C.R. P
1a 0.387 0.622 0.315 0.023 13.696 ***1b 0.337 0.58 0.329 0.026 12.601 ***1c 0.415 0.644 0.306 0.021 14.292 ***1d 0.194 0.44 0.185 0.02 9.219 ***1e 0.381 0.617 0.316 0.023 13.566 ***1f 0.379 0.615 0.243 0.018 13.508 ***2a 0.321 0.567 0.366 0.032 11.469 ***2b 0.422 0.65 0.36 0.027 13.391 ***2c 0.321 0.566 0.292 0.025 11.461 ***2d 0.295 0.543 0.337 0.031 10.954 ***2e 0.13 0.361 0.148 0.021 7.113 ***3a 0.268 0.518 0.302 0.026 11.578 ***3b 0.261 0.511 0.261 0.023 11.391 ***3c 0.319 0.564 0.299 0.023 12.868 ***3d 0.315 0.562 0.256 0.02 12.788 ***3e 0.258 0.508 0.269 0.024 11.313 ***4a 0.376 0.614 0.343 0.027 12.614 ***4b 0.175 0.418 0.164 0.02 8.383 ***4c 0.235 0.485 0.209 0.021 9.805 ***4d 0.341 0.584 0.359 0.03 11.949 ***4e 0.43 0.656 0.344 0.025 13.587 ***4f 0.466 0.683 0.412 0.029 14.214 ***
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem42
Figure14.Distributionofaverageoverallratings
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem43
Table12.Descriptivestatisticsofcomponentratings AllAnnounced AllUnannounced FinalRatings
MeanStd.
Deviation N MeanStd.
Deviation N MeanStd.
Deviation N
1.a 3.09 0.49 756 3.02 0.44 372 3.19 0.44 495
1.b 3.11 0.52 732 2.98 0.49 364 3.15 0.49 491
1.c 2.98 0.55 739 2.88 0.59 344 3.08 0.43 493
1.d 3.01 0.47 629 2.94 0.53 291 3.08 0.36 493
1.e 3.04 0.56 752 2.94 0.54 360 3.14 0.46 495
1.f 2.89 0.45 474 2.85 0.53 192 3 0.36 492
2.a 3.25 0.58 2311 3.19 0.57 1702 3.39 0.52 497
2.b 3.06 0.49 2271 2.99 0.52 1653 3.17 0.45 496
2.c 3.05 0.54 2328 3.02 0.55 1686 3.13 0.42 497
2.d 3.15 0.60 2305 3.1 0.62 1702 3.23 0.50 497
2.e 3.09 0.41 1843 3.02 0.37 1247 3.08 0.32 494
3.a 3.08 0.48 2383 3.01 0.49 1700 3.19 0.46 497
3.b 2.86 0.55 2195 2.8 0.53 1508 2.96 0.42 494
3.c 2.97 0.50 2376 2.9 0.51 1693 3.07 0.43 497
3.d 2.92 0.49 2051 2.89 0.49 1408 3.05 0.36 495
3.e 3.02 0.45 1465 3 0.43 905 3.12 0.42 492
4.a 3.17 0.55 511 3.09 0.60 185 3.19 0.50 491
4.b 3 0.45 446 3.02 0.52 178 3 0.34 495
4.c 3.02 0.46 477 3.01 0.52 165 2.99 0.38 493
4.d 3.24 0.58 585 3.17 0.60 258 3.22 0.53 496
4.e 3.17 0.54 520 3.15 0.53 212 3.11 0.46 493
4.f 3.31 0.58 605 3.31 0.61 278 3.32 0.52 489
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem44
Table13.Resultsofmulti-levelmodelspredictingAnnouncedobservationcomponentratings
Component EstimateStd.Error df t Sig.
1a -0.00303 0.001716 8.257 -1.767 0.1141b -0.00462 0.001618 45.849 -2.855 0.0061c -0.00338 0.001524 46.411 -2.214 0.0321d -0.00174 0.002144 19.889 -0.813 0.4261e -0.00527 0.001776 5.519 -2.968 0.0281f -0.00426 0.002181 6.001 -1.954 0.0982a -0.00519 0.001461 53.131 -3.551 0.0012b -0.00357 0.000959 21.848 -3.722 0.0012c -0.00479 0.001255 57.321 -3.819 02d -0.00522 0.001425 35.709 -3.662 0.0012e -0.0027 0.001133 33.006 -2.385 0.0233a -0.0031 0.001051 34.969 -2.944 0.0063b -0.0041 0.001037 13.734 -3.949 0.0023c -0.00379 0.001141 44.642 -3.316 0.0023d -0.00502 0.000837 16.826 -5.994 03e -0.00308 0.001372 69.718 -2.243 0.0284a -0.00483 0.002216 12.196 -2.178 0.054b 0.001082 0.001883 30.816 0.574 0.574c -0.00331 0.002126 44.614 -1.556 0.1274d -0.005 0.002222 11.832 -2.252 0.0444e -0.00418 0.00279 22.02 -1.498 0.1484f -0.00339 0.002635 16.337 -1.287 0.216
SociallyResponsibleEvaluationinEducation
EvaluationoftheWisconsinEducatorEffectivenessSystem45
Table14.Resultsofmultinomialgeneralizedlinearmixedmodelspredictingratingsresults:ChangeinprobabilitiesofratingsasafunctionofschoolF/Rlunchrate(ratingsarecomparedtoDistinguished)
Intercept(0%F/Rlunch)
Adjusted(100%F/Rlunch)
Component Comparisonlogoddsratio Probability
logoddsratio Probability
2a Unsatisfactory -6.432 0% -2.032 12% Basic -3.498 3% 0.302 57% Proficient 0.157 54% 1.557 83%2b Unsatisfactory -6.355 0% -0.755 32% Basic -1.898 13% 0.902 71% Proficient 1.194 77% 2.694 94%2c Unsatisfactory -6.528 0% 0.572 64% Basic -2.075 11% 1.325 79% Proficient 1.118 75% 2.418 92%2d Unsatisfactory -7.292 0% -0.592 36% Basic -2.581 7% 0.619 65% Proficient 0.59 64% 1.69 84%2e Unsatisfactory -7.361 0% -0.361 41% Basic -2.766 6% 0.334 58% Proficient 1.564 83% 2.464 92%3a Unsatisfactory -5.324 0% -2.324 9% Basic -1.98 12% 0.62 65% Proficient 1.169 76% 2.169 90%3b Unsatisfactory -4.241 1% 0.359 59% Basic 0.182 55% 1.982 88% Proficient 2.158 90% 2.358 91%3c Unsatisfactory -5.581 0% -0.681 34% Basic -1.107 25% 1.493 82% Proficient 1.748 85% 2.348 91%3d Unsatisfactory -5.831 0% 0.969 72% Basic -0.627 35% 2.373 91% Proficient 2.097 89% 3.197 96%3e Unsatisfactory -5.814 0% 0.386 60% Basic -2.022 12% 1.378 80% Proficient 1.79 86% 2.69 94%