Validity and Reliability of
Classworks Universal Screeners
Updated May 2018
Research on Validity and Reliability of Classworks Universal Screeners • 2
Table of Contents
ExecutiveSummary..................................................................................................................................3–4
TestDesign...............................................................................................................................................5–8
VerticalScaleandItemBankCalibration
ScoreReporting
EstablishingCutScores
ItemDevelopment.................................................................................................................................9–10
GuidingPrinciplesofItemConstruction
TestValidation.....................................................................................................................................11–12
FieldTestingandAnalysis
NationalCenterforResponsetoInterventionReview........................................................................13–14
Reliability
Validity
ClassificationAnalyses
Addendum:ClassworksUniversalScreenersUpdate………………………………………………………………………….15
Research on Validity and Reliability of Classworks Universal Screeners • 3
Executive Summary
Purpose ClassworksUniversalScreenersareformalassessmentsusedtomeasurereadinessforgradelevelinstruction,helpidentifybaselinelearninglevels,andmeasuregrowth.TheUniversalScreenerswerespecificallydesignedforthepurposeofscreeningstudentswhomayneedadditionalinterventionandcanbeusedaspartoftheResponsetoIntervention(RtI)process.
Inadditiontoreportinganoverallscaledscorebasedonthetotaltest,Classworksprovidesstudentstrengthsandweaknessforkeystrands.Keystrandsincludeaminimumoffourtestquestionstoprovideareasonableestimateofstudentstrengthsandweaknesses.Thisinformation,whenusedinconjunctionwithotherdatasuchasHighStakesTestresultsandclassroomperformance,canhelpprovideastartingpointfordeterminingnextsteps.
Overview ClassworksUniversalScreenersincludemultipleformsateachlevelforlanguageartsandmathematics,gradesK–10.TheUniversalScreenersaretypicallyadministeredthreetimesayear:atthebeginningoftheschoolyeartoassessreadinessforinstructionforallstudents,mid-yeartomeasureprogressforRtItiersIIandIII,andend-of-yeartomeasureoverallgrowthfortheyear.Giventhatthetestisprimarilydesignedtoidentifyreadiness,thetestincludesmultiplegradelevelsofcontenttoallowsufficientreachforstudentswhomaybestruggling.
TheUniversalScreenersarebetween20and35itemsinlengthdependingonthegradeleveltargeted,andmustbeadministeredinasinglesitting.TwoparallelformsofeachScreenerweredeveloped;theseformsmeasuresimilarcontent.Thekindergartenlevelassessmentsareanexceptiontothisapproach,withtwodifferentformsreflectingearlierandlaterkindergartencontentgiventherapiddevelopmentatthekindergartenlevel.
Overalltestresultsarereportedasascaledscore.Scoringonaverticalscaleprovidesasinglepointofreferencetocompareindividualstudentgainsfromonetestadministrationtothenext,withinandacrossschoolyears.Measuringgrowthverticallyservesadualpurpose:totracklearninggainsforindividualstudentsandtodeterminewhetherlearningmustbeaccelerated.
ClassworksUniversalScreenershavebeenevaluatedbytheNationalCenterforResponsetoIntervention(NCRTI),andtheyreceivedthehighestreliabilityranking.
Research on Validity and Reliability of Classworks Universal Screeners • 4
Universal Screener Quick Guide
Item Description
Purpose Measure grade level readiness, help identify baseline, measure growth
Grades K–10 Math, K–10 Reading
Levels of coverage per test
Test includes multiple grade levels of content to allow sufficient reach to help identify strugglers (exception: Kindergarten)
Audio Audio support available for all grades
Length of test Must be taken in one sitting; 20–35 items depending on grade level/subject
Vertical scale? Yes. All scores are vertically scaled from K–10 for longitudinal tracking.
Output from test Average readiness scaled score of students by class, teacher, custom group, demographic, and/or grade level
Research on Validity and Reliability of Classworks Universal Screeners • 5
Test Design SEGMeasurement(SEG)hasbeeninstrumentalinthedesign,development,testing,andanalysisofClassworksUniversalScreeners.SEGisanassessment,measurement,andresearchfirmthatprovidesassessmentdesign,development,andimplementationservicesforK–12,highereducation,andcredentialingprograms.Theyhavedeliveredover100millionassessmentstotensofthousandsofschoolsandcollegesinall50states.
ClassworksUniversalScreenersweredesignedandbuiltfortheparticularpurposetheyserve.Forthisreason,theymeetallofthecriteriathatdefinequalityscreeners:theassessmentsarebrief,reliable,valid,equated,andmeasuredonaverticalscale.
SEGinitiallycreatedtheassessmentsbyhand-selectingitemsforeachlevelandformofthetests.Formswerethenequatedthroughfieldtestingandcalibrationsothateachmeasuresthesamesetsofskillsatthesamelevelofdifficulty.Individualtestitemsandtheassessmentsthemselvesweredesignedwithdiversityinmind:includingpopulationsofculturalandlinguisticallydiversestudents,andspecialneedsstudents.Guidingprinciplesforassessmentdesignwereintegratedintotheprocess,includingensuringallitemsarewritteninaclear,concisemannerandfreeofage,gender,ethnic,religious,ordisabilitybias.
TherearetwoparallelformsforeachtestingradesK–10.For2ndgradeandabove,thetestquestionsincludecontentfromthetargetgradelevelaswellasfromtwogradelevelsbelowthetarget.Giventhatthetestisprimarilydesignedtoidentifyreadiness,thetestincludesmultiplegradelevelsofcontenttoallowsufficientreachandenoughcontentcoverageforstudentswhomaybestruggling.Thetestsincludeapproximately50%ofthecontentfromthetargetgrade,approximately25%ofthecontentfromthegradebelow,andapproximately25%ofthecontentfromtwogradesbelow.The1stgradeassessmentcontainscontentfromboth1stgradeandkindergarten.Thekindergartenassessmentcontainscontentdrawnonlyfromkindergartenwithtwodifferentformsreflectingearlierandlaterkindergartencontent,giventherapiddevelopmentatthekindergartenlevel.
Research on Validity and Reliability of Classworks Universal Screeners • 6
Grade Level Number of Test Questions
Scored Number of Test Forms
Source of Test Questions
K Early 15 Reading/Language Arts; 15 Mathematics
1 100 % early K content
K Late 15 Reading/Language Arts; 15 Mathematics
1 50% later K content; 50% early K content
Grade 1 20 Reading/Language Arts; 20 Mathematics
2 50% grade 1 content; 50% grade K content;
Grade 2 25 Reading/Language Arts; 25 Mathematics
2 50% grade 2 content; 25% grade 1 content; 25% grade K content
Grade 3 25 Reading/Language Arts; 25 Mathematics
2 50% grade 3 content; 25% grade 2 content; 25% grade 1 content
Grade 4 25 Reading/Language Arts; 25 Mathematics
2 50% grade 4 content; 25% grade 3 content; 25% grade 2 content
Grade 5 30 Reading/Language Arts; 30 Mathematics
2 50% grade 5 content; 25% grade 4 content; 25% grade 3 content
Grade 6 30 Reading/Language Arts; 30 Mathematics
2 50% grade 6 content; 25% grade 5 content; 25% grade 4 content
Grade 7 30 Reading/Language Arts; 30 Mathematics
2 50% grade 7 content; 25% grade 6 content; 25% grade 5 content
Grade 8 30 Reading/Language Arts; 30 Mathematics
2 50% grade 8 content; 25% grade 7 content; 25% grade 6 content
Grade 9 30 Reading/Language Arts; 30 Mathematics
2 50% grade 9 content; 25% grade 8 content; 25% grade 7 content
Grade 10 30 Reading/Language Arts; 30 Mathematics
2 50% grade 10 content; 25% grade 9 content; 25% grade 8 content
Research on Validity and Reliability of Classworks Universal Screeners • 7
Vertical Scale and Item Bank Calibration Theverticalscalewasdevelopedthroughalinkedtestingdesignsuchthatallitemscouldbecalibratedtogetherandplacedonthesamecontinuum.Thefieldtestdatawasusedtocalibratetheitemsandtests.Calibrationisaprocessthatplacesalltestsandalltestitemsonacommonscale.ThiswasusedtocreateasinglecommonscalefromgradeKtograde10.Inthisway,scoresfromthetestsarecomparableacrossformsofthetestandovertime.Agivenscorewillhavethesamemeaningregardlessofwhichformisadministeredandregardlessofwhenthestudenttakesthetest.
Theassessmentsdevelopedincludesetsofoverlappingitemsacrosstestformsatthesamelevelandacrossadjacentgradelevels.Thisfacilitatesthecalibrationoftheitembank.SEGcalibratedtheitemsusingIRT(oneparameterRaschmodel)tocreateacommonverticalscaleacrossgradelevels.
TherawnumberofcorrectanswersreflectsaparticularRaschscore(rangingfrom-4to+4),whichisthentranslatedtothefinalscaledscoreforreportingpurposes.Whenthestudentcompleteshis/herscreener,thescaledscoreandkeystrandlevelperformancefeedbackareimmediatelyavailableforreporting.TheapproachtakeninthecalibrationandscoringprocessprovidesRaschextrapolatednorms.
Asafurthermeasuretoensurethatthetestquestionsandassessmentsaretechnicallysoundandareperformingasexpected,SEGanalyzesthedatafromthefalltesttakerseachyear.
CurriculumAdvantagereviewstheresultsfromthefalltomakesurethetestsareperformingwell.SEGexaminesthestatisticsforthetestsasawhole(e.g.,averagescores,distributionofscores)andthestatisticsforindividualtestitems(e.g.,questiondifficultyandtheabilityofthequestiontodistinguishbetweendifferentlevelsofstudentperformance).Basedonthisanalysis,CurriculumAdvantagefurtherrefinesthetests,revisingandreplacingquestionsasnecessary.
Duringthe2014-2015itemanalysis,CurriculumAdvantagemadethedecisiontoupdatetheUniversalScreener.Newitemswerecreatedandfieldtestedduringthe2015-2016schoolyearandofficiallyaddedtotheassessmentforthe2016-2016schoolyear.TheUniversalScreenerupdatesoverview,goal,andconstraintscanbefoundonaddendumI.
Score Reporting ScoreReportingisdesignedtoprovidereliableinformationusefulforunderstandingoverallstudentreadinessandestimatedstudentstrengthsandweaknessesinspecificstrandsmeasuredbythetest.Scoresarebasedonscaledscoresthatallowallteststobeplacedonacommonscaleregardlessofwhichformisadministeredandatwhatgradelevel.Resultsarereportedatthetotaltestandkeystrandlevel.Strandsassessedvarybygradelevelandsubjectoftheassessment.Thisapproachprovidesareasonablebalancebetweentheneedforinformationonstudentstrengthsandweaknessestheneedforsufficientscorereliability.
Rawscoresarecalculatedasthetotalnumberofitemsansweredcorrectlyonthescreener.Performanceontheassessmentsisreportedasascaledscoreonaverticalscalerangingfrom200to800spanningacrossgradesK–10.Feedbackisalsoprovidedatthekeystrandlevel.(seeVerticalScaleandItemBankCalibrationabove).
Thesestrandsweredeterminedbasedonananalysisofover31statestandardsandthenre-examinedwiththeintroductionoftheCommonCoreStateStandards.CrosswalksareavailabletoshowtherelationshipbetweentheClassworksstrandsandthesestatestandards.
Reading:
Research on Validity and Reliability of Classworks Universal Screeners • 8
• Grammar/Usage/Mechanics
• ReadingComprehension
• StudySkills
• WordAnalysis
• Writing
• WritingProcess
Math:
• Algebra
• Geometry
• MathematicalProcesses
• Measurement
• Numeration
• Operations
• Patterns
• StatisticsandProbability
Strandsthatarereportedarerequiredtoincludeaminimumoffourtestquestionstoprovideareliableestimateofstudentstrengthsandweaknesses.
CurriculumAdvantageestablishesscorerangesthatreflectlevelsofstudentreadinessontheassessments.Therearevariousapproachesthatcanbeusedtoidentifyappropriatecutpointsdefininglevelsofreadiness.BelowdetailsthemethodSEGrecommendedforcreatingappropriatecutpoints.
Establishing Cut Scores ThecutscoresforClassworksUniversalScreenersforgrades3–8wereestablishedusingatwo-stagestandardsettingprocess.Inthefirststage,aBookMarkingProcedure(CizekandBunch,2007)wasapplied.Thiswasfollowedbyasecondstage,inwhichthestageonepotentialcutscoreswerereviewedinlightofstudentperformancedataandexpectationsforstudentperformance.
TheBookMarkingProcedureisanitemmappingapproachtostandardsettingdevelopedinthe1990’s(CizekandBunch,2007).TheBookMarkingProcedureasemployedforClassworksinvolvesthereviewofanorderedtestbookletcontainingalltheitemsforagiventestarrangedinorderofdifficultyfromeasiesttohardest(Mitzel,H.C.,Lewis,D.M.,Patz,R.J.,andGreen,D.R.,2001).ThedifficultyvaluesforthisprocedurewereobtainedfromtheRaschitemcalibrationsobtainedfromtheoriginaldevelopmentofthescreeners.BasedontheproceduressuggestedbyMitzel,etal(2001),contentexpertsreviewedtheordereditembookletandwereaskedtoidentify(“bookmark”)theitemrepresentingthefirstitemforwhichtheminimallyproficientstudentwouldbeunlikelytoanswertheitemcorrectly(lessthan50%probability).Thedifficultyoftheitemidentifiedservedasthepotentialcutscoreemergingfromstageoneofthestandardsetting.
Inthesecondstage,thepotentialcutscoresproducedinstageoneoftheprocesswerereviewedagainstthedistributionofscoresfromoperationaltestingtoevaluatethenumberandpercentageof
Research on Validity and Reliability of Classworks Universal Screeners • 9
studentsthatwould“pass”andthenumberandpercentageofstudentsthatwould“fail”theassessmentbasedonthestageonepotentialcutscores.Insomecases,thestageonepotentialcutscorewasraisedorloweredbasedontheimpactratesorexpectedperformanceforthestudents.
Item Development TheClassworksassessmentitembankwasdevelopedbyateamofcontentexpertsfromathird-partydeveloper,aleaderinthecreationofhigh-stakescontentforassessmentsproducedbystatesandtestingcompanies.Thetestitemshavebeenreviewedandrefinedthroughamulti-stepprocessinvolvingmembersofthistestdevelopmentteam.
TheUniversalScreenersarecomposedof100%four-response-optionmultiple-choicetypequestions.TheitemswerespecificallydevelopedfortheUniversalScreenerorwereselectedandmodifiedfromtheexistingCurriculumAdvantageitembank.
Guiding Principles of Item Construction Inordertoensureitemreliabilityandvalidity,guidingprincipleswereusedintheitemconstructionprocess.
ItemConstruction:
• Itemsarewritteninclear,conciselanguageattheappropriategradelevel
• Itemsarewrittenwithoutage,gender,ethnic,religious,ordisabilitybias
• Eachitemsetmeasuresbothbasicknowledgeandhigher-orderthinkingskills
• Itemsadheretotheobjectivesbeingassessed
• Itemsareconstructedinaconsistentmanner
• Itemcontentiscurrentandrelevanttoaudience
• Itemsarewrittenintheformofquestions,avoidingopenendedornegativestems
ItemResponseMeasurement:
• Itemsshowconsistencyofstudentresponse
• Resultscanbegeneralizedtothepopulation
• Itemsarecalibratedtoensurethatscoreshavesimilarmeaningovertime
• Aftercalibration,itemsareplacedonadevelopmental/verticalscaletoallowfortheaccuratecomparisonofstudentsovertimeandacrossuseoftheitems
• Studentperformancecanbepredictedfromitemresponse
• Targetgoalsandnormscanbedevelopedfromitemresponsemeasures
Questions/Stems:
• Stemsandreadingpassageswillbeatgrade-levelreadabilityandmustassesstheskillbeingtestedaccordingtothelevelofBloom’sindicated
• Stemsarefreeofage,gender,ethnic,religious,ordisabilitystereotypesorbias
Research on Validity and Reliability of Classworks Universal Screeners • 10
• Stemsarewritteninquestionformatanddonotrequiresentencecompletion,true/false,andfill-in-the-blank
• Eachstemhasonlyonecorrectanswer
Answers/Distractors:
• Answersarepresentedinamultiple-choiceformatwithfouransweroptions
• Distractorsarewritteninalogicalorder(alphabetical,chronological)
• Distractorsareapproximatelythesamelengthandmustbegrammaticallyparallel
• Distractorsareplausibleandshouldnotcontaingrammaticalclues
• Distractorsaddressavarietyofcommonerrorsratherthanthesameerror
• Distractorrationaleisprovidedforeachanswerchoice
Thetestitemsaremultiple-choicequestions,offeringanefficientandreliablewaytoassessstudents’knowledgeandskills.Allitemshaveonesinglebestanswerandresponsesarescoredascorrectorincorrect.Multiplechoicemeasureshaveadvantagesoverothertypesofitemresponse,inthattheyarecapableofcoveringalargeamountofcontentinarelativelyshortperiodoftime.Moreover,theycanachievehighlevelsofreliability,providinguserswithaconsistentandstablemeasureofstudentknowledgeandskillsovertime.
Research on Validity and Reliability of Classworks Universal Screeners • 11
Test Validation Followingthecreationofthetests,SEGconductedasecondverificationoftheassessmentitems.Theverificationprocessconsistedofacomprehensivealignmentreviewtoestablishthevalidityoftheassessmentitemsandtodetermineiftheywereaccuratelyalignedtotheobjectivestheypurporttomeasure.
CurriculumAdvantagecontinuestopartnerwithSEGtoensurethattheteststhemselves,aswellasassessment-relateddecisions,arepsychometricallysound.Thisongoingprocessincludesfurtherstatisticalanalysis,itemcalibration,adjustmentstothecutscoresontheverticalscale,andoverallevaluationofthequalityofClassworksUniversalScreeners.
Field Testing and Analysis Toensurethatthetestitemsandassessmentsarepsychometricallysound,SEGanalyzedtheitemandtestperformancedatabasedonthefieldtesttobeconductedbyCurriculumAdvantageinthefallof2009,thefallof2010andthefallof2011.CurriculumAdvantagecollectedinformationfromapproximately200–300studentspertestformthefirstyear,withexponentialincreasesineachofthefollowingyears.SEGanalyzestheresultseachyear,providingbothtestanditemlevelanalysesincluding:
• Overalltestandsubteststatistics
o Mean
o StandardDeviation
o Reliability
o SEM(StandardErrorofmeasure)
o OverallModelFit
o FrequencyDistribution
• Itemstatistics
o PValue(percentcorrect)
o Pointbiserialcorrelation(measureofitemdiscrimination)
o Logitvaluefrom-3to+3(personanditemindependentmeasureofitemdifficulty)
o ItemInfitstatistic
o ItemOutfitstatistic
SEGreviewstheitemstatistics,andanyitemthatdoesnotdemonstratesuitablepsychometriccharacteristicsarerecommendedforreplacement.Thesestatisticshelpensureon-goingrelevanceandvalidity.
Research on Validity and Reliability of Classworks Universal Screeners • 12
HerearesomeofthestatisticsSEGcalculates:TotalTestStatistics
• AverageScoreontheAssessment–SEGcomputestheaverage(mean)scoreachievedbystudentstakingtheassessment.Thishelpsusdetermineiftheassessmentisproperlytargetedtothelevelofthestudentsassessed.
• VariationandDistributionofScoresontheAssessment–SEGcalculatestheamountofvariability(standarddeviation)inthetestscoresachievedbystudentstakingtheassessment.Thisisanotherindicatorofhowwellthetestistargetedtothelevelofstudentsassessed.
• Reliability–SEGcomputesthereliabilityofthetesttoensurethatthetestisconsistentlymeasuringtheknowledgeandskillsmeasuredbytheassessmentacrossformsofthetestandisstableovertime.
• ScoreAccuracy–Anyassessmentscoreissubjecttovariationwhenastudenttakesthetestmultipletimes.SEGestimatestheamountofvariationexpectedforastudentscore(StandardErrorofMeasure;SEM);thisisanindicatorofscoreaccuracy.
IndividualQuestionStatistics
• QuestionDifficulty–SEGcomputesthepercentageofstudentswhoanswerthequestionscorrectly;thisisanindicatorofthedifficultyofthequestion
• QuestionDifferentiation–SEGcomputestherelationshipbetweenstudentperformanceoneachindividualquestionandtheassessmentasawhole;thisisanindicatorofhowwellthequestiondifferentiatesbetweenthosestudentswhohavetheknowledgeandskillsmeasuredbytheassessmentandthosewhodonothavetheknowledgeandskills.
Research on Validity and Reliability of Classworks Universal Screeners • 13
National Center for Response to Intervention Review TheNationalCenterforResponsetoInterventionuseddatacollectedduringthe2009–2010and2010–2011schoolyearstofurtherevaluatethequalityofClassworksUniversalScreeners.FollowingtheimplementationofthefinalUniversalScreenerforms,performanceonthescreenersandhigh-stakestestswereusedtoinvestigatethevalidityandclassificationaccuracyoftheUniversalScreeners.
Reliability Testreliabilityreferstothetestscoreconsistencyandaccuracy.Reliabilityvaluesrangefrom0to1.00,withhighervaluesindicatinghigherreliability.Usingthedatacollectedfromthemulti-statefieldtest,theaveragereliabilityforUniversalScreenersforreadingfromgradesK–10wasfoundtobe0.90.FormathematicsingradesK–10,theaveragereliabilitycoefficientis0.88.ThesehighinternalconsistencymeasuresindicatethattheUniversalScreenersareabletoprovideareliablemeasureofstudentperformanceinreadingandmathematics.
Validity Testvalidityreferstotheappropriatenessofthetestsforitsintendedpurpose.Evidenceforvalidityofthetestsisgatheredfromtheitemdevelopmentandtestdevelopmentprocessaswellasstatisticalanalyses.
ClassworksUniversalScreenerswerespecificallydesignedforthepurposeofscreeningstudentswhomayneedadditionalintervention.TheitemsandtestshavebeenfieldtestedandevaluatedusingItemResponseTheorytoensurethattheitemsandtestsareperformingasexpected.TherigorousprocessesfollowedforitemandtestdevelopmentprovidesupportforthecontentvalidityoftheUniversalScreeners.
PerformanceontheUniversalScreenershasbeencomparedtootherhigh-stakesteststoensurethatperformanceontheUniversalScreenersisconsistentwithperformanceonotherassessments.Duringthe2010–2011schoolyear,ClassworksUniversalScreenerdataandhigh-stakestestdatafromover11,300studentsinalargesouthernstatewerecollectedtoevaluatethecorrelationbetweentheUniversalScreenerscoresandthehigh-stakestestscores.
RulesofThumb–Armstrong(2006),reiteratingtherecommendationsofSmith(1984)suggeststhefollowingrulesofthumbforvaliditydataexaminingonemeasureofaconstructinrelationtoanothermeasureofthatconstruct:
• Over.50excellent
• .40to.49good
• .30to.39acceptable
• Lessthan.30poor
Onaverage,thecorrelationbetweentheClassworksUniversalScreenerscoresandthehigh-stakestestscoreswas0.46formathematicsand0.63forreading.Further,thescreenerswerefoundtoagreewithothermeasuresinclassifyingstudentsas“notat-risk”93%ofthetimeinmathematics,and97%ofthetimeinreading.ThesecorrelationsbetweentwotestsmeasuringsimilarconstructssupporttheconstructvalidityoftheinterpretationoftheUniversalScreenerscores.
Research on Validity and Reliability of Classworks Universal Screeners • 14
Classification Analyses Inadditiontothereliabilityandvalidityofthemeasures,theUniversalScreenerswerealsoevaluatedwithregardtotheaccuracyofclassifyingstudentsasat-riskincomparisontoanindependentmeasure.Itisimportantthatthescreenersareabletoappropriatelyidentifystudentswhoareat-riskandthosewhoarenotat-risk.Inparticular,itiscriticalthatat-riskstudentsareproperlyidentifiedasbeingat-risktogettheinstructionalhelpthattheyneed.
Inordertoevaluatetheclassificationaccuracy,ClassworksUniversalScreenersclassificationswerecomparedtotheclassificationsdeterminedbyperformanceonhigh-stakesstateassessmentsinreadingandmath.Thecomparisonsprovidedaclassificationofstudentsintooneoffourcellsina“confusionmatrix.”Studentscouldbeclassifiedasat-riskornotat-riskbasedonthepassingstatusforeachofthetwoassessmentsasPass-Pass,Pass-Fail,Fail-Pass,orFail-Fail.Theclassificationanalyseswereperformedbyevaluatingsensitivityandspecificity.
Negativepredictivepowerisameasurethatestimatestheaccuracyofclassifyingstudentsas“notat-risk.”Ausefulscreeningtoolshouldhaveveryhighnegativepredictivepowersuchthatat-riskstudentsarenotmisidentifiedasnotbeingat-risk.Usingtestdataformorethan11,300students,theUniversalScreenerswerefoundtohave93%and97%negativepredictivepowerformathandreading,respectively.
ClassworksUniversalScreeners Update TechnicalReport
ThisdocumentprovidesasummaryofthetaskscompletedinupdatingtheClassworksUniversalScreenersforreadinginmathingradesK–HighSchool.
August2016
1©SEGMeasurement.
ContentsOverview......................................................................................................................................................2
GoalsandConstraints...............................................................................................................................2
Tasks.........................................................................................................................................................2
Investigatingstrandsandexpectationsofotherassessments.............................................................3
FinalizingplansfortheupdatestobemadetotheUniversalScreeners.............................................6
Selectingitemsforreplacement.........................................................................................................10
Developingnewitems........................................................................................................................10
Producingnewitems..........................................................................................................................11
Creatingfieldtestformsandadministeringthefieldtest..................................................................11
Analyzingfieldtestdata.....................................................................................................................14
Evaluatingfinaltestformsandscoring..............................................................................................14
2©SEGMeasurement.
OverviewThisdocumentprovidessupportinginformationregardingtheupdatestotheClassworksUniversalScreenersinReadingandMathforgradesK–10thatwillbeinplaceofficiallyforthe2016-2017schoolyear.ThisdocumentprovidesthefinalplansthatwereexecutedbetweenMarch2015andJuly2016andprovidesstatisticalinformationregardingtheitemsandforms.
GoalsandConstraintsThegoalsforthisprojectweretomodifytheUniversalScreenerstobemorereflectiveofthelatestmultiplechoiceitemsandexpectationsofstudentsinK-12education,whileatthesametimekeepingtheClassworksScreenersconsistentwiththecurrentforms.Itwasagreedthatthisprojectwouldincludethedevelopmentof125newReadingand125newMathitemsforuseinthenewUniversalScreenerforms.Furthergoalsandguidelinesarenotedbelow.
• Allitemswillbefour-choicemultiplechoicewithasinglecorrectanswer.• Therewillbenoaudioorvideopassagesassociatedwiththeitems.
o GradesK-2formswillhavetext-to-speechsupport.Anynewitemswrittenwillgetthisappliedsotheentirefieldtestformhasthissupport.
o Theremaybeconsiderationforapassagetobeavideooraudioclip,ifitcanbefullyownedandhosted(soastoavoidlinksexpiring)andifthedeliverycansupportit.
• Thetestlengthswillremainconsistentwiththecurrenttestlengthsofscoreablecontent(fieldtestlengthswillbelonger).
• Themajorityoftheitemsonthecurrentscreenerswillremainonthenewscreeners.• Thenewcontentshouldbeasseamlessaspossiblewiththecurrentcontent.• Aswiththecurrentforms,anystrandwithatleast4itemswillbeconsideredakeystrandand
willbelinkedtoinstructionalcontent.• AlloftheitemsmustaligntothecurrentClassworkscontenthierarchy.
(subject/grade/strand/skill/objective–listedintheAppendix)o Therewillbenochangesto,combinationsof,oradditionstothestrands,skills,or
objectives.• Fieldtestformswillincludetheentirecurrentformplusnewitemsforfieldtesting.Scoring
duringthefieldtesttimeperiodwillcontinuetobebasedonthescoreditemsonthecurrentforms.
• ReportingonstudentperformanceonthefinalnewScreenerswillneedtobeabletobecomparabletohistoricalperformanceonpriorforms,whetherbyusingthesamescaleorprovidingatranslationofnewtooldscoringforcomparison.
• Theupdatestotheformsshouldbeasseamlessaspossible.TasksThefollowingkeytasksinvolvedinupdatingtheUniversalScreenersaresummarizedinthisreport.
1. Investigatingstrandsandexpectationsofotherassessments
3©SEGMeasurement.
2. FinalizingplansfortheupdatestobemadetotheUniversalScreeners3. Selectingitemsforreplacement
4. Developingnewitems
5. Producingnewitems
6. Creatingfieldtestformsandadministeringthefieldtest
7. Analyzingfieldtestdata
8. Evaluatingfinaltestformsandscoring
InvestigatingstrandsandexpectationsofotherassessmentsAspartoftheinitialplanningstages,manyassessmentsandstandardswerereviewedtogatherinformationonthelatestexpectationsofstudentsinreadingandmath.ThiswastohelpmeetthegoalthatthechangestotheUniversalScreenerswouldhelptobringtheformsmoreinlinewithexpectationsofothercommonassessmentsandstandards.
TheCommonCoreReading/ELAstrandsaresummarizedinthefollowingtable.
Table1:CommonCoreReading/ELAStrands
Area
Strand
Grade
K 1 2 3 4 5 6 7 8 9 10
Reading
Literature-keyideasanddetails Y Y Y Y Y Y Y Y Y Y Y
Literature-craftandstructure Y Y Y Y Y Y Y Y Y Y YLiterature-integrationofknowledgeandideas
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Literature-rangeofreadingandleveloftextcomplexity
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Inf.Text-keyideasanddetails Y Y Y Y Y Y Y Y Y Y Y
Inf.Text-craftandstructure Y Y Y Y Y Y Y Y Y Y Y
Inf.Text-integrationofknowledgeandideas Y Y Y Y Y Y Y Y Y Y YInf.Text-rangeofreadingandleveloftextcomplexity
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
FoundationalSkills-Printconcepts Y Y Y Y Y Y Foundationalskills-phonologicalawareness Y Y Y Y Y Y
Foundationalskills-phonicsandwordrecognition
Y
Y
Y
Y
Y
Y
Foundationalskills-fluency Y Y Y Y Y Y
Language
ConventionsofStandardEnglish Y Y Y Y Y Y Y Y Y Y Y
KnowledgeofLanguage Y Y Y Y Y Y Y Y Y
VocabularyAcquisitionandUse Y Y Y Y Y Y Y Y Y Y Y
4©SEGMeasurement.
Writing
Texttypesandpurposes Y Y Y Y Y Y Y Y Y Y Y
ProductionandDistributionofWriting Y Y Y Y Y Y Y Y Y Y Y
Researchtobuildandpresentknowledge Y Y Y Y Y Y Y Y Y Y Y
Rangeofwriting Y Y Y Y Y Y Y Y
SpeakingandListening
ComprehensionandCollaboration Y Y Y Y Y Y Y Y Y Y Y
PresentationofKnowledgeandIdeas Y Y Y Y Y Y Y Y Y Y Y
LiteracyinHistory/Social
Studies,Science,&TechnicalSubjects
Keyideasanddetails Y Y Y Y Y
Craftandstructure Y Y Y Y Y
Integrationofknowledgeandideas Y Y Y Y Y
Rangeofreadingandleveloftextcomplexity Y Y Y Y Y
TheGeorgiaMilestoneAssessmenttestsingrades3–8includethefollowinghighlevelskillsforELA:
• ReadingandVocabulary• WritingandLanguage
TheNationalAssessmentofEducationalProgress(NAEP)forReadingincludesthefollowingskills:
• LiteraryandInformationaltext
o Locateandrecallo Integrateandinterpreto Critiqueandevaluateo Vocabulary
TheCommonCoreMathstrandsaresummarizedinthefollowingtable.
Table2:CommonCoreMathematicsStrands
Grade/Course
Strand
Grade
K
1
2
3
4
5
6
7
8 HS
-Num
ber&
Qua
ntity
HS-Algebra
HS-Functio
ns
HS-Geo
metry
HS-Stats&
Probability
K-8
CountingandCardinality Y OperationsandAlgebraicThinking Y Y Y Y Y Y NumberandOperationsinBase10 Y Y Y Y Y Y NumberandOperations-Fractions Y Y Y MeasurementandData Y Y Y Y Y Y Geometry Y Y Y Y Y Y Y Y Y RatiosandProportions Y Y TheNumberSystem Y Y Y ExpressionsandEquations Y Y Y
5©SEGMeasurement.
Functions Y StatisticsandProbability Y Y Y
HS
Numberand
Quantity
TheRealNumberSystem
Y
Quantities Y ComplexNumberSystem Y VectorandMatrixQuantities Y
HSAlgebra
SeeingStructureinExpressions Y ArithmeticwithPolynomialsandRationalExpressions
Y
CreatingEquations Y ReasoningwithEquationsandInequalities Y
HSFunctions
InterpretingFunctions Y BuildingFunctions Y Linear,Quadratic,andExponentialModels Y TrigonometricFunctions Y
HSGeometry
Congruence Y Similarity,RightTriangles,andTrig Y Circles Y ExpressingGeometricPropertieswithEquations
Y
GeometricMeasurementandDimension Y ModelingwithGeometry Y
HSStatistics
andProbability
InterpretingCategoricalandQuantitativeData
Y
MakingInferencesandJustifyingConclusions
Y
ConditionalProbabilityandRulesofProbability
Y
UsingProbabilitytoMakeDecisions Y
TheGeorgiaMilestoneAssessmenttestsingrades3–8includethefollowingstrandsforMath:
• OperationsandAlgebraicThinking:Grades3–5• NumberandOperations:Grade3• NumberandOperationsinBase10:Grades4-5• NumberandOperations:Fractions:Grades4-5• MeasurementandData:Grades3-5• Geometry:Grades3–8• TheNumberSystem:Grades6-7• RatiosandProportions:Grades6–7• StatisticsandProbability:Grades6–8• Numbers,Expressions,andEquations:Grade8• ExpressionsandEquations:Grades6–7• AlgebraandFunctions:Grade8
6©SEGMeasurement.
TheNationalAssessmentofEducationalProgress(NAEP)mathematicsassessmentcoversthefollowingstrands:
• Algebra• Numberpropertiesandoperations• Measurement• Geometry• Dataanalysis,statisticsandprobability
FinalizingplansfortheupdatestobemadetotheUniversalScreenersWemaderecommendationsforchangestothestrands(particularsonconsolidating,renaming,adding,orexpanding)andafterinternalreviewoftheimpactonthesystemandbenefitsofmakingthechanges,CurriculumAdvantagedeterminedthatthestrandswillremainconsistentbetweenthecurrentUniversalScreenerformsandthenewUniversalScreenerforms.Ratherthanchangingthestrands,thefocusisonincreasingthequalityoftheitemsincludedwithinthestrands.
Table3:ClassworksReading/ELAStrandsandCurrentUniversalScreenerCoverage
Grad
e
Grammar/U
sage/M
echanics
Read
ing
Stud
ySkills
WordAn
alysis
Writing
WritingProcess
notcovered
-Listen
ing/Speaking/Viewing
Grand
Total
K 9 1 5 151 2 10 7 1 202 3 12 1 8 1 253 7 9 3 4 1 1 254 6 10 2 5 1 1 255 7 10 3 7 1 2 306 7 11 3 6 1 2 307 7 11 4 6 2 308 8 11 2 6 3 309 8 13 3 4 2 30
10 9 13 5 3 30
7©SEGMeasurement.
Table4:ClassworksMathematicsStrandsandCurrentUniversalScreenerCoverage
Grade
Algebra
Concep
tsofC
alculus
Geo
metry
Mathe
maticalProcesses
Measuremen
t
Num
eration
Ope
ratio
ns
Patterns
Statisticsa
ndProbability
Trigon
ometry
Grand
Total
K 2 3 5 2 2 1 151 1 5 4 4 4 1 1 202 2 4 3 5 4 2 2 3 253 2 2 2 6 1 5 2 5 254 1 5 1 3 4 4 1 6 255 5 8 4 4 1 2 1 5 306 6 1 8 3 1 3 3 1 4 307 6 2 6 2 4 2 2 1 5 308 8 1 8 1 4 1 2 1 4 309 8 7 5 2 1 1 1 4 1 30
10 8 6 5 4 1 1 4 1 30
ThefollowingdecisionsweremadeinconjunctionwithCurriculumAdvantagewithregardstotheupdatestotheUniversalScreeners:
o Eachformwouldhaveapproximately20%oftheformreplacedwithnewitemsthataligntothe
currentClassworksobjectives.(TheobjectivesforeachgradeandsubjectweregatheredthroughtheClassworksitembankandincludedinAppendixA.)
o Itemswillbeconsideredforreplacementwithanewitembasedonthequalityofthecurrentitem,theimportanceoftheobjectivemeasured,andtheabilityoftheitemtomeasureon-gradereadiness.
o Itemsthatarereplacedmaybereplacedwithanewitemmeasuringthesameobjective,adifferentobjectivewithinthesamestrand,oranobjectiveinadifferentstrandthatisinneedofmorecoverage.
o Inbothreadingandmath,allnewitemswillbesinglebestanswermultiplechoiceitems.o Itemsmaybeassociatedwithoneormorepassagesorimages.Someitemsmayneedtobe
administeredtogetherinsequenceasaset(i.e.,agroupofitemsthatareallassociatedwiththesamepassage(s)).
o Itemswillallbeindependent(notrelatetoorbuildoneachother),eveniftheyrelatetothesamepassageorstimulus.
o Newitemsmaybeusedonmultipleformsacrossorwithingrades(tofollowsimilaroverlapofcurrentforms),butduplicateusagewillonlycountasoneitemoutofthe125thatwillbedeveloped.
8©SEGMeasurement.
o Thefieldtestformswillcontaintheentirecurrentscoreableformsplusadditionalnon-scoreditemsforfieldtesting.Thiswillallowforthefieldtestformstocontinuetoserveasliveoperationalformsduringthe2015-2016schoolyear.
o Thenewformswillmaintainthecurrentgradelevelcoverageoftheforms.o Allofthenewitemswillbefieldtestedtogatherdata.o Alinkedformdesignwithshareditemswillbeusedsothattheentirepoolofnewitemswithina
subjectcanbecalibratedwiththecurrentpool.o Afterthefieldtest,theitemsandplannedfinalformswillbeevaluated.o Scoringandcomparabilitytothecurrentformswillbeevaluatedtodeterminewhetherchanges
arewarranted.
Table5showstheplannednumberofitemsdevelopedforeachgrade(roughly20%ofeachform).Theactualitemdevelopmentmatchedtheseplans.Tables6and7showthebreakdownofgradelevelcoverageforeachform,whichremainconsistentfromthecurrentscoreableitemstothenewscoreableitems(afterfieldtesting).
Table5:NumberofNewItemsPerForm
Grade Reading MathK 3 31 3 32 5 53 7 74 6 65 7 76 6 67 7 78 6ononeform,7ontheother 6ononeform,7ontheother9 6 610 6 6Total 125 125
Table6:ItemGradeLevelCoverageonReadingScreeners
READING ItemGradeLevelForm K 1 2 3 4 5 6 7 8 HS NGradeKReadingScreener A 15 15B 15 15Grade1ReadingScreener A 8 12 20B 8 12 20Grade2ReadingScreener A 5 7 13 25
9©SEGMeasurement.
B 5 7 13 25Grade3ReadingScreener A 7 7 11 25B 7 7 11 25Grade4ReadingScreener A 6 7 12 25B 6 7 12 25Grade5ReadingScreener A 7 8 15 30B 7 8 15 30Grade6ReadingScreener A 7 8 15 30B 7 8 15 30Grade7ReadingScreener A 7 8 15 30B 7 8 15 30Grade8ReadingScreener A 7 8 15 30B 7 8 15 30Grade9ReadingScreener A 7 8 15 30B 7 10 13 30Grade10ReadingScreener A 2 8 20 30B 2 8 20 30
Table7:ItemGradeLevelCoverageonMathScreeners
MATH ItemGradelevelForm K 1 2 3 4 5 6 7 8 HS NGradeKMathScreener
A 15 15B 15 15
Grade1MathScreener A 8 12 20B 8 12 20
Grade2MathScreener A 5 7 13 25B 5 7 13 25
Grade3MathScreener A 6 7 12 25B 6 7 12 25
10©SEGMeasurement.
Grade4MathScreener A 6 7 12 25B 6 7 12 25
Grade5MathScreener A 7 8 15 30B 7 8 15 30
Grade6MathScreener A 7 8 15 30B 7 8 15 30
Grade7MathScreener A 4 8 18 30B 4 8 18 30
Grade8MathScreener A 7 8 15 30B 7 8 15 30
Grade9MathScreener A 2 13 15 30B 2 13 15 30
Grade10MathScreener A 2 8 20 30B 2 8 20 30
SelectingitemsforreplacementEachcurrentformwasexportedfromtheClassworkssystemintoaseparateWorddocumentinpreparationforreviewandupdate.Foreachform,theplansfornumbersofitemstobereplacedandtheblueprintfortheformwerenotedinthedocument.Eachcurrentformwasreviewedbyexpertstoidentifythespecificitemsthatwouldprovidethemostvaluebybeingremovedfromtheformandreplacedwithanewitem.Theitemswerereviewedalongmultiplefacetsofqualityincludingthegeneralqualityoftheitem,reflectionofcurrentexpectationsoftheskill,importanceandrelevanceoftheitem,andhowwelltheitemmeasurestheobjectivewithintheskill/strand.
Foreachform,theitemstobereplacedwereidentifiedanditemwritingassignmentsweredeveloped.Inmanycases,thenewitemwoulddirectlyreplacethecurrentitemwithanotheritemthatbettermeasuredtheobjectivewithinthestrand.Insomecases,itwasdeterminedthatadifferentobjectiveshouldbecoveredwithinthestrandtobettercoverthefocusoftheparticularstrand.
DevelopingnewitemsAftertheitemstobereplacedwereidentifiedandtheitemneedswereidentified,itemdevelopmentbegan.Testdevelopmentexpertsinmathandreadingdevelopedthenewitemstomeettheitemspecifications.NewitemswerewrittentomaintainthecurrentstyleoftheUniversalScreenerswhilealsorepresentingnewerwaysofmeasuringtheobjectives.
11©SEGMeasurement.
Thedraftitemswerereviewedandeditedforstyle,grammar,contentaccuracy,appropriateness,andperceivedpsychometricquality.Thefinal250newitemswerethenpreparedforonlineproductionintotheClassworkssystem.
AppendixBcontainsthealignmentinformationforeachofthenewitems.
ProducingnewitemsOnceapprovedinternally,weindividuallyenteredtheitemsintotheClassworksitembankdatabase.Eachitemwascodedwithitssubject,gradelevel,strand,andskillaspertherequirementsofthesystem.ThesystemgeneratedauniqueassessmentsystemIDnumberforeachitem.Thecorrectanswerwasidentifiedandartworkwasuploaded.Theitemswerereviewedforproperrenderingontheplatform.
Aftertheitemspassedthroughtheinternalproductionreview,itemswerereleasedtoCurriculumAdvantageforreviewbytheircontentexperts.Inadditiontotheitemsbeingavailableonline,detailsabouttheitemsweresentexternallytoassistinreviewandtracking.AfterreviewbyCurriculumAdvantage,theitemswerefinalizedbySEGandapprovedinthesystembyCurriculumAdvantagecontentexperts.
OncetheitemswereapprovedinourlocalitembankinClassworks,CurriculumAdvantageprogrammersworkedtoporttheitemsintotheofficialClassworksitembankandactivatetheitemsforuseonthefieldtestforms.Duringthisprocess,theitemIDswereslightlymodifiedtoensuretheitemIDswereuniquewithintheClassworksitembankwhilealsoallowingfortrackingwiththeoriginalIDswhentheitemswerecreated.Allofthenewitemsintheofficialbankareinthe15,000s.Forexample,itemID8inthelocalbankisnow15008anditemID168isnow15168.
CreatingfieldtestformsandadministeringthefieldtestInordertoallowforcontinuedproductionuseoftheUniversalScreenerswhilealsofieldtestingthenewitems,thefieldtestformsweredevelopedtoincludetheentiresetofscoreableitemsonthecurrentformaswellasadditionalitemsforfieldtestingthatdidnotcounttowardsthestudent’sscore.Thefieldtestitemsincludeditemsthatwouldendupbeingonthatofficialformaswellasotherlinkingitemsthatwouldbedroppedfromtheform.Itemswereplacedstrategicallyacrossformssothatalloftheformswouldbelinkedandthattestdatafromstudentswhowereongrade,abovegrade,andbelowgradewereexposedtotheitems.Thefollowingtwotablessummarizetheplansforthefieldtestforms.AppendixCcontainstheitemleveldetailsonthefieldtestforms.
12©SEGMeasurement.
Table8:FieldTestPlansforReading/ELA
Form
Number of Scored Items on Current Form
Number of Non-Scored Items on Current Forms (these items will be dropped for new field test forms)
Current total number of items (scored and non-scored)
Number of Item Replacements (New Field Test Items that will eventually replace current scored items)
Number of linking items (additional non-scored linking items for field testing and calibration)
New Field Test Length (scored and non- scored items)
New Planned Screener Final Test Length (all scored items only, same number as current form scored item count)
Grade K Reading Screener A 15 5 20 3 3 21 15 B 15 5 20 3 3 21 15 Grade 1 Reading Screener A 20 5 25 3 3 26 20 B 20 5 25 4 2 26 20 Grade 2 Reading Screener A 25 5 30 5 3 33 25 B 25 5 30 5 3 33 25 Grade 3 Reading Screener A 25 5 30 7 3 35 25 B 25 5 30 7 3 35 25 Grade 4 Reading Screener A 25 5 30 6 4 35 25 B 25 5 30 6 4 35 25 Grade 5 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 6 Reading Screener A 30 5 35 6 3 39 30 B 30 5 35 7 2 39 30 Grade 7 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 8 Reading Screener A 30 5 35 6 3 39 30 B 30 5 35 7 2 39 30 Grade 9 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 10 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30
13©SEGMeasurement.
Table9:FieldTestPlansforMath
Form
Number of Scored Items on Current Form
Number of Non-Scored Items on Current Forms (these items will be dropped for new field test forms)
Current total number of items (scored and non-scored)
Number of Item Replacements (New Field Test Items that will eventually replace current scored items)
Number of linking items (additional non-scored linking items for field testing and calibration)
New Field Test Length (scored and non- scored items)
New Planned Screener Final Test Length (all scored items only, same number as current form scored item count)
Grade K Math Screener A 15 5 20 3 3 21 15
B 15 5 20 3 3 21 15
Grade 1 Math Screener A 20 5 25 3 3 26 20
B 20 5 25 3 3 26 20
Grade 2 Math Screener A 25 5 30 5 3 33 25
B 25 5 30 5 3 33 25
Grade 3 Math Screener A 25 5 30 7 3 35 25
B 25 5 30 7 3 35 25
Grade 4 Math Screener A 25 5 30 6 4 35 25
B 25 5 30 6 4 35 25
Grade 5 Math Screener A 30 5 35 7 2 39 30
B 30 5 35 7 2 39 30
Grade 6 Math Screener A 30 5 35 6 3 39 30
B 30 5 35 6 3 39 30
Grade 7 Math Screener A 30 5 35 7 2 39 30
B 30 5 35 7 2 39 30
Grade 8 Math Screener A 30 5 35 6 3 39 30
B 30 5 35 7 2 39 30
Grade 9 Math Screener A 30 5 35 6 3 39 30
B 30 5 35 7 2 39 30
Grade 10 Math Screener A 30 5 35 7 2 39 30
B 30 5 35 9 0* 39 30
*Grade 10 B form already has new field test items that are also on other forms/grades.
14©SEGMeasurement.
Thefieldtestformswereadministeredduringthe2015-2016schoolyearaspartofoperationalClassworksusageuntilsufficientdatawascollectedforeachform.CurriculumAdvantageexportedthefieldtestdataforanalysisinJune2016.
AnalyzingfieldtestdataSEGpreparedthefieldtestdataforanalysesformultiplepurposes:evaluatingtheitemqualityofthenewitems,evaluatingtheitemqualityofthecurrentitemsthatwillremainontheforms,calibratingthenewitemsintothecurrentpoolsofactiveitems,evaluatingthedifficultyofthetestforms,andreviewingtheverticalscalingacrosstheforms.
Theitemswerereviewedfirstintermsofpercentageofstudentsansweringcorrectly.Anyitemsthatwereansweredbyfewerthan25percentcorrectwerereviewedforaccuracy.Itemsthathaveveryfewpeopleansweringcorrectlymaysimplybeharditems,ortheymaybeitemsthatweremiskeyed,didnotrenderproperlyforansweringcorrectly(particularlyinthecaseswhereimages/graphswererequired),orpossiblyhadmultiplecorrectanswers.Thepointbiserialswerealsoreviewedforeachitem.Thepointbiserialprovidesameasureoftherelationshipbetweenperformanceontheitemandperformanceontheform.Allofthenewitemsweredeterminedtobefunctioningacceptablyandnomodificationsorreplacementswerewarranted.AsmallnumberofcurrentitemswereflaggedforcontentreviewinternallyatCurriculumAdvantageforpotentialmodificationtoimprovetheperformanceoftheitems.Theitemsflaggedforfurtherreviewwereitems8942,13064,and14628.
ThedetaileditemstatisticsareprovidedinAppendixD.Theformswerereviewedtocomparetheoveralldifficultyoftheplannednewformswiththedifficultyofthecurrentforms.ThenewformswerefoundtobeveryconsistentwiththecurrentformsasshowninAppendixE.Thesesimilaritieswereexpectedbasedonthefinaldesignandscopeoftheupdatestotheitemsontheforms.
EvaluatingfinaltestformsandscoringAfterthefieldtestdatawasevaluatedandthedefinitions(itemcomposition)ofthenewformswereconfirmed,weevaluatedthenewformstodeterminewhetheranychangestothescoringoruseofthedatawouldbewarranted.
Usingthedatacollectedduringthefieldtesting,wecalculatedtheestimatedreliabilityofthenewforms(includingthoseitemsthatwillbescoreableonthefinalnewforms).Reliabilitycanbethoughtofasameasureoftheconsistency,stability,andaccuracyofthescoring.Testscoreswithhighreliabilitywillproducesimilarscoresforstudentsiftheyweretoretakethetestwithoutfurtherinstructionortimepassing.Overall,thereliabilitiesforthenewUniversalScreenersareverystrong.Atthetailswheretherearefewerstudentstakingtheforms(specifically10thgrademath),thereliabilitiesareabitlower.Thereliabilitiesareaffectedbythedistributionofthescoresandthestudentswhotookthetestforms.ItisexpectedthatwithadditionaltesttakersandamoreconsistentusageoftheScreenersforthose
180©SEGMeasurement.
forms,thatwewouldseeimprovedreliabilityforthoseformswherethereliabilityiscurrentlyabitweakerthanotherforms.
Table10:FormReliability
MATH READINGKA 0.78 0.69KB 0.88 0.861A 0.96 0.961B 0.77 0.822A 0.98 0.972B 0.87 0.973A 0.96 0.983B 0.97 0.974A 0.95 0.964B 0.91 0.895A 0.95 0.975B 0.93 0.936A 0.91 0.976B 0.92 0.947A 0.80 0.927B 0.89 0.938A 0.85 0.948B 0.85 0.939A 0.77 0.89B 0.51 0.7510A 0.66 0.8410B 0.48 0.82
Theitemswerecalibratedwithinsubjectacrossallgradesandanchoredtothecurrentitempools.Thiswasconductedinordertoevaluatewhethertheitemsfitreasonablywithinthepoolandwhetherchangestotheverticalscalingwerewarranted.Giventheconsistencyofthenewformswiththecurrentforms,itisrecommendedthatthecurrentscalingandreportingbecontinued.Thiswillallowforlongitudinalreportinginthesystemwithoutchangestothesystemorincreasedcomplexityforteacherstointerprettheresultsandmakedecisions.TheitemlevellogitandfitdatafromtheverticalscalingisincludedwiththeitemlevelstatisticsinAppendixD.
ThenewupdatedUniversalScreenerformscanbeseamlesslyputintoproductionasplannedandcancontinuetobeusedasanintegralcomponentofthecompleteClassworkssystem.