Classworks Universal Screeners

Preview:

Citation preview

Validity and Reliability of

Classworks Universal Screeners

Updated May 2018

Research on Validity and Reliability of Classworks Universal Screeners • 2

Table of Contents

ExecutiveSummary..................................................................................................................................3–4

TestDesign...............................................................................................................................................5–8

VerticalScaleandItemBankCalibration

ScoreReporting

EstablishingCutScores

ItemDevelopment.................................................................................................................................9–10

GuidingPrinciplesofItemConstruction

TestValidation.....................................................................................................................................11–12

FieldTestingandAnalysis

NationalCenterforResponsetoInterventionReview........................................................................13–14

Reliability

Validity

ClassificationAnalyses

Addendum:ClassworksUniversalScreenersUpdate………………………………………………………………………….15

Research on Validity and Reliability of Classworks Universal Screeners • 3

Executive Summary

Purpose ClassworksUniversalScreenersareformalassessmentsusedtomeasurereadinessforgradelevelinstruction,helpidentifybaselinelearninglevels,andmeasuregrowth.TheUniversalScreenerswerespecificallydesignedforthepurposeofscreeningstudentswhomayneedadditionalinterventionandcanbeusedaspartoftheResponsetoIntervention(RtI)process.

Inadditiontoreportinganoverallscaledscorebasedonthetotaltest,Classworksprovidesstudentstrengthsandweaknessforkeystrands.Keystrandsincludeaminimumoffourtestquestionstoprovideareasonableestimateofstudentstrengthsandweaknesses.Thisinformation,whenusedinconjunctionwithotherdatasuchasHighStakesTestresultsandclassroomperformance,canhelpprovideastartingpointfordeterminingnextsteps.

Overview ClassworksUniversalScreenersincludemultipleformsateachlevelforlanguageartsandmathematics,gradesK–10.TheUniversalScreenersaretypicallyadministeredthreetimesayear:atthebeginningoftheschoolyeartoassessreadinessforinstructionforallstudents,mid-yeartomeasureprogressforRtItiersIIandIII,andend-of-yeartomeasureoverallgrowthfortheyear.Giventhatthetestisprimarilydesignedtoidentifyreadiness,thetestincludesmultiplegradelevelsofcontenttoallowsufficientreachforstudentswhomaybestruggling.

TheUniversalScreenersarebetween20and35itemsinlengthdependingonthegradeleveltargeted,andmustbeadministeredinasinglesitting.TwoparallelformsofeachScreenerweredeveloped;theseformsmeasuresimilarcontent.Thekindergartenlevelassessmentsareanexceptiontothisapproach,withtwodifferentformsreflectingearlierandlaterkindergartencontentgiventherapiddevelopmentatthekindergartenlevel.

Overalltestresultsarereportedasascaledscore.Scoringonaverticalscaleprovidesasinglepointofreferencetocompareindividualstudentgainsfromonetestadministrationtothenext,withinandacrossschoolyears.Measuringgrowthverticallyservesadualpurpose:totracklearninggainsforindividualstudentsandtodeterminewhetherlearningmustbeaccelerated.

ClassworksUniversalScreenershavebeenevaluatedbytheNationalCenterforResponsetoIntervention(NCRTI),andtheyreceivedthehighestreliabilityranking.

Research on Validity and Reliability of Classworks Universal Screeners • 4

Universal Screener Quick Guide

Item Description

Purpose Measure grade level readiness, help identify baseline, measure growth

Grades K–10 Math, K–10 Reading

Levels of coverage per test

Test includes multiple grade levels of content to allow sufficient reach to help identify strugglers (exception: Kindergarten)

Audio Audio support available for all grades

Length of test Must be taken in one sitting; 20–35 items depending on grade level/subject

Vertical scale? Yes. All scores are vertically scaled from K–10 for longitudinal tracking.

Output from test Average readiness scaled score of students by class, teacher, custom group, demographic, and/or grade level

Research on Validity and Reliability of Classworks Universal Screeners • 5

Test Design SEGMeasurement(SEG)hasbeeninstrumentalinthedesign,development,testing,andanalysisofClassworksUniversalScreeners.SEGisanassessment,measurement,andresearchfirmthatprovidesassessmentdesign,development,andimplementationservicesforK–12,highereducation,andcredentialingprograms.Theyhavedeliveredover100millionassessmentstotensofthousandsofschoolsandcollegesinall50states.

ClassworksUniversalScreenersweredesignedandbuiltfortheparticularpurposetheyserve.Forthisreason,theymeetallofthecriteriathatdefinequalityscreeners:theassessmentsarebrief,reliable,valid,equated,andmeasuredonaverticalscale.

SEGinitiallycreatedtheassessmentsbyhand-selectingitemsforeachlevelandformofthetests.Formswerethenequatedthroughfieldtestingandcalibrationsothateachmeasuresthesamesetsofskillsatthesamelevelofdifficulty.Individualtestitemsandtheassessmentsthemselvesweredesignedwithdiversityinmind:includingpopulationsofculturalandlinguisticallydiversestudents,andspecialneedsstudents.Guidingprinciplesforassessmentdesignwereintegratedintotheprocess,includingensuringallitemsarewritteninaclear,concisemannerandfreeofage,gender,ethnic,religious,ordisabilitybias.

TherearetwoparallelformsforeachtestingradesK–10.For2ndgradeandabove,thetestquestionsincludecontentfromthetargetgradelevelaswellasfromtwogradelevelsbelowthetarget.Giventhatthetestisprimarilydesignedtoidentifyreadiness,thetestincludesmultiplegradelevelsofcontenttoallowsufficientreachandenoughcontentcoverageforstudentswhomaybestruggling.Thetestsincludeapproximately50%ofthecontentfromthetargetgrade,approximately25%ofthecontentfromthegradebelow,andapproximately25%ofthecontentfromtwogradesbelow.The1stgradeassessmentcontainscontentfromboth1stgradeandkindergarten.Thekindergartenassessmentcontainscontentdrawnonlyfromkindergartenwithtwodifferentformsreflectingearlierandlaterkindergartencontent,giventherapiddevelopmentatthekindergartenlevel.

Research on Validity and Reliability of Classworks Universal Screeners • 6

Grade Level Number of Test Questions

Scored Number of Test Forms

Source of Test Questions

K Early 15 Reading/Language Arts; 15 Mathematics

1 100 % early K content

K Late 15 Reading/Language Arts; 15 Mathematics

1 50% later K content; 50% early K content

Grade 1 20 Reading/Language Arts; 20 Mathematics

2 50% grade 1 content; 50% grade K content;

Grade 2 25 Reading/Language Arts; 25 Mathematics

2 50% grade 2 content; 25% grade 1 content; 25% grade K content

Grade 3 25 Reading/Language Arts; 25 Mathematics

2 50% grade 3 content; 25% grade 2 content; 25% grade 1 content

Grade 4 25 Reading/Language Arts; 25 Mathematics

2 50% grade 4 content; 25% grade 3 content; 25% grade 2 content

Grade 5 30 Reading/Language Arts; 30 Mathematics

2 50% grade 5 content; 25% grade 4 content; 25% grade 3 content

Grade 6 30 Reading/Language Arts; 30 Mathematics

2 50% grade 6 content; 25% grade 5 content; 25% grade 4 content

Grade 7 30 Reading/Language Arts; 30 Mathematics

2 50% grade 7 content; 25% grade 6 content; 25% grade 5 content

Grade 8 30 Reading/Language Arts; 30 Mathematics

2 50% grade 8 content; 25% grade 7 content; 25% grade 6 content

Grade 9 30 Reading/Language Arts; 30 Mathematics

2 50% grade 9 content; 25% grade 8 content; 25% grade 7 content

Grade 10 30 Reading/Language Arts; 30 Mathematics

2 50% grade 10 content; 25% grade 9 content; 25% grade 8 content

Research on Validity and Reliability of Classworks Universal Screeners • 7

Vertical Scale and Item Bank Calibration Theverticalscalewasdevelopedthroughalinkedtestingdesignsuchthatallitemscouldbecalibratedtogetherandplacedonthesamecontinuum.Thefieldtestdatawasusedtocalibratetheitemsandtests.Calibrationisaprocessthatplacesalltestsandalltestitemsonacommonscale.ThiswasusedtocreateasinglecommonscalefromgradeKtograde10.Inthisway,scoresfromthetestsarecomparableacrossformsofthetestandovertime.Agivenscorewillhavethesamemeaningregardlessofwhichformisadministeredandregardlessofwhenthestudenttakesthetest.

Theassessmentsdevelopedincludesetsofoverlappingitemsacrosstestformsatthesamelevelandacrossadjacentgradelevels.Thisfacilitatesthecalibrationoftheitembank.SEGcalibratedtheitemsusingIRT(oneparameterRaschmodel)tocreateacommonverticalscaleacrossgradelevels.

TherawnumberofcorrectanswersreflectsaparticularRaschscore(rangingfrom-4to+4),whichisthentranslatedtothefinalscaledscoreforreportingpurposes.Whenthestudentcompleteshis/herscreener,thescaledscoreandkeystrandlevelperformancefeedbackareimmediatelyavailableforreporting.TheapproachtakeninthecalibrationandscoringprocessprovidesRaschextrapolatednorms.

Asafurthermeasuretoensurethatthetestquestionsandassessmentsaretechnicallysoundandareperformingasexpected,SEGanalyzesthedatafromthefalltesttakerseachyear.

CurriculumAdvantagereviewstheresultsfromthefalltomakesurethetestsareperformingwell.SEGexaminesthestatisticsforthetestsasawhole(e.g.,averagescores,distributionofscores)andthestatisticsforindividualtestitems(e.g.,questiondifficultyandtheabilityofthequestiontodistinguishbetweendifferentlevelsofstudentperformance).Basedonthisanalysis,CurriculumAdvantagefurtherrefinesthetests,revisingandreplacingquestionsasnecessary.

Duringthe2014-2015itemanalysis,CurriculumAdvantagemadethedecisiontoupdatetheUniversalScreener.Newitemswerecreatedandfieldtestedduringthe2015-2016schoolyearandofficiallyaddedtotheassessmentforthe2016-2016schoolyear.TheUniversalScreenerupdatesoverview,goal,andconstraintscanbefoundonaddendumI.

Score Reporting ScoreReportingisdesignedtoprovidereliableinformationusefulforunderstandingoverallstudentreadinessandestimatedstudentstrengthsandweaknessesinspecificstrandsmeasuredbythetest.Scoresarebasedonscaledscoresthatallowallteststobeplacedonacommonscaleregardlessofwhichformisadministeredandatwhatgradelevel.Resultsarereportedatthetotaltestandkeystrandlevel.Strandsassessedvarybygradelevelandsubjectoftheassessment.Thisapproachprovidesareasonablebalancebetweentheneedforinformationonstudentstrengthsandweaknessestheneedforsufficientscorereliability.

Rawscoresarecalculatedasthetotalnumberofitemsansweredcorrectlyonthescreener.Performanceontheassessmentsisreportedasascaledscoreonaverticalscalerangingfrom200to800spanningacrossgradesK–10.Feedbackisalsoprovidedatthekeystrandlevel.(seeVerticalScaleandItemBankCalibrationabove).

Thesestrandsweredeterminedbasedonananalysisofover31statestandardsandthenre-examinedwiththeintroductionoftheCommonCoreStateStandards.CrosswalksareavailabletoshowtherelationshipbetweentheClassworksstrandsandthesestatestandards.

Reading:

Research on Validity and Reliability of Classworks Universal Screeners • 8

• Grammar/Usage/Mechanics

• ReadingComprehension

• StudySkills

• WordAnalysis

• Writing

• WritingProcess

Math:

• Algebra

• Geometry

• MathematicalProcesses

• Measurement

• Numeration

• Operations

• Patterns

• StatisticsandProbability

Strandsthatarereportedarerequiredtoincludeaminimumoffourtestquestionstoprovideareliableestimateofstudentstrengthsandweaknesses.

CurriculumAdvantageestablishesscorerangesthatreflectlevelsofstudentreadinessontheassessments.Therearevariousapproachesthatcanbeusedtoidentifyappropriatecutpointsdefininglevelsofreadiness.BelowdetailsthemethodSEGrecommendedforcreatingappropriatecutpoints.

Establishing Cut Scores ThecutscoresforClassworksUniversalScreenersforgrades3–8wereestablishedusingatwo-stagestandardsettingprocess.Inthefirststage,aBookMarkingProcedure(CizekandBunch,2007)wasapplied.Thiswasfollowedbyasecondstage,inwhichthestageonepotentialcutscoreswerereviewedinlightofstudentperformancedataandexpectationsforstudentperformance.

TheBookMarkingProcedureisanitemmappingapproachtostandardsettingdevelopedinthe1990’s(CizekandBunch,2007).TheBookMarkingProcedureasemployedforClassworksinvolvesthereviewofanorderedtestbookletcontainingalltheitemsforagiventestarrangedinorderofdifficultyfromeasiesttohardest(Mitzel,H.C.,Lewis,D.M.,Patz,R.J.,andGreen,D.R.,2001).ThedifficultyvaluesforthisprocedurewereobtainedfromtheRaschitemcalibrationsobtainedfromtheoriginaldevelopmentofthescreeners.BasedontheproceduressuggestedbyMitzel,etal(2001),contentexpertsreviewedtheordereditembookletandwereaskedtoidentify(“bookmark”)theitemrepresentingthefirstitemforwhichtheminimallyproficientstudentwouldbeunlikelytoanswertheitemcorrectly(lessthan50%probability).Thedifficultyoftheitemidentifiedservedasthepotentialcutscoreemergingfromstageoneofthestandardsetting.

Inthesecondstage,thepotentialcutscoresproducedinstageoneoftheprocesswerereviewedagainstthedistributionofscoresfromoperationaltestingtoevaluatethenumberandpercentageof

Research on Validity and Reliability of Classworks Universal Screeners • 9

studentsthatwould“pass”andthenumberandpercentageofstudentsthatwould“fail”theassessmentbasedonthestageonepotentialcutscores.Insomecases,thestageonepotentialcutscorewasraisedorloweredbasedontheimpactratesorexpectedperformanceforthestudents.

Item Development TheClassworksassessmentitembankwasdevelopedbyateamofcontentexpertsfromathird-partydeveloper,aleaderinthecreationofhigh-stakescontentforassessmentsproducedbystatesandtestingcompanies.Thetestitemshavebeenreviewedandrefinedthroughamulti-stepprocessinvolvingmembersofthistestdevelopmentteam.

TheUniversalScreenersarecomposedof100%four-response-optionmultiple-choicetypequestions.TheitemswerespecificallydevelopedfortheUniversalScreenerorwereselectedandmodifiedfromtheexistingCurriculumAdvantageitembank.

Guiding Principles of Item Construction Inordertoensureitemreliabilityandvalidity,guidingprincipleswereusedintheitemconstructionprocess.

ItemConstruction:

• Itemsarewritteninclear,conciselanguageattheappropriategradelevel

• Itemsarewrittenwithoutage,gender,ethnic,religious,ordisabilitybias

• Eachitemsetmeasuresbothbasicknowledgeandhigher-orderthinkingskills

• Itemsadheretotheobjectivesbeingassessed

• Itemsareconstructedinaconsistentmanner

• Itemcontentiscurrentandrelevanttoaudience

• Itemsarewrittenintheformofquestions,avoidingopenendedornegativestems

ItemResponseMeasurement:

• Itemsshowconsistencyofstudentresponse

• Resultscanbegeneralizedtothepopulation

• Itemsarecalibratedtoensurethatscoreshavesimilarmeaningovertime

• Aftercalibration,itemsareplacedonadevelopmental/verticalscaletoallowfortheaccuratecomparisonofstudentsovertimeandacrossuseoftheitems

• Studentperformancecanbepredictedfromitemresponse

• Targetgoalsandnormscanbedevelopedfromitemresponsemeasures

Questions/Stems:

• Stemsandreadingpassageswillbeatgrade-levelreadabilityandmustassesstheskillbeingtestedaccordingtothelevelofBloom’sindicated

• Stemsarefreeofage,gender,ethnic,religious,ordisabilitystereotypesorbias

Research on Validity and Reliability of Classworks Universal Screeners • 10

• Stemsarewritteninquestionformatanddonotrequiresentencecompletion,true/false,andfill-in-the-blank

• Eachstemhasonlyonecorrectanswer

Answers/Distractors:

• Answersarepresentedinamultiple-choiceformatwithfouransweroptions

• Distractorsarewritteninalogicalorder(alphabetical,chronological)

• Distractorsareapproximatelythesamelengthandmustbegrammaticallyparallel

• Distractorsareplausibleandshouldnotcontaingrammaticalclues

• Distractorsaddressavarietyofcommonerrorsratherthanthesameerror

• Distractorrationaleisprovidedforeachanswerchoice

Thetestitemsaremultiple-choicequestions,offeringanefficientandreliablewaytoassessstudents’knowledgeandskills.Allitemshaveonesinglebestanswerandresponsesarescoredascorrectorincorrect.Multiplechoicemeasureshaveadvantagesoverothertypesofitemresponse,inthattheyarecapableofcoveringalargeamountofcontentinarelativelyshortperiodoftime.Moreover,theycanachievehighlevelsofreliability,providinguserswithaconsistentandstablemeasureofstudentknowledgeandskillsovertime.

Research on Validity and Reliability of Classworks Universal Screeners • 11

Test Validation Followingthecreationofthetests,SEGconductedasecondverificationoftheassessmentitems.Theverificationprocessconsistedofacomprehensivealignmentreviewtoestablishthevalidityoftheassessmentitemsandtodetermineiftheywereaccuratelyalignedtotheobjectivestheypurporttomeasure.

CurriculumAdvantagecontinuestopartnerwithSEGtoensurethattheteststhemselves,aswellasassessment-relateddecisions,arepsychometricallysound.Thisongoingprocessincludesfurtherstatisticalanalysis,itemcalibration,adjustmentstothecutscoresontheverticalscale,andoverallevaluationofthequalityofClassworksUniversalScreeners.

Field Testing and Analysis Toensurethatthetestitemsandassessmentsarepsychometricallysound,SEGanalyzedtheitemandtestperformancedatabasedonthefieldtesttobeconductedbyCurriculumAdvantageinthefallof2009,thefallof2010andthefallof2011.CurriculumAdvantagecollectedinformationfromapproximately200–300studentspertestformthefirstyear,withexponentialincreasesineachofthefollowingyears.SEGanalyzestheresultseachyear,providingbothtestanditemlevelanalysesincluding:

• Overalltestandsubteststatistics

o Mean

o StandardDeviation

o Reliability

o SEM(StandardErrorofmeasure)

o OverallModelFit

o FrequencyDistribution

• Itemstatistics

o PValue(percentcorrect)

o Pointbiserialcorrelation(measureofitemdiscrimination)

o Logitvaluefrom-3to+3(personanditemindependentmeasureofitemdifficulty)

o ItemInfitstatistic

o ItemOutfitstatistic

SEGreviewstheitemstatistics,andanyitemthatdoesnotdemonstratesuitablepsychometriccharacteristicsarerecommendedforreplacement.Thesestatisticshelpensureon-goingrelevanceandvalidity.

Research on Validity and Reliability of Classworks Universal Screeners • 12

HerearesomeofthestatisticsSEGcalculates:TotalTestStatistics

• AverageScoreontheAssessment–SEGcomputestheaverage(mean)scoreachievedbystudentstakingtheassessment.Thishelpsusdetermineiftheassessmentisproperlytargetedtothelevelofthestudentsassessed.

• VariationandDistributionofScoresontheAssessment–SEGcalculatestheamountofvariability(standarddeviation)inthetestscoresachievedbystudentstakingtheassessment.Thisisanotherindicatorofhowwellthetestistargetedtothelevelofstudentsassessed.

• Reliability–SEGcomputesthereliabilityofthetesttoensurethatthetestisconsistentlymeasuringtheknowledgeandskillsmeasuredbytheassessmentacrossformsofthetestandisstableovertime.

• ScoreAccuracy–Anyassessmentscoreissubjecttovariationwhenastudenttakesthetestmultipletimes.SEGestimatestheamountofvariationexpectedforastudentscore(StandardErrorofMeasure;SEM);thisisanindicatorofscoreaccuracy.

IndividualQuestionStatistics

• QuestionDifficulty–SEGcomputesthepercentageofstudentswhoanswerthequestionscorrectly;thisisanindicatorofthedifficultyofthequestion

• QuestionDifferentiation–SEGcomputestherelationshipbetweenstudentperformanceoneachindividualquestionandtheassessmentasawhole;thisisanindicatorofhowwellthequestiondifferentiatesbetweenthosestudentswhohavetheknowledgeandskillsmeasuredbytheassessmentandthosewhodonothavetheknowledgeandskills.

Research on Validity and Reliability of Classworks Universal Screeners • 13

National Center for Response to Intervention Review TheNationalCenterforResponsetoInterventionuseddatacollectedduringthe2009–2010and2010–2011schoolyearstofurtherevaluatethequalityofClassworksUniversalScreeners.FollowingtheimplementationofthefinalUniversalScreenerforms,performanceonthescreenersandhigh-stakestestswereusedtoinvestigatethevalidityandclassificationaccuracyoftheUniversalScreeners.

Reliability Testreliabilityreferstothetestscoreconsistencyandaccuracy.Reliabilityvaluesrangefrom0to1.00,withhighervaluesindicatinghigherreliability.Usingthedatacollectedfromthemulti-statefieldtest,theaveragereliabilityforUniversalScreenersforreadingfromgradesK–10wasfoundtobe0.90.FormathematicsingradesK–10,theaveragereliabilitycoefficientis0.88.ThesehighinternalconsistencymeasuresindicatethattheUniversalScreenersareabletoprovideareliablemeasureofstudentperformanceinreadingandmathematics.

Validity Testvalidityreferstotheappropriatenessofthetestsforitsintendedpurpose.Evidenceforvalidityofthetestsisgatheredfromtheitemdevelopmentandtestdevelopmentprocessaswellasstatisticalanalyses.

ClassworksUniversalScreenerswerespecificallydesignedforthepurposeofscreeningstudentswhomayneedadditionalintervention.TheitemsandtestshavebeenfieldtestedandevaluatedusingItemResponseTheorytoensurethattheitemsandtestsareperformingasexpected.TherigorousprocessesfollowedforitemandtestdevelopmentprovidesupportforthecontentvalidityoftheUniversalScreeners.

PerformanceontheUniversalScreenershasbeencomparedtootherhigh-stakesteststoensurethatperformanceontheUniversalScreenersisconsistentwithperformanceonotherassessments.Duringthe2010–2011schoolyear,ClassworksUniversalScreenerdataandhigh-stakestestdatafromover11,300studentsinalargesouthernstatewerecollectedtoevaluatethecorrelationbetweentheUniversalScreenerscoresandthehigh-stakestestscores.

RulesofThumb–Armstrong(2006),reiteratingtherecommendationsofSmith(1984)suggeststhefollowingrulesofthumbforvaliditydataexaminingonemeasureofaconstructinrelationtoanothermeasureofthatconstruct:

• Over.50excellent

• .40to.49good

• .30to.39acceptable

• Lessthan.30poor

Onaverage,thecorrelationbetweentheClassworksUniversalScreenerscoresandthehigh-stakestestscoreswas0.46formathematicsand0.63forreading.Further,thescreenerswerefoundtoagreewithothermeasuresinclassifyingstudentsas“notat-risk”93%ofthetimeinmathematics,and97%ofthetimeinreading.ThesecorrelationsbetweentwotestsmeasuringsimilarconstructssupporttheconstructvalidityoftheinterpretationoftheUniversalScreenerscores.

Research on Validity and Reliability of Classworks Universal Screeners • 14

Classification Analyses Inadditiontothereliabilityandvalidityofthemeasures,theUniversalScreenerswerealsoevaluatedwithregardtotheaccuracyofclassifyingstudentsasat-riskincomparisontoanindependentmeasure.Itisimportantthatthescreenersareabletoappropriatelyidentifystudentswhoareat-riskandthosewhoarenotat-risk.Inparticular,itiscriticalthatat-riskstudentsareproperlyidentifiedasbeingat-risktogettheinstructionalhelpthattheyneed.

Inordertoevaluatetheclassificationaccuracy,ClassworksUniversalScreenersclassificationswerecomparedtotheclassificationsdeterminedbyperformanceonhigh-stakesstateassessmentsinreadingandmath.Thecomparisonsprovidedaclassificationofstudentsintooneoffourcellsina“confusionmatrix.”Studentscouldbeclassifiedasat-riskornotat-riskbasedonthepassingstatusforeachofthetwoassessmentsasPass-Pass,Pass-Fail,Fail-Pass,orFail-Fail.Theclassificationanalyseswereperformedbyevaluatingsensitivityandspecificity.

Negativepredictivepowerisameasurethatestimatestheaccuracyofclassifyingstudentsas“notat-risk.”Ausefulscreeningtoolshouldhaveveryhighnegativepredictivepowersuchthatat-riskstudentsarenotmisidentifiedasnotbeingat-risk.Usingtestdataformorethan11,300students,theUniversalScreenerswerefoundtohave93%and97%negativepredictivepowerformathandreading,respectively.

ClassworksUniversalScreeners Update TechnicalReport

ThisdocumentprovidesasummaryofthetaskscompletedinupdatingtheClassworksUniversalScreenersforreadinginmathingradesK–HighSchool.

August2016

1©SEGMeasurement.

ContentsOverview......................................................................................................................................................2

GoalsandConstraints...............................................................................................................................2

Tasks.........................................................................................................................................................2

Investigatingstrandsandexpectationsofotherassessments.............................................................3

FinalizingplansfortheupdatestobemadetotheUniversalScreeners.............................................6

Selectingitemsforreplacement.........................................................................................................10

Developingnewitems........................................................................................................................10

Producingnewitems..........................................................................................................................11

Creatingfieldtestformsandadministeringthefieldtest..................................................................11

Analyzingfieldtestdata.....................................................................................................................14

Evaluatingfinaltestformsandscoring..............................................................................................14

2©SEGMeasurement.

OverviewThisdocumentprovidessupportinginformationregardingtheupdatestotheClassworksUniversalScreenersinReadingandMathforgradesK–10thatwillbeinplaceofficiallyforthe2016-2017schoolyear.ThisdocumentprovidesthefinalplansthatwereexecutedbetweenMarch2015andJuly2016andprovidesstatisticalinformationregardingtheitemsandforms.

GoalsandConstraintsThegoalsforthisprojectweretomodifytheUniversalScreenerstobemorereflectiveofthelatestmultiplechoiceitemsandexpectationsofstudentsinK-12education,whileatthesametimekeepingtheClassworksScreenersconsistentwiththecurrentforms.Itwasagreedthatthisprojectwouldincludethedevelopmentof125newReadingand125newMathitemsforuseinthenewUniversalScreenerforms.Furthergoalsandguidelinesarenotedbelow.

• Allitemswillbefour-choicemultiplechoicewithasinglecorrectanswer.• Therewillbenoaudioorvideopassagesassociatedwiththeitems.

o GradesK-2formswillhavetext-to-speechsupport.Anynewitemswrittenwillgetthisappliedsotheentirefieldtestformhasthissupport.

o Theremaybeconsiderationforapassagetobeavideooraudioclip,ifitcanbefullyownedandhosted(soastoavoidlinksexpiring)andifthedeliverycansupportit.

• Thetestlengthswillremainconsistentwiththecurrenttestlengthsofscoreablecontent(fieldtestlengthswillbelonger).

• Themajorityoftheitemsonthecurrentscreenerswillremainonthenewscreeners.• Thenewcontentshouldbeasseamlessaspossiblewiththecurrentcontent.• Aswiththecurrentforms,anystrandwithatleast4itemswillbeconsideredakeystrandand

willbelinkedtoinstructionalcontent.• AlloftheitemsmustaligntothecurrentClassworkscontenthierarchy.

(subject/grade/strand/skill/objective–listedintheAppendix)o Therewillbenochangesto,combinationsof,oradditionstothestrands,skills,or

objectives.• Fieldtestformswillincludetheentirecurrentformplusnewitemsforfieldtesting.Scoring

duringthefieldtesttimeperiodwillcontinuetobebasedonthescoreditemsonthecurrentforms.

• ReportingonstudentperformanceonthefinalnewScreenerswillneedtobeabletobecomparabletohistoricalperformanceonpriorforms,whetherbyusingthesamescaleorprovidingatranslationofnewtooldscoringforcomparison.

• Theupdatestotheformsshouldbeasseamlessaspossible.TasksThefollowingkeytasksinvolvedinupdatingtheUniversalScreenersaresummarizedinthisreport.

1. Investigatingstrandsandexpectationsofotherassessments

3©SEGMeasurement.

2. FinalizingplansfortheupdatestobemadetotheUniversalScreeners3. Selectingitemsforreplacement

4. Developingnewitems

5. Producingnewitems

6. Creatingfieldtestformsandadministeringthefieldtest

7. Analyzingfieldtestdata

8. Evaluatingfinaltestformsandscoring

InvestigatingstrandsandexpectationsofotherassessmentsAspartoftheinitialplanningstages,manyassessmentsandstandardswerereviewedtogatherinformationonthelatestexpectationsofstudentsinreadingandmath.ThiswastohelpmeetthegoalthatthechangestotheUniversalScreenerswouldhelptobringtheformsmoreinlinewithexpectationsofothercommonassessmentsandstandards.

TheCommonCoreReading/ELAstrandsaresummarizedinthefollowingtable.

Table1:CommonCoreReading/ELAStrands

Area

Strand

Grade

K 1 2 3 4 5 6 7 8 9 10

Reading

Literature-keyideasanddetails Y Y Y Y Y Y Y Y Y Y Y

Literature-craftandstructure Y Y Y Y Y Y Y Y Y Y YLiterature-integrationofknowledgeandideas

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Literature-rangeofreadingandleveloftextcomplexity

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Inf.Text-keyideasanddetails Y Y Y Y Y Y Y Y Y Y Y

Inf.Text-craftandstructure Y Y Y Y Y Y Y Y Y Y Y

Inf.Text-integrationofknowledgeandideas Y Y Y Y Y Y Y Y Y Y YInf.Text-rangeofreadingandleveloftextcomplexity

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

FoundationalSkills-Printconcepts Y Y Y Y Y Y Foundationalskills-phonologicalawareness Y Y Y Y Y Y

Foundationalskills-phonicsandwordrecognition

Y

Y

Y

Y

Y

Y

Foundationalskills-fluency Y Y Y Y Y Y

Language

ConventionsofStandardEnglish Y Y Y Y Y Y Y Y Y Y Y

KnowledgeofLanguage Y Y Y Y Y Y Y Y Y

VocabularyAcquisitionandUse Y Y Y Y Y Y Y Y Y Y Y

4©SEGMeasurement.

Writing

Texttypesandpurposes Y Y Y Y Y Y Y Y Y Y Y

ProductionandDistributionofWriting Y Y Y Y Y Y Y Y Y Y Y

Researchtobuildandpresentknowledge Y Y Y Y Y Y Y Y Y Y Y

Rangeofwriting Y Y Y Y Y Y Y Y

SpeakingandListening

ComprehensionandCollaboration Y Y Y Y Y Y Y Y Y Y Y

PresentationofKnowledgeandIdeas Y Y Y Y Y Y Y Y Y Y Y

LiteracyinHistory/Social

Studies,Science,&TechnicalSubjects

Keyideasanddetails Y Y Y Y Y

Craftandstructure Y Y Y Y Y

Integrationofknowledgeandideas Y Y Y Y Y

Rangeofreadingandleveloftextcomplexity Y Y Y Y Y

TheGeorgiaMilestoneAssessmenttestsingrades3–8includethefollowinghighlevelskillsforELA:

• ReadingandVocabulary• WritingandLanguage

TheNationalAssessmentofEducationalProgress(NAEP)forReadingincludesthefollowingskills:

• LiteraryandInformationaltext

o Locateandrecallo Integrateandinterpreto Critiqueandevaluateo Vocabulary

TheCommonCoreMathstrandsaresummarizedinthefollowingtable.

Table2:CommonCoreMathematicsStrands

Grade/Course

Strand

Grade

K

1

2

3

4

5

6

7

8 HS

-Num

ber&

Qua

ntity

HS-Algebra

HS-Functio

ns

HS-Geo

metry

HS-Stats&

Probability

K-8

CountingandCardinality Y OperationsandAlgebraicThinking Y Y Y Y Y Y NumberandOperationsinBase10 Y Y Y Y Y Y NumberandOperations-Fractions Y Y Y MeasurementandData Y Y Y Y Y Y Geometry Y Y Y Y Y Y Y Y Y RatiosandProportions Y Y TheNumberSystem Y Y Y ExpressionsandEquations Y Y Y

5©SEGMeasurement.

Functions Y StatisticsandProbability Y Y Y

HS

Numberand

Quantity

TheRealNumberSystem

Y

Quantities Y ComplexNumberSystem Y VectorandMatrixQuantities Y

HSAlgebra

SeeingStructureinExpressions Y ArithmeticwithPolynomialsandRationalExpressions

Y

CreatingEquations Y ReasoningwithEquationsandInequalities Y

HSFunctions

InterpretingFunctions Y BuildingFunctions Y Linear,Quadratic,andExponentialModels Y TrigonometricFunctions Y

HSGeometry

Congruence Y Similarity,RightTriangles,andTrig Y Circles Y ExpressingGeometricPropertieswithEquations

Y

GeometricMeasurementandDimension Y ModelingwithGeometry Y

HSStatistics

andProbability

InterpretingCategoricalandQuantitativeData

Y

MakingInferencesandJustifyingConclusions

Y

ConditionalProbabilityandRulesofProbability

Y

UsingProbabilitytoMakeDecisions Y

TheGeorgiaMilestoneAssessmenttestsingrades3–8includethefollowingstrandsforMath:

• OperationsandAlgebraicThinking:Grades3–5• NumberandOperations:Grade3• NumberandOperationsinBase10:Grades4-5• NumberandOperations:Fractions:Grades4-5• MeasurementandData:Grades3-5• Geometry:Grades3–8• TheNumberSystem:Grades6-7• RatiosandProportions:Grades6–7• StatisticsandProbability:Grades6–8• Numbers,Expressions,andEquations:Grade8• ExpressionsandEquations:Grades6–7• AlgebraandFunctions:Grade8

6©SEGMeasurement.

TheNationalAssessmentofEducationalProgress(NAEP)mathematicsassessmentcoversthefollowingstrands:

• Algebra• Numberpropertiesandoperations• Measurement• Geometry• Dataanalysis,statisticsandprobability

FinalizingplansfortheupdatestobemadetotheUniversalScreenersWemaderecommendationsforchangestothestrands(particularsonconsolidating,renaming,adding,orexpanding)andafterinternalreviewoftheimpactonthesystemandbenefitsofmakingthechanges,CurriculumAdvantagedeterminedthatthestrandswillremainconsistentbetweenthecurrentUniversalScreenerformsandthenewUniversalScreenerforms.Ratherthanchangingthestrands,thefocusisonincreasingthequalityoftheitemsincludedwithinthestrands.

Table3:ClassworksReading/ELAStrandsandCurrentUniversalScreenerCoverage

Grad

e

Grammar/U

sage/M

echanics

Read

ing

Stud

ySkills

WordAn

alysis

Writing

WritingProcess

notcovered

-Listen

ing/Speaking/Viewing

Grand

Total

K 9 1 5 151 2 10 7 1 202 3 12 1 8 1 253 7 9 3 4 1 1 254 6 10 2 5 1 1 255 7 10 3 7 1 2 306 7 11 3 6 1 2 307 7 11 4 6 2 308 8 11 2 6 3 309 8 13 3 4 2 30

10 9 13 5 3 30

7©SEGMeasurement.

Table4:ClassworksMathematicsStrandsandCurrentUniversalScreenerCoverage

Grade

Algebra

Concep

tsofC

alculus

Geo

metry

Mathe

maticalProcesses

Measuremen

t

Num

eration

Ope

ratio

ns

Patterns

Statisticsa

ndProbability

Trigon

ometry

Grand

Total

K 2 3 5 2 2 1 151 1 5 4 4 4 1 1 202 2 4 3 5 4 2 2 3 253 2 2 2 6 1 5 2 5 254 1 5 1 3 4 4 1 6 255 5 8 4 4 1 2 1 5 306 6 1 8 3 1 3 3 1 4 307 6 2 6 2 4 2 2 1 5 308 8 1 8 1 4 1 2 1 4 309 8 7 5 2 1 1 1 4 1 30

10 8 6 5 4 1 1 4 1 30

ThefollowingdecisionsweremadeinconjunctionwithCurriculumAdvantagewithregardstotheupdatestotheUniversalScreeners:

o Eachformwouldhaveapproximately20%oftheformreplacedwithnewitemsthataligntothe

currentClassworksobjectives.(TheobjectivesforeachgradeandsubjectweregatheredthroughtheClassworksitembankandincludedinAppendixA.)

o Itemswillbeconsideredforreplacementwithanewitembasedonthequalityofthecurrentitem,theimportanceoftheobjectivemeasured,andtheabilityoftheitemtomeasureon-gradereadiness.

o Itemsthatarereplacedmaybereplacedwithanewitemmeasuringthesameobjective,adifferentobjectivewithinthesamestrand,oranobjectiveinadifferentstrandthatisinneedofmorecoverage.

o Inbothreadingandmath,allnewitemswillbesinglebestanswermultiplechoiceitems.o Itemsmaybeassociatedwithoneormorepassagesorimages.Someitemsmayneedtobe

administeredtogetherinsequenceasaset(i.e.,agroupofitemsthatareallassociatedwiththesamepassage(s)).

o Itemswillallbeindependent(notrelatetoorbuildoneachother),eveniftheyrelatetothesamepassageorstimulus.

o Newitemsmaybeusedonmultipleformsacrossorwithingrades(tofollowsimilaroverlapofcurrentforms),butduplicateusagewillonlycountasoneitemoutofthe125thatwillbedeveloped.

8©SEGMeasurement.

o Thefieldtestformswillcontaintheentirecurrentscoreableformsplusadditionalnon-scoreditemsforfieldtesting.Thiswillallowforthefieldtestformstocontinuetoserveasliveoperationalformsduringthe2015-2016schoolyear.

o Thenewformswillmaintainthecurrentgradelevelcoverageoftheforms.o Allofthenewitemswillbefieldtestedtogatherdata.o Alinkedformdesignwithshareditemswillbeusedsothattheentirepoolofnewitemswithina

subjectcanbecalibratedwiththecurrentpool.o Afterthefieldtest,theitemsandplannedfinalformswillbeevaluated.o Scoringandcomparabilitytothecurrentformswillbeevaluatedtodeterminewhetherchanges

arewarranted.

Table5showstheplannednumberofitemsdevelopedforeachgrade(roughly20%ofeachform).Theactualitemdevelopmentmatchedtheseplans.Tables6and7showthebreakdownofgradelevelcoverageforeachform,whichremainconsistentfromthecurrentscoreableitemstothenewscoreableitems(afterfieldtesting).

Table5:NumberofNewItemsPerForm

Grade Reading MathK 3 31 3 32 5 53 7 74 6 65 7 76 6 67 7 78 6ononeform,7ontheother 6ononeform,7ontheother9 6 610 6 6Total 125 125

Table6:ItemGradeLevelCoverageonReadingScreeners

READING ItemGradeLevelForm K 1 2 3 4 5 6 7 8 HS NGradeKReadingScreener A 15 15B 15 15Grade1ReadingScreener A 8 12 20B 8 12 20Grade2ReadingScreener A 5 7 13 25

9©SEGMeasurement.

B 5 7 13 25Grade3ReadingScreener A 7 7 11 25B 7 7 11 25Grade4ReadingScreener A 6 7 12 25B 6 7 12 25Grade5ReadingScreener A 7 8 15 30B 7 8 15 30Grade6ReadingScreener A 7 8 15 30B 7 8 15 30Grade7ReadingScreener A 7 8 15 30B 7 8 15 30Grade8ReadingScreener A 7 8 15 30B 7 8 15 30Grade9ReadingScreener A 7 8 15 30B 7 10 13 30Grade10ReadingScreener A 2 8 20 30B 2 8 20 30

Table7:ItemGradeLevelCoverageonMathScreeners

MATH ItemGradelevelForm K 1 2 3 4 5 6 7 8 HS NGradeKMathScreener

A 15 15B 15 15

Grade1MathScreener A 8 12 20B 8 12 20

Grade2MathScreener A 5 7 13 25B 5 7 13 25

Grade3MathScreener A 6 7 12 25B 6 7 12 25

10©SEGMeasurement.

Grade4MathScreener A 6 7 12 25B 6 7 12 25

Grade5MathScreener A 7 8 15 30B 7 8 15 30

Grade6MathScreener A 7 8 15 30B 7 8 15 30

Grade7MathScreener A 4 8 18 30B 4 8 18 30

Grade8MathScreener A 7 8 15 30B 7 8 15 30

Grade9MathScreener A 2 13 15 30B 2 13 15 30

Grade10MathScreener A 2 8 20 30B 2 8 20 30

SelectingitemsforreplacementEachcurrentformwasexportedfromtheClassworkssystemintoaseparateWorddocumentinpreparationforreviewandupdate.Foreachform,theplansfornumbersofitemstobereplacedandtheblueprintfortheformwerenotedinthedocument.Eachcurrentformwasreviewedbyexpertstoidentifythespecificitemsthatwouldprovidethemostvaluebybeingremovedfromtheformandreplacedwithanewitem.Theitemswerereviewedalongmultiplefacetsofqualityincludingthegeneralqualityoftheitem,reflectionofcurrentexpectationsoftheskill,importanceandrelevanceoftheitem,andhowwelltheitemmeasurestheobjectivewithintheskill/strand.

Foreachform,theitemstobereplacedwereidentifiedanditemwritingassignmentsweredeveloped.Inmanycases,thenewitemwoulddirectlyreplacethecurrentitemwithanotheritemthatbettermeasuredtheobjectivewithinthestrand.Insomecases,itwasdeterminedthatadifferentobjectiveshouldbecoveredwithinthestrandtobettercoverthefocusoftheparticularstrand.

DevelopingnewitemsAftertheitemstobereplacedwereidentifiedandtheitemneedswereidentified,itemdevelopmentbegan.Testdevelopmentexpertsinmathandreadingdevelopedthenewitemstomeettheitemspecifications.NewitemswerewrittentomaintainthecurrentstyleoftheUniversalScreenerswhilealsorepresentingnewerwaysofmeasuringtheobjectives.

11©SEGMeasurement.

Thedraftitemswerereviewedandeditedforstyle,grammar,contentaccuracy,appropriateness,andperceivedpsychometricquality.Thefinal250newitemswerethenpreparedforonlineproductionintotheClassworkssystem.

AppendixBcontainsthealignmentinformationforeachofthenewitems.

ProducingnewitemsOnceapprovedinternally,weindividuallyenteredtheitemsintotheClassworksitembankdatabase.Eachitemwascodedwithitssubject,gradelevel,strand,andskillaspertherequirementsofthesystem.ThesystemgeneratedauniqueassessmentsystemIDnumberforeachitem.Thecorrectanswerwasidentifiedandartworkwasuploaded.Theitemswerereviewedforproperrenderingontheplatform.

Aftertheitemspassedthroughtheinternalproductionreview,itemswerereleasedtoCurriculumAdvantageforreviewbytheircontentexperts.Inadditiontotheitemsbeingavailableonline,detailsabouttheitemsweresentexternallytoassistinreviewandtracking.AfterreviewbyCurriculumAdvantage,theitemswerefinalizedbySEGandapprovedinthesystembyCurriculumAdvantagecontentexperts.

OncetheitemswereapprovedinourlocalitembankinClassworks,CurriculumAdvantageprogrammersworkedtoporttheitemsintotheofficialClassworksitembankandactivatetheitemsforuseonthefieldtestforms.Duringthisprocess,theitemIDswereslightlymodifiedtoensuretheitemIDswereuniquewithintheClassworksitembankwhilealsoallowingfortrackingwiththeoriginalIDswhentheitemswerecreated.Allofthenewitemsintheofficialbankareinthe15,000s.Forexample,itemID8inthelocalbankisnow15008anditemID168isnow15168.

CreatingfieldtestformsandadministeringthefieldtestInordertoallowforcontinuedproductionuseoftheUniversalScreenerswhilealsofieldtestingthenewitems,thefieldtestformsweredevelopedtoincludetheentiresetofscoreableitemsonthecurrentformaswellasadditionalitemsforfieldtestingthatdidnotcounttowardsthestudent’sscore.Thefieldtestitemsincludeditemsthatwouldendupbeingonthatofficialformaswellasotherlinkingitemsthatwouldbedroppedfromtheform.Itemswereplacedstrategicallyacrossformssothatalloftheformswouldbelinkedandthattestdatafromstudentswhowereongrade,abovegrade,andbelowgradewereexposedtotheitems.Thefollowingtwotablessummarizetheplansforthefieldtestforms.AppendixCcontainstheitemleveldetailsonthefieldtestforms.

12©SEGMeasurement.

Table8:FieldTestPlansforReading/ELA

Form

Number of Scored Items on Current Form

Number of Non-Scored Items on Current Forms (these items will be dropped for new field test forms)

Current total number of items (scored and non-scored)

Number of Item Replacements (New Field Test Items that will eventually replace current scored items)

Number of linking items (additional non-scored linking items for field testing and calibration)

New Field Test Length (scored and non- scored items)

New Planned Screener Final Test Length (all scored items only, same number as current form scored item count)

Grade K Reading Screener A 15 5 20 3 3 21 15 B 15 5 20 3 3 21 15 Grade 1 Reading Screener A 20 5 25 3 3 26 20 B 20 5 25 4 2 26 20 Grade 2 Reading Screener A 25 5 30 5 3 33 25 B 25 5 30 5 3 33 25 Grade 3 Reading Screener A 25 5 30 7 3 35 25 B 25 5 30 7 3 35 25 Grade 4 Reading Screener A 25 5 30 6 4 35 25 B 25 5 30 6 4 35 25 Grade 5 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 6 Reading Screener A 30 5 35 6 3 39 30 B 30 5 35 7 2 39 30 Grade 7 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 8 Reading Screener A 30 5 35 6 3 39 30 B 30 5 35 7 2 39 30 Grade 9 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 10 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30

13©SEGMeasurement.

Table9:FieldTestPlansforMath

Form

Number of Scored Items on Current Form

Number of Non-Scored Items on Current Forms (these items will be dropped for new field test forms)

Current total number of items (scored and non-scored)

Number of Item Replacements (New Field Test Items that will eventually replace current scored items)

Number of linking items (additional non-scored linking items for field testing and calibration)

New Field Test Length (scored and non- scored items)

New Planned Screener Final Test Length (all scored items only, same number as current form scored item count)

Grade K Math Screener A 15 5 20 3 3 21 15

B 15 5 20 3 3 21 15

Grade 1 Math Screener A 20 5 25 3 3 26 20

B 20 5 25 3 3 26 20

Grade 2 Math Screener A 25 5 30 5 3 33 25

B 25 5 30 5 3 33 25

Grade 3 Math Screener A 25 5 30 7 3 35 25

B 25 5 30 7 3 35 25

Grade 4 Math Screener A 25 5 30 6 4 35 25

B 25 5 30 6 4 35 25

Grade 5 Math Screener A 30 5 35 7 2 39 30

B 30 5 35 7 2 39 30

Grade 6 Math Screener A 30 5 35 6 3 39 30

B 30 5 35 6 3 39 30

Grade 7 Math Screener A 30 5 35 7 2 39 30

B 30 5 35 7 2 39 30

Grade 8 Math Screener A 30 5 35 6 3 39 30

B 30 5 35 7 2 39 30

Grade 9 Math Screener A 30 5 35 6 3 39 30

B 30 5 35 7 2 39 30

Grade 10 Math Screener A 30 5 35 7 2 39 30

B 30 5 35 9 0* 39 30

*Grade 10 B form already has new field test items that are also on other forms/grades.

14©SEGMeasurement.

Thefieldtestformswereadministeredduringthe2015-2016schoolyearaspartofoperationalClassworksusageuntilsufficientdatawascollectedforeachform.CurriculumAdvantageexportedthefieldtestdataforanalysisinJune2016.

AnalyzingfieldtestdataSEGpreparedthefieldtestdataforanalysesformultiplepurposes:evaluatingtheitemqualityofthenewitems,evaluatingtheitemqualityofthecurrentitemsthatwillremainontheforms,calibratingthenewitemsintothecurrentpoolsofactiveitems,evaluatingthedifficultyofthetestforms,andreviewingtheverticalscalingacrosstheforms.

Theitemswerereviewedfirstintermsofpercentageofstudentsansweringcorrectly.Anyitemsthatwereansweredbyfewerthan25percentcorrectwerereviewedforaccuracy.Itemsthathaveveryfewpeopleansweringcorrectlymaysimplybeharditems,ortheymaybeitemsthatweremiskeyed,didnotrenderproperlyforansweringcorrectly(particularlyinthecaseswhereimages/graphswererequired),orpossiblyhadmultiplecorrectanswers.Thepointbiserialswerealsoreviewedforeachitem.Thepointbiserialprovidesameasureoftherelationshipbetweenperformanceontheitemandperformanceontheform.Allofthenewitemsweredeterminedtobefunctioningacceptablyandnomodificationsorreplacementswerewarranted.AsmallnumberofcurrentitemswereflaggedforcontentreviewinternallyatCurriculumAdvantageforpotentialmodificationtoimprovetheperformanceoftheitems.Theitemsflaggedforfurtherreviewwereitems8942,13064,and14628.

ThedetaileditemstatisticsareprovidedinAppendixD.Theformswerereviewedtocomparetheoveralldifficultyoftheplannednewformswiththedifficultyofthecurrentforms.ThenewformswerefoundtobeveryconsistentwiththecurrentformsasshowninAppendixE.Thesesimilaritieswereexpectedbasedonthefinaldesignandscopeoftheupdatestotheitemsontheforms.

EvaluatingfinaltestformsandscoringAfterthefieldtestdatawasevaluatedandthedefinitions(itemcomposition)ofthenewformswereconfirmed,weevaluatedthenewformstodeterminewhetheranychangestothescoringoruseofthedatawouldbewarranted.

Usingthedatacollectedduringthefieldtesting,wecalculatedtheestimatedreliabilityofthenewforms(includingthoseitemsthatwillbescoreableonthefinalnewforms).Reliabilitycanbethoughtofasameasureoftheconsistency,stability,andaccuracyofthescoring.Testscoreswithhighreliabilitywillproducesimilarscoresforstudentsiftheyweretoretakethetestwithoutfurtherinstructionortimepassing.Overall,thereliabilitiesforthenewUniversalScreenersareverystrong.Atthetailswheretherearefewerstudentstakingtheforms(specifically10thgrademath),thereliabilitiesareabitlower.Thereliabilitiesareaffectedbythedistributionofthescoresandthestudentswhotookthetestforms.ItisexpectedthatwithadditionaltesttakersandamoreconsistentusageoftheScreenersforthose

180©SEGMeasurement.

forms,thatwewouldseeimprovedreliabilityforthoseformswherethereliabilityiscurrentlyabitweakerthanotherforms.

Table10:FormReliability

MATH READINGKA 0.78 0.69KB 0.88 0.861A 0.96 0.961B 0.77 0.822A 0.98 0.972B 0.87 0.973A 0.96 0.983B 0.97 0.974A 0.95 0.964B 0.91 0.895A 0.95 0.975B 0.93 0.936A 0.91 0.976B 0.92 0.947A 0.80 0.927B 0.89 0.938A 0.85 0.948B 0.85 0.939A 0.77 0.89B 0.51 0.7510A 0.66 0.8410B 0.48 0.82

Theitemswerecalibratedwithinsubjectacrossallgradesandanchoredtothecurrentitempools.Thiswasconductedinordertoevaluatewhethertheitemsfitreasonablywithinthepoolandwhetherchangestotheverticalscalingwerewarranted.Giventheconsistencyofthenewformswiththecurrentforms,itisrecommendedthatthecurrentscalingandreportingbecontinued.Thiswillallowforlongitudinalreportinginthesystemwithoutchangestothesystemorincreasedcomplexityforteacherstointerprettheresultsandmakedecisions.TheitemlevellogitandfitdatafromtheverticalscalingisincludedwiththeitemlevelstatisticsinAppendixD.

ThenewupdatedUniversalScreenerformscanbeseamlesslyputintoproductionasplannedandcancontinuetobeusedasanintegralcomponentofthecompleteClassworkssystem.

Recommended