12
1 Measurement Options for Development of Sustainable Development Goal Indicator 4.2.1 Memo, Global Alliance to Monitor Learning, Taskforce on Target 4.2 Prepared by Hirokazu Yoshikawa, Abbie Raikes, and Alice Wuermli 1 June 2017 DRAFT In this memo we describe options for developing a measurement strategy at the global level for Target 4.2, and specifically Indicator 4.2.1 (“Proportion of children under 5 years of age who are developmentally on track in health, learning and psychosocial well-being, by sex”). Goals for such a strategy include development of a measure that is appropriate for cross-country comparison of levels of development in each of these major domains (health, learning and its subdomains, and psychosocial well-being); feasible to use in national monitoring and evaluation, in terms of cost and human capital resources (e.g., training intensity and ease in achieving reliability); vertically equitable in its units 2 across the span of birth through 60 months; and psychometrically sound in reliability and validity from standpoints of both classical test theory and other approaches such as IRT. Our memo has three parts: 1) Three options for a global measurement strategy; 2) weighing options at the intersection of validity and use; and 3) options for next steps. A. Three Options for A Global Measurement Strategy. Option 1. An existing measure could be chosen without adaptation as a single global measure for Indicator 4.2.1. The SDGs whenever possible aim to achieve global indicators that are comparable across countries. In contrast to the MDGs, the countries in question include all UN member nations, and thus cut across high-, middle- and low-income countries. The choice to select a single global indicator for 4.2.1 is thus quite formidable; and setting a goal by 2030 of a single assessment, drawn from previous learnings across other measures that have been analyzed for validity across multiple countries, is important to weight carefully. What domains of child development should such a measure assess? The language for Indicator 4.2.1 reflects a global consensus in the field of early childhood development regarding the 1 Yoshikawa and Wuermli, NYU Global TIES for Children Center; Raikes, University of Nebraska Medical Center. We thank Elise Legault, Baela Raza Jamil, select members of the Taskforce on Target 4.2. of GAML, and Ivelina Borisova for comments on previous drafts. 2 I.e., for a particular construct, a unit at one point of the scale’s distribution is comparable to a unit at another point of the scale’s distribution. GAML4/REF/14

Measurement Options for Development of Sustainable

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Measurement Options for Development of Sustainable

1

MeasurementOptionsforDevelopmentofSustainableDevelopmentGoalIndicator4.2.1

Memo,GlobalAlliancetoMonitorLearning,TaskforceonTarget4.2

PreparedbyHirokazuYoshikawa,AbbieRaikes,andAliceWuermli1

June2017DRAFT

InthismemowedescribeoptionsfordevelopingameasurementstrategyatthegloballevelforTarget4.2,andspecificallyIndicator4.2.1(“Proportionofchildrenunder5yearsofagewhoaredevelopmentallyontrackinhealth,learningandpsychosocialwell-being,bysex”).Goalsforsuchastrategyincludedevelopmentofameasurethatisappropriateforcross-countrycomparisonoflevelsofdevelopmentineachofthesemajordomains(health,learninganditssubdomains,andpsychosocialwell-being);feasibletouseinnationalmonitoringandevaluation,intermsofcostandhumancapitalresources(e.g.,trainingintensityandeaseinachievingreliability);verticallyequitableinitsunits2acrossthespanofbirththrough60months;andpsychometricallysoundinreliabilityandvalidityfromstandpointsofbothclassicaltesttheoryandotherapproachessuchasIRT.

Ourmemohasthreeparts:1)Threeoptionsforaglobalmeasurementstrategy;2)weighingoptionsattheintersectionofvalidityanduse;and3)optionsfornextsteps.

A. ThreeOptionsforAGlobalMeasurementStrategy.

Option1.AnexistingmeasurecouldbechosenwithoutadaptationasasingleglobalmeasureforIndicator4.2.1.

TheSDGswheneverpossibleaimtoachieveglobalindicatorsthatarecomparableacrosscountries.IncontrasttotheMDGs,thecountriesinquestionincludeallUNmembernations,andthuscutacrosshigh-,middle-andlow-incomecountries.Thechoicetoselectasingleglobalindicatorfor4.2.1isthusquiteformidable;andsettingagoalby2030ofasingleassessment,drawnfrompreviouslearningsacrossothermeasuresthathavebeenanalyzedforvalidityacrossmultiplecountries,isimportanttoweightcarefully.

Whatdomainsofchilddevelopmentshouldsuchameasureassess?ThelanguageforIndicator4.2.1reflectsaglobalconsensusinthefieldofearlychildhooddevelopmentregardingthe

1YoshikawaandWuermli,NYUGlobalTIESforChildrenCenter;Raikes,UniversityofNebraskaMedicalCenter.WethankEliseLegault,BaelaRazaJamil,selectmembersoftheTaskforceonTarget4.2.ofGAML,andIvelinaBorisovaforcommentsonpreviousdrafts.2I.e.,foraparticularconstruct,aunitatonepointofthescale’sdistributioniscomparabletoaunitatanotherpointofthescale’sdistribution.

GAML4/REF/14

Page 2: Measurement Options for Development of Sustainable

2

multi-domainnatureofdevelopmentinthefirstyearsoflife.Domainsofphysical,cognitive,language,numeracy,andsocio-emotionaldevelopmentaretypicallyinter-related,yetdistinct,withintheagerangecovered(birththrough60months).Althoughgreatvariabilityoccursinthenatureofbehaviorsandskillsintheseoveralldomainsofdevelopment,bothwithinandacrosscountries,theconsensusconcerningthemeaningfulnessofthesedomainsincontextsofnationalECDpolicyandplanninghasbeenshownacrossmultipleregionsandnations(e.g.,Kagan&Britto,2005).

Theinclusionofmultipledomainsincludingphysical,cognitive,learning(e.g.,languageandnumeracy)andsocio-emotionaldomainsrepresentsarelativelystrongconsensusinmeasuresthathaverecentlybeenassessedacrossmultiplenations(Raikes&Anderson,2017).Theseincludeinstrumentsbasedonadult/caregiverreport,suchastheEDI(Janus&Offord,2007),theCREDI(McCoyetal.,2017)andtheUNICEFEarlyChildhoodDevelopmentIndex(Bornsteinetal.,2012;McCoyetal.,2016);aswellasinstrumentsthatdirectlyassesschildren,suchasthePRIDI(Verdisco,Cueto,&Thompson,2016),IDELA(Wolfetal.,2017),EAP-ECDS(Raoetal.,2014),theMELQOMODELmeasure(UNESCO,2017),andothers.

Despitethisrangeofrecenteffortstomeasure,incoordinatedfashion,multipledomainsofearlychildhooddevelopment,currentlynoconsensusmeasureexistsforIndicator4.2.1thatismeasuredacrossalargenumberofcountries(across,e.g.,low-,middle-andhigh-incomecountries)andmeetsothercriteriaforaTierIindicatoroftheUNStatisticalCommission(2016,2017).Thus,althoughOption1wouldbeidealifallconditionsweremetforfeasibility,relevanceandvalidityacrosscountries,inthecurrentcontextoftheSDGindicators,thereisnoalternativethatmeetsthesecriteria.Asdiscussedbelow,Option3wouldworktowardscreatingsuchasinglecriterionmeasureinthefuture.

Option2.Useanexistingcommonsetofitemsoridentifyasetofanchoritemstointegrateintonationalandregionalassessments.Atpresent,therearearangeofmeasuresthathavebeendevelopedandtestedwithincountriesandregions.Anoverviewofthesemeasuresappearsinthefirstbackgroundpaper(Anderson&Raikes,2017).Manycountrieshavealsoexpressedthedesiretobuild(eitherbyadaptingorcreatingnew)nationally-specificmeasurestopromoteongoingmonitoringofchilddevelopmentinamannerthatisalignedwithnationalstandardsandculturalexpectations.Belowwepresenttwoideasonhowcommonitemsetsoranchoritemscouldbeusedinglobalmeasurement.Common Outcome Sets. Invariousfields,theuseofCommonOutcomeSets(COS’s)hasbeenimplementedtoestablishcommonsetsofmeasuresoritemsacrossasetofevaluationorotherresearchstudies(e.g.,Gershonetal.,2013;Schmittetal.,2015;Williamsonetal.,2012).Acrosstheseinitiatives,atypicalmulti-phaseprocessincludesthefollowing.First,aconsensusgroupofexpertsandpractitioners/policyleadersisbroughttogethertoestablishagreementontheconstructsthatwillconstituteameasurementdomain.Second,criteriaformeasuresthatmaybeconsideredascandidatestocontributeitemsorentirescales/

GAML4/REF/14

Page 3: Measurement Options for Development of Sustainable

3

assessmentsareagreedupon.Third,aninventoryofmeasuresmeetingthesecriteriaisassembled,andcommonitemsortasksareidentified.Fourth,dependingontheinitiative,asingle“consensus”measuremaybedeveloped(someaspectsofwhichmaybenewlydeveloped,withothersdrawnfromexistingmeasures).Fifth,phasesofpilottesting,psychometricanalyses,andrevisionmayoccuriterativelyuntilafinalconsensusmeasureisagreedupon.Finally,ameasureanditsguidelinesforadministrationmaybedisseminatedtoawiderangeofpotentialusers,withcontinuedinputandrefinementasthemeasureentersgeneraluse.IntheareaofIndicator4.2.1,arecentinitiativetodevelopacommonsetofitemswascarriedoutaspartoftheMeasuringEarlyLearningandQualityOutcomesproject(MELQO;UNESCO,2017).MELQObeganwiththeintentofclarifyingifonemeasurewouldbesufficientformeasurementinallcountries(Option1),butmovedquicklyinthedirectionofOption2.Option2,findingacommonitemset,wasdesirablefortwomainreasons:1)becauseitwouldallowcountriestobuildonexistingmeasuresthatwerealreadydevelopedandvalidatedineachregion;and2)itwouldallowagreaterdealofflexibilitytoaddmoreculturally-responsive,nationally-specificitemsthatarenotpossibletoincludewhenrelyingononlyonemeasure.UltimatelythecommonitemsetwasdistilledintoasinglenewmeasurenamedtheMODEL,coveringdevelopmentaldomainsofsocial,cognitive,languageandliteracy,numeracy,andexecutivefunction.Cross-countryanalysesareunderwayonthismeasure(Raikesetal.2017).Measures harmonization.Multipleexistingmeasuresmaybeharmonizedusingsomeformofstandardizationtoallowforscoringonacommonscale.Twoapproachesareoutlinedbelow(butothermayberelevant):1)crosswalksamples;and2)identificationofanchoritems.Theprocessofharmonizingacrossmeasuresistypicallydonethroughidentifyingcommonitemsthatcanhelplinkthedifferentassessments(“anchoritems”)(Chanetal.,2015),and/orbyadministeringmultiplemeasuresonthesamesampletosynchronizemeasurementsandestablishthebasisforcomparingchildren’slearninganddevelopmentonthesamescale,butwithdatacollectedthroughdifferentmeasures(“crosswalksamples”).Toinvestigatethefeasibilityofthisoption,eithermultipledatasetswithasetofcommonitemsareneeded(“anchoritems”)ormultiplemeasuresmustbeadministeredtothesamechildren(“crosswalksample”),sothatcalibrationacrossdifferentmeasurescantakeplace.Forexample,anchoritemscouldbecreatedfromthosemeasuresusedinmultiplecountrieswherecertainitemshaveshownevidenceofcross-countryinvariance.Itwouldthenbenecessarytoensurethatthisscaleworksinsimilarwaysacrosscountries,arelatedbutdistinctstepinbuildinginternationalcomparability.

Crosswalksamples.Crosswalksamples(singlesamplesthatincorporateassessmentofmultipleinstruments)areusefulforfacilitatingdecisionsaboutwhattoincludeandexcludefromparticularinstrumentsindevelopingasinglesetofcommonitemsorreducedconsensusassessment.Thisisbecauseinasinglesample,multiplealternativemeasuresareassessed,allowingfordirectcalculationsofcorrelationsamongmeasures,differencesinpredictiveorotherformsofvaliditythatdonotconfoundsamplewithmeasure.Thisapproachhasbeen

GAML4/REF/14

Page 4: Measurement Options for Development of Sustainable

4

usedrecentlyinastudythataimedtoharmonizemultiplemeasuresofdepressionandsubjectivehealthamongolderadults(Gatzetal.,2015).

Anchoritems.Achallengeishowtoidentifyanchoritemsintheabsenceofany

universalmeasureorindicator.Forexample,arecenteffortintheUnitedStatestoharmonizestatelevelstandardizedassessmentsreliedonananchormeasurethatisadministeredacrossthecountry,namelytheNationalAssessmentofEducationalProgress(NAEP).Basedonthedistributionofdistrictsonthatnationalassessment,astandardizationprocedurewasusedtolinkthestate-levelassessments(Reardon,Kalogrides,&Ho,2016).IntheabsenceofacriterionoraudittestsuchastheNAEPintheU.S.example,amixofconceptualandtraditionalempiricalcriteriafrombothclassicaltesttheoryandotherapproachessuchasitem-responseanalysismaybeutilized.However,variationintaskrequirements,itemlanguage,assessorsacrossdatasets,orderofitems,andresponsecategoriesallcreatedauntingdifferencesinsourcesofmeasurementerrorevenwhenconsideredwithindomain(e.g.,languageornumeracy).Option3.Createanewuniversal“criterion”scaleofchilddevelopmentagainstwhichmanyotherpossiblemeasurescouldbeplacedThedevelopmentofanewuniversalcriterionscalecouldproceedfollowingestablishedproceduressomewhatsimilartotheonesthatledtotheMELQO,IDELAorregionalmeasures,butwithlearningssynthesizedfromallofthem.Itcouldstartwiththetwoleadinginitiativesinthisfield,whicharetheUNICEFECDI(longstanding,for3-6yearolds)andtheWHOconsortiuminstrumentfor0-3yearolds(morerecentlydeveloped).Thefirststepofpoolingitemsfrommeasuresusedinmultiplecountries,categorizingthembyoutcomedomain,andcompilinginformationonvaliditystudies,samples,andcountries,wasalsorecentlycompletedinthefirstphaseofworkoftheMELQOproject(UNESCO,2017).Acrosstheseinitiativesandothers,ECDmeasurementanalyseshaveadvancedtocross-countryinvarianceanalysesonsomeexistingmeasures.Thussomeinformationisemergingbothonthepsychometricstructureofmulti-domainassessmentsofchilddevelopmentwithincountries,aswellaswhetherthesemeasuresfunctionwellacrosscountriesinordertopermitcomparisons.Suchinformationcouldbeusedintheprocessofcreatinganew“criterion”scaleacrossthefullrangefrombirthtoage5or6thatwouldadvancethefieldtowardstestingasinglemeasure.Twoexamplescanillustratetheeffortstocreateasinglecriterionscale.WHOisleadingaconsortiumtocreateasinglecriterionscalefor0-3yearolds.UNICEF,withitsinfrastructureforconductingmultiplenationallyrepresentativesamplesinasustainedmanneracrossyears,couldadaptandextenditsECDImeasure(thusfarfor3-6yearolds)withinputfromtheinitiativesofthepastdecadethathaveledthefieldtowardscross-countrycomparablemeasures.SuchaneffortcouldintegratetheWHOscaleonsomeconstructsthatmaybesuitableformeasurementacrossthe0-3and3-6yearoldageranges.Theadvantagesofthisprocessincludetheleveragingofresourcesforlargesamplesatthecountrylevelformanycountriesthatmaynototherwisecurrentlyhavetheseresources;experienceinarangeofregions;andthelargenumberofLMICswithinwhichtheMICSiscurrentlyfielded.

GAML4/REF/14

Page 5: Measurement Options for Development of Sustainable

5

WenotethattheprocessofcreatingasinglecriterionmeasurecouldalsobenefitfromtheexperienceofthecreationandrevisionofthePISAcross-nationalassessmentsorotherssuchasTIMSS,PIRLS,etc.(andcurrentlythesupplementationofthePISAwiththePISAforDevelopment,aimedforLMICuse).Thisisparticularlyrelevanttotheextensiontorichcountriesrequiredinanydevelopmentofanewcriterionmeasure.InthePISAdevelopmentprocess,forexample,initialwide-rangingexpertconsensusonameasurementframeworkatthelevelofconstructsoccurred,followedbyconveningpanelsofexpertstodevelopitemsinspecificdomains;phasesofpre-pilotinginmultiplecountrieswithrelativelysmallsamplestoascertainmeaningofitemsandvariationinresponsetotasks,assessors,andadministrationformats;moresystematicpilotingofitemsacrosscountries;itemrevisionandselectionforlarge-scalepiloting;andfinallynationallyrepresentativeadministrationacrosscountries(OECD,2000a,2000b).ManyofthesestepshavebeencarriedoutintheECDworkoftheMELQOinitiativeandothers,butnotall.However,therearechallengesfacingsuchaneffortaswell,whichshouldbetakenintoaccount.Theyinclude:

1) Theneedtocontinuetosupportcountry-leveladaptationprocesses.Forcountry-leveluse,thestakeholderprocesstobuildconsensustowardsnationalmeasurementofearlychildhoodassessmentformonitoringpurposesandtoinformpoliciesinareassuchasqualityimprovementandteachingandlearningcanandshouldbecomprehensive.Asinglecriterionmeasurecanbeusedbutcouldalsobesupplemented,forexample,inparticularcountrieswithculturallyspecificconstructsofchilddevelopmentthatarerelevanttogoalsforchildren’slearning,behavioranddevelopment.Somecountriesmaychoosetousetheirownmeasures,andthisissupportedwithintheSDGprocess.

2) Thedefinitionof“ontrack”and“offtrack.”Nocurrentwidelyusedearlychildhooddevelopmentmeasureamongthosementionedinthisdocumenthasestablishedcutoffsforonvsofftrack.Thisisinpartbecausethesearenotdesignedasscreeningmeasures.However,thedevelopmentofnationalnormscanbedonewithoutexpectationofacross-country,uniformdefinitionofonandofftrack.TechnicalworktoestablishaconsensusonthisprocesswithinandacrosscountriesisnecessaryinthefieldofECD.

3) Theunprecedentedrangeofcountrycontexts.NoneofthecurrentinitiativesorexistingmeasuresinthefieldofECDhavebeenwidelyadministeredoranalyzedacrossbothLMICandrich-countrycontexts.ShouldasinglemeasurebedevelopedfromthebasisoftheUNICEFECDIandotherexistingmeasureswithcross-countrydata,itwillbevitaltoconsidercross-countrymeasuresthathavebeenfieldedinrichcountries,includingcurrentinitiativesoftheOECD,theEU,andotherentities.SomeoftheECDmeasuresrecentlyfieldedinmultipleLMICsarestartingtobeappliedinrichcountries;theselearningsshouldalsobeintegrated.

4) Needtoincludebothcaregiver/adultreportanddirectchildassessment.AconsensusisbuildingintheECDfieldthatmeasurementofsomedomainsofdevelopmentbenefit

GAML4/REF/14

Page 6: Measurement Options for Development of Sustainable

6

fromtheintegrationofinformationfromadultswhospendsubstantialtimewithchildren(caregivers/parents;teachers)anddirectchildassessment.Forexample,adultsfamiliarwithchildren’sbehaviorsinhomeand/orcaresettingsmaybeinabetterpositiontoobservelow-frequencybehaviorssuchasaggressionthancanbeassessedinanassessoradministeredtask.Conversely,directassessmentsmaybemoreappropriatewhencertainskillsarenotonesthatadultsinchildren’slivesareusedtonoticing,butmayneverthelessbepredictiveoflateroutcomes(e.g.,aspectsofexecutivefunction),orwhencomplexskillsbenefitfromstandardstimuli(e.g.,comprehensionofasentence).Itisundeniablethatdirectchildassessmentismorecostly,withtrainingtoreliabilitymoredifficultatscalethanwithmorestraightforwardsurvey-basedmeasures.However,optionssuchasrandomsubsamples,withinalargernationallyrepresentativesample,fordirectchildassessmentmodulesshouldbeconsidered.Thisapproachmayreducetheoverallcostsofaddingadirectchildassessmentportiontoanadult-reportedmeasureinanationallyrepresentativesample.

5) Ameasurethatverticallyequatessomedomainsofdevelopmentacrosswideragerangesthan0-3and3-6.Theverticalequatingrequiredtoachievemeasuresofsomeareasofdevelopmentthataremeaningfultomeasureacrossthefullagespanisverychallenging,giventherapidityofdevelopmentintheseyearsandthequalitativechangesinskillsthatoccur,notjustquantitative.Yetexistingmeasurescollectedacrosscountrieshaveforthemostpartbeenrestrictedtothe0-3vsthe3-6yearoldagerange.TheintegrationoftheWHOconsortiumon0-3measurementandcurrenteffortsinthe3-6yearoldsagerangewouldbecriticalforthiseffort.Itislikelythatonlysomeconstructsofdevelopmentaresuitedtointegrationacross0-3and3-6.

6) AlignmentwithlaterlearningtargetsandindicatorsinSDG4.Thecontinuumoflearninganddevelopmentstretchesfrombirthtoadulthood.Alignmentof4.2.1withindicator4.1.1,inparticular(andespeciallythegrade2or3indicator),isimportanttoenablenationstotrackhowlearningunfoldsinthefirst8yearsoflife.

7) AlignmentwithotherSDGtargetsandindicators.ThealignmentofSDG4.2.1withothergoals,targetsandindicatorsintheareasofhealth,mentalhealth,nutrition,andchildprotectionissignaledinthewordingof4.2.1,which(unliketheprimaryschoolingindicators)integrateshealthandpsychosocialwell-being.Suchalignmentcanmovebeyondhealthandpsychosocialwell-beingtoconsiderrelationshipswithotherSDGindicatorsoutsideofGoal4.

8) Integrationofmembernationinputintodevelopmentofacriterionmeasure.TheinputofUNmembernationsintothedevelopmentoftheSDGs,includingTarget4.2.,wasunprecedentedinhistory.Continuedinputintothedevelopmentofacriterionmeasurefor4.2.1isvitalforultimateuseofthemeasureatthecountrylevelandtheglobalprocessestotrackSDG4.

B. Assessingouroptions:Whichmeasurementstrategymaximizesbothvalidityand

use?

GAML4/REF/14

Page 7: Measurement Options for Development of Sustainable

7

Asingleorganizingquestiontohelpinassessingtheseoptionsmightbe:Whatapproachmaximizesvalidity,feasibilityandproductivenationalandcross-nationalusetoinformpolicyandpractice?Toanswersuchaquestion,agreementonthemeaningandevidencetosupportvaliditymustbeastartingpoint.Whatdoes“validity”meanincross-nationalmeasurementofearlychildhooddevelopment?Acentralgoalofassessmentinthefieldsofchilddevelopmentandeducationistoachievemeasurementwithevidenceofvalidity.Currentnotionsofvalidityconsideritaunitaryconstructsupportedbyevidenceinthecontextofuse.Withthenewdemandforpolicy-relevantdata,thetypesofevidencethatshouldbeweighedinassessingvalidityincludebutgobeyondolderconceptualizationsofvalidity(forexample,thetrioofcontent,criterion-related,andconstructvalidityandtheirsubtypes;Cronbach&Meehl,1955).“Validityreferstothedegreetowhichevidenceandtheorysupporttheinterpretationsoftestscoresentailedbyproposedusesoftests”(AERA,APA,&NCME,1999).Notethatthisoverallsingleconceptionofvaliditysupercedesthesubtypesofcontent,criterion-relatedandconstructvalidity.Fivekindsofevidencehavebeenputforwardtosupportvalidityaccordingtothisdefinition(Goodwin&Leech,2003).Theseincludemanyofthetraditionalsubtypesofcontent,criterion-relatedandconstructvalidity.First,evidencebasedontestcontentrequiresthatexpertconsensusbeachievedonthematchbetweenitemandtaskcontentandtheconstructthatisbeingmeasured,andwhetherthecontentreflectsbiasordifferentialmatchbetweencontentandconstructforparticulargroups(e.g.,asdefinedbygender,language,culture,etc.).Inordertomaximizethiskindofevidence,the“groupconsensus”approachintestdevelopmentandrefinement,aswellastestingexplicitlyforbiasthroughqualitativeresearch,cognitivetesting,understandingofvariationinexperienceofthetestingcontext,analysestotestformeasurementinvarianceacrossdiversepopulationsandothermethodscanbeemployed.Second,evidencebasedonresponsesoftesttakersexaminespotentialunintendedresponsessuchasthosebasedonsocialdesirability,unfamiliaritywiththeadministrationapproachortestingcontext.ThisformofevidenceinthecaseofmeasuresofIndicator4.2.1includes,forexample,analyzingreasonsfornon-responsethatgobeyondlackofunderlyingability,todiscomfort,anxiety,orresponsetotheassessorandsetting.Third,evidenceoninternalstructuretapswhetherthecomponentsofameasureadequatelyreflectthesubdomainsofaconstruct.Inthecaseofthemulti-domainmeasuresofearlychildhooddevelopment,thisincludeswhethertheinstrumentadequatelyreflectsphysical,learning,andpsychosocialdomains.Thelearningdomainisoftenconsideredtoincludepotentialsubdomainssuchaslanguage/earlyliteracy;numeracy,spatialandquantitativeskills;andexecutivefunctionorapproachestolearning.Thiskindofevidenceismostoftenanalyzedthroughexploratoryandconfirmatoryfactoranalyses.

GAML4/REF/14

Page 8: Measurement Options for Development of Sustainable

8

Fourth,evidencebasedonrelationstoothermeasuresincludestraditionalformsofcriterion-relatedandconstructvalidity,suchasconcurrent,predictive,convergent,anddiscriminantvalidity.Suchevidencealsoincludestheimportantcriterionofsensitivitytointervention.ThisisparticularlyimportantintheSDG4.2context,astheoveralltargetlinksearlychildhooddevelopmenttothequalityofpoliciesandprogramsthatsupportit.Finally,evidencebasedontheconsequencesoftestingfocusescentrallyonusesoftheassessment.Theabilityofameasuretoinformpracticeandpolicyinvolvesnotonlyfeasibilityofadministrationatlargescalewithregularperiodicity,butthelinkstopracticeandpolicydecision-making.Inthisregard,theMELQOinitiative,buildingonpriorefforts,proposedthatbothmeasurementofqualityofearlylearningenvironmentsandmeasurementofearlychildhooddevelopmentoutcomeswasnecessarytomostpowerfullyinformpolicyandpractice(UNESCO,2017).

C. NextStepsandQuestionstoGuideDiscussionTheSDGsprovideauniqueopportunityforbuildingaglobalECDmeasurementstrategythatsignificantlyenhancesthereliability,feasibilityandcomparabilityofexistingECDdata.Movingforwardonanyofthestrategiesoutlinedabovewillrequireagreaterdegreeofsystematicdatacollection,coordinationamongmeasuresdevelopersandexperts,andinputfromstakeholdersthanhastakenplaceinthepast.Belowwehaveoutlinedoptionsforpursuingconsensusonthesestrategies:

- NextStep1.Buildingonpriorconsensuswork,beginbyagreeingonageneralconceptualframework(e.g.,ofearlychildhooddevelopmentdomains)toguidemeasurement.Thiswillserveasthegroundworkforthenextstepsofclarifyingwhatcouldbemeasuredacrosscountries,andwhereexistingdatamaybeavailabletohelpinformdecisionsonconstructsanditems.Severalsuchconsensusmeetingshaveoccurredrecently;however,therearestillareasoflackofconsensussuchassomeoftheeightareasofchallengeslistedabove.

- NextStep2.AddressquestionsofvalidityinlightofthestrongpolicyemphasisofSDG-relateddata,theuniqueaspectsofearlychildhooddevelopmentrelativetolaterphasesoflearning,andtheneedtoensureequityandculturalrelevanceacrossarangeofcountries.Aconveningontechnicalstandardsandguidelinesforuse,synthesisandharmonizationofdatafromexistingmeasures;standardsforcross-countrycomparabilityoftheoptionofasinglecriterionmeasure;andthecriteriafordimensionsofvalidityinsuchaneffortiscritical.Engagingawiderangeofpsychometricians,researchersandotherexpertstakeholderswhocanhelptoassessthefeasibilityofeachapproach,includingthecostandcoordinationrequirementsforpursuingeachoftheoptions,isrequired.Noconveningtodate,forexample,hasbroughttogetherthesmallgroupofpsychometricexpertswhohaveworkedoncross-countryanalysesofexistingECDmeasures.Buildingatechnicalconsensusforthenext

GAML4/REF/14

Page 9: Measurement Options for Development of Sustainable

9

phaseofworkwouldbeanimportantstepinthenextadvancestowardsglobalmeasurementofSDGIndicator4.2.1.

QuestionstoGuideDiscussion:

1) AmongOptions1,2and3,whichseemfeasibleintheshortrun(next12-18months)?Inthemediumrun(1.5-3years)?Inthelongerrun(3-5years)?

2) WhatisthebestplanformakingprogressonthethreeNextStepsnotedimmediatelyaboveinthissection?AreanyimportantNextStepsmissing?

GAML4/REF/14

Page 10: Measurement Options for Development of Sustainable

10

REFERENCES

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. American Educational Research Association.

Anderson, K., & Raikes, A. (2017). Key measurement questions for Indicator 4.2.1 (Discussion paper for GAML Taskforce 4.2).

Bornstein, M. H., Britto, P. R., Nonoyama-Tarumi, Y., Ota, Y., Petrovic, O., & Putnick, D. L. (2012). Child development in developing countries: introduction and methods. Child Development, 83(1), 16-31.

Chan,K.S.,Gross,A.L.,Pezzin,L.E.,Brandt,J.,&Kasper,J.D.(2015).HarmonizingMeasuresofCognitivePerformanceAcrossInternationalSurveysofAgingUsingItemResponseTheory.Journalofagingandhealth,27(8),1392-1414.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281.

Gatz,M.,Reynolds,C.A.,Finkel,D.,Hahn,C.J.,Zhou,Y.,&Zavala,C.(2015).Dataharmonizationinagingresearch:Notsofast.Experimentalagingresearch,41(5),475-495.Gershon,R.C.,Wagster,M.V.,Hendrie,H.C.,Fox,N.A.,Cook,K.F.,&Nowinski,C.J.(2013).NIHtoolboxforassessmentofneurologicalandbehavioralfunction.Neurology,80(11Supplement3),S2-S6.Goodwin,L.D.,&Leech,N.L.(2003).Themeaningofvalidityinthenewstandardsforeducationalandpsychologicaltesting.MeasurementandEvaluationinCounselingandDevelopment,36(3),181-192.Kagan,S.L.,&Britto,P.R.(2005).Goingglobalwithindicatorsofchilddevelopment.NewYork:UNICEF.Jacobusse,G.,VanBuuren,S.,&Verkerk,P.H.(2006).Anintervalscalefordevelopmentofchildrenaged0–2years.Statisticsinmedicine,25(13),2272-2283.Janus,M.,&Offord,D.R.(2007).DevelopmentandpsychometricpropertiesoftheEarlyDevelopmentInstrument(EDI):Ameasureofchildren'sschoolreadiness.CanadianJournalofBehaviouralScience,39(1),1-22.McCoy,D.C.,Peet,E.D.,Ezzati,M.,Danaei,G.,Black,M.M.,Sudfeld,C.R.,...&Fink,G.(2016).Earlychildhooddevelopmentalstatusinlow-andmiddle-incomecountries:national,regional,andglobalprevalenceestimatesusingpredictivemodeling.PLoSMed,13(6),e1002034.

GAML4/REF/14

Page 11: Measurement Options for Development of Sustainable

11

McCoy,D.C.,Sudfeld,C.R.,Bellinger,D.C.,Muhihi,A.,Ashery,G.,Weary,T.E.,...&Fink,G.(2017).Developmentandvalidationofanearlychildhooddevelopmentscaleforuseinlow-resourcedsettings.Populationhealthmetrics,15(1),3-.OECD(2000a).Measuringstudentknowledgeandskills:ThePISA2000assessmentofreading,mathematicalandscientificliteracy.Paris:Author.OECD(2000b).PISA2000:TechnicalReport.Paris:Author.Raikes,A.,&Anderson,K.L.(2017).KeymeasurementquestionsforSDG4.2.1(discussionpaper).Montreal:UNESCOInstituteforStatistics,GlobalAlliancetoMonitorLearning,Target4.2TaskForce.Rao,N.,Sun,J.,Ng,M.,Becher,Y.,Lee,D.,Ip,P.,&Bacon-Shone,J.(2014).Validation,FinalizationandAdoptionoftheEastAsia-PacificEarlyChildDevelopmentScales(EAP-ECDS).UNICEF,EastandPacificRegionalOffice.Ravens-Sieberer,U.,Erhart,M.,Rajmil,L.,Herdman,M.,Auquier,P.,Bruil,J.,...&Mazur,J.(2010).Reliability,constructandcriterionvalidityoftheKIDSCREEN-10score:ashortmeasureforchildrenandadolescents’well-beingandhealth-relatedqualityoflife.QualityofLifeResearch,19(10),1487-1500.Reardon,S.F.,Kalogrides,D.,&Ho,A.D.(2016).LinkingU.S.schooldistricttestcoredistributionstoacommonscale,2009-2013.PaloAlto,CA:StanfordCenterforEducationPolicyAnalysis.Schmitt,J.,Apfelbacher,C.,Spuls,P.I.,Thomas,K.S.,Simpson,E.L.,Furue,M.,...&Williams,H.C.(2015).TheHarmonizingOutcomeMeasuresforEczema(HOME)roadmap:amethodologicalframeworktodevelopcoresetsofoutcomemeasurementsindermatology.JournalofInvestigativeDermatology,135(1),24-30.UNStatisticalCommission(2016).ProvisionalproposedtiersforglobalSDGindicators.NewYork:Author.UNStatisticalCommission(2017).RevisedlistofglobalSustainableDevelopmentGoalindicators.NewYork:Author.UNESCO(2017).Overview:MeasuringEarlyLearningandQualityOutcomes(MELQO).Paris:Author.Verdisco,A.,Cueto,S.,&Thompson,J.(2016).EarlyChildhoodDevelopment:Wealth,theNurturingEnvironmentandInequalityFirstResultsfromthePRIDIDatabase.Inter-AmericanDevelopmentBank.

GAML4/REF/14

Page 12: Measurement Options for Development of Sustainable

12

Williamson,P.R.,Altman,D.G.,Blazeby,J.M.,Clarke,M.,Devane,D.,Gargon,E.,&Tugwell,P.(2012).Developingcoreoutcomesetsforclinicaltrials:issuestoconsider.Trials,13(1),132.Wolf,S.,Halpin,P.,Yoshikawa,H.,Pisani,L.,Dowd,A.J.,&Borisova,I.(2017).AssessingtheconstructvalidityofSavetheChildren’sInternationalDevelopmentandEarlyLearningAssessment(IDELA).Manuscriptunderreview.

GAML4/REF/14