16
Longitudinal Predictions Using Regression-Corrected Grouping to Reduce Regression to the Mean S. Stephens and M. Marder Department of Physics The University of Texas at Austin Coarse-graining procedures make it possible to model the movement of large numbers of objects, such as particles in a fluid. Longitudinal student performance data can be modeled similarly by sorting students into score bins and following the flow of scores through time: trajectories depict the average scores over time for initial score bins and streamlines provide an approximate way to calculate the flow of student scores over many years based on only two consecutive years of data. However, due to the partially stochastic nature of observed scores, the coarse-graining procedure that sorts students into score bins amplifies a statistical phenomenon known as regression to the mean (RTM). As a result, streamlines do not provide an accurate prediction for the future per- formance of students. Here we discuss a new coarse-graining procedure, regression-corrected (RC) grouping, which reduces RTM in the streamlines. We apply RC streamlines to the Texas State Lon- gitudinal Data System, which contains standardized testing data for students throughout primary and secondary school since 2003. We show that the RC streamlines accurately predict trajectories, using two or three years of data. Therefore RC streamlines can be used to identify the eects of academic interventions on a time scale comparable to that of policy changes. We illustrate this assertion by examining a particular policy intervention, Texas’ Student Success Initiative. I. INTRODUCTION The No Child Left Behind Act of 2001 (NCLB) mandated widespread, high-stakes standardized testing for primary and secondary students through- out the United States [1]. Struggling students may be required to receive additional instruction to pass these exams and failure can result in grade re- tention [2]. Teachers evaluations and placements are often determined by the performance gains of their students on standardized exams [3–5]. Schools can be closed or restructured due to persistent low scores, whether overall or for disaggregated sub- groups [6]. These accountability measures for stu- dents, teachers, and schools create a strong incentive for policy makers to find interventions that raise stu- dent performance. However, evaluating the conse- quences of legislative actions is dicult as the poli- cies aect large numbers of students and may last only as long as an election cycle, whereas the impacts of the policies on valued metrics, such as high school graduation rates and college readiness, require up to 13 years to reach fruition. A common way to analyze longitudinal datasets is to use hierarchical linear modeling or struc- tural equation modeling to model outcomes with respect to observed variables or unobserved latent variables [7]. Marder and Bansal (2009) [8] and Bendinelli and Marder (2012) [9] developed an al- ternative approach that makes use of standard ideas from statistical mechanics. Trajectories can repre- sent the position of any extended object over time. An example is the parabolic trajectory of a projec- tile. This is a useful idea especially when the object is composed of many correlated sub-units. It can be extended to fluid systems by coarse-graining the fluid at some point in time and following the motion of packets of fluid as the flow evolves. This provides a model for student score trajectories in an education context, showing average observed scores over time. A coarse-grained fluid is also frequently described by a velocity vector field that depicts the average velocity of particles in a fluid in each coarse-grained cell. Streamlines can be interpolated from the veloc- ity vector field to show the anticipated movement of a particle throughout the space, given some initial position. Similarly, in the context of student scores, streamlines can be interpolated from a vector field that represents the change in score for each grade transition. Thus, streamlines depict the anticipated flow of scores over time. For fluids, streamlines are good approximations to trajectories when the flow is steady. In the education context, the testing en- vironment must be stable for streamlines to provide a good estimate of trajectories, but this condition is only necessary and not sucient. Instead of analyzing individual student outcomes or the average outcomes for the aggregation of all students, outcomes are evaluated after coarse- graining into score bins by sorting students into groups by their percent score (90-100%, 80-90%, etc.) in some initial grade. When dealing with stu- dent scores, we will use the terms ‘coarse graining’ and ‘binning’ interchangeably. A consequence of the partially stochastic nature of observed scores and the distribution of observed scores is the regression of the scores toward the mean score. This is exacerbated when students are binned

Longitudinal Predictions Using Regression-Corrected

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Longitudinal Predictions Using Regression-Corrected

Longitudinal Predictions Using Regression-Corrected Grouping to Reduce

Regression to the Mean

S. Stephens and M. MarderDepartment of Physics

The University of Texas at Austin

Coarse-graining procedures make it possible to model the movement of large numbers of objects,such as particles in a fluid. Longitudinal student performance data can be modeled similarly bysorting students into score bins and following the flow of scores through time: trajectories depictthe average scores over time for initial score bins and streamlines provide an approximate way tocalculate the flow of student scores over many years based on only two consecutive years of data.However, due to the partially stochastic nature of observed scores, the coarse-graining procedurethat sorts students into score bins amplifies a statistical phenomenon known as regression to themean (RTM). As a result, streamlines do not provide an accurate prediction for the future per-formance of students. Here we discuss a new coarse-graining procedure, regression-corrected (RC)grouping, which reduces RTM in the streamlines. We apply RC streamlines to the Texas State Lon-gitudinal Data System, which contains standardized testing data for students throughout primaryand secondary school since 2003. We show that the RC streamlines accurately predict trajectories,using two or three years of data. Therefore RC streamlines can be used to identify the e↵ects ofacademic interventions on a time scale comparable to that of policy changes. We illustrate thisassertion by examining a particular policy intervention, Texas’ Student Success Initiative.

I. INTRODUCTION

The No Child Left Behind Act of 2001 (NCLB)mandated widespread, high-stakes standardizedtesting for primary and secondary students through-out the United States [1]. Struggling students maybe required to receive additional instruction to passthese exams and failure can result in grade re-tention [2]. Teachers evaluations and placementsare often determined by the performance gains oftheir students on standardized exams [3–5]. Schoolscan be closed or restructured due to persistent lowscores, whether overall or for disaggregated sub-groups [6]. These accountability measures for stu-dents, teachers, and schools create a strong incentivefor policy makers to find interventions that raise stu-dent performance. However, evaluating the conse-quences of legislative actions is di�cult as the poli-cies a↵ect large numbers of students and may lastonly as long as an election cycle, whereas the impactsof the policies on valued metrics, such as high schoolgraduation rates and college readiness, require up to13 years to reach fruition.A common way to analyze longitudinal datasets

is to use hierarchical linear modeling or struc-tural equation modeling to model outcomes withrespect to observed variables or unobserved latentvariables [7]. Marder and Bansal (2009) [8] andBendinelli and Marder (2012) [9] developed an al-ternative approach that makes use of standard ideasfrom statistical mechanics. Trajectories can repre-sent the position of any extended object over time.An example is the parabolic trajectory of a projec-tile. This is a useful idea especially when the object

is composed of many correlated sub-units. It canbe extended to fluid systems by coarse-graining thefluid at some point in time and following the motionof packets of fluid as the flow evolves. This providesa model for student score trajectories in an educationcontext, showing average observed scores over time.A coarse-grained fluid is also frequently describedby a velocity vector field that depicts the averagevelocity of particles in a fluid in each coarse-grainedcell. Streamlines can be interpolated from the veloc-ity vector field to show the anticipated movement ofa particle throughout the space, given some initialposition. Similarly, in the context of student scores,streamlines can be interpolated from a vector fieldthat represents the change in score for each gradetransition. Thus, streamlines depict the anticipatedflow of scores over time. For fluids, streamlines aregood approximations to trajectories when the flowis steady. In the education context, the testing en-vironment must be stable for streamlines to providea good estimate of trajectories, but this condition isonly necessary and not su�cient.

Instead of analyzing individual student outcomesor the average outcomes for the aggregation ofall students, outcomes are evaluated after coarse-graining into score bins by sorting students intogroups by their percent score (90-100%, 80-90%,etc.) in some initial grade. When dealing with stu-dent scores, we will use the terms ‘coarse graining’and ‘binning’ interchangeably.

A consequence of the partially stochastic natureof observed scores and the distribution of observedscores is the regression of the scores toward the meanscore. This is exacerbated when students are binned

Page 2: Longitudinal Predictions Using Regression-Corrected

2

according to their observed scores. Regression tothe mean (RTM) is present in both trajectories andstreamlines; however, the multiple binning processesin streamlines exaggerate RTM, resulting in inaccu-rate depictions of student scores over time.Here we present a new binning technique, called

regression-corrected (RC) grouping, which reducesRTM in the streamlines so that they accurately rep-resent the flow of student scores over time. RCstreamlines can follow a single cohort of studentsover time, or they can be constructed using an ac-celerated longitudinal design to predict longitudinaloutcomes with only two or three years of data. Inthis way, RC streamlines can predict longitudinaloutcomes within a time suitable for informing pol-icy; in addition, the comparison of streamlines fromdi↵erent periods can identify the e↵ects of interme-diate interventions.Section II summarizes some techniques in educa-

tion research that are currently used to analyze stu-dent performance over time. While RC groupingand streamlines could be used in many contexts, Sec-tion III discusses the setting used for this paper: theTexas State Longitudinal System and standardizedtesting in Texas. Section IV lays the foundation forthe RC streamlines by summarizing the trajectoryand streamline techniques described in Bendinelliand Marder (2012) [9]. Section V presents a simpletheory for understanding the consequences of RTM,specifically in regards to trajectories and stream-lines. Section VI describes the RC grouping tech-nique and uses the same theory to demonstrate howRTM is reduced. Section VII applies RC streamlinesto identify the e↵ects of a state-wide interventioncalled the Student Success Initiative, while SectionVIII explores additional applications including dis-aggregation by demographic factors, coarse-grainingby alternate exams, and future predictions of stu-dent performance. In Section IX, we summarize ourconclusions.

II. BACKGROUND

Longitudinal data analysis methods have been de-veloped in many fields to study both within-personchanges and between-person di↵erences in outcomesover time [10]. Within-person changes refer to out-comes for a single individual and are particularlyimportant for the analysis of educational policy im-pacts because they can establish a connection be-tween an intervention and the resulting changes forthe a↵ected individuals. Between-person di↵erencesare also important to study because they can showhow interventions a↵ect students on a larger scale,and they can shed light on the di↵erential impacts

of an intervention on di↵erent groups of students,which can lead to more e�ciently targeted or moreequitable interventions.

Most longitudinal studies utilize a statistical tech-nique known as hierarchical linear modeling [11, 12],or a similar technique called structural equationmodeling [13–15], to analytically describe both thewithin-person changes and the between-person dif-ferences in the longitudinal data. The analyticalmodel may use observed variables or unobservedlatent variables combined with fitted parametersto best represent the growth of the outcome vari-able. Individual growth curves represent the within-person component of the model, and person-specificparameters (often within a normal distribution ofvalues) that adjust the intercepts or slopes accountfor the between-person variation. The model oftentakes the form of a linear, polynomial, or piece-wisefunction, although it can have any non-parametricform.

This idea can be extended to grouped individu-als through techniques such as group-based trajec-tory modeling [16] and latent class growth model-ing [17]. These techniques assume that the popula-tion is mixed, containing several distinct groups thatare categorized by a latent variable. Using the longi-tudinal data, individuals are first sorted into groupsbased on similar initial growth patterns and thentheir average outcomes over time are modeled with atrajectory. Intervention e↵ects can be compared fortreated groups and control groups within the samepre-treatment growth trajectory. This technique re-quires several years of data both before and afteran intervention to establish comparison groups andthen to observe the intervention e↵ects.

The regression techniques in hierarchical linearmodeling or structural equation modeling are famil-iar to statisticians. However, the language, equa-tions, and coe�cients in these frameworks can bedi�cult to understand for policy makers and educa-tors. These techniques often involve computationalpackages such as HLM [18] and Mplus [19], whichcan act as a black box, potentially leading to misuseor misinterpretation. The techniques in this paperare designed to be more intuitive to policy makersand educators by using visualizations rather thanrelying on parametric solutions with tables of com-puted coe�cients. Nonetheless, the standard regres-sion methods provide a foundation and a source ofcomparison for the new methods described in thispaper.

Page 3: Longitudinal Predictions Using Regression-Corrected

3

A. Age-Period-Cohort E↵ects

We now develop some of the concepts neededin order to discuss educational data gathered overtime. Between-person di↵erences and within-personchanges in the data can be observed as any of threetime-related variations: age e↵ects, period e↵ects,and cohort e↵ects. Age e↵ects represent changesrelated to aging although in the context of educa-tion, age e↵ects could be substituted by grade ef-fects, changes that occur between each grade. Pe-riod e↵ects represent changes occurring during a spe-cific time period, a↵ecting people of all ages simi-larly. Cohort e↵ects represent formative experiences,changes that are unique to people who experiencethe same events at the same time, often because theywere born in the same time period. In an educationcontext, cohort e↵ects may relate to changes that af-fect a cohort of students progressing through schooltogether, completing one grade each year and grad-uating in the same year.It can be di�cult to separate age, period, and

cohort e↵ects from each other. This is due tothe inherent relationship between age, time, andcohort. For studies using birth cohorts—peopleborn in the same time period—the relationship be-tween age, year, and birth cohort is Y ear � Age =Birth Cohort. For students in primary or secondaryschool, the relationship between the year, grade,and graduating cohort is Y ear � Grade + 12 =High School Graduation Cohort. These relation-ships make it di�cult to design a study to isolategrade, period, or cohort e↵ects because it is di�-cult to control for more than one of these e↵ects.In addition, there may be several concurrent influ-ences, causing a combination of grade, period, andcohort e↵ects. Educational policy changes are usu-ally either period e↵ects or cohort e↵ects, dependingon the process of implementation and the intendedrecipients.To minimize or isolate the potential sources of

time-related change, studies may focus on a sin-gle time period or a single cohort. Studies analyz-ing data from one time period are known as cross-sectional models. By using data from only one time,period e↵ects are removed from the analysis and thetime required for data collection is minimized, anattractive feature for costly studies. Cross-sectionalmodels show the range of outcomes at a momentin time and can be helpful for identifying between-person variation. The students within a cross-section can be aggregated into a synthetic cohort,which contains students at every grade throughoutschool in a single year. If there were no cohort ef-fects, the outcomes from a synthetic cohort wouldaccurately identify grade e↵ects. However, without

other time periods for comparison, grade e↵ects andcohort e↵ects are completely confounded. Extend-ing the study longitudinally can help to separate thegrade e↵ects from the cohort e↵ects, although thiscan then introduce period e↵ects.By contrast, single-cohort studies use the longitu-

dinal data from a single graduating cohort of stu-dents to study within-person change [20]. Cohortstudies are helpful for studying the evolution of in-dividuals, especially considering that a person mayhave many experiences that build upon one anotherto cause a particular growth pattern. Single-cohortstudies require several data points for each individ-ual, which can take years of data collection and cansu↵er from attrition. In addition, while cohort stud-ies remove cohort e↵ects from the analysis, gradee↵ects and period e↵ects are confounded. By com-paring outcomes from multiple cohorts, grade e↵ectsand period e↵ects can be separated, while possiblyreintroducing cohort e↵ects.Accelerated longitudinal design (ALD) studies are

a compromise between cross-sectional studies andlongitudinal single-cohort studies [21]. ALDs usedata from multiple overlapping cohorts beginning atdi↵erent ages to span a large age range while usingonly a few years of data. For example, Miyazakiand Raudenbush used the National Youth Surveythat contained data for 7 adjacent cohorts over 5years to study the development of antisocial atti-tudes from ages 11 to 21 [22]. ALDs study growthover a large age/grade range without needing to waitthe full time period as in longitudinal studies. Thisreduces the cost of the study as well as the attri-tion due to missing data, while producing results ina time period that allows for more political influ-ence. The techniques discussed in this paper can becategorized as single cohort designs or ALDs.

III. SETTING

Although we expect the methods of this paper tohave general utility, the possibility of testing themis due to a unique resource provided by the TexasEducation Research Center (ERC), which holds ex-tensive longitudinal information about Texas stu-dents, teachers, and schools [23]. The Texas Edu-cation Agency (TEA) provides the ERC with muchof their student-level data, including demographicsand standardized testing information. Starting in2003, the state-wide exam was the Texas Assess-ment of Knowledge and Skills (TAKS), which con-sisted of annual mathematics and reading exams be-tween 3rd and 11th grade, as well as periodic examsin other subjects [24]. In 2012, the exams startedtransitioning to the State of Texas Assessments of

Page 4: Longitudinal Predictions Using Regression-Corrected

4

Academic Readiness (STAAR), which had mostlythe same schedule of exams before high school butreplaced grade-specific high school exams with end-of-course subject exams [25, 26]. To use a consistentmetric, we mostly limit the focus of this paper tothe TAKS mathematics exams (2003-2012), exceptfor some future predictions that we make with theSTAAR data.

In our research, we have decided to use the rawpercent scores instead of the scaled scores providedby the TEA. While, psychometrically, scaled scoresare preferable to raw scores when comparing the re-sults from multiple exams, we feel justified in us-ing raw scores for several reasons. First, the examswere compiled from banks of items that were de-signed and tested to be similar in content and di�-culty [27]. Each year the exams were designed to bevery similar, resulting in a consistent raw score met-ric. Second, the raw score conversion tables [28] wereexamined and the raw scores corresponding with apassing score within each grade for the mathematicsTAKS exams was fairly consistent; the score usu-ally varied by only one question, occasionally bytwo questions, and rarely by three (out of approx-imately 40-50 questions). The maximum possibleraw score within a grade and subject was consistentover time. Third, the TEA used a one-parameterlogistic function as the item-response theory [27],which resulted in a one-to-one mapping of the rawscores to scaled scores; in this model, students withthe same raw score have the same “ability”. Fourth,the iterative process of estimating the abilities us-ing the one-parameter logistic model requires pro-prietary software designed for item response the-ory calculations and student-level data for specifictest items that are often unavailable for researchers.Fifth, the TEA converted raw to scaled scores usingscaling constants with a proprietary method basednot only on item response theory, but also subjectto the constraints that a scaled score of 2100 be apanel-recommended passing score, and 2400 a panel-recommended commended score. Sixth, the TEA’sscaling method switched from a horizontal scalingmethod (comparable between years within grade andsubject) to a vertical scaling method (comparablebetween years and grades within subject) in 2008due to changes in the Texas Education Code [6]; thisresulted in a large shift in scaled score ranges, with-out the ability to track longitudinal progress before2008. Therefore, on the basis of simplicity, trans-parency, and consistency, we have chosen to use rawscore percentages as our metric.

IV. FOUNDATION

The methods detailed in Bendinelli and Marder(2012) [9] establish a foundation for the RC stream-lines developed in this paper. We briefly describethem again here to provide a background for themodifications that follow.

A. Trajectories

In physics, trajectories are used to represent theposition of a physical object over time. In an educa-tion context, trajectories can be used to representscores over time. Using a coarse-grained binningprocedure, students are sorted into score bins de-termined by their TAKS mathematics score in 3rdgrade, the earliest exam in primary school. Specifi-cally, the score bin limits are determined by the rawpercent scores (90-100%, 80-90%, etc.). Groupingstudents into decile score bins provides a compro-mise between an overwhelming number of individ-ual student trajectories and a single aggregate tra-jectory, which would fail to convey most of the infor-mation available in the data. Once the groups havebeen established, the average scores for each groupare plotted for each grade, up to the exit mathe-matics exams in 11th grade. Therefore, trajecto-ries are single-cohort studies, depicting the observedaverage scores for the score bins over time. Decilescore bins were chosen in part because they are fa-miliar and correspond to intuition about ‘A’, ‘B’,‘C’, ‘D’, and ‘F’ students. For a discussion of group-ing from a more general point of view, see Guthrie(2018).[29] Only by grouping students according toprior academic performance and other characteris-tics are we able to understand e↵ects of educationalpolicy changes, for these work di↵erently on studentsof di↵erent backgrounds. However, as we will see,the process of grouping creates a set of technicalproblems that we have set ourselves in this paperto resolve.Figure 1 shows the trajectories for the cohort of

students who graduated high school in 2012. Thethickness of each line is proportional to the numberof students in that score bin. The average scoresin each grade are connected by linear segments, al-though the connection does not need to be linear.The performance peaks in 5th and 8th grade willbe discussed in Section VII; otherwise, the trajecto-ries are fairly flat, without intersections. This is notsurprising when averaging thousands of students, asprior performance is the strongest indicator of futureperformance. Students could be further aggregatedby demographic variables or course taking, for ex-ample, which would likely result in more movement

Page 5: Longitudinal Predictions Using Regression-Corrected

5

FIG. 1. Trajectories for the cohort of students who grad-uated in 2012. The students are sorted into percent scoredeciles by their maximum 3rd grade mathematics TAKSscore in 2003. The average scores for these groups of stu-dents are then plotted for the subsequent grades in thefollowing years, and these average scores are connectedin linear segments. The thickness of the trajectory isproportional to the number of students in that group.There are approximately 250,000 students included inthis analysis.

in the trajectories (see Section VIII).Trajectory plots are of fundamental interest be-

cause they track the average performance of a co-hort of students at all grades and performance levelsexactly. Therefore, trajectories provide an accuratedepiction of longitudinal student performance on alarge scale and they can be used to analyze the out-comes of policies in the long term. However, policiestypically have a shorter duration than the nine yearsof data that it takes to construct a full trajectory, soeducational policy has likely already changed by thetime the results can be analyzed with this method.In addition, attrition can introduce bias to the sam-ple and this becomes more of an issue with longerstudies. It is therefore necessary to identify othertechniques that permit more timely analysis.

B. Streamlines

Streamlines are used in fluid mechanics to repre-sent the motion of particles in a fluid. Velocity vec-tor fields are constructed from the average velocity ofparticles at various positions in a fluid. Streamlinesbegin at an initial position and then follow the flowcreated by the velocity vector field, conveying theanticipated motion of a particle in the fluid from thestarting position. In an education context, vectorfields can be constructed from the change in scoreover time to represent the flow of scores through the

FIG. 2. Vector field of score changes and correspond-ing streamlines for the cohort of 2012. The students aresorted into new bins each grade, and the average changein score for each group is calculated from that grade tothe next. The slope of an arrow is equal to the changein score for that grade and bin. Streamlines are con-structed, beginning with the average score in 3rd gradefor each bin and then interpolating changes in score fromthe vector field to determine the slope of each segmentof the streamline. These streamlines should capture theflow of student scores throughout the grades, however,the evident convergence of the streamlines due to RTMprevents this method from producing more meaningfulresults.

grades. Streamlines then represent the anticipatedscores over time for a student with an observed ini-tial score.Streamlines can utilize a single cohort design or an

ALD. Cohort streamlines, as in Figure 2, use datafrom a single cohort of students as they progressthrough school. The students are sorted into scorebins in each grade, and the change in score is calcu-lated for those groups from that grade to the next.This di↵ers from a trajectory where the students aresorted in only the initial grade, as the students instreamlines are re-grouped into score bins in eachgrade. The changes in score for each score bin andgrade transition comprise a vector field from whichstreamlines can be interpolated. The streamlinesbegin at the average scores for each score bin in3rd grade and then the slope of each segment is de-termined by a linear interpolation function for thatgrade transition, which is derived from the observedscore changes (it does not need to be linear).Snapshot streamlines, as in Figure 3, use an ALD

with data from eight sequential cohorts, followingeach for two consecutive years. Snapshot stream-lines are constructed in the same way as the cohortstreamlines, except each grade transition is repre-sented by a di↵erent cohort of students. In the first

Page 6: Longitudinal Predictions Using Regression-Corrected

6

FIG. 3. Vector field and corresponding streamlines foreach grade transition between 2003 and 2004. The stu-dents in the synthetic cohort are sorted into groups ac-cording to their score in 2003, and the average change inscore for each group is calculated from 2003 to the sub-sequent grade in 2004. The slope of an arrow is equal tothe average change in score between the two neighbor-ing grades for that bin. The streamlines begin with theaverage score for each bin in 3rd grade and then followthe flow designated by the vector field. The convergenceis a result of RTM.

year, the students from each cohort are sorted intoscore bins in their grade and the score change iscalculated for those groups from that year to thenext, when the students have progressed to the nextgrade. Therefore snapshot streamlines represent thescore flow throughout the grades while using onlytwo years of data. Assuming that di↵erent cohortsperform similarly, snapshot streamlines can be usedto predict longitudinal results from a single cohort.

Figures 4 and 5 show the snapshot streamlines for2003-2004 and 2012 cohort streamlines, each com-pared to the trajectories for the cohort of 2012. It isevident in both cases that the streamlines are con-verging, di↵ering from the relatively flat trajectories.The trajectories directly plot the average observedscores, so the discrepancies between the trajectoriesand streamlines points to an issue with the stream-lines, as they do not accurately represent the flowof scores through the grades. This inaccuracy is dueto RTM. The next section presents a simple theoryto explain why the iterative score binning processis causing strong RTM in the streamlines, and thenwe present a simple solution to create more accuratestreamlines.

FIG. 4. Snapshot streamlines for 2003-2004 and trajec-tories for the cohort of 2012. The snapshot streamlinesrepresent the synthetic cohort of 2003, whereas the tra-jectories represent the cohort of 2012. In the absenceof cohort e↵ects, the snapshot streamlines could be usedto predict the longitudinal performance of the students,which is represented by the trajectories. The conver-gence of the streamlines due to RTM makes the predic-tions from the snapshot streamlines inaccurate.

FIG. 5. Cohort streamlines and trajectories for the co-hort of 2012. Both plotting schemes are representingthe same cohort of students, however the streamlinesare more inclusive of students who join the cohort af-ter 3rd grade. Due to RTM, the streamlines convergeconsiderably compared to the trajectories. The stream-lines therefore do not accurately represent the flow ofstudent scores.

V. REGRESSION TO THE MEAN

Regression to the mean (RTM) is a bulk statisticalphenomenon, and is a consequence of the stochasticcomponent of the observed scores. RTM dependson the magnitude of these random fluctuations and

Page 7: Longitudinal Predictions Using Regression-Corrected

7

on the score distributions. RTM is exacerbated byselection or classification processes. In particular,by sorting students into score bins by their observedscores on a single exam, the random component oftheir score may force the student into a di↵erent binthan expected. For a score distribution with morestudents performing near the mean than at the ex-tremes, most of the “wrongly binned” students typi-cally perform closer to the mean of the distribution.On a second exam, the average for the score bin willregress toward the mean as the expected scores arecloser to the distribution’s average.

We use a simple theory to demonstrate the in-fluences of RTM on the trajectories and streamlineswith the understanding that observed data are neverso simple, but general behavior is best elucidated inthe simple case. Using conditional expectation val-ues for the exam scores [30], we can better under-stand the significance of RTM in the coarse-grainingprocedures. We invoke classical test theory [31, 32]to establish the relationship between the observedscore, the true score, and the random component. Ifxi

are the raw scores for exam i, then we can saythat x

i

= ti

+ ei

where ti

is the true score and ei

is the random component, or error score. The truescore, E(x

i

) = ti

, is unobserved and is defined to bethe expected score if a student were to be tested re-peatedly to average over short-term fluctuations dueto temporary circumstances such as a bad night’ssleep. The error scores for the repeated measure-ments come from a normal distribution with a meanof zero, and they are uncorrelated with each otherand with the true score. We can also define z-scores,zi

, such that zi

= (xi

�µ)/�x

i

, where �x

i

is the stan-dard deviation of x

i

and µ is the mean raw score orexpected score, E(x

i

) = µ. For z-scores, E(zi

) = 0and �

z

i

= 1 for all i.

To make concrete computations, we assume thatstudent scores on exams in successive years are lin-early correlated. This assumption holds for a bi-variate normal distribution, and it can also hold fornon-normal distributions. Figure 6 shows the scoredistributions and bivariate distributions for the 3rd,4th, and 5th grade scores of the 2012 cohort, as ex-amples of typical distributions in the ERC dataset.All of the score distributions are skewed towardhigher scores, with a ceiling e↵ect from the maxi-mum score. The 5th and 8th grade distributions, inparticular, had floor e↵ects due to a program calledStudent Success Initiative (see Section VII). Despitethese floor and ceiling e↵ects, the assumption of lin-ear conditional expectation values is not unreason-able.

Assuming linear conditional expectation values,the expected score on exam y given a score a on

0

5000

10000

15000

20000

0 25 50 75 100Percent Score

Freq

uenc

y

3rd Grade

0

5000

10000

15000

20000

0 25 50 75 100Percent Score

Freq

uenc

y

4th Grade

0

5000

10000

15000

20000

0 25 50 75 100Percent Score

Freq

uenc

y

5th Grade

0

25

50

75

100

0 25 50 75 100Grade 3 (2003)

Gra

de 4

(200

4)

0

25

50

75

100

0 25 50 75 100Grade 3 (2003)

Gra

de 5

(200

5)

Joint Distributions

0

25

50

75

100

0 25 50 75 100Grade 4 (2004)

Gra

de 5

(200

5)

FIG. 6. Score distributions for TAKS mathematics ex-ams in 3rd, 4th, and 5th grade, and correlations betweenstudents’ scores for each grade pair. The score distribu-tions for most of the exams are skewed as a result ofthe ceiling e↵ects from the maximum score. The 5thgrade and 8th grade exams have an additional floor ef-fect due to the Student Success Initiative, discussed inSection VII. The relationship between the scores for mostexams is roughly linear.

exam x is:

E(y|x = a) = E(y) + ⇢x,y

�y

�x

(a� E(x)), (V.1)

where ⇢x,y

is the Pearson correlation coe�cient be-tween the exams and � is the standard deviation.For z-scores, this expression simplifies to:

E(zy

|zx

= a) = ⇢z

x

,z

y

a. (V.2)

Following from the Cauchy-Schwarz inequality,|⇢

z

i

,z

j

| 1. Therefore, this expression explainsRTM; given the score on an exam, the expectedscore on another exam is closer to zero, the meanfor z-scores. The less correlated the exams, the moreRTM.This theory can be expanded to multiple years in

several ways. Trajectories are modeled by condi-tioning every exam by the initial exam, similar tohow the students are sorted into bins by their initialscores. The necessary conditional expectation valuesare:

E(z2|z1 = a) = ⇢z1,z2a

E(z3|z1 = a) = ⇢z1,z3a

E(z4|z1 = a) = ⇢z1,z4a

...

The sequence of expected scores is shown in Table Iand depends on the correlation coe�cients between

Page 8: Longitudinal Predictions Using Regression-Corrected

8

Trajectories Streamlines

Expression Equivalent CorrelationCoe�cients

Expression Equivalent CorrelationCoe�cients

&3 = a a &3 = a a

&4 = ⇢3,4a ⇢a &4 = ⇢3,4a ⇢a

&5 = ⇢3,5a ⇢a &5 = ⇢4,5⇢3,4a ⇢2a

&6 = ⇢3,6a ⇢a &6 = ⇢5,6⇢4,5⇢3,4a ⇢3a

&7 = ⇢3,7a ⇢a &7 = ⇢6,7⇢5,6⇢4,5⇢3,4a ⇢4a

TABLE I. Anticipated sequences of scores for trajectories and streamlines with respect to the initial score. The fullexpression and the expression for equivalent correlation coe�cients are given for several exams.

each exam and the first exam. If it is assumedthat all of the exam pairs have the same correlationcoe�cient ⇢

z

i

,z

j

= ⇢, this sequence would becomea, ⇢a, ⇢a, ⇢a, etc. Therefore the RTM takes placeduring the first grade transition but not thereafter.The assumption of equivalent correlation coe�cientsis not completely borne out in the data, as examstend to be less correlated with time, but it is illus-trative of the basic patterns of RTM.Streamlines are constructed using a first-order

Markov process, with each exam being conditionedonly by the previous exam. This process is describedby the following conditional expectation values:

E(z2|z1 = a) = ⇢z1,z2a

E(z3|z2 = b) = ⇢z2,z3b

E(z4|z3 = c) = ⇢z3,z4c

...

To create a continuous streamline, the score for oneexam is used as the initial condition for the next, sothat the anticipated values for the scores are

&1 = a

&2 = ⇢z1,z2a

&3 = ⇢z2,z3&2 = ⇢

z2,z3⇢z1,z2a

&4 = ⇢z3,z4&3 = ⇢

z3,z4⇢z2,z3⇢z1,z2a

...

For streamlines, the scores are proportional to theproduct of previous correlation coe�cients. Forequivalent correlation coe�cients ⇢, the sequence be-comes a, ⇢a, ⇢2a, ⇢3a, etc. It is obvious in this casethat the scores continually regress towards the meanof zero. The sequences for trajectories and stream-lines can be compared in Table I. While this theory issimple and requires assumptions that may not holdwith observed scores, Figure 5 exhibits RTM mainlyin the first segment of the trajectories but through-out the streamlines, aligning with the observationsfrom the theory.

VI. REGRESSION-CORRECTEDGROUPING

The sequence of anticipated scores in the tra-jectory framework demonstrates the magnitude ofRTM each year after a binning process. The ma-jority of the RTM occurs between the binning examand the next, in the first grade transition. Stream-lines are essentially several short trajectories strungtogether; each segment is the first grade transitionin a new trajectory. Since this is exactly the por-tion of the trajectory that contains the majority ofthe RTM, it is no surprise that the streamlines havesuch extreme RTM. However, the RTM in the tra-jectories is mostly resolved by the second grade tran-sition. Therefore, if the streamlines instead utilizedthe second grade transition for each short trajectory,the RTM would have already been mostly resolved.This is the premise of the new binning procedure,called regression-corrected (RC) grouping.For the uncorrected streamlines described above,

students are sorted by their scores in grade i fori 2 [3, 10] and the average change in score is calcu-lated for those groups between grades i to i+1. Thescores from the binning exams are included in thechange-in-score calculations. With RC grouping, wedelay the change-in-score calculation by one year. InRC streamlines, students are sorted by their scoresin grade i for i 2 [3, 9] and the changes in score arecalculated for those groups between grades i + 1 toi + 2. Excluding the binning exam scores from thechange-in-score calculations reduces the magnitudeof RTM. After the vector field is established fromthe change-in-score calculations, continuous stream-lines are constructed. In essence, the students foreach grade transition are sorted into score bins by aseparate exam from the exams used to calculate thechange in score. We have chosen to use the previ-ous exam of the same subject (mathematics) but itcould be any other exam that is reasonably well cor-related with the exams used to calculate the changein score. An example of using the reading score forsorting is discussed in Section VIII and it demon-

Page 9: Longitudinal Predictions Using Regression-Corrected

9

strates that RC streamlines can be constructed withonly two years of data.We can analyze RC streamlines using the condi-

tional expectation values, as above. For each bin-ning process, we calculate a pair of expectation val-ues conditioned on the binned scores. The necessarypairs of expectation values are:

⇢E(z2|z1 = a) = ⇢

z1,z2a

E(z3|z1 = a) = ⇢z1,z3a

⇢E(z3|z2 = b) = ⇢

z2,z3b

E(z4|z2 = b) = ⇢z2,z4b

⇢E(z4|z3 = c) = ⇢

z3,z4c

E(z5|z3 = c) = ⇢z3,z5c

...

The di↵erences of the two scores in each pair areused as the slopes for each of the vectors to con-struct the RC streamlines. A continuous stream-line is strung together by using the previous scoreas the initial condition for the next pair. The ini-tial conditions (b, c, etc.) can be calculated by set-ting the expectation values for the same exam equalto each other. For example, we would need to findb such that E(z3|z1 = a) = E(z3|z2 = b). Thisgives b =

z1,z3⇢

z2,z3a, which then allows us to calculate

E(z4|z2 = b). Using this method iteratively gives asequence of scores in the RC streamlines of

&1 = a (Not shown on the plot)

&2 = E(z2|z1 = a) = ⇢z1,z2a

&3 = E(z3|z1 = a) = ⇢z1,z3a

&4 = E(z4|z2 : E(z3|z2) = &3)

= E(z4|z2 : ⇢z2,z3z2 = ⇢

z1,z3a)

= E(z4|z2 =⇢z1,z3

⇢z2,z3

a) =⇢z2,z4⇢z1,z3⇢z2,z3

a

&5 = E(z5|z3 : E(z4|z3) = &4)

= E(z5|z3 : ⇢z3,z4z3 =

⇢z2,z4⇢z1,z3⇢z2,z3

a)

= E(z5|z3 =⇢z2,z4⇢z1,z3

⇢z3,z4⇢z2,z3

a) =⇢z3,z5⇢z2,z4⇢z1,z3⇢z3,z4⇢z2,z3

a

...

&n

=⇢z

n�2,zn⇢zn�3,zn�1⇢zn�4,zn�2 ...⇢z1,z3⇢z

n�2,zn�1⇢zn�3,zn�2 ...⇢z2,z3a

=

n�2Y

p=1

(⇢p,p+2)

n�2Y

q=2

(⇢q,q+1)

a (8n � 4)

Scores for Equivalent Correlation Coe�cients

Trajectories Streamlines Regression-CorrectedStreamlines

&3 a a a

&4 ⇢a ⇢a ⇢a

&5 ⇢a ⇢2a ⇢a

&6 ⇢a ⇢3a ⇢a

&7 ⇢a ⇢4a ⇢a

&8 ⇢a ⇢5a ⇢a

&9 ⇢a ⇢6a ⇢a

&10 ⇢a ⇢7a ⇢a

&11 ⇢a ⇢8a ⇢a

TABLE II. Comparison between the score sequenceswithin the trajectory, cohort streamline, and RC cohortstreamline frameworks, assuming every pair of exams hasthe same correlation coe�cient.

In this sequence, the coe�cients of the scores areratios of Pearson correlation coe�cients between theexams. Again, if we were to assume the same cor-relation coe�cients between all of the exams, thevalues in the sequence would be a, ⇢a, ⇢a, ⇢a, etc.,which is identical to the sequence for the trajectory.Table II shows the sequences for the special case

where the correlation coe�cients are the same be-tween every exam. The trajectory and RC stream-line sequences only have RTM between the 3rd and4th grade exams, whereas the streamline sequencecontinuously regresses toward the mean. Despite theunlikely assumption of equivalent correlation coef-ficients, this is an indication that the RC stream-lines may successfully reduce RTM. Table III showsthe expressions and values for the expected scoresin the trajectory, streamline, and RC streamlineframeworks, using the Pearson correlation coe�-cients from the ERC dataset for the cohort of 2012.While the correlation decreases as the exams are fur-ther apart in time, the RTM in the trajectories andRC streamlines are nearly identical, whereas the un-corrected streamlines exhibit extreme RTM.Figure 7 shows the RC cohort streamlines and the

trajectories for the cohort of students that gradu-ated in 2012. Overall, the two frameworks havevery similar results. The slight di↵erences betweenthe two frameworks are meaningful; RC streamlinescan include students who join the cohort after 3rdgrade. Trajectories require students to have a 3rdand 4th grade score whereas RC streamlines requireany three consecutive scores (grades i, i + 1, andi+2) within the cohort to be included for a segment(from grade i + 1 to i + 2). Therefore, trajectoriesare more inclusive of joiners before 5th grade and RCstreamlines are more inclusive after 5th grade. In the

Page 10: Longitudinal Predictions Using Regression-Corrected

10

Trajectories Streamlines Regression-Corrected Streamlines

Expression Value Expression Value Expression Value

&3 = a a &3 = a a &3 = a a

&4 = ⇢3,4a .73a &4 = ⇢3,4a .73a &4 = ⇢3,4a .73a

&5 = ⇢3,5a .67a &5 = ⇢4,5&4 .53a &5 = ⇢3,5a .67a

&6 = ⇢3,6a .65a &6 = ⇢5,6&5 .38a &6 =⇢4,6⇢4,5

&5 .64a

&7 = ⇢3,7a .63a &7 = ⇢6,7&6 .30a &7 =⇢5,7⇢5,6

&6 .62a

&8 = ⇢3,8a .62a &8 = ⇢7,8&7 .23a &8 =⇢6,8⇢6,7

&7 .59a

&9 = ⇢3,9a .58a &9 = ⇢8,9&8 .18a &9 =⇢7,9⇢7,8

&8 .57a

&10 = ⇢3,10a .58a &10 = ⇢9,10&9 .15a &10 =⇢8,10⇢8,9

&9 .56a

&11 = ⇢3,11a .54a &11 = ⇢10,11&10 .12a &11 =⇢9,11⇢9,10

&10 .53a

TABLE III. Comparison between the score sequences within the trajectory, cohort streamline, and RC cohort stream-line frameworks, using the Pearson correlation coe�cients computed from the data for the cohort of 2012.

FIG. 7. RC cohort streamlines and trajectories for thecohort of 2012. The RC streamlines and the trajecto-ries correspond very well. The slight di↵erences are dueto the students who join or leave the cohort. Before5th grade, trajectory plots are slightly more inclusive ofjoiners. After 5th grade, RC cohort streamlines are moreinclusive.

data, this corresponds with lower performance in theRC streamlines after 5th grade, as mobility has beenshown to be correlated with lower performance [33].Not only do RC cohort streamlines produce accuratelongitudinal results, but they can better capture theperformance of temporary cohort members.

RC streamlines can also utilize an AcceleratedLongitudinal Design, reducing the required years ofdata to only three years. When using mathematicsscores exclusively, the students are sorted by theirscore in a first year and then the changes in scoreare calculated between a second and third year. Fig-ure 8 shows the RC snapshot streamlines for 2003-2005 compared to the trajectory for the cohort of

FIG. 8. RC snapshot streamlines for 2003-2005 and tra-jectories for the 2012 cohort. The 8th grade peaks ob-served in the lower-performing trajectories (2008), whichare missing in the RC snapshot streamlines (2003-2005),are a result of the Student Success Initiative and the Ac-celerated Mathematics Instruction, which was providedto low-performing 8th grade students starting in 2008.

2012. While the two frameworks are fairly com-parable, there is an 8th grade performance peak inthe trajectories that is missing in the RC snapshotstreamlines. Again, this di↵erence is meaningful andit demonstrates a change in policy that a↵ected the8th graders by 2008 (trajectories) but not in 2005(RC snapshot streamlines) called the Student Suc-cess Initiative.

VII. APPLICATION: SSI

The Student Success Initiative (SSI), establishedin 1999, encompasses several initiatives intended

Page 11: Longitudinal Predictions Using Regression-Corrected

11

to promote grade-level performance in mathemat-ics and reading in Texas. One component of theSSI is a set of grade promotion requirements in 5thand 8th grade; if students fail the mathematics orreading exam three times then they will be retainedin 5th or 8th grade for another year unless theyare excused by a committee [2]. In addition, strug-gling students in every grade, especially in the high-stakes 5th and 8th grades, can receive additional in-struction called Accelerated Mathematics/ReadingInstruction (AMI/ARI) [34].

SSI was implemented gradually along with the co-hort of 2012. In the 1999-2000 school year, SSI onlyimpacted Kindergarten students, and the programexpanded to include an additional grade each year[35]. Therefore, the first 5th grade retention andARI/AMI occurred in the 2004-2005 school year,and the first 8th grade retention and ARI/AMI oc-curred in the 2007-2008 school year. This timelineof implementation may explain the similarities anddi↵erences in performance in Figure 8. The studentsin the trajectories were the first cohort of studentsto be a↵ected by SSI in each grade. The students inthe RC snapshot streamlines were only impacted bySSI up through 5th grade. Therefore, a performancepeak can be seen as a result of the 5th grade high-stakes exam in both frameworks, but the 8th gradepeak is only seen in the trajectories.

Figure 9 shows the RC snapshot streamlines from2007-2009, after the SSI impacted students up to8th grade. The 8th grade performance peak is evi-dent in both the trajectories and the RC snapshotstreamlines. It should be mentioned that the fund-ing for SSI has been reduced dramatically in recentyears, dropping from over $150 million annually in2011 [35] to $5.5 million in the 2018-2019 bienniumbudget [36].

Overall, the RC snapshot streamlines and trajec-tories are remarkably similar despite the significantdi↵erence in the number of years of testing datarequired for the plot: trajectories use 9 years ofdata whereas RC snapshot streamlines only use 3years. Therefore, in the absence of policy changes,RC snapshot streamlines can make accurate longitu-dinal predictions and in the event of a policy change,RC snapshot streamlines before and after can becompared to identify the e↵ects of the intermediateintervention. RC snapshot streamlines could be apowerful tool for education researchers, particularlythose who work at State Education Agencies and arerequired to produce rapid results to influence policy[37].

FIG. 9. RC snapshot streamlines for 2007-2009 and tra-jectories for the 2012 cohort. The double peaks in 5thand 8th grade, due to SSI and AMI, are observed inboth the trajectories and the RC snapshot streamlines.These interventions were provided to low-performing K-8th grade students.

VIII. OTHER APPLICATIONS

A. Disaggregation

In addition to grouping students by their observedscores, the students in RC streamlines and trajec-tories can be grouped according to other variablesof interest. For example, students could be disag-gregated by demographic variables to study equityand performance disparities. In the Texas Longi-tudinal Data System, some of the available demo-graphic variables include gender, socio-economic sta-tus (SES) with respect to free or reduced lunch sta-tus, and race/ethnicity.Figure 10 shows the trajectories for the cohort of

2012 disaggregated by gender. Male and female stu-dents perform similarly, with female students per-forming slightly better on the TAKS mathematicsexams. Figure 11 shows the trajectories for the low-income students who receive free or reduced lunch,disaggregated by race/ethnicity. The trajectories foreach score bin are separated into subplots to high-light the within-bin di↵erences.Students can also be disaggregated by course tak-

ing. Figure 12 shows the trajectories for the studentsin the cohort of 2012, disaggregated by the highestlevel of physics that they took in high school. InTexas, the students could take advanced placement(AP) physics, basic physics, or integrated physicsand chemistry (IPC). For the cohort of 2012, thestudents were required to take at least basic physicsto satisfy the Recommended High School Program,although IPC was su�cient for the Minimum High

Page 12: Longitudinal Predictions Using Regression-Corrected

12

FIG. 10. Trajectories for the cohort of 2012 disaggre-gated by gender. There are minimal performance dis-parities associated with di↵erences in gender.

School Program. The high school science require-ments in Texas changed in 2014, removing the basicphysics requirement [38].

B. Binning by Alternate Exam

One unfortunate consequence of using the RCgrouping process for snapshot streamlines is the ne-cessity of a third year of data. However, RC snap-shot streamlines can be constructed using only twoyears of data if an alternate exam is used for binningthe students. Students in Texas were assessed inboth mathematics and reading annually; therefore,the reading exam is a consistent alternate exam forsorting the students into score bins. Figure 13 showsthe RC snapshot streamlines of 2008-2009, showingthe flow in mathematics scores with students sortedby their reading scores in 2008. This demonstratesthat accurate RC snapshot streamlines can be con-structed with only two years of data; the quanti-tative improvements to the streamlines through theRC process do not require additional data collectiontime.

C. Future Predictions

To make future predictions of student perfor-mance in Texas it is necessary to use data from anew exam whose roll-out started in 2011-2012, theState of Texas Assessments of Academic Readiness(STAAR). Not enough time has passed since STAARwas first implemented to construct a full trajectoryplot, although RC snapshot streamlines can be con-structed. Figure 14 shows the partial trajectory for

FIG. 11. Trajectories for the low-income students in thecohort of 2012, disaggregated by race/ethnicity. Lat-inx/Hispanic and White low-income students performvery similarly.

the first cohort of students who took STAAR in3rd grade (6th graders in 2015) and the RC snap-shot streamlines from 2012-2014. The dip in per-formance in 2015 compared to the predictions maybe real. The STAAR mathematics performances in2015 were so low compared to the State’s expecta-tions that the Texas Education Commissioner de-cided not to use the mathematics exams for any ac-countability measures [39]. The cause for the dropin performance could be a change in mathematicscurriculum standards that was implemented in the2014-2015 school year [40], a↵ecting the spring 2015scores, or it could be the delayed result of budgetcuts that started in 2011 [41].

IX. CONCLUSIONS

We have developed a new method that can visual-ize, analyze, and predict longitudinal student testingdata. RC streamlines improve on the streamlinescreated by Bendinelli and Marder [9] by reducingthe e↵ects of RTM. Consequently, the RC stream-lines can be used to predict long-term student test-

Page 13: Longitudinal Predictions Using Regression-Corrected

13

FIG. 12. The trajectories for the cohort of 2012 disaggre-gated by highest level of physics course taking. Despitehaving similar 3rd grade scores, AP physics students out-perform their basic physics and IPC counterparts.

FIG. 13. RC snapshot streamlines for 2008-2009 andtrajectories for the 2012 cohort. The students are sortedinto score bins by their 2008 reading exams and thechanges in score are calculated between their 2008-2009mathematics exams.

ing performance with only two or three consecutiveyears of data. RC streamlines can also be used todetermine the outcomes of interventions by compar-ing predictions derived before and after the inter-vention. This technique has the potential to analyzeoutcomes rapidly enough for the results to influencepolicy decisions.Trajectory plots are a visualization tool to ob-

serve long-term trends in the scores for groups ofstudents that are determined by the initial scores.Trajectories are a fundamental longitudinal analy-sis technique, as they show the average observedscores over time for the students in each initial scorebin. By coarse-graining the students into score bins,

FIG. 14. Partial trajectories for the students whowere 6th graders in 2015 and RC snapshot streamlinesfor 2012-2014. The dip in observed performance in2015 caused the Texas Education Commissioner to can-cel accountability measures related to the mathematicsscores [39].

trajectories provide a compromise between the over-whelming information at the individual student leveland the loss of information in the full aggregate,which groups completely di↵erent types of studentstogether. Trajectories act as a source of comparisonfor longitudinal prediction methods, such as snap-shot streamlines. The longitudinal nature of thetrajectory plots causes some limitations. The tra-jectories are not inclusive of students who join thecohort after the initial grade, and similar to otherlongitudinal techniques, trajectories su↵er from at-trition. Full trajectories require almost a decade ofdata to follow the scores of students throughout pri-mary and secondary school; therefore, they are onlypossible to assemble after the students have alreadygraduated.Streamline plotting, as described in Bendinelli and

Marder [9], is a tool inspired by techniques in statis-tical mechanics that models the movement of parti-cles in a fluid. Streamlines in an education contextuse the changes in score for students in each score binand for each grade transition to represent the flow ofscores through the grades. The streamlines can havea single-cohort design, known as cohort streamlines,or they can utilize an ALD with two years of data,called snapshot streamlines. However, the repeatedsorting of students into score bins results in an ex-aggerated RTM, rendering the streamlines quantita-tively inaccurate.RC streamlines use a separate exam to create the

score bins from the exams used to calculate thechanges in score. In this way, the RTM is reducedconsiderably, resulting in acceptably accurate depic-

Page 14: Longitudinal Predictions Using Regression-Corrected

14

tions of score flow. RC cohort streamlines are analternative to trajectories, as both follow a singlecohort of students throughout school; however, theRC cohort streamlines include additional studentswho join the cohort after the initial exam. RC snap-shot streamlines use three consecutive years of datato make predictions of the score flow throughout thegrades; the first year is used to sort the students intoscore bins and the second and third years are used tocalculate the change in score. RC snapshot stream-lines can also be constructed from only two years ofdata, as long as the sorting exams are not used inthe change-in-score calculations. We demonstratedthis by using the reading exams to form score binsand the mathematics exams for the changes in score.Deviations between the RC snapshot streamlines

and trajectories are indications of a policy change.Similarly, di↵erences between RC snapshot stream-lines from di↵erent years identify the e↵ects of in-termediate interventions. This was demonstrated byidentifying the e↵ects of SSI in the Texas State Lon-gitudinal Data System. SSI was implemented alongwith the cohort of students who graduated in 2012and it mandated especially high-stakes mathematicsand reading exams in 5th and 8th grade. By com-paring the RC snapshot streamlines from 2003-2005to those from 2007-2009, the e↵ects of the 8th gradeSSI policies can be identified.RC grouping, trajectories, and RC streamlines

can be expanded in several ways. Students canbe further disaggregated by other variables, suchas demographics or course taking, to identify thebetween-person di↵erences in within-person changesover time. Furthermore, RTM is not only a con-cern when creating streamline plots. Texas is imple-menting new school accountability measures begin-ning partially in the 2018-2019 school year and fullyby the 2019-2020 school year that use the relativescore classifications of the students from one yearto the next as a component of the metric [42]. Asdemonstrated with the streamlines and trajectories,classifying students based on observed scores leadsto RTM. Therefore, RTM can be expected to a↵ectthe new school accountability measures and to havepotentially profound consequences for Texas schools.

X. ACKNOWLEDGEMENTS

This work was supported by the National Mathand Science Initiative and by the National ScienceFoundation Grants No. 1557410, No. 1557273, No.1557295, No. 1557294, No. 1557276, No. 1557278,No. 1557286, and No. 1557290. Any opinions, find-ings, and conclusions or recommendations expressedin this material are those of the authors and do not

necessarily reflect the views of the National ScienceFoundation.

Page 15: Longitudinal Predictions Using Regression-Corrected

15

[1] Editorial Projects in Education ResearchCenter. Issues a-z: No child left be-hind: An overview. education week. http:

//www.edweek.org/ew/section/multimedia/

no-child-left-behind-overview-definition-summary.

html/, April 2015.[2] Texas Education Agency. Student Success Initiative

Manual (2017). http://tea.texas.gov/student.

assessment/SSI/.[3] Cory Koedel, Kata Mihaly, and Jonah Rocko↵.

Value-added modeling: A review. Economics of Ed-ucation Review, 2015.

[4] Eric Hanushek. The economic value of higherteacher quality. Economics of Education Review,30(3):266–479, 2011.

[5] Steven Glazerman, Ali Protik, Bing ru Teh, JulieBruch, and Je↵rey Max. Transfer incentives forhigh-performing teachers: Final results from a mul-tisite randomized experiment. United States Depart-ment of Education, 2013.

[6] Texas Education Code. Title 2, Subtitle H, Chapter39: Public School System Accountability.

[7] G. Fitzmaurice, M. Davidian, G. Verbeke, andG. Molenberghs, editors. Longitudinal Data Analy-sis. Chapman and Hall/CRC, 2009.

[8] M. Marder and D. Bansal. Flow and di↵usion ofhigh-stakes test scores. Proceedings of the NationalAcademy of Sciences, 106(41):17267–17270, 2009.

[9] Anthony Bendinelli and Michael Marder. Visual-ization of Longitudinal Student Data. Physical Re-view Special Topics - Physics Education Research,8(020119), 2012.

[10] Lesa Ho↵man. Longitudinal Analysis: ModelingWithin-Person Fluctuation and Change. Routledge,2015.

[11] Stephen Raudenbush and Anthony Bryk. Hierarchi-cal Linear Models: Applications and Data AnalysisMethods. Sage Publications, 2 edition, 2002.

[12] Judith Singer and John Willett. Applied Longitu-dinal Data Analysis: Modeling Change and EventOccurence. Oxford University Press, 2003.

[13] John McArdle and John R. Nesselroade. Longitudi-nal Data Analysis Using Structural Equation Mod-els. American Psychological Association, 2014.

[14] Patrick Curran and Bengt Muthen. The applica-tion of latent curve analysis to testing developmen-tal theories in intervention research. American Jour-nal of Community Psychology, 27(4), 1999.

[15] Bengt Muthen and Siek-Toon Khoo. Longitudi-nal studies of achievement growth using latent vari-able modeling. Learning and Individual Di↵erences,10(2):73–101, 1998.

[16] Daniel Nagin. Analyzing developmental trajecto-ries: A semiparametric, group-based approach. Psy-chological Methods, 4(2):139–157, 1999.

[17] Heather Andru↵, Natasha Carraro, AmandaThompson, Patrick Gaudreau, and Benoit Louvet.Latent Class Growth Modelling: A Tutorial. Tutori-als in Quantitative Methods for Psychology, 5(1):11–

24, 2009.[18] S. Raudenbush, A. Bryk, and R. Congdon. HLM

7.01 for Windows [Computer Software]. ScientificSoftware International Inc., Skokie, IL, 2013.

[19] L.K. Muthen and B.O. Muthen. Mplus User’sGuide. Muthen and Muthen, Los Angeles, CA, 8edition, 2017.

[20] Norman B. Ryder. The cohort as a concept in thestudy of social change. American Sociological Re-view, 30(6):843–861, December 1965.

[21] Richard Bell. Convergence: An Accelerated Longi-tudinal Approach. Child Development, 24(2):145–152, June 1953.

[22] Yasuo Miyazaki and Stephen Raudenbush. Tests forlinkage of multiple cohorts in an accelerated longi-tudinal design. Psychological Methods, 5(1):44–63,2000.

[23] The University of Texas at Austin. Texas educa-tion research center. https://research.utexas.

edu/erc/.[24] Student Assessment Division. Technical Digest

2009-2010, 2010.[25] Texas Education Agency. State of Texas Assess-

ment of Academic Readiness. http://tea.texas.

gov/student.assessment/staar/.[26] Texas Education Agency. House Bill 3 Tran-

sition Plan. http://tea.texas.gov/student.

assessment/hb3plan/.[27] Student Assessment Division. Technical Digest

2014-2015, 2015.[28] Texas Education Agency. Taks raw score con-

version tables. https://tea.texas.gov/student.

assessment/taks/convtables/.[29] Matthew Guthrie. Grouping and Comparing Texas

High Schools Through Machine Learning and Vi-sualization Techniques. PhD thesis, University ofTexas at Austin, 2018.

[30] John R. Nesselroade, Stephen M. Stigler, andPaul B. Baltes. Regression toward the mean and thestudy of change. Psychological Bulletin, 88(3):622–637, 1980.

[31] Melvin R. Novick. The axioms and principal re-sults of classical test theory. Journal of Mathemat-ical Psychology, 3:1–18, 1966.

[32] Frederic M. Lord and Melvin R. Novick. Statisti-cal Theories of Mental Test Scores. Addison-WesleyPublication Corporation, 1968.

[33] Simon Tidd and Sonia Dominguez. Chronicabsenteeism and student mobility. http://

e3alliance.org/wp-content/uploads/2017/04/

E3-3D-Mobility-Chronic-Absence-040417.pdf,2017.

[34] Texas Education Agency. The StudentSuccess Initiative: An Evaluation Report(2009). https://tea.texas.gov/WorkArea/

DownloadAsset.aspx?id=2147490898.[35] Texas Education Agency. The Student Success

Initiative: 2009-2010 Biennium Evaluation Report,2011.

Page 16: Longitudinal Predictions Using Regression-Corrected

16

[36] Texas 85th Legislative Session. SB1 General Ap-propriations Bill. https://capitol.texas.gov/

tlodocs/85R/billtext/pdf/SB00001F.pdf, 2017.[37] Carrie Conaway, Venessa Keesler, and Nathaniel

Schwartz. What Research Do State EducationAgencies Really Need? The Promise and Limita-tions of State Longitudinal Data Systems. Educa-tional Evaluation and Policy Analysis, 37(1S):16S–28S, May 2015.

[38] Texas Education Agency. State Graduation Re-quirements. http://tea.texas.gov/graduation.

aspx.[39] Je↵rey Weiss. Three things you should know about

the 2015 STAAR results. The Dallas Morning News,

May 2015.[40] Texas Education Agency. Mathematics Texas Es-

sential Knowledge and Skills. https://tea.texas.gov/index2.aspx?id=2147499971.

[41] M. Marder and Chandra K. Villanueva. Conse-quences of the texas public school funding holeof 2011-2016. https://forabettertexas.org/

images/EO_2017_09_SchoolFinance_ExecSum.pdf,October 2017.

[42] Texas Education Agency. 2015-16 A-F Ratings, De-cember 2016.