Upload
hangoc
View
220
Download
1
Embed Size (px)
Citation preview
BLUE ENGINE | 1
1 See Appendix A for disaggregated results over the past three years and Appendix B for an explanation of predictive models2 Results in ELA were mixed, with students performing roughly as predicted by counterfactual models. See page 5 for a more in-depth discussion of performance on the ELA Regents exams.
EXECUTIVE SUMMARYThis report provides an overview of results on measures of student performance as measured by New York State
Regents Examinations and analytical writing growth for Year 4 (AY 2013-2014) across three subjects, five school sites,
and 944 students. These results provide evidence of impact and rank among our strongest to date.
Across Algebra and Geometry, we see clear, convincing growth in performance gains, evidence that Blue Engine has
become more effective at accelerating student achievement over the past three years. The majority of gains, over
average predicted levels, are the highest to date.1
In English Language Arts, student results show evidence of growth in the number of students performing at
college ready levels compared to last year’s cohort of students at BLA2 , the only school where students took the
exam this year. 2
To assess impact, Blue Engine employs a rigorous modeling standard of comparing actual performance on end of
year exams “with Blue Engine” to counterfactual performance, i.e. how students would have performed “without Blue
Engine.” We aim to increase the magnitude of actual performance over predicted performance on state standardized
tests across three key measures: (1) average scores, (2) passing rates, and (3) college ready rates as part of a mission to
prepare dramatically greater numbers of students for college-level work. Over time, our results will shift from “single-
year snapshots” to cumulative college readiness as measured by the percent of graduating seniors who are prepared
for college-level work.
Table 1. 2013-14 Actual vs. Predicted Scores among Blue Engine students, including results from paired t-tests (statistical significance) on the differences between actual and predicted scores/rates. (Note: “CC Algebra” refers to Common Core Algebra.)
Average Scores % Passing % College Ready
Without
BE
With
BE
ImpactΔ Without
BE
With
BE
ImpactΔ Without
BE
With
BE
ImpactΔ
Algebra 64.0 71.6 7.6** 59.1 81.4 22.3** 9.4 29.8 20.4**
CC Algebra 57.5 61.6 4.1** 34.2 55.0 20.8** 2.1 0.0 -2.1**
Geometry 59.2 64.8 5.6** 37.4 52.5 15.1** 5.9 8.1 2.2
ELA 66.7 66.6 -0.1 75.2 71.4 -3.8 29.9 29.6 -0.3
**p<.01, demonstrating strong statistical significance; see Appendix E for a discussion of statistical significance based on paired t-tests
At the same time, new academic challenges have emerged. The central challenge remains moving students above the
college ready threshold in addition to helping improve performance overall. While gains – measured as average gains
(i.e. shifting the bell curve) – were our strongest to date, we must focus on pushing increasingly significant numbers
of students to meet college readiness benchmarks. Additionally, the shift to teaching Common Core standards causes
significant instructional challenges, with the added complication of school-by-school variation in which standards
(traditional or Common Core) were prioritized this year. While these transitions continue to challenge educators and
students, Blue Engine is committed to taking an elevated leadership role in defining and achieving measurable success
in the Common Core era.
BLUE ENGINE | 2
3 Readers may notice that some Year Three (2012-2013) predictions changed from last year to this year in order to enable and implement the same models to compare results across multiple years. Please refer to Appendix B for more detail.
INTRODUCTIONBlue Engine partners with public high schools serving low-income communities to increase
academic rigor and prepare greater numbers students for postsecondary success. We recruit,
train, and support Blue Engine Teaching Assistants (BETAs), recent college graduates who work
alongside teachers in daily math and literacy classes, dramatically reducing instructor-to-student
ratios from 1:30 to 1:6. By rethinking how human capital is used in schools, Blue Engine helps elevate
levels of college readiness in low-income communities while training a new wave of educators to
enter the teaching profession. Founded in 2009, Blue Engine is backed by leading performance-
based social impact funders including Robin Hood Foundation, Michael and Susan Dell Foundation,
New Schools Venture Fund, Tiger Foundation, Heckscher Foundation for Children, Edna McConnell
Clark Foundation’s PropelNext Initiative, Echoing Green, and Draper Richards Kaplan Foundation.
Blue Engine’s mission is marked by the pursuit of a clear and measurable long-term outcome:
dramatic increases in the number of graduating seniors at partner high schools who complete
one year of postsecondary education without the need for remedial coursework and who enroll
in a second year of postsecondary education. A highly data-driven organization, Blue Engine
is developing a multidimensional approach to defining college readiness, measuring student
performance during the high school career in terms of: standards mastery (state exams, writing
rubric); enrollment in advanced coursework; and college-access activity (SAT/ACT preparation,
FAFSA, and application supports). Each of these progress indicators is aligned to postsecondary
degree completion.
With cohorts in the 9th and 10th grade across five district schools, Blue Engine measured impact
in the 2013-14 school year using scores on the New York State Regents Examinations in Integrated
Algebra, Geometry, and English Language Arts. Blue Engine aims to increase the average score
and the number of students meeting both the passing and college ready benchmarks.
Blue Engine partners with Glass Frog Solutions (Rebecca Casciano, PhD) to compare actual
performance on end of year exams “with Blue Engine” to counterfactual performance, i.e. how
students would have performed in our absence. To avoid comparing our students to an external
comparison group that might be dissimilar in composition for any number of reasons, Glass Frog
uses district-wide data to develop models that incorporate historical academic and demographic
student data to predict performance “without Blue Engine.”
Why predict? Counterfactual forecasts provide Blue Engine with a baseline that serves a dual
purpose: (1) to enable Blue Engine to set goals for student performance that are rigorous yet
realistic because they are rooted in knowledge of how similar students have performed historically
and (2) to evaluate the program’s impact on performance with statistical evidence of improvement
in scores due to the program. The method enables Blue Engine to set goals and measure impact
by looking at average scores, distribution curves, and the portion of students hitting passing and
college ready benchmarks.3 In the 11th and 12th grades, as our program expands, this will allow
for a more limited, targeted approach to student support based on maximizing the numbers of
college ready graduates.
ABOUT BLUE ENGINE
MEASURING IMPACT
BLUE ENGINE | 3
4 These 242 total students represent 57% of our total IA exam takers this year; the other 43% (discussed on page 6) took the “new” Common Core exam. This number also excludes 44 students labeled “truant” who were chronically absent, i.e. 50% of the year or more; this number also excludes an additional 12 students who were absent the day of the test (no change in method from last year, enabling comparisons). Appendix C provides a breakdown of students’ year in school for each subject and school, as well as a detailed accounting of students in our program versus students who took the exam.
** Strong statistical significance based on paired t-tests. Refer to Appendix E.5 Peer schools are those New York City public high schools with the most similar student populations based on the following characteristics: average 8th grade ELA proficiency, average 8th grade math proficiency, percent students with disabilities, percent students with self-contained placements, and percent over-age students.
INTEGRATED ALGEBRA (IA)Student performance in non-Common Core Algebra demonstrated gains on an average, passing, and college ready
basis, as well as clear evidence that Blue Engine has become more effective at accelerating student achievement over
the past three years. For Integrated Algebra Regents Examinations, raw scores are “scaled” to fall between 0-100, with
passing set at 65+ and college ready at 80+. (See page 6 for performance on Common Core Algebra.)
242 total students took the exam this year.4
The passing rate increased by 22.3 percentage points**,
compared to last year’s 7 point gain over predicted and
a 2 point loss two years ago.
The college ready rate increased by 20.4 percentage
points**, compared to last year’s gain of 7 points over
predicted and a 2 point loss two years ago.
On average, students were 3.7 times more likely to pass
the Integrated Algebra exam and 7.6 times more likely
to score college ready than their peers districtwide, after
controlling for prior scores, students’ unique learning
needs (IEP and ELL status), and school characteristics.
The average score on the IA exam this year was
71.6 relative to predicted scores of 64, a gain of 7.6
scale points** over predicted levels. Overall growth
– the rightward shift in the distribution curve – was
substantial this year, equaling .76 standard deviations,
a 162% increase over last year’s growth of .29 standard
deviations and significantly higher than two years ago
when gains were approximately equal to predictions.
Average Score
% Pass % College Ready
Blue Engine Schools 71.6 81.4 29.8
District Peer Schools5 62.2 55.9 5.9
College Readiness & Pass Rates on NYS Regents Exams Score Distribution on NYS Regents Exams
BLUE ENGINE | 4
6 These 222 total students represent 79% of total Geometry students this year; the other 21% (58 students) are excluded for two reasons. 28 are labeled “truant” because of chronic absence issues defined by missing 50% of the year or more; the remaining 30 students were absent the day of the test (no change in method from last year)
** Strong statistical significance based on paired t-tests. Refer to Appendix E.7 Note: though it appears in the graphic / distribution-curves that the actual college ready rate is higher than the predicted rate, this is an artifact of the model used to predict student scores. In other words, the 8% actual vs. 6% predicted figures are accurate. Why the difference between numbers and graphics? Specifically, we use OLS regression models to predict students’ scores on the exam – the correct model to use when predicting a continuous outcome like test scores; when predicting movement above a fixed threshold (like 80+) we use a different model (logistic regression) designed to predict binary outcomes like passing vs. not passing.
GEOMETRYStudent performance in Geometry demonstrated gains on an average and passing basis, as well as evidence that
Blue Engine has become more effective at accelerating student achievement over the past three years. For Geometry
Regents Examinations, raw scores are “scaled” to fall between 0-100, with passing set at 65+ and college ready at 80+.
222 total students took the exam.6
The passing rate increased by 15.1 percentage points**,
compared to a 13.7 point gain over predicted last year and
3 point gain two years ago. On average, students were
more than twice as likely to pass the geometry exam as
their peers districtwide, after controlling for the variables
in the model.
The college ready rate increased by 2.2 percentage
points, compared to a 3 point gain over predicted last
year and 1 point gain the year prior. We must continue
to focus on finding ways of getting greater shares of
students to college ready levels (80+) in Geometry, and
have a team dedicated to understanding the issue in
greater detail and making recommendations.
The average score on the Geometry exam was 64.8
this year relative to predicted scores of 59.2, a gain
of 5.6 percentage points* over predicted levels.
This rightward shift on the distribution curve is, like
Algebra, gaining strength over time, equaling a .51
standard deviation gain over predicted scores relative
to last year’s increase of .30 standard deviations over
predicted and a .15 standard deviation gain posted two
years ago at WHEELS.7
Average Score
% Pass % College Ready
Blue Engine Schools 64.8 52.5 8.1
District Peer Schools 59.9 39.3 5.8
College Readiness & Pass Rates on NYS Regents Exams Score Distribution on NYS Regents Exams
BLUE ENGINE | 5
8 Please refer to Appendix D for context on how Blue Engine tracked progress for students who did not take the state exam.9 These 98 total students represent 83% of total eligible ELA students this year; the other 17% (20 students) are excluded for one reason: they are labeled “truant” because of chronic absence issues defined by missing 50% of the year or more; zero students were absent the day of the test (no change in method from last year).
ENGLISH LANGUAGE ARTS (ELA)
Students at BLA2 (n=98) were the only Blue Engine students who took the ELA exam this year. That is, only one teacher
– former BETA, current Teach For America corps member Sabine Chisty – prepared students (10th graders) for and
administered the ELA Regents Exam.8 For English Language Arts, raw scores are “scaled” to fall between 0-100, with
passing set at 65+ and college ready at 75+.
Regents scores this year were significantly stronger than last year at BLA2, especially at the higher “college ready”
threshold; 30% of students at BLA2 scored college ready this year, compared to 16% last year at BLA2 – a rate increase
of 14 percentage points.
98 total students took the exam.9
On a predicted vs. actual basis (Table 2), students
performed as predicted on the college ready rate (0.3
percentage points lower than predicted) and in terms
of average scores (0.1 percentage points lower than
predicted). The passing rate was slightly (3.8 percentage
points) lower than predicted.
With predicted achievement nearly identical to actual
achievement levels, the data suggests that Blue Engine
helped BLA2 students “catch up” to their predictions,
when previous cohorts had underperformed. When
compared to 10th graders at peer horizon schools and
districtwide, BLA2 10th grade students performed
better than their peers, who underperformed more
dramatically relative to their predictions, on average.
College Readiness & Pass Rates on NYS Regents Exams,2012-13 Cohort vs. 2013-14 Cohort at BLA2
Score Distribution on NYS Regents Exams, Predicted vs. Actual Performance
Average Scores % Passing % College Ready
Predicted Actual ImpactΔ Predicted Actual ImpactΔ Predicted Actual ImpactΔ
Blue Engine (10th) 66.7 66.6 -0.1 75.2 71.4 -3.8 29.9 29.6 -0.3
Peer Horizon (10th) 69.8 67.5 -2.3 79.3 71.6 -7.7 41.3 36.5 -4.8
District (10th) 73.9 72.1 -1.8 85.0 79.5 -5.5 54.2 50.6 -3.6
Table 2. 2013-14 Actual vs. Predicted Scores
BLUE ENGINE | 6
COMMON CORE ALGEBRACommon Core standards require new methods of instruction – productive struggle, greater critical
thinking, and emphasis on compound word problems – that represent a significant departure from
traditionally taught standards and pose challenges to teachers and students alike. For the first
time this year (AY 2013-14), schools faced a decision: would teachers align instruction with either
the new Common Core Integrated Algebra Exam or stick with the traditional one?
From a student and school accountability perspective, state officials allowed principals to count
whichever score was highest for the purposes of high school graduation this year; ultimately, the
decision on which curriculum to deploy rested on (a) each school’s comfort level teaching new vs.
old material and (b) whether schools believed students would be better served by accelerating vs.
delaying Common Core Exam results for their student population. Blue Engine does not control
school-level decisions regarding curriculum and standards adoption, which left our teams reacting
to the mandates of each school site, causing elevated instructional challenges during this transition.
While some students took both exams, no student was actively prepared for both; thus, Blue Engine
has reported results for only the primary exam that teachers and BETAs “prepared” students to
take in each classroom, with no double counts. In the end, about half of Blue Engine classrooms
aligned instruction to the non-CC exam (57% of students), while the other half (43%) chose to align
with the CC exam. As expected, when compared to traditional exams, student performance was
somewhat lower on Common Core exams.
Non-Common Core Common Core
# test takers 242 180
% passing 81 55
% college ready10 29 1
Average score 71.6 61.6
However, students showed gains on the Common Core Algebra test relative to their predictions:11
average scores were 4.1 points higher and the pass rate was 20.8 percentage points higher. On
average, Blue Engine students were three times more likely to pass the Common Core algebra
exam than their peers districtwide. No students scored college ready.
Average Scores % Passing % College Ready
Without
BE
With
BE
Impact
ΔWithout
BE
With
BE
Impact
ΔWithout
BE
With
BE
Impact
Δ
57.5 61.6 4.1** 34.2 55.0 20.8** 2.1 0.0 -2.1**
**p<.01, demonstrating strong statistical significance
10 This threshold is somewhat meaningless in this context because research has yet to identify new “college ready” thresholds for new CC exams, and we don’t have city-wide (and peer horizon school) data on Common Core Regents to help benchmark our students’ scores. The latter will come by spring 2015.
11 In the absence of any historical data on the new Common Core aligned Regents Exams, Blue Engine was not able to model predicted performance until after the NYC Department of Education released district-wide student scores in February of 2015. In turn, counterfactuals were not used to set goals before the exam; they were used to measure student gains and compare the performance of Blue Engine students to that of students district-wide after the test was administered and data was made public.
INSTRUCTIONAL CHALLENGE
STUDENTPERFORMANCE
BLUE ENGINE | 7
LOOKING AHEAD
These instructional challenges are compounded by confusion surrounding the interpretation of
Common Core exam results in the following ways.
Predictive Models. For Year 4 (the first under Common Core in high schools) Glass Frog could
not immediately generate predicted scores for this subset of students because there existed no
population on which to base the predictions (i.e. there was not yet a sufficient “link” between
prior 8th grade test scores – a key driver of the predictions – and the “new” exams; it’s a two-part
process). In other words, there’d never been a Common Core Integrated Algebra Regents Exam
to “link” those scores to, which delayed our predictions.
Note: Once student-level citywide data for AY13-14 became available to our research team in spring
of 2015, we were able to run counterfactual models retroactively, arriving at baseline predictions
to compare to actual student performance. These counterfactual models provide the basis for the
data on gains presented in the Student Performance section on the previous page.
College Readiness. The second interpretive challenge involves state and city level debate around
establishing a “new bar” for what college ready really means in the Common Core era. It will take
5+ years for researchers to conclusively link Common Core Regents results to college persistence
and success rates because current students, by definition, haven’t graduated and enrolled in college
yet; they are rising 10th graders now. Recall: our current college ready benchmarks (e.g. 80+ and
75+ thresholds on “old” math and ELA exams, respectively) were developed by Harvard researchers
in 2010 based on longitudinal analyses linking college remediation and persistence rates in the
CUNY system with actual student Regents scores during high school in New York: that’s how the
college ready cut-points were discovered and publicized initially. Over time, the same analyses will
need to be completed for the “new” exams. In the short term, we will rely on the new “cutpoint”
scores for college ready levels currently being released by the NY State Board of Regents (e.g. 74
in Algebra, 79 in ELA, etc) and will set ambitious goals, accordingly.
In 2014-15, Common Core tests will be required (counting exclusively that exam) in Algebra; the
option for “old” vs. “new” will extend in Geometry, for one year only, and will be optional for ELA
for the next two years. For the present year, we’ve opted to report all of the information that is
available to us and plan to set aggressive growth goals over current school-level baselines in the
coming year, especially given what we expect to be low levels of baseline performance at our
current schools.
Blue Engine is confident in these outcomes and proud of the teams of educators, students, and
families that worked immeasurably hard to make academic progress in 2013-14. Our students made
impressive gains compared to students districtwide and at peer horizon schools.
We also remain committed to improving results and preparing greater numbers of students for the
rigors of higher education in the years to come. Growth in the percentage of students passing and
scoring college ready on Common Core exams is a central focus for our site-based teams going
forward. We are reevaluating existing programming, training, and support structures through the
lens of the new academic standards, and introducing professional development opportunities for
lead teachers and BETAs to equip them with instructional tools and skills suited to fostering critical
thinking and productive struggle in students. Our ability to meet challenges and play a leadership
role in the new Common Core era is a critical benchmark of organizational progress and partnership
strength in schools. This year brings new hurdles, but Blue Engine is stronger than ever.
INTERPRETIVE CHALLENGE
BLUE ENGINE | 8
APPENDIX A | BLUE ENGINE, PREDICTED VS. ACTUAL STUDENT PERFORMANCE, 2011-2014
Blue Engine: historical performance indicators on three categories of NYS Regents Examinations, 2012-2014.
Table 1. Actual scores and predicted scores among Blue Engine students, by year.
2011-2012 2012-2013 2013-2014
Actual Predicted Actual Predicted Actual Predicted
AVERAGE SCORES
Algebra 65.1 65.6 69.3 66.7 71.6 64.0
Geometry 69.4 67.9 66.4 62.6 64.8 59.2
ELA 73.1 74.6 74.3 70.8 66.6 66.7
PASS RATES
Algebra 66.8 69.3 76.0 68.3 81.4 59.1
Geometry 69.0 66.3 63.0 49.3 52.5 37.4
ELA 80.4 87.2 91.5 80.9 71.4 75.2
COLLEGE READY RATES
Algebra 8.7 10.6 19.9 12.6 29.8 9.4
Geometry 20.9 20.4 14.3 9.6 8.1 5.9
ELA 56.7 56.6 57.5 41.4 29.6 29.9
Table 2. Differences between actual scores and predicted scores among Blue Engine students, by year (positive
numbers correspondent to outperforming predicteds / negative numbers to underperforming).
2011-2012 2012-2013 2013-2014
Actual – Predicted Actual – Predicted Actual – Predicted
AVERAGE SCORES
Algebra -0.5 2.6 7.6
Geometry 1.5 3.8 5.6
ELA -1.5 3.5 -0.1
PASS RATES
Algebra -2.5 7.7 22.3
Geometry 2.7 13.7 15.1
ELA -6.8 10.6 -3.8
COLLEGE READY RATES
Algebra -1.9 7.3 20.4
Geometry 0.5 4.7 2.2
ELA 0.1 16.1 -0.3
Table 3. Differences between actual scores and predicted scores among Blue Engine students expressed in Standard
Deviation, by year.
2011-2012 2012-2013 2013-2014
Actual – Predicted Actual – Predicted Actual – Predicted
AVERAGE SCORES
Algebra -0.06 0.31 0.76
Geometry 0.15 0.38 0.51
ELA -0.14 0.34 -0.01
BLUE ENGINE | 9
APPENDIX B | A NOTE ON PREDICTIVE MODELING
Some predictions for the 2012-2013 cohort of test takers changed since the last time Blue Engine published results. This
is not because of changes in the number or composition of students but rather because of changes in the modeling
strategy. Last year, Glass Frog used an OLS model to predict students’ scores and then later computed the portion of
students predicted to pass and score college ready based on these predicted scores. The OLS models are well-suited
for predicting actual scores because they are designed to predict continuous outcomes (ranging, for example, from
0-100), but they are not as useful for predicting whether students will score above or below particular thresholds (like
80+). To predict whether students are above a certain threshold, it is preferable to use a probit or logit model, which
is what we now use to estimate binary outcomes (such as whether someone passed (yes) or failed (no)).
This represents a more precise method of modeling outcomes while maintaining the basic logic of which “coefficients”
to include. A logit model estimates the association between one or more independent variables (such as whether a
student has an IEP or is a female) and the dependent variable (e.g., whether the student passed the test). The model
estimates the change in the “logged odds” of passing, given that a student has an IEP or is a female. These logged odds
can then be converted into probabilities (which are easier to understand), enabling interpretive statements like: “being
a female is associated with a 25% increase in the probability of passing” or “having an IEP decreases the probability of
passing by 20%.” Then, based on each student’s overall characteristics, we can compute the overall probability that
they will pass the test: as an example, given that Emily is a female and she does not have an IEP and she scored a 650
on her eighth grade English test, her probability of passing the test is X%. Once we have probabilities for all students
in Emily’s class, we can then average them to compute the “mean probability of passing the test,” which is also the
metric we’re using to determine the predicted pass rate. In other words, if the mean probability of passing is 50%,
then our estimate is that the 50% of students will pass the exam. (If the mean probability of passing is 80%, then our
estimate is that 80% of students will pass the exam, and so on.)
For more information on results modeling and associated changes, please contact Rebecca Casciano, PhD, at
BLUE ENGINE | 10
APPENDIX C | TEST-TAKING POPULATION, 2013-14
GRADE-LEVEL BREAKDOWN FOR REGENTS TEST TAKERS BY SUBJECT AND SCHOOL*
9 10 11 12
Integrated Algebra
School 1 (non-CC) 94 2 0 0
School 2 (CC) 75 0 0 0
School 3 (non-CC) 73 25 12 3
School 4 (CC) 64 0 0 0
School 5 (non-CC) 21 11 1 0
School 5 (CC) 26 14 0 1
Geometry
School 1 4 98 25 3
School 3 4 33 3 1
School 5 15 29 6 0
ELA
School 1 0 98 0 0
*only includes non-truant students.
INTEGRATED ALGEBRA: STUDENT POPULATION VS. TEST-TAKING POPULATION
A B C D E
Total # students*
# dropped because truant**
# absent from test
# used in analysis
% non-truant students included in
analysis D/(A-B)
School 5 40 5 2 33 94.3
School 3 146 24 9 113 92.6
School 1 112 15 1 96 99.0
Total 298 44 12 242 95.3
*This includes all students who were preparing for the Regents exam this year. It excludes students in a two-year algebra class who are scheduled to take the Regents next year.
**Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics. The figures reported here also include a small number of kids who missed class consistently due to being tardy.
BLUE ENGINE | 11
APPENDIX C | TEST-TAKING POPULATION, 2013-14 CC ALGEBRA: STUDENT POPULATION VS. TEST-TAKING POPULATION
A B C D E
Total # students
# dropped because truant*
# absent from test
# used in analysis
% non-truant students included in
analysis D/(A-B)
School 5 43 1 1 41 97.6
School 2 81 0 6 75 92.6
School 1 71 0 7 64 90.1
Total 195 0 14 180 92.8
*Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics. The figures reported here also include a small number of kids who missed class consistently due to being tardy.
GEOMETRY: STUDENT POPULATION VS. TEST-TAKING POPULATION
A B C D E
Total # students
# dropped because truant*
# absent from test
# used in analysis
% non-truant students included in
analysis D/(A-B)
School 5 56 3 3 50 94.3
School 3 63 6 16 41 71.9
School 1 161 19 11 131 92.3
Total 280 28 30 222 88.1
*Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics. The figures reported here also include a small number of kids who missed class consistently due to being tardy.
ELA: STUDENT POPULATION VS. TEST-TAKING POPULATION
A B C D E
Total # students
# dropped because truant*
# absent from test
# used in analysis**
% non-truant students included in
analysis D/(A-B)
School 1 118 20 0 98 100
*Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics.
**Two students scored numerically higher on CC than non-CC, but that’s too small a group to report statistics for, so in order to include them in comparisons with predictions, we used their non-CC scores, since we have no basis on which to predict CC scores.
BLUE ENGINE | 12
APPENDIX D | WRITING RUBRIC
The majority of ELA support was provided this year for non-Regents exam students, including those enrolled in the 9th
grade (Renaissance, Bronx River, BLA2, and MHBX) as well as the 10th grade at MHBX and a pull-out “writing-center”
model for grades 9-12 at WHEELS.
To track progress for ELA students who did not end the year with a Regents exam, we used a 6-point writing rubric
that was administered over the course of the year. We administered a baseline assessment in September, quarterly
assessments throughout the year, and an end-of-year (EOY) assessment in June. This is our first year using the rubric
and we are still working with teachers and principals to determine the best way to benchmark student progress against
it; in the meantime, we are comparing how students demonstrated growth on the four writing traits we explicitly taught
this year (ideas, organization, sentence fluency, and conventions) versus the two writing traits we did not teach (voice
and word choice).
The table below shows baseline and EOY rubric scores among ELA students, by school.12 We show results separately
for the four traits we taught and the two traits we did not teach. As the table shows, on average, though students
performed similarly on both sets of traits at baseline, students showed more growth (47% - 30% = 17 percentage
points) on the taught traits than on the un-taught traits. This was true across all schools, suggesting that Blue Engine
continues to create focused opportunities to develop student writing in ways that align with college-ready standards.
Growth on both sets of standards was statistically significant from baseline to end of year.
N Baseline End of yearGrowth
(raw points)Growth
(%)Sig?a
ALL TRAITS EXCLUDING VOICE AND
WORD CHOICE
School 1 56 2.7 4.1 1.4 52.7 **
School 2 88 2.1 3.3 1.2 55.7 **
School 3 64 3.2 4.0 0.7 21.7 **
School 4 59 2.0 3.4 1.4 71.9 **
Total 267 2.5 3.6 1.2 47.2 **
ONLY VOICE AND WORD
CHOICE
School 1 56 2.9 3.9 1.1 37.0 **
School 2 88 2.3 3.2 1.0 42.2 **
School 3 64 3.2 3.0 -0.2 -7.1 +
School 4 59 2.1 3.4 1.3 63.1 **
Total 267 2.6 3.3 0.7 29.9 **
a ** = p<.01; * = p<.05; + = p<.10
12 For students missing final assessments, we used the most recent assessment as the “final” score.
BLUE ENGINE | 13
APPENDIX E | A NOTE ON THE MEANING OF SUBSTANTIVE VS. STATISTICAL “SIGNIFICANCE”
When considering Blue Engine’s impact on student performance, we account for (and present results showing) both
the substantive and statistical significance of our impact. Combined, these practices present a holistic account of both
the size of the impact and our confidence in the results. Presenting substantive effects alongside tests of statistical
significance is common practice in the social sciences.
To demonstrate the substantive significance of our impact, we examine the size of the difference between predicted
and actual scores or predicted and actual pass rates. We also consider the “statistical significance” of the difference
between actual and predicted scores. In the analyses presented in this document, we perform two-tailed, paired t-tests
on the difference between actual and predicted scores. The t-test is a significance test on the t-value: a ratio where
the numerator is the difference between actual and predicted scores and the denominator is a measure of how much
variability there is in the data. If the difference in the numerator is large and the variability in the denominator is low,
the result is a high t-value. If the difference is low and/or the variability is high, the t-value will be lower.
A test of statistical significance is a test of the probability of observing a difference between predicted and actual
scores in the data in the event that there is actually no difference between the predicted and actual scores. To perform
a test of statistical significance on a t-value, we begin by assuming that the difference between actual and predicted
scores (the numerator) is zero. This is the so-called “null hypothesis.” We then employ a decision rule that denotes the
threshold above which we can determine that the null hypothesis should be rejected (and below which we determine
that the null hypothesis should be accepted). The greater the threshold, the more difficult it becomes to reject the null
hypothesis. A commonly used threshold in the social sciences is 2.0, which corresponds to a significance level of 5%
(or p<.05), suggesting that the observed difference between predicted and actual scores would only occur 5% of the
time when the null hypothesis is indeed correct.
As an example, among Blue Engine students, the average algebra score was 71.6 and the predicted score was 66.9.
The numerator of the t-value is thus 4.7 (71.6 – 66.9). The denominator (which is the measure of variability) is .68. The
resulting t-value is therefore 6.9 (4.7/.68). Since 6.9 is greater than 2.0, we can be pretty confident that this observed
difference between actual and predicted scores cannot be attributed to chance.
In the analysis for this report, we tested the t-values at different thresholds. The table below shows the symbol we use
to denote the level at which a finding is significant
If this difference is attributed to chance (i.e. no impact) we would only expect to observe this difference…
Symbol
10% of the time (p<.10) +
5% of the time (p<.05) *
1% of the time (p<.01) **