BLUE ENGINE PROGRAM REPORT

BLUE ENGINE PROGRAM REPORT

2013 - 2014 Student Achievement - Year 4

BLUE ENGINE | 1

1 See Appendix A for disaggregated results over the past three years and Appendix B for an explanation of predictive models2 Results in ELA were mixed, with students performing roughly as predicted by counterfactual models. See page 5 for a more in-depth discussion of performance on the ELA Regents exams.

EXECUTIVE SUMMARYThis report provides an overview of results on measures of student performance as measured by New York State

Regents Examinations and analytical writing growth for Year 4 (AY 2013-2014) across three subjects, five school sites,

and 944 students. These results provide evidence of impact and rank among our strongest to date.

Across Algebra and Geometry, we see clear, convincing growth in performance gains, evidence that Blue Engine has

become more effective at accelerating student achievement over the past three years. The majority of gains, over

average predicted levels, are the highest to date.1

In English Language Arts, student results show evidence of growth in the number of students performing at

college ready levels compared to last year’s cohort of students at BLA2 , the only school where students took the

exam this year. 2

To assess impact, Blue Engine employs a rigorous modeling standard of comparing actual performance on end of

year exams “with Blue Engine” to counterfactual performance, i.e. how students would have performed “without Blue

Engine.” We aim to increase the magnitude of actual performance over predicted performance on state standardized

tests across three key measures: (1) average scores, (2) passing rates, and (3) college ready rates as part of a mission to

prepare dramatically greater numbers of students for college-level work. Over time, our results will shift from “single-

year snapshots” to cumulative college readiness as measured by the percent of graduating seniors who are prepared

for college-level work.

Table 1. 2013-14 Actual vs. Predicted Scores among Blue Engine students, including results from paired t-tests (statistical significance) on the differences between actual and predicted scores/rates. (Note: “CC Algebra” refers to Common Core Algebra.)

Average Scores % Passing % College Ready

Without

BE

With

BE

ImpactΔ Without

BE

With

BE

ImpactΔ Without

BE

With

BE

ImpactΔ

Algebra 64.0 71.6 7.6** 59.1 81.4 22.3** 9.4 29.8 20.4**

CC Algebra 57.5 61.6 4.1** 34.2 55.0 20.8** 2.1 0.0 -2.1**

Geometry 59.2 64.8 5.6** 37.4 52.5 15.1** 5.9 8.1 2.2

ELA 66.7 66.6 -0.1 75.2 71.4 -3.8 29.9 29.6 -0.3

**p<.01, demonstrating strong statistical significance; see Appendix E for a discussion of statistical significance based on paired t-tests

At the same time, new academic challenges have emerged. The central challenge remains moving students above the

college ready threshold in addition to helping improve performance overall. While gains – measured as average gains

(i.e. shifting the bell curve) – were our strongest to date, we must focus on pushing increasingly significant numbers

of students to meet college readiness benchmarks. Additionally, the shift to teaching Common Core standards causes

significant instructional challenges, with the added complication of school-by-school variation in which standards

(traditional or Common Core) were prioritized this year. While these transitions continue to challenge educators and

students, Blue Engine is committed to taking an elevated leadership role in defining and achieving measurable success

in the Common Core era.

BLUE ENGINE | 2

3 Readers may notice that some Year Three (2012-2013) predictions changed from last year to this year in order to enable and implement the same models to compare results across multiple years. Please refer to Appendix B for more detail.

INTRODUCTIONBlue Engine partners with public high schools serving low-income communities to increase

academic rigor and prepare greater numbers students for postsecondary success. We recruit,

train, and support Blue Engine Teaching Assistants (BETAs), recent college graduates who work

alongside teachers in daily math and literacy classes, dramatically reducing instructor-to-student

ratios from 1:30 to 1:6. By rethinking how human capital is used in schools, Blue Engine helps elevate

levels of college readiness in low-income communities while training a new wave of educators to

enter the teaching profession. Founded in 2009, Blue Engine is backed by leading performance-

based social impact funders including Robin Hood Foundation, Michael and Susan Dell Foundation,

New Schools Venture Fund, Tiger Foundation, Heckscher Foundation for Children, Edna McConnell

Clark Foundation’s PropelNext Initiative, Echoing Green, and Draper Richards Kaplan Foundation.

Blue Engine’s mission is marked by the pursuit of a clear and measurable long-term outcome:

dramatic increases in the number of graduating seniors at partner high schools who complete

one year of postsecondary education without the need for remedial coursework and who enroll

in a second year of postsecondary education. A highly data-driven organization, Blue Engine

is developing a multidimensional approach to defining college readiness, measuring student

performance during the high school career in terms of: standards mastery (state exams, writing

rubric); enrollment in advanced coursework; and college-access activity (SAT/ACT preparation,

FAFSA, and application supports). Each of these progress indicators is aligned to postsecondary

degree completion.

With cohorts in the 9th and 10th grade across five district schools, Blue Engine measured impact

in the 2013-14 school year using scores on the New York State Regents Examinations in Integrated

Algebra, Geometry, and English Language Arts. Blue Engine aims to increase the average score

and the number of students meeting both the passing and college ready benchmarks.

Blue Engine partners with Glass Frog Solutions (Rebecca Casciano, PhD) to compare actual

performance on end of year exams “with Blue Engine” to counterfactual performance, i.e. how

students would have performed in our absence. To avoid comparing our students to an external

comparison group that might be dissimilar in composition for any number of reasons, Glass Frog

uses district-wide data to develop models that incorporate historical academic and demographic

student data to predict performance “without Blue Engine.”

Why predict? Counterfactual forecasts provide Blue Engine with a baseline that serves a dual

purpose: (1) to enable Blue Engine to set goals for student performance that are rigorous yet

realistic because they are rooted in knowledge of how similar students have performed historically

and (2) to evaluate the program’s impact on performance with statistical evidence of improvement

in scores due to the program. The method enables Blue Engine to set goals and measure impact

by looking at average scores, distribution curves, and the portion of students hitting passing and

college ready benchmarks.3 In the 11th and 12th grades, as our program expands, this will allow

for a more limited, targeted approach to student support based on maximizing the numbers of

college ready graduates.

ABOUT BLUE ENGINE

MEASURING IMPACT

BLUE ENGINE | 3

4 These 242 total students represent 57% of our total IA exam takers this year; the other 43% (discussed on page 6) took the “new” Common Core exam. This number also excludes 44 students labeled “truant” who were chronically absent, i.e. 50% of the year or more; this number also excludes an additional 12 students who were absent the day of the test (no change in method from last year, enabling comparisons). Appendix C provides a breakdown of students’ year in school for each subject and school, as well as a detailed accounting of students in our program versus students who took the exam.

** Strong statistical significance based on paired t-tests. Refer to Appendix E.5 Peer schools are those New York City public high schools with the most similar student populations based on the following characteristics: average 8th grade ELA proficiency, average 8th grade math proficiency, percent students with disabilities, percent students with self-contained placements, and percent over-age students.

INTEGRATED ALGEBRA (IA)Student performance in non-Common Core Algebra demonstrated gains on an average, passing, and college ready

basis, as well as clear evidence that Blue Engine has become more effective at accelerating student achievement over

the past three years. For Integrated Algebra Regents Examinations, raw scores are “scaled” to fall between 0-100, with

passing set at 65+ and college ready at 80+. (See page 6 for performance on Common Core Algebra.)

242 total students took the exam this year.4

The passing rate increased by 22.3 percentage points**,

compared to last year’s 7 point gain over predicted and

a 2 point loss two years ago.

The college ready rate increased by 20.4 percentage

points**, compared to last year’s gain of 7 points over

predicted and a 2 point loss two years ago.

On average, students were 3.7 times more likely to pass

the Integrated Algebra exam and 7.6 times more likely

to score college ready than their peers districtwide, after

controlling for prior scores, students’ unique learning

needs (IEP and ELL status), and school characteristics.

The average score on the IA exam this year was

71.6 relative to predicted scores of 64, a gain of 7.6

scale points** over predicted levels. Overall growth

– the rightward shift in the distribution curve – was

substantial this year, equaling .76 standard deviations,

a 162% increase over last year’s growth of .29 standard

deviations and significantly higher than two years ago

when gains were approximately equal to predictions.

Average Score

% Pass % College Ready

Blue Engine Schools 71.6 81.4 29.8

District Peer Schools5 62.2 55.9 5.9

College Readiness & Pass Rates on NYS Regents Exams Score Distribution on NYS Regents Exams

BLUE ENGINE | 4

6 These 222 total students represent 79% of total Geometry students this year; the other 21% (58 students) are excluded for two reasons. 28 are labeled “truant” because of chronic absence issues defined by missing 50% of the year or more; the remaining 30 students were absent the day of the test (no change in method from last year)

** Strong statistical significance based on paired t-tests. Refer to Appendix E.7 Note: though it appears in the graphic / distribution-curves that the actual college ready rate is higher than the predicted rate, this is an artifact of the model used to predict student scores. In other words, the 8% actual vs. 6% predicted figures are accurate. Why the difference between numbers and graphics? Specifically, we use OLS regression models to predict students’ scores on the exam – the correct model to use when predicting a continuous outcome like test scores; when predicting movement above a fixed threshold (like 80+) we use a different model (logistic regression) designed to predict binary outcomes like passing vs. not passing.

GEOMETRYStudent performance in Geometry demonstrated gains on an average and passing basis, as well as evidence that

Blue Engine has become more effective at accelerating student achievement over the past three years. For Geometry

Regents Examinations, raw scores are “scaled” to fall between 0-100, with passing set at 65+ and college ready at 80+.

222 total students took the exam.6

The passing rate increased by 15.1 percentage points**,

compared to a 13.7 point gain over predicted last year and

3 point gain two years ago. On average, students were

more than twice as likely to pass the geometry exam as

their peers districtwide, after controlling for the variables

in the model.

The college ready rate increased by 2.2 percentage

points, compared to a 3 point gain over predicted last

year and 1 point gain the year prior. We must continue

to focus on finding ways of getting greater shares of

students to college ready levels (80+) in Geometry, and

have a team dedicated to understanding the issue in

greater detail and making recommendations.

The average score on the Geometry exam was 64.8

this year relative to predicted scores of 59.2, a gain

of 5.6 percentage points* over predicted levels.

This rightward shift on the distribution curve is, like

Algebra, gaining strength over time, equaling a .51

standard deviation gain over predicted scores relative

to last year’s increase of .30 standard deviations over

predicted and a .15 standard deviation gain posted two

years ago at WHEELS.7

Average Score

% Pass % College Ready

Blue Engine Schools 64.8 52.5 8.1

District Peer Schools 59.9 39.3 5.8

College Readiness & Pass Rates on NYS Regents Exams Score Distribution on NYS Regents Exams

BLUE ENGINE | 5

8 Please refer to Appendix D for context on how Blue Engine tracked progress for students who did not take the state exam.9 These 98 total students represent 83% of total eligible ELA students this year; the other 17% (20 students) are excluded for one reason: they are labeled “truant” because of chronic absence issues defined by missing 50% of the year or more; zero students were absent the day of the test (no change in method from last year).

ENGLISH LANGUAGE ARTS (ELA)

Students at BLA2 (n=98) were the only Blue Engine students who took the ELA exam this year. That is, only one teacher

– former BETA, current Teach For America corps member Sabine Chisty – prepared students (10th graders) for and

administered the ELA Regents Exam.8 For English Language Arts, raw scores are “scaled” to fall between 0-100, with

passing set at 65+ and college ready at 75+.

Regents scores this year were significantly stronger than last year at BLA2, especially at the higher “college ready”

threshold; 30% of students at BLA2 scored college ready this year, compared to 16% last year at BLA2 – a rate increase

of 14 percentage points.

98 total students took the exam.9

On a predicted vs. actual basis (Table 2), students

performed as predicted on the college ready rate (0.3

percentage points lower than predicted) and in terms

of average scores (0.1 percentage points lower than

predicted). The passing rate was slightly (3.8 percentage

points) lower than predicted.

With predicted achievement nearly identical to actual

achievement levels, the data suggests that Blue Engine

helped BLA2 students “catch up” to their predictions,

when previous cohorts had underperformed. When

compared to 10th graders at peer horizon schools and

districtwide, BLA2 10th grade students performed

better than their peers, who underperformed more

dramatically relative to their predictions, on average.

College Readiness & Pass Rates on NYS Regents Exams,2012-13 Cohort vs. 2013-14 Cohort at BLA2

Score Distribution on NYS Regents Exams, Predicted vs. Actual Performance


Predicted Actual ImpactΔ Predicted Actual ImpactΔ Predicted Actual ImpactΔ

Blue Engine (10th) 66.7 66.6 -0.1 75.2 71.4 -3.8 29.9 29.6 -0.3

Peer Horizon (10th) 69.8 67.5 -2.3 79.3 71.6 -7.7 41.3 36.5 -4.8

District (10th) 73.9 72.1 -1.8 85.0 79.5 -5.5 54.2 50.6 -3.6

Table 2. 2013-14 Actual vs. Predicted Scores

BLUE ENGINE | 6

COMMON CORE ALGEBRACommon Core standards require new methods of instruction – productive struggle, greater critical

thinking, and emphasis on compound word problems – that represent a significant departure from

traditionally taught standards and pose challenges to teachers and students alike. For the first

time this year (AY 2013-14), schools faced a decision: would teachers align instruction with either

the new Common Core Integrated Algebra Exam or stick with the traditional one?

From a student and school accountability perspective, state officials allowed principals to count

whichever score was highest for the purposes of high school graduation this year; ultimately, the

decision on which curriculum to deploy rested on (a) each school’s comfort level teaching new vs.

old material and (b) whether schools believed students would be better served by accelerating vs.

delaying Common Core Exam results for their student population. Blue Engine does not control

school-level decisions regarding curriculum and standards adoption, which left our teams reacting

to the mandates of each school site, causing elevated instructional challenges during this transition.

While some students took both exams, no student was actively prepared for both; thus, Blue Engine

has reported results for only the primary exam that teachers and BETAs “prepared” students to

take in each classroom, with no double counts. In the end, about half of Blue Engine classrooms

aligned instruction to the non-CC exam (57% of students), while the other half (43%) chose to align

with the CC exam. As expected, when compared to traditional exams, student performance was

somewhat lower on Common Core exams.

Non-Common Core Common Core

# test takers 242 180

% passing 81 55

% college ready10 29 1

Average score 71.6 61.6

However, students showed gains on the Common Core Algebra test relative to their predictions:11

average scores were 4.1 points higher and the pass rate was 20.8 percentage points higher. On

average, Blue Engine students were three times more likely to pass the Common Core algebra

exam than their peers districtwide. No students scored college ready.


Without

BE

With

BE

Impact

ΔWithout

BE

With

BE

Impact

ΔWithout

BE

With

BE

Impact

Δ

57.5 61.6 4.1** 34.2 55.0 20.8** 2.1 0.0 -2.1**

**p<.01, demonstrating strong statistical significance

10 This threshold is somewhat meaningless in this context because research has yet to identify new “college ready” thresholds for new CC exams, and we don’t have city-wide (and peer horizon school) data on Common Core Regents to help benchmark our students’ scores. The latter will come by spring 2015.

11 In the absence of any historical data on the new Common Core aligned Regents Exams, Blue Engine was not able to model predicted performance until after the NYC Department of Education released district-wide student scores in February of 2015. In turn, counterfactuals were not used to set goals before the exam; they were used to measure student gains and compare the performance of Blue Engine students to that of students district-wide after the test was administered and data was made public.

INSTRUCTIONAL CHALLENGE

STUDENTPERFORMANCE

BLUE ENGINE | 7

LOOKING AHEAD

These instructional challenges are compounded by confusion surrounding the interpretation of

Common Core exam results in the following ways.

Predictive Models. For Year 4 (the first under Common Core in high schools) Glass Frog could

not immediately generate predicted scores for this subset of students because there existed no

population on which to base the predictions (i.e. there was not yet a sufficient “link” between

prior 8th grade test scores – a key driver of the predictions – and the “new” exams; it’s a two-part

process). In other words, there’d never been a Common Core Integrated Algebra Regents Exam

to “link” those scores to, which delayed our predictions.

Note: Once student-level citywide data for AY13-14 became available to our research team in spring

of 2015, we were able to run counterfactual models retroactively, arriving at baseline predictions

to compare to actual student performance. These counterfactual models provide the basis for the

data on gains presented in the Student Performance section on the previous page.

College Readiness. The second interpretive challenge involves state and city level debate around

establishing a “new bar” for what college ready really means in the Common Core era. It will take

5+ years for researchers to conclusively link Common Core Regents results to college persistence

and success rates because current students, by definition, haven’t graduated and enrolled in college

yet; they are rising 10th graders now. Recall: our current college ready benchmarks (e.g. 80+ and

75+ thresholds on “old” math and ELA exams, respectively) were developed by Harvard researchers

in 2010 based on longitudinal analyses linking college remediation and persistence rates in the

CUNY system with actual student Regents scores during high school in New York: that’s how the

college ready cut-points were discovered and publicized initially. Over time, the same analyses will

need to be completed for the “new” exams. In the short term, we will rely on the new “cutpoint”

scores for college ready levels currently being released by the NY State Board of Regents (e.g. 74

in Algebra, 79 in ELA, etc) and will set ambitious goals, accordingly.

In 2014-15, Common Core tests will be required (counting exclusively that exam) in Algebra; the

option for “old” vs. “new” will extend in Geometry, for one year only, and will be optional for ELA

for the next two years. For the present year, we’ve opted to report all of the information that is

available to us and plan to set aggressive growth goals over current school-level baselines in the

coming year, especially given what we expect to be low levels of baseline performance at our

current schools.

Blue Engine is confident in these outcomes and proud of the teams of educators, students, and

families that worked immeasurably hard to make academic progress in 2013-14. Our students made

impressive gains compared to students districtwide and at peer horizon schools.

We also remain committed to improving results and preparing greater numbers of students for the

rigors of higher education in the years to come. Growth in the percentage of students passing and

scoring college ready on Common Core exams is a central focus for our site-based teams going

forward. We are reevaluating existing programming, training, and support structures through the

lens of the new academic standards, and introducing professional development opportunities for

lead teachers and BETAs to equip them with instructional tools and skills suited to fostering critical

thinking and productive struggle in students. Our ability to meet challenges and play a leadership

role in the new Common Core era is a critical benchmark of organizational progress and partnership

strength in schools. This year brings new hurdles, but Blue Engine is stronger than ever.

INTERPRETIVE CHALLENGE

BLUE ENGINE | 8

APPENDIX A | BLUE ENGINE, PREDICTED VS. ACTUAL STUDENT PERFORMANCE, 2011-2014

Blue Engine: historical performance indicators on three categories of NYS Regents Examinations, 2012-2014.

Table 1. Actual scores and predicted scores among Blue Engine students, by year.

2011-2012 2012-2013 2013-2014

Actual Predicted Actual Predicted Actual Predicted

AVERAGE SCORES

Algebra 65.1 65.6 69.3 66.7 71.6 64.0

Geometry 69.4 67.9 66.4 62.6 64.8 59.2

ELA 73.1 74.6 74.3 70.8 66.6 66.7

PASS RATES

Algebra 66.8 69.3 76.0 68.3 81.4 59.1

Geometry 69.0 66.3 63.0 49.3 52.5 37.4

ELA 80.4 87.2 91.5 80.9 71.4 75.2

COLLEGE READY RATES

Algebra 8.7 10.6 19.9 12.6 29.8 9.4

Geometry 20.9 20.4 14.3 9.6 8.1 5.9

ELA 56.7 56.6 57.5 41.4 29.6 29.9

Table 2. Differences between actual scores and predicted scores among Blue Engine students, by year (positive

numbers correspondent to outperforming predicteds / negative numbers to underperforming).

2011-2012 2012-2013 2013-2014

Actual – Predicted Actual – Predicted Actual – Predicted

AVERAGE SCORES

Algebra -0.5 2.6 7.6

Geometry 1.5 3.8 5.6

ELA -1.5 3.5 -0.1

PASS RATES

Algebra -2.5 7.7 22.3

Geometry 2.7 13.7 15.1

ELA -6.8 10.6 -3.8

COLLEGE READY RATES

Algebra -1.9 7.3 20.4

Geometry 0.5 4.7 2.2

ELA 0.1 16.1 -0.3

Table 3. Differences between actual scores and predicted scores among Blue Engine students expressed in Standard

Deviation, by year.

2011-2012 2012-2013 2013-2014

Actual – Predicted Actual – Predicted Actual – Predicted

AVERAGE SCORES

Algebra -0.06 0.31 0.76

Geometry 0.15 0.38 0.51

ELA -0.14 0.34 -0.01

BLUE ENGINE | 9

APPENDIX B | A NOTE ON PREDICTIVE MODELING

Some predictions for the 2012-2013 cohort of test takers changed since the last time Blue Engine published results. This

is not because of changes in the number or composition of students but rather because of changes in the modeling

strategy. Last year, Glass Frog used an OLS model to predict students’ scores and then later computed the portion of

students predicted to pass and score college ready based on these predicted scores. The OLS models are well-suited

for predicting actual scores because they are designed to predict continuous outcomes (ranging, for example, from

0-100), but they are not as useful for predicting whether students will score above or below particular thresholds (like

80+). To predict whether students are above a certain threshold, it is preferable to use a probit or logit model, which

is what we now use to estimate binary outcomes (such as whether someone passed (yes) or failed (no)).

This represents a more precise method of modeling outcomes while maintaining the basic logic of which “coefficients”

to include. A logit model estimates the association between one or more independent variables (such as whether a

student has an IEP or is a female) and the dependent variable (e.g., whether the student passed the test). The model

estimates the change in the “logged odds” of passing, given that a student has an IEP or is a female. These logged odds

can then be converted into probabilities (which are easier to understand), enabling interpretive statements like: “being

a female is associated with a 25% increase in the probability of passing” or “having an IEP decreases the probability of

passing by 20%.” Then, based on each student’s overall characteristics, we can compute the overall probability that

they will pass the test: as an example, given that Emily is a female and she does not have an IEP and she scored a 650

on her eighth grade English test, her probability of passing the test is X%. Once we have probabilities for all students

in Emily’s class, we can then average them to compute the “mean probability of passing the test,” which is also the

metric we’re using to determine the predicted pass rate. In other words, if the mean probability of passing is 50%,

then our estimate is that the 50% of students will pass the exam. (If the mean probability of passing is 80%, then our

estimate is that 80% of students will pass the exam, and so on.)

For more information on results modeling and associated changes, please contact Rebecca Casciano, PhD, at

[email protected].

BLUE ENGINE | 10

APPENDIX C | TEST-TAKING POPULATION, 2013-14

GRADE-LEVEL BREAKDOWN FOR REGENTS TEST TAKERS BY SUBJECT AND SCHOOL*

9 10 11 12

Integrated Algebra

School 1 (non-CC) 94 2 0 0

School 2 (CC) 75 0 0 0

School 3 (non-CC) 73 25 12 3

School 4 (CC) 64 0 0 0

School 5 (non-CC) 21 11 1 0

School 5 (CC) 26 14 0 1

Geometry

School 1 4 98 25 3

School 3 4 33 3 1

School 5 15 29 6 0

ELA

School 1 0 98 0 0

*only includes non-truant students.

INTEGRATED ALGEBRA: STUDENT POPULATION VS. TEST-TAKING POPULATION

A B C D E

Total # students*

# dropped because truant**

# absent from test

# used in analysis

% non-truant students included in

analysis D/(A-B)

School 5 40 5 2 33 94.3

School 3 146 24 9 113 92.6

School 1 112 15 1 96 99.0

Total 298 44 12 242 95.3

*This includes all students who were preparing for the Regents exam this year. It excludes students in a two-year algebra class who are scheduled to take the Regents next year.

**Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics. The figures reported here also include a small number of kids who missed class consistently due to being tardy.

BLUE ENGINE | 11

APPENDIX C | TEST-TAKING POPULATION, 2013-14 CC ALGEBRA: STUDENT POPULATION VS. TEST-TAKING POPULATION

A B C D E

Total # students

# dropped because truant*

# absent from test

# used in analysis


analysis D/(A-B)

School 5 43 1 1 41 97.6

School 2 81 0 6 75 92.6

School 1 71 0 7 64 90.1

Total 195 0 14 180 92.8

*Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics. The figures reported here also include a small number of kids who missed class consistently due to being tardy.

GEOMETRY: STUDENT POPULATION VS. TEST-TAKING POPULATION

A B C D E

Total # students


# absent from test

# used in analysis


analysis D/(A-B)

School 5 56 3 3 50 94.3

School 3 63 6 16 41 71.9

School 1 161 19 11 131 92.3

Total 280 28 30 222 88.1

*Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics. The figures reported here also include a small number of kids who missed class consistently due to being tardy.

ELA: STUDENT POPULATION VS. TEST-TAKING POPULATION

A B C D E

Total # students


# absent from test

# used in analysis**


analysis D/(A-B)

School 1 118 20 0 98 100

*Truant students are defined as those with an attendance rate lower than 50%. Because they are not receiving the Blue Engine “treatment,” they are subtracted from the denominator for purposes of computing performance metrics.

**Two students scored numerically higher on CC than non-CC, but that’s too small a group to report statistics for, so in order to include them in comparisons with predictions, we used their non-CC scores, since we have no basis on which to predict CC scores.

BLUE ENGINE | 12

APPENDIX D | WRITING RUBRIC

The majority of ELA support was provided this year for non-Regents exam students, including those enrolled in the 9th

grade (Renaissance, Bronx River, BLA2, and MHBX) as well as the 10th grade at MHBX and a pull-out “writing-center”

model for grades 9-12 at WHEELS.

To track progress for ELA students who did not end the year with a Regents exam, we used a 6-point writing rubric

that was administered over the course of the year. We administered a baseline assessment in September, quarterly

assessments throughout the year, and an end-of-year (EOY) assessment in June. This is our first year using the rubric

and we are still working with teachers and principals to determine the best way to benchmark student progress against

it; in the meantime, we are comparing how students demonstrated growth on the four writing traits we explicitly taught

this year (ideas, organization, sentence fluency, and conventions) versus the two writing traits we did not teach (voice

and word choice).

The table below shows baseline and EOY rubric scores among ELA students, by school.12 We show results separately

for the four traits we taught and the two traits we did not teach. As the table shows, on average, though students

performed similarly on both sets of traits at baseline, students showed more growth (47% - 30% = 17 percentage

points) on the taught traits than on the un-taught traits. This was true across all schools, suggesting that Blue Engine

continues to create focused opportunities to develop student writing in ways that align with college-ready standards.

Growth on both sets of standards was statistically significant from baseline to end of year.

N Baseline End of yearGrowth

(raw points)Growth

(%)Sig?a

ALL TRAITS EXCLUDING VOICE AND

WORD CHOICE

School 1 56 2.7 4.1 1.4 52.7 **

School 2 88 2.1 3.3 1.2 55.7 **

School 3 64 3.2 4.0 0.7 21.7 **

School 4 59 2.0 3.4 1.4 71.9 **

Total 267 2.5 3.6 1.2 47.2 **

ONLY VOICE AND WORD

CHOICE

School 1 56 2.9 3.9 1.1 37.0 **

School 2 88 2.3 3.2 1.0 42.2 **

School 3 64 3.2 3.0 -0.2 -7.1 +

School 4 59 2.1 3.4 1.3 63.1 **

Total 267 2.6 3.3 0.7 29.9 **

a ** = p<.01; * = p<.05; + = p<.10

12 For students missing final assessments, we used the most recent assessment as the “final” score.

BLUE ENGINE | 13

APPENDIX E | A NOTE ON THE MEANING OF SUBSTANTIVE VS. STATISTICAL “SIGNIFICANCE”

When considering Blue Engine’s impact on student performance, we account for (and present results showing) both

the substantive and statistical significance of our impact. Combined, these practices present a holistic account of both

the size of the impact and our confidence in the results. Presenting substantive effects alongside tests of statistical

significance is common practice in the social sciences.

To demonstrate the substantive significance of our impact, we examine the size of the difference between predicted

and actual scores or predicted and actual pass rates. We also consider the “statistical significance” of the difference

between actual and predicted scores. In the analyses presented in this document, we perform two-tailed, paired t-tests

on the difference between actual and predicted scores. The t-test is a significance test on the t-value: a ratio where

the numerator is the difference between actual and predicted scores and the denominator is a measure of how much

variability there is in the data. If the difference in the numerator is large and the variability in the denominator is low,

the result is a high t-value. If the difference is low and/or the variability is high, the t-value will be lower.

A test of statistical significance is a test of the probability of observing a difference between predicted and actual

scores in the data in the event that there is actually no difference between the predicted and actual scores. To perform

a test of statistical significance on a t-value, we begin by assuming that the difference between actual and predicted

scores (the numerator) is zero. This is the so-called “null hypothesis.” We then employ a decision rule that denotes the

threshold above which we can determine that the null hypothesis should be rejected (and below which we determine

that the null hypothesis should be accepted). The greater the threshold, the more difficult it becomes to reject the null

hypothesis. A commonly used threshold in the social sciences is 2.0, which corresponds to a significance level of 5%

(or p<.05), suggesting that the observed difference between predicted and actual scores would only occur 5% of the

time when the null hypothesis is indeed correct.

As an example, among Blue Engine students, the average algebra score was 71.6 and the predicted score was 66.9.

The numerator of the t-value is thus 4.7 (71.6 – 66.9). The denominator (which is the measure of variability) is .68. The

resulting t-value is therefore 6.9 (4.7/.68). Since 6.9 is greater than 2.0, we can be pretty confident that this observed

difference between actual and predicted scores cannot be attributed to chance.

In the analysis for this report, we tested the t-values at different thresholds. The table below shows the symbol we use

to denote the level at which a finding is significant

If this difference is attributed to chance (i.e. no impact) we would only expect to observe this difference…

Symbol

10% of the time (p<.10) +

5% of the time (p<.05) *

1% of the time (p<.01) **

Documents

BLUE ENGINE PROGRAM REPORT