Value-Added Tests: Buyer, Be Aware - VAMboozledvamboozled.com/wp-content/uploads/2013/10/2009BuyerBe... · 2013. 10. 31. · Student Risk Factors The EVAAS model does not control

Audrey Amrein-Beardsley

Who doesn’t laughwhen a drugcommercial presentsa clip of a young,otherwise happy and

healthy person laughing and flying akite at the park and then dramaticallyexposes some ailment—only to fix it byunveiling a prescription drug, alongwith its potential side effects? What islaughable is that the side effects oftenseem worse than the problem itself.

I may have found more humor inthese commercials than others havebecause I grew up in a family opposedto even over-the-counter drugs. Drugswere simply not a part of my family’sholistic approach to healthy living—until last year, when I discovered I had aheart condition. I was prescribed a drugcocktail consisting of six medications,three of which carry serious side effects.No longer was I laughing at the poorsouls portrayed in those drug commer-

cials. Thanks to the Food and DrugAdministration (FDA), I quickly becamean educated consumer.

The FDA is the oldest and mostrespected protector of wellness in theUnited States. It exists to guarantee thatno harm is done to consumers of foodsand drugs. Specifically, it ensures thatthe benefits of the foods and drugs itapproves outweigh the risks they poseand that their benefits and risks, oncescientifically documented, are fullydisclosed to the public to enable con-sumers to make wise health decisions.

Might the FDA approach also serve asa model to protect the intellectual healthof the United States? Might this be amodel that legislators and educationleaders follow when they pass legisla-tion or policies whose benefits and risksare unknown? Don’t students, teachers,an d administrators in U.S. publicschools deserve similar protection? Inlight of these questions, let’s look at onesuggested “cure” for what’s ailing ourschools.

Take the Model—and Call Me in the MorningCurrently, NCLB mandates that all U.S.states measure student learning usingstandardized achievement tests. This isnot likely to change. NCLB also requiresstates to report on school progress usingadequate yearly progress (AYP) meas-ures. But because of AYP’s shortcomings,some states now receive funds to inte-grate value-added assessment modelsinto their accountability procedures,largely to help states comply with theaccountability provisions written intoNCLB.

Value-added models assess teachers,schools, and districts on the value theyadd to student learning, from themoment students enter the classroom tothe time they leave. In theory, thismakes more sense than just capturingwhere students are academically at theend of each school year.

But it is far from certain that value-added models work in the ways theo-rized. There is a risk associated with

38 E D U C AT I O N A L L E A D E R S H I P / N O V E M B E R 2 0 0 9

Value-Added Tests: Buyer, Be Aware

The value-added assessment model is one over-the-counter product that may be

detrimental to your health.

Beardsley pp38-42_2.qxp:EL Template 10/2/09 9:06 AM Page 38

blindly adopting the value-addedmodels—they may be detrimental toconsumer health. Just as the FDA regu-lates foods and drugs, we need a FederalEducation Agency to provide thescience-based, accurate information thateducators need to be informedconsumers. Such an agency might warnconsumers about the benefits and risks

of the value-added assessment modelscurrently “prescribed.” To protect thepublic good, such an organization mightexamine whether the most popular,widely adopted, sophisticated, andexpensive “over-the-counter” value-added model—the Education Value-Added Assessment System (EVAAS)developed by William L. Sanders—

really measures up. The model has three limitations.

Limitation 1: A Reliance on Standardized TestsThe EVAAS model relies on standard-ized tests to measure levels of change instudent learning. It’s unclear whetherstandardized tests can accuratelymeasure what students know and areable to do at one point in time, let aloneover time to measure “knowledgeadded.” The effect of districts, schools,and teachers on student learning is alsounclear; we need to consider whether

A S C D / W W W. A S C D . O R G 39

Might the FDA approach also serve as a model to protect the intellectualhealth of the United States?

© JAMES YANG


it’s possible to attribute gains or losses instudent achievement solely to thequality of instruction, independent ofother school and life factors. This is thefundamental assumption on which thesystem relies, which, if we were to seri-ously consider it, might cause a modelrecall.

Test Data IrregularitiesThe EVAAS model requires completeand high-quality longitudinal test datathat most states currently do not have.Students sometimes misstests. Student test score dataare often not linked toteacher names. Students areoften misreported by classand grade level. And some-times data show that studentsjump from the top to thebottom of the class, or viceversa, from one year to thenext, which is nearly impos-sible in actuality.

Data errors like these,which are often caused bystudent mobility, missing testscores, data-processingerrors, or students incorrectlybubbling in their scoresheets, affect thousands ofstudent records. Model devel-opers claim these things don’tmatter, that the system can operateregardless.

Student Risk FactorsThe EVAAS model does not control forstudent risk factors, making it the onlysophisticated assessment model thatdoes not account for such things asfamily income, ethnicity, and otherstudent background variables.

Developers of the model state that theeffects of these factors on studentgrowth are negligible. Yet educatorsknow too well that student backgroundvariables unquestionably affect student

achievement and the progress studentsmake from year to year. How could theachievement gap continue to widen ifthese factors play no role?

Class SizeStatistical errors in test resultsfrequently occur when fewer studentsare in a given class, a problem thatprevents truthful claims about thequality of teachers with class sizes belowa certain number. In the EVAAS model,general and special education teachers

who teach smaller classes are morelikely assumed to be average. An in-effective teacher who teaches a largeclass might be penalized for being belowaverage, whereas an equally ineffectiveteacher who teaches a smaller class maygo undetected. The larger the class, themore “accurate” the estimate. This modelmakes the process of evaluating teacherquality unfair, discriminating againstteachers who have larger classes.

Grades and Subjects TestedOnly students in certain grade levelsmust take the standardized accounta-

bility tests. This situation subjectsteachers to accountability measures insome grades, but not in others. In addi-tion, many of these tests only assessstudents’ reading and mathematicsskills, exempting teachers who teachother subjects from being held account-able in similar ways.

Teacher EffectThe EVAAS model is also incapable ofcontrolling for out-of-school learningand the effects one teacher might have

on another. Let’s say studentscomplete a standardized testin the spring in one teacher’sclassroom. They complete theschool year still learning fromthat teacher, spend threemonths in the summer losingor gaining variable amountsof knowledge, enter the class-room of a new teacher in thefall, and then take the “post-test” the following springunder the tutelage of the newteacher. It is impossible toprove that the losses or gainsposted from the previous yearto the next are solely a resultof the current teacher’s efforts.Although system developersargue that their system canfactor these effects out, this

remains unclear—and unlikely.The issue becomes more convoluted

when students enter middle and highschool and switch teachers and class-rooms daily, sometimes taking classes inthe same subject areas during the samesemester. For example, if a student istaking geometry and algebra the samesemester, who is to say that the geom-etry teacher was more or less effectivethan the algebra teacher or that thevalue the geometry teacher added to thestudent’s learning about math hadnothing to do with what the studentlearned in algebra? Who is to say that a


© J

AM

ES

YA

NG


student who switched language artsteachers midsemester learned moreabout reading from one teacher than theother? What about teachers who teamteach or teach in other atypical class-room settings? The model neglects thiscomplexity.

Student AssignmentAnd what does all this mean for evalu-ating teacher effectiveness whenstudents are not randomly placed intoclasses? If one high-quality teacher getsan amazing set of students and anotherequally effective teacher gets an unexcep-tional set, the students of the first teacherwill most likely learn more within oneyear, and their teacher will be unfairlyrewarded as a higher-quality teacher.This situation is more likely in schools inwhich assertive parents push their chil-dren into “better” classrooms. Conversely,if a teacher is assigned a disproportionateamount of difficult-to-teach students—possibly because the principal believesthat he or she can teach at-risk studentsmore effectively—and these studentsgain less than other students in compa-rable classrooms, is it fair to say theteacher is less capable, successful, orqualified than other teachers?

We can say that one teacher causedstudents to learn more than another onedid only if we randomly assign studentsinto classrooms. The same holds true forstatements about schools and districts.Most value-added researchers agree withthis, yet some continue to use data fromtheir models to make consequentialdecisions about teachers, schools, anddistricts.

A Single IndicatorAt best, the EVAAS model might beuseful at face value to help identifyteachers who need professional develop-ment or schools and districts in need ofintervention if, and only if, value-addedscore reports are not used in isolation

from other data confirming that theteachers, schools, or districts are, in fact,struggling to succeed (see also Bracey,2007).

The use of one single indicator tomake consequential decisions aboutstudents, teachers, schools, or districtsviolates the first of the 12 Standards forEducational and Psychological Testingset forth by the American EducationalResearch Association (AERA), the Amer-ican Psychological Association, and the

National Council on Measurement inEducation (AERA, 2000). These stan-dards represent the professionalconsensus on the appropriate uses oftests.

Limitation 2: Lack of Evidence of ValidityThe model’s developers state thatadopting the EVAAS model will makevisible certain education findings thatwere indiscernible in the past (Sanders& Horn, 1994) in fair, objective, andunbiased ways (SAS, 2007). Withoutthese findings, they claim, the realeffects on student learning wouldcontinue to go unaddressed (Sanders,1998). Purportedly, the model will helpdistricts and schools make data-informed decisions that will ultimatelyincrease student performance.

Also, proponents state that“combining value-added analysis andimproved high school assessments willlead to improved high school gradua-tion rates, increased rigor in academic

content, higher college-going rates, lesscollege remediation, and increasedteacher accountability” (Battelle forKids, n.d.). But nowhere do the devel-opers provide evidence to substantiatethese claims.

The model’s developers have usedtheir value-added data to notify parentsof the chances their children will or willnot pass upcoming tests or graduatefrom high school. They have also usedthe system to predict students’ scores on

college entrance exams, estimate thelikelihood students will get into statecolleges and universities, predict whichstudents are more suited to technicalmajors, and determine the probability ofstudents receiving As and Bs theirfreshman year in college.

Using inexact data to predict thingsabout students’ lives is unethical, un-professional, and borders on educationmalpractice. Many parents and teachersalready think they know which studentsare at risk; let us not rely on imperfectstatistics to notify high-achievingstudents that they are free and clear orremind low-achieving students that theodds are against them. Making suchpredictions may directly or indirectlycause them to come true.

The model’s developers also claimthat because their product singles outteachers whose students post eitherabove- or below-average gains, it’s thebest tool out there for rewarding orpenalizing teachers. Yet the developershave conducted no studies to examine

A S C D / W W W. A S C D . O R G 41

We can say that one teacher caused students to learn more than another one did only if we randomly assign students into classrooms.


whether teachers determined as highlyeffective are also (1) teachers with moreyears of experience, (2) teachers whosesupervisors or peers would also be clas-sified as highly effective, (3) teacherswho received high scores on theirteacher licensure tests, (4) teachers whohave higher levels of education, or (5) teachers who have received teachingawards and honors, are National Boardcertified, and the like.

Moreover, personnel in the districtsand schools that have implemented themodel do not seem to be using the datain the expected and promoted ways.This is largely because of the confusingdata reports and a lack of professionaldevelopment opportunities to helpteachers and administrators understandthe model’s output.

Limitation 3: Lack of TransparencyThere has been insufficient externalexamination of the EVAAS model toinform recommendations or regulatorydecisions about its use, benefits, andrisks. The question here is whetherthere have been enough empiricalstudies conducted to warrant the federaland state education policies mandatingthe use of this system.

The model’s developers have notcompletely opened up their system—inparticular, the computational algorithmsused to analyze test data—to external orpeer review. Nor have they released anyvalue-added data they have collected toenable other researchers to verify theclaims they make. This makes scientificresearch by external statisticians nearlyimpossible, limiting researchers’capacity to make sound recommenda-tions about the model to inform educa-tion policies and provide consumerswith the facts they need to make theirown “regulatory” decisions.

In 1997, developers asserted that theyhad undertaken “extensive efforts” to

increase understanding of the system,formerly known as the Tennessee Value-Added Assessment System (TVAAS),and they explained the system in greatdetail. They also stated that “detailedexternal reviews from both the statisticaland educational evaluation communitieshave confirmed that the properties ofthe TVAAS results are as claimed”(Sanders, 1998, p. 26)—but they didn’tprovide citations or references to theseexternal reviews. Four sets of externalreviewers examined the assessmentsystem in depth: Two reviewers praisedthe system, one reviewer raised signifi-cant points of contention, and the lastreviewer was one of the model’s devel-opers (Sanders & Wright, 2008).

Educating theEducation ConsumerIn all fairness, all value-added modelsare flawed, especially when it comes totheir reliance on standardized tests andthe assumptions about what these testscan reveal. The EVAAS model is themost sophisticated, or the least inferior,of these models.

Nevertheless, should the issues thatcontaminate the practicality of theEVAAS model warrant its removal fromthe market? Yes, at least until externalreviewers can verify the model’sassumptions about what standardizedtests can reveal, validate the inferencesdrawn about students and teachers,begin necessary internal and externalresearch studies, answer commonsense

questions, and inform consumers aboutthe system’s benefits and risks.

We need to take our education healthas seriously as we take our physicalhealth. Education consumers should getto know the model before educationpolicymakers force them to blindlyaccept it, simply because the theorybehind it makes sense. And they shouldhave the opportunity to learn about thebenefits and risks of the EVAASapproach because, in the end, they—and not the software developers or thesystem builders—will experience theside effects.

ReferencesAmerican Educational Research Association

(AERA). (2000). AERA position statementon high-stakes testing in PreK-12 education.Available: www.aera.net/?id=378

Battelle for Kids. (n.d.). High-school value-added project. Retrieved February 1, 2007,from www.battelleforkids.org.

Bracey, G. W. (2007, May 1). Valuesubtracted: A “debate” with WilliamSanders. The Huffington Post. Available:www.huffingtonpost.com/gerald-bracey/value-subtracted-a-debate_b_47404.html

SAS. (2007). Dr. William L. Sanders. Avail-able: www.sas.com/govedu/edu/bio_sanders.html

Sanders, W. L. (1998). Value-added assess-ment. The School Administrator, 55(11),24–27.

Sanders, W. L., & Horn, S. P. (1994). TheTennessee Value-Added AssessmentSystem (TVAAS): Mixed-model method-ology in educational assessment. Journalof Personnel Evaluation in Education, 8(3),299–311.

Sanders, W. L., & Wright, S. P. (2008, April14). A response to Amrein-Beardsley (2008):“Methodological concerns about the Educa-tion Value-Added Assessment System.” Avail-able: www.sas.com/govedu/edu/services/Sanders_Wright_response_to_Amrein-Beardsley_4_14_2008.pdf

Audrey Amrein-Beardsley is AssistantProfessor in the College of TeacherEducation and Leadership at ArizonaState University, Phoenix; [email protected].

EL


This modeldiscriminatesagainst teachers who have larger classes.


Documents

Value-Added Tests: Buyer, Be Aware - VAMboozledvamboozled.com/wp-content/uploads/2013/10/2009BuyerBe... · 2013. 10. 31. · Student Risk Factors The EVAAS model does not control