HRD-76-165 Problems and Needed Improvements in …evaluating the effectiveness of major Federal education programs. This concern stems from the need for objective data to be used in

DOCUMENT RESUME

03404 - [k253678]

ProblEms and Needed Improvements in Evaluating Office ofEducation Programs. HRD-76-165; B-164031(1). September 8, 1977.76 pp. 4 appendices (53 pp.).

Report to the congress; by Elmer B. Staats, Comptroller General.

Contact: Husan Resources Div.'3udget Function: Education, Manpower, and Social Services:

Elementary, Secondary, and Vocational Education (501).Organizaticii Concerned: Office of Education; Department of

Bealtt, Education, and Welfare.Congressional Relevance: House Committee on Education and Labor;

Senate Ccmittee cn Human Resources; Congress.Authority: Elementary and Secondary Education Act of 1965 (20

U.S.C. 241a). Education Amendments of 1974 (P.L. 93-380).General Education Provisions Act (20 U.S.C. 1226c).Education Amendments of 1972 (P.L. 92-318). 20 U.S.C. 1231a.

A review was conducted to determine the effectivenessof federally supported education evaluations, primarily thoseconcerning elementary and secondary education programs, in orderto obtain objective data for allocating resources and indeciding whether cr not programs should be continued ormodified. Questionnaires were sent to education agencies in allStates and the District of Columbia and to a statistical sampleof local scnool districts to obtain State and local agencies'views on Federal education program evaluations. Federal, State,and local education agencies frequently use standardizednorm-referenced achievement tests to measure the effect ofFederal education programs. indings/Conclusions: The Office ofEducation's (OE) evaluation studies can better serve Congress byhaving them timed to coincide with the legislative cycle and bymore frequent briefings of congressional committee staffs. OEneeds to make a better effort to set forth specific qualitativeand quantitative program objectives in order to provide a clearbasis for program evaluation. The usefulness of the State andlocal evaluation reports needs improvements in the areas of:relevance of reports to policy issues, dta completeness andcomparability, and report timeliness. If he reporting systemsbased on aggregated local agency data are to be effective,standardization of data collection efforts is needed. Educatorsand test experts disagree on the use of standardizednorm-referenced tests versus criterion-referenced tests. oreresearch may be needed on criterion-referenced tests and on howto reduce racial, sexual, and cultural biases in standardizedtests. Reccmmendations: The Secretary of HEW should direct OEto: (1) emphasize congressional information needs when planning,implementing, and reporting on evaluation studies; (2) seekagreement with Congress on the specific program objectives to beused for evaluations as well as acceptable evaluation data andmeasures for each program to be evaluated; and (3) improve the

implementation of evaluation results by giving greater attentionand priority to procedures such as the issuance of policyimplication memoranda designed to assure implementation of thoseresults. OE should review the types of State and/or localprogram evaluation iformation collected on programs authorizedby titles I and VII of the Elementary and Secondary EducationAct to determine if it is real: stic to serve Federal, State, andlocal levels with aggregated data based on local agencyevaluation reports. (So)

REPORT TO THE CONGRESS

' BY THE COMPTROLLER GENERAL, : '.', OF THE UNITED STATES

Problems And Needed Improve-ments In Evaluating Office OfEducation ProgranmsOffice of EducationDepartment of Hea!th, Educaticn, an-i Welfare

The Office of Education should more stronglyemphasize serving congressional needs in plan-ning and carrying out evaluation studies,should define its program objectives moreclearly, o..d should improve the implementa-tion of evaluation results.

The Office of Education should assess wheth-er State and/or local evaluation reports, undertitles I and VII of the Elementary and Sec-ordary Education Act, can realistically be im-proved to supply Federal, State, and local of-ficials with the reliable program informationthey need for decisionmaking.

Serious questions have been raised aboutstandardized "norm-referenced" tests, whichare frequently used to measure Federal educa-tion programs' effectiveness. This report dis-cusses some of these questions and varioussuggestions for alleviating problems in con-ductiig large-scale evaluations of compensa-tory education and desegregation programs.

HRD-76-165 SEPTEMBER 8, 1977

COMPTROLLER GENERAL OF IHE UNITED STATESWASHINGTON,. D.C. 20548

B-164031(1)

To the President of the Senate and theSpeaksr of the House of Representatives

This report points out that the Office of Educationneeds to more strongly emphasize the purpose of providinginformation to the Congress when planning and implementingevaluation studies, more clearly define its program objec-tives, and improve implementation of evaluation results.The report contains recommendations to improve the useful-ness of educational evaluations. Also discussed i, the re-port are some criticisms and deferses of standardized testsand suggestions for alleviating problems in evaluating large-scale compensatory education and desegregation programs.

Our review was made because of the increasing concernshown by the Congress and various Federal agencies withevaluating the effectiveness of major Federal educationprograms. This concern stems from the need for objectivedata to be used in allocating resources and in decidingwhether or not programs should be continued, modified, ordiscontinued.

We mdde our review pursuant to the Budget and Account-ing Act, 1921 (31 U.S.C. 53), and the Accounting and Audit-ing Act of 1950 (31 U.S.C. 67).

We are sending copies of this report to the Director,Office of Management and Budget, and the Secretary ofHealth, Education, and Welfare.

mptroller Generalof the United States

COMPTROLLER GENERAL'S PROBLEMS AND NEEDEDREPORT TO THE CONGRESS IMPROVEMENTS IN EVALUATING

OFFICE OF EDUCATION PROGRAMSOffice of EducationDepartment of Health, Education,

and Welfare

DIGEST

In recent years the Congress and variousFedeLr ancies have become increasinglyinterested in evaluating the effectivenessof major Federal education programs. Suchevaluations should help them decide whetherprograms should be continued, modified, ordiscontinued. (See p. 1.)

To obtain State 'and local agencies' viewson Federal education program evaluationsand related matters, GAO sent question-naires to education agencies in all Statesand the District of Columbia and to a sta-tistical sample of local school districtsthroughout the Nation. (See p. 11.)

This report discusses the questionnaireresults, the pros and cons of standardizedtests, and suggestions from testing andevaluation experts and others for alleviat-ing problems in conducting large-scaleevaluations of compensatory education anddesegregation programs. (See pp. 53 and 63.)

OFFICE OF EDUCATIONEVALUATION STUDIES

Office of Education evaluations are to helpthose making decisions about education, in-cluding the Congress. Recently, the mainemphasis has been on evaluating the effec-tiveness of major Office of Education pro-grams throughout the Nation.

The Office of Education's evaluation studiescan better serve the Congress. Timing thestudies to coincide with the legislativecycle and more frequently briefing congres-sional committee staff would help.

ar Sha . Upon removal, the reportcover date sould be noted hereon. i HRD-76-165

Evaluations can reach conclusive and usefulfindings about a program's effectivenessonly if the program's objectives are definedand measurable. For example, a nationalevaluation of the migrant program did notadequately assess the program's effective-ness because no one had developed acceptablecriteria and objectives for measuring theprogram's success.

Program objectives to be evaluated need tobe better defined, and the ways evaluationresults are used and implemented need moreattention.

The Congress should recognize that the De-partment of Health, Education, and Welfare(HEW) is not complying, and des not intendto comply, with the legislative requirementto set forth goals and specific objectivesfor individual programs and projects in-cluded i its annual valuation report tothe House Committee on Education and Laborand the Senate Committee on Human Rsources.

HEW eels that its authority and ability tocomply with this legislative requirement arelimited.

PROBLEMS WITH STATE ANDLOCAL EVALUATION REPOPTS

The money spent at the State and local educa-tion agency levels to evaluate selected ed-eral elementary and secondary education pro-grams is large--over $42 million in fiscalyear 1974.

If the reporting systems bsed on aggregatedlocal aqenzy data are to be effective, vari-ous aspects of the evaluation reports needto be improved. These include the credi-bility of findings and the qualificationand quantification of measurement data.

The usefulness of the State and local evalua-tion reports also needs to be improved withrespect to

ii

-- relevance of the reports to policy issues,

--completeness and comparability of the datareported, and

-- report timeliness.

Valid, complete, and comparable evaluationdata is important if evaluation results areto meaningfully contribute to decisions atall levels. If local and State evaluationdata continues to be aggregated for use athigher levels, data collection efforts andtechniques need to be standardized to pro-vide comparable results.

Because of constraints on the Federal role.n education, the Government probably willnot try to provide needed valid and compar-able data by dictating uniform evaluationmethods and procedures to State and localeducation agency grantees.

In addition, questions exist about whetherthe models for evaluating title I (aid fordisadvantaged children) will be able toprovide valid and acceptable data to meetprogram information needs. If these issuesare not resolved, GAO questions whether im-provements can be made that will enable .'tereporting systems (based on aggregated localdata) to meet program information needs atFederal, local, and/or State levels.

USES OF STANDARDIZED TSTS ANDPROGRAM-EVALUATION

Federal, State, and local education agen-cies frequently use standardized norm-referenced achievement tests to measurethe effectiveness of Federal educationprograms. These tests measure an individ-ual's performance against a "norm" group.

However, testing experts and educatorsdisagree about the adequacy of these testsfor their intended uses. Serious questionshave been raised about the tests, and someorganizations have called for a moratorium

Tear Sheet iii

on their use and a higher priority on develop-ment and use of alternatives.

Although the tests' critics and defendersagree that certain problems exist, viewsdiffer greatly about the importance orseverity of the problems, and theirremedies. Defenders recognize that im-provements are needed in such areas astest and test question bias; appropriatetest norms; test selection, interpretation,and administration, including test uses forprogram evaluation; and other issues.

Some questions about the tests have greatimportance in determining the appropriate-ness, validity, and proper conduct of educa-tional program evaluations, including thosethat are federally funded. Those responsiblefor making educational decisions should beaware of these issues when using such infor-mation.

Additional research may be needed on:

-- Criterion-referenced and other tests foruses which include program evaluation,as alternatives to standardized norm-referenced achievement tests.

-- How to reduce racial, sexual, and culturalbiases in standardized tests.

States and especially local education agen-cies need to be more aware of available in-formation intended to help them select themost appropriate tests for evaluation.

EVIDENCE OF PROGRAM EFFECTIVENESS

Local and State evaluation reports on Fed-eral elementary and secondary educationprogram effectiveness are intended to pro-vide information that local, State, andFederal officials can use to make policyand program decisions. However, State andlocal officials see important differencesin the types of evidence of program effec-tiveness that they and officials at otherlevels prefer.

iv

Better communication is needed among thethree levels on information they need tofacilitate policy and program decisions.Questionnaire results raise this question:Should all three levels be served by areporting system based on the same reports?

Although State officials view Office ofEducation program officials as being mostimpressed by standardized norm-referencedtest results, and local officials viewState and Office of Education officialsin the same manner, State and local offi-cials say that they are not most impressedby such results.

Local officials prefer broader, more diversetypes of information on program results thanjust these test scores, and they are motimpressed by improvements in curriculum andinstructional methods and gains in the affec-tive domain (likes, dislikes, i.e.-rests,attitudes, motives, etc.). State officialsare most impressed by results from criterion-referenced tests.

The widespread use of standardized norm-referenced tests to evaluate State and localprograms indicates that State and local offi-cials have more frequently based their eval-uations on the kinds of results they believewould be likely to most impress higher levelofficials than on their own preferences.

Although HEW's comments were not responsiveto certain aspects of GAO's recommendations,it agreed, at least in general, with all butone recommendation. (See pp. 21, 37, 52,and 76.)

Tear Sheet v

C o n t e n t sPae

DIGEST

CHAPTER

1 INTRODUCTION 1Origin and development of Federal

evaluation efforts 1.dministrati.-. Ppderi eiuation

activitiesReqiuirements for State and localevaluations of OE programs 5

Limitations of educational evaluations 6

2 SCOPE AND METHOD OF REVIEW 11

3 OPPORTUNITIES TO IMPROVE OE'S EVALUATIONSTUDIES 14

Congressional needs should be taken moreinto account 1.4

Program objectives need to be defined 15Use of evaluation results needs more

attention 17Conclusions 20Recommendations to the Secretary of HEW 20Agency comments and our evaluation 21Recommendation to the Congress 23

4 PROBLEMS WITH STATE AND LOCAL EVALUATIONREPORTS 24State and local evaluation expenditures 24State and local evaluation reports need

improvement 26Other problems 30Conclusions 36Recommendation to the Secretary of HEW 37Agency comments and our evaluation 37

5 USING STANDARDIZED TESTS 42Uses and implications of standardizednorm-referenced tests 42

Widespread State and local ue ofstandardized norm-referenced tests toevaluate Federal programs 43

Efforts to evaluate standardized tests 45Criterion-referenced tests 48Conclusions 51Recommendations to the Secretary of HEW 51Agency comments and our evaluation 52

ageCHAPTER

6 STANDARDIZED TESTS AND PROGRAM EVALUATION 53Criticism of standardized norm-

referenced tests 53Defense of standardized tests 61Conference on using tests to evaluateprograms 63

Conclusions 67

7 PREFERENCES FOR EVIDENCE OF PROGRAMEFFECTIVENESS DIFFER 69

State and local education officialsbelieve standardized test resultsmost impress higher level officials 70

Conclusions 75Recommendation to the Secretary of HEW 76

APPENDIX

I Results of GAO's State education agencyquestionnaire 77

II Results of GAO's local education agencyquestionnaire 93

III Letter dated June 15, 1977, from theInspector General, HEW 116

IV Additional suggestions by OE confereesfor improving testing and evaluation 125

V Principal officials of the Department ofHealth, Education, and Welfare responsiblefor activities discussed in this report 128

ABBREVIATIONS

GAO General Accounting Office

HEW Department of Health, Education, and Welfare

NIE National Institute of Education

OE Office of Education

CHAPTER 1

INTRODUCTION

ORIGIN AND DEVELOPMENT OFFEDERAL EVALUATION EFFORTS

In recent years the Congress and various Federal agencieshave become increasingly concerned with evaluating the effec-tiveness of major Federal ducation programs. Because oftight budget constraints, limited financial and human resources,and the need for objective data on the effectiveness of educa-tion programs, the Congress, the Office of Management andBudget, officials of the Department of Health, Education, andWelfare (HEW), and others are requesting such data to helpthem allocate resources and decide whether or not programsshould be continued, modified, or discontinued.

Legislative background

Legislative andates for evaluation of education programscan be traced back at least to title I of the lementary andSecondary Education Act of 1965 (20 U.S.C. 241a), which re-quired States to assure the adoption of

"* * * effective procedures, including provision forappropriate objective measurements of educationalachievement * * * for evaluating at least annuallythe effectiveness of the programs in meeting thespecial educational needs of educationally deprivedchildren."

Congressional interest in evaluation was further reflectedin the Education Amendments of 1974 (Public Law 93-380).The Office of Education states that 22 new studies and re-ports are required to be submitted to the Congress by theCommissioner of Education, the Assistant Secretary for Edu-cation, or the Secretary of HEW. Of te 22, 7 are evaluationstudies to be conducted by the Office of Education's Officeof Planning, Budgeting, and Evaluation; another is a thoroughevaluation and study of Federal title I and State compensa-tory education programs by the National Institute of Education(NIE).

The General Education Provisions Act (20 U.S.C. 1226c)requires that HEW provide the House Committee on Educationand Labor and t Senate Committee on Human Resources (pre-viously the Senate Committee on Labor and Public Welfare)an annual evaluation report "which evaluates the effective-ness of applicable programs in achieving their legislated

1

purposes" and recommends improvements. Any evaluation re-port evaluating specific programs and projects is required to

-- set forth goals and specific objectives in qualita-tive and quantitative terms for all programs andProjects and relate those goals and objectives toproG iA purposes;

-- report on the progress made during the previousfiscal year in achieving such goals and objectives;

--describe the cost and benefits of each programevaluated; and

-- contain plans for implementing corrective actionand legislative recommendations, where warranted.

In addition to a increasing number of congressionallymandated ntional studies, legislation and HEW regulationsoften require local and/or State evaluations of programeffectiveness at least once a year.

The Education Amendments of 1972 (Public Law 92-318),established NIE as part of HEW's Education Division. TheDirector of NIE reports to the Secretary of HEW through theAssistant Secretary for Education, as does the Office ofEducation (OE). NIE is charged with improving educationby, among other things, building an effective educationalresearch and development system. Since educational researchincludes not only basic and applied research and surveys,but also evaluation, the law provides a new mechanism forevaluating educational programs. However, principal responsi-bility for evaluating OE programs remains with OE.

ADMINISTRATION OF FEDERALEVALUATION ACTIVITIES

According to OE, the main goal of Federal educationalevaluation studies is to provide information on whichpolicy decisions about Federal education programs and OEresource allocations may be based. To achieve this goal, OE

--conducts national evaluations of the effectiveness ofFederal education programs;

--analyzes major educational problems or issues;

-- reports annually to the Congress on the effectivenessof OE programs in meeting their legislative intent;

2

-- attempts to identify the program approaches that workand do not work, and determine why; and

-- attempLs to identify and validate for disseminationlocally initiated innovative practices and exemplaryprograms.

HEW's education evaluations are carried out by severalentities: the Office of the Assistant Secretary for Planningand Evaluation; OE's Office of Planning, Budgeting, and Evalu-ation; a limited number of OE program bureaus; the NationalAdvisory Councils; and NIE. Although the Office of theAssistant Secretary for Planning and Evaluation primarilyreviews OE's evaluation activities, it also receives a smallportion of education evaluation funds to conduct studies forthe Secretary.

OE is among the few Federal agencies that attempt tointegrate the program evaluation process with their budgetand legislative cycle. OE's evaluation activities are largelycentralized in the Office of Planning, Budgeting, and Evalua-tion. This was done to try to emphasize evaluation morestrongly.

The table below lists the funds available to OE for plan-ning and evaluation. According to OE, these sums, althoughsubstantial, represent less than three-tenths of 1 percentof OE's total annual program appropriations and must coverapproximately 85 legislative programs. OE's Assistant Com-missioner for Planning, Budgeting, and Evaluation estimatedthat from about 1971 on, approximately two-thirds of the OEplanning and evaluation appropriation funds have been usedfor OE evaluation activities. (Chapte 4 provides fundinginformation on State- and local-level evaluations of elemen-tary and secondary education programs.

3

OE Planning and Evaluation Funds

OE programOE planning funds used

and evaluation for evaluationFiscal year appropriations (notes a and b) Total

- ----- ~---- (000 omitted)

1968 $ 1,250 - $ 1,2501969 1,250 - 1,250

c/1970 9,512 $ 4,155 13,667c/, d/1971 12,475 8,724 21,1993/, e/1972 11,225 3,950 13,175

d/1973 10,205 9,880 20,0853/1974 5,200 5,268 10,4683/1975 6,858 11,043 17,901

1976 6,383 10,512 16,895

a/Includes funds authorized from Follow Through, EmergencySchool Assistance Act, title I of the Elementary andSecondary Education Act, Basic Opportunity Grants, ProjectInformation Packages, and Career Education programs.

b/Does not include program funds used by State and localeducation agenices for evaluations under Elementary andSecondary Education Act, titles I, III, VII, and VIII.

c/Does not include $5 million appropriated for grants toStates for planning and evaluation under Elementary andSecondary Education Act, title V, part C--ComprehensiveEducational Planning and Evaluation.

d/Includes support for the Educational Policy ResearchCenters (at Stanford Research Institute and Syracuse Uni-versity Research Center) for the following fiscal years:$900,000 (1971); $900,000 (1972); $950,000 (1973); and$450,000 (1974). Monitorship of the centers was transferredto the Office of the Assistant Secretary for Education infiscal year 1974.

e/Excludes $1 million earmarked for NIE planning.

Systematic, comprehensive evaluation of Federal educa-tion programs at the Federal level dates back only to 1970.At that time the Congress increased OE evaluation funds inresponse to HEW's request. According to OE, such effortswere largely precluded before then by insufficient appropri-ated funds for evaluation and too few technically qualified

4

evaluation staff members. Since fiscal year 1970, OE hasattempted to expand and upgrade its evaluation activitiesand capabilities. The equivalent of about 23 professionalfull-time staff members are now assigned to evaluation.

The Office of Planning, Budgeting, and Evaluation hasdesigned and begun over 10 evaluation and planning studies;instituted an annual evaluation plan highlighting yearlypriorities; and implemented a process for disseminating,cniefly at the Federal level, the major results of evaluationstudies.

Almost all OE evaluation and planning studies are per-formed under contract. OE's evaluation office issues a re-quest for proposals after determining the study's design andthe techniques to be used--for example, sample size, anal-ysis method, and data collection method. Contractors areselected competitively. After a contract is awarded, anOE project monitor from the evaluation off ce monitors thecontractor's performance by exercising ap[ oval over theapproach to be used, making site visits, and reviewing pro-gress reports. The project monitor also reviews and approvesthe draft report's technical adequacy, completeness, andresponsiveness before the report is finally accepted.

OE develops and implements policy recommendations onthe basis of the evaluation findings.

REQUIREMENTS FOR STATE AND LOCALEVALUATIONS OF OE PROGRAMS

Legislation and HEW regulations often require annualevaluations of Federal programs by State and/or local educa-tion agencies. This is the case for programs and projectsfunded under titles I, III, and VII of the Elementary andSecondary Education Act. A brief summary of the provisionsin each title follows:

--Title I provides funds through State educationagencies to local education agencies serving areaswith concentrations of children from low incomefamilies. The funds are intended to meet the specialeducational needs of educationally deprived children.

--Title III has provided funds to local education agen-cies, principally through State ducation agencies,for (1) stimulating and assisting in the developmentand establishment of exemplary programs to serve asmodels for regular school programs and (2) assisting

5

the States in establishing and maintaining guidance,counseling, and testing programs. (The Education Ancnd-ments of 1974 consolidated the title II and most titleIII activities into a new title IV. Fiscal year 1976was both the first year of funding under the new titleIV and the last year of funding under title III. How-ever, final title III projects will not run out untilthe end of fiscal year 1977, and requirements forState and local evaluations are in effect until thattime. Title IV requires evaluation by the advisorycouncil in each State.)

-- Title VII provides OE discretionary funds for localeducation agencies and others to help carry out proj-ects designed to meet the special educational needs ofcliildren who speak a language other than English andwho come from low income families. Title VII is ademonstration program designed to build up the re-sources needed to start bilingual projects.

Local education agencies are required to evaluate theirtitle I, III, and VII projects annually. State educationagencies are also required to administer and annually evalu-ate their title I and III programs. However, title VII pro-jects (bilingual education) operate under direct grants fromOE; therefore, no State evaluations are required for suchprojects. OE is nevertheless required to consult with Stateeducation agencies before approving local education agencies'title VII grant applications and to give States the oppor-tunity to make recommendations on the applications.

LIMITATIONS OF EDUCATIONALEVALUATIONS

The central issue in most educational evaluation studiesis whether programs such as title I of the Elementary andSecondary Education Act affect student progress. ThePresident's Commission on School Finance was establishedto make recommendations to the President regarding the properFederal role in financing elementary and secondary education.To be able to make its recommendations in the light of educa-tional research results, the Commission asked a major researchcorporation to assess the available knowledge on what deter-mines educational effectiveness. The resulting 1972 report 1/

1/Harvey A. Averch, et. al., "How Effective Is Schooling? A

Critical Review And Synthesis Of Research Findings,H R-956-PCSF/RC, The Rand Corporation, Mar. 1972, pp. iii-xii, 125,and 158.

6

states tat research has found nothing that consistently andunambiguously makes a difference in student "outcomes."That is, research has not found any educational practicethat offers a high probability of widespread success. OE'sfiscal year 1975 annual evaluation report makes a similarstatement about the attempt to identify the attributesof successful projects. A November 1976 OE-funded report,based on a study of educational innovations, found that theinnovations make little difference in student achievement.

The 1972 report's findings were based on an examinationof how valid the approach and results of numerous individualstudies were. The report states that while some studies showa given educational practice to be effective, other similarstudies find the same educational practice to be ineffective.It is unclear, the report adds, why this discrepancy exists.According to the report, four substantive problems appearin virtually every area of educational research that limitevaluation studies:

--Research data is, at best, a crude measure of whatis happening. For example, student achievement istypically measured by scores on standardized achievementtests despite the many serious problems involved ininterpreting such scores. (See chs. 5 and 6.)

--Educational outcomes are almost exclusively measuredaccording to cognitive achievement, often leading tosparse and inconclusive results that provide littleguidance on what practices are effective.

--There is almost no examination of the cost implica-tions of research results, which makes it verydifficult to translate research/evaluation resultsinto policy-relevant statements.

--Few studies adequately monitor the relationshipbetween what actually goes on in the classroom andstudent achievement, so that data may be affectedby circumstances unrecognized in analysis.

Because of the problems above, according to the reportresearchers are confronted by the virtually impossible taskof measuring those aspects of education they wish to study.That is, it is impossible for current research to reach de-finitive conclusions about educational outcomes becauseit cannot measure most of them well.

7

Other studies point out that numerous poorly designedand implemented evaluations result in very questionable orinvalid data on which to base decisions about plicy orprograms.

A 1975 study, 1/ which reviewed the major title Ievaluation efforts rom 1965 through 1972, discusses somelimitations in educational evaluations. The study tracesthe accepted belief in the necessity to evaluate educationprograms to title I of the 1965 act. This legislationestablished local reporting on projects in the hope thattimely and objective information about the results of title Iprojects could reform *:he local administration of educationand the methods of educating poor children. It was alsohoped that systematic evaluation could make Federal manage-ment of education programs more efficient. According to thestudy, those pursuing educational reform saw evaluation ascentral to achieving it and assumed that reporting require-ments would generate valuable information whichowould be usedrationally in contributing to policy and program decisions.

The study concluded that after 7 years, more than $52million of expenditure on evaluations, and creating a numberof alternative evaluation models, evaluation had failedto meet the expectations of those urging reform or even toserve the self-interest of Federal program managers. Regard-ing this conclusion, the study stated:

"There are numerous reasons why efforts to evaluateTitle I failed * * * The central cause is thatschool districts had no incentive to collector report output data, and federal officialslacked the political muscle to enforce evalua-tion guidelines or to require cooperation withother federal evaluation efforts * * *."

The study added:

-- Those interested in reform efforts failed to takeinto account the difficulty of evaluating the processof schooling in general and title I in particular.

1/Milbrey Wallin McLaughlin, "Evaluation and Reform," aRand Educational Policy Study, Ballinger PublishingCompany, Cambridge, Mass., 1975, pp. vii-ix and 117-120.

8

-- Legislatively mandated evaluation, intended to makeschool administrators accountable, has led to localevaluation that is, in the view of many observers,little more than an annual ritualistic defense ofprogram ctivities. In addition, the Federal evalu-ation efforts have not contributed to the formulationof short-run management strategies or long-range plan-ning. Evaluations based on an "impact cost-benefit"model have been used selectively to lend an appearanceof rationality to decisions that are essentiallypolitical.

--Contrary to the expectations of those interested inreform efforts, neither Federal decisionmakers nor localschool personnel showed much ability or interest inusing evaluations to formulate title I policy orpractice.

-- Local perceptions of Federal initiatives and commit-ments as inherently unstable, combined with a basiclocal defensiveness about achievement measures, willprobably continue to istrate Federal attempts tosecure objective, rell,2ble information on programresults.

-- The highly political way that title I evaluationhas been conducted, including use of its results, hasweakened the credibility of evaluation as a policyinstrument, in the opinion of many program person-nel.

-- A realistic and useful evaluation policy shouldacknowledge the inherent constraints that the policysystem and the behavior of bureaucracies place uponevaluation.

In addition, the study stated:

"The history of Title I evaluation also suggestsa number of implications about the conduct and use ofevaluation in a multi-level government structure * * *In a federal system of government, and especially ineducation, the balance of power resides at the bottom,with special interest groups * * * Thus a federalevaluation policy that conflicts in fundamental wayswith local priorities is unlikely to succeed * * *Federal evaluators, then, are faced with a specificallypolitical dilemma generated by their inability to insistupon accurate information on school effects and program

9

impact. And the existence of powerful social sanctionsagainst a strong federal data requirement means thatthese barriers to the implementation of federal evalua-tion policy will remain."

OE's Assistant Commissioner for Planning, Budgeting, andEvaluation said that he agrees with this historical analysis.He believes that the evaluation approach that OE's evaluationoffice follows takes these problems into account because itis based not on school district data but on contractor datacollected nationally.

10

CHAPTER 2

SCOPE AND METHOD OF REVIEW

Our objectives were to review the usefulness and limita-tions of federally supported education evaluations--focusingmostly on elementary and secondary education programs--andto solicit suggestions on needed program evaluation improve-ments, including needed research and development.

We reviewed OE's evaluation activities--principallycarried out through the Office of Planning, Budgeting, andEvaluation--and related NIE activities. We also reviewedlegislation, policies, procedures, and various Federal,State, and local education agency program evaluation reportsrelating to the Elementary and Secondary Education Act,titles I, III, and VII.

We interviewed officials from the Office of the Secretaryof HEW, the Office of the Assistant Secretary for Education,the National Center for Education Statistics, OE, NIE, andthe Office of Management and Budget. In addition, we inter-viewed the staff members of various congressional committeesand officials from 11 education research organizations, includ-ing 4 publishers of commercial tests and 6 research/evaluationorganizations. We also interviewed officials from two nationalinterest groups concerned witI education, and attended con-ferences of educators and measurement and evaluation expertswhich were held to improve student assessment or educationalprogram evaluation.

To obtain State and local education agencies' views onFederal education program evaluations and related matters,we sent questionnaires to education agencies in all Statesand the District of Columbia, and to a statistical sample oflocal school districts throughout the Nation. The two setsof questionnaires were sent in April 1975 and were returnedby June 1975.

The District of Columbia and 49 States responded to ourState-level questionnaire. (To simplify questionnaireresults in this report, we consider the District of Columbiato be a State.) Appendix I compiles the responses onthe State education agency questionnaire. Respondents tosection A of the questionnaire were almost always officialsresponsible for statewide assessment, accountability, and/ortesting activities. Respondents to sections B and C werenearly always officials responsible for titles I and IIIprog'.ams, respectively.

11

Our questionnaire sample for local school districts waslargely the same as a national sample used by the Office ofEducation in 1973. Neither sample included school districtshaving fewer than 300 pupils; both were stratified accordingto enrollment as follows: 125,000 pupils or more; 35,000 to124,999 pupils; 9,000 to 34,999 pupils; 3,000 to 8,999 pupils;and 300 to 2,999 pupils.

Nineteen school districts compose the first group--thelargest school districts--and all were included in the sample.An independent random sample of 813 school districts wasdrawn from the remaining groups. We received responses from710 (85 percent) of the 832 school districts included in thesample.

As a result of the high response rate, the attitudesand opinions expressed in response to our local school dis-trict questionnaire are representative of the entire uni-verse of 11,666 such districts in the Nation having 300 omore pupils. However, we projected the responses to a totalof 8,936 local education agencies because this method,based on the weighting and the response rates across thevarious strata in our sample, allows us to obtain the mostaccurate percentages on the answers given.

Local education agency questionnaire results appear inappendix II. The numbers shown there represent the numberof local scnool districts in the Nation to which our localquestionnaire sample responses have been Projected. Mostlocal education agency respondents to section A of thequestionnaire were directors for testing. Most respondentsfor sections B, C, and D were directors for titles I, III,and VII projects, respectively. However, in some casessuperintendents responded to the questionnaire.

Our questionnaires focused on program evaluations oftitles I, III, and VII of the Elementary and Secondary Educa-tion Act for a number of reasons. The largest Federal em-phasis in education has been placed on the attempt to dealwith various inequalities in educational opportunity. Pro-grams of this kind have attempted to equalize educational op-portunity for groups and individuals who are at a disadvantage(titles I and VII) and to improve the quality and relevanceof American education through research and demonstration anddissemination of results (title III). Appropriations for theprograms under thee three titles are substantial. For fiscalyeaL 1975 they were $2.2 billion, which represented about one-half of all Federal elementary and secondary education programdollars. In addition, Federal legislation and HW regulations

12

require national, State, and local evaluations for titles Iand III, and national and local evaluations for title VII.

To supplement information obtained from the question-naires, we interviewed education agency officials fromfive States, the District of Columbia, and 10 local schooldistricts.

13

CHAPTER 3

OPPORTUNITIES TO IMPROVE

OE'S EVALUATION STUDIES

Evidence we gathered indicates that opportunities existfor OE's evaluation studies to better serve the Congress.Specifically, this includes timing the studies better andbriefing congressional committee staff more frequently.

OE also needs to better define the program objectives tobe evaluated and improve the use of evaluation results.

CONGRESSIONAL NEEDS SHOULDBE TAKEN MORE INTO ACCOUNT

According to OE, its evaluations are intended primarilyto assist those involved in making educational decisions.This includes the Congress. In recent years the main emphasishas been on evaluating the national impact or effectiveness ofmajor OE programs.

Decisions on education programs are made at variouslevels--in the Congress, OE, State education agencies, andlocal school districts. Decisionmakers at different levelssometimes need different information. For evaluation studiesto be most useful to them, the views of those who are to usethe results should be taken into account in evaluation plan-ning and design. This increases the chance that their in-formation needs will be adequately fulfilled and that resultantdecisions will be well defined. It should also increase thechance that evaluation results will be effectively communicatedto those intended to use them.

One such user to which OE should give greater attentionin providing program evaluation information is the Congress.To obtain information on how useful OE evaluation studies areto the Congress and how much congressional views are taken intoaccount in those studies, we contacted four key congressionalcommittee staff members responsible for education matters, in-cluding the Majority and Minority Counsels for the Subcommitteeon Elementary, Secondary, and Vocational Education, House Com-mittee on Education and Labor; the Minority Counsel for theSubcommittee on Education, Senate Committee on Human Resources;and the Chief Counsel for the Senate Appropriations Committee.Two persons interviewed believed that OE does not obtain enoughcongressional input in designing its evaluation studies.

14

One of the four staff members stated that OE evalua-tion studies are generally useful and reasonably gcod, T'.rother three stated that the studies often have been completelyineffective or have had little impact or. legislation. Theirmost frequently cited reasons were that

-- OE studies are not timed to coincide with thelegislative cycle;

-- OE efforts to interpret data and highlightimportant findings are insufficient; and

--OE briefings for congressional committee staffare not frequent enough.

OE's Assistant Commissioner for Planning, Budgeting, andEvaluation agreed with each of these statements and said thatthe evaluation office has not been sufficiently sensitive andresponsive to congressional needs. He noted that the longleadtime necessary to plan and implement studies contributesto this problem.

OE's Assistant Commissioner for the Office of Legisla-tion said that he considers the poor timing of evaluationstudies to be a major factor inhibiting their impact onlegislation. He stated that OE's evaluation office has notmade the effort necessary to assure that evaluation resultsare arrived at and communicated soon enough to be consideredin developing legislative proposals.

Commenting on our report, HEW stated that it does notconcur with one of the three reasons cited above for the limitedimpact of its studies. HEW believes that its procedure forinterpreting and summarizing evaluation study results is notdeficie,-t. Brief summaries of each evaulation study are sentto all members of the cognizant House and Senate authorizingand appropriation committees and their staffs, as well asto appropriate HEW Education Division staff and others.

PROGRAM OBJECTIVES NEEDTO BE DEFINED

Evaluations can reach conclusive and useful findingsabout a program's effectveness only if the program's objec-tives are defined and measurable. However, our review ofselected evaluations and reports, as well as discussionswith OE and non-OE experts, showed that one major problemin assessing education programs is the lack of sufficientlydefined objectives.

15

For example, a national evaluation of the migrantprogram under title T of the Elementary and Secondary Educa-tion Act had not adequately assessed the program's effective-ness, an OE migrant procram official said, because no onehad developed acceptab>o criteria and objectives for mea-suring the program's success.

In addition, officials from OE's evaluation officestated that the purpose and objectives of title I itselfhave not been specified clearly. They said that this hasmade it difficult to evaluate the effectiveness of thetitle I program.

Other educational researchers have pointed out thatit is difficult to establish criteria for Federal programsdesigned to respond to multiple needs, such as titles I andIII of the Elementary and Secondary Education Act. They havestated that when the purposes of a program such as title Ior III are ambiguous, various criteria are applied in asses-sing the program which leads to noncomparable evaluation re-sults.

Likewise, officials representing a major test publishersaid that the objectives of compensatory education programsshould be clarified.

According to some educational researchers and contrac-tors who compete for OE evaluation contracts, OE studies donot always define clearly the criteria and major questionsthat should be addressed. For example, a researcher whofrequently prepares OE policy studies said that OE evaluationstudies do not help in formulating policy because they arenot set up to answer the major short-term or long-term ques-tions.

Other educational experts stated that lacking measurableand sufficiently defined objectives often led to evaluationstudies that addressed unanswerable questions and producedinconclusive results. They added that the language used inlegislation, regulations, policy manuals, plans, and budgetsis generally ambiguous and fails to precisely define programobjectives and make the evaluation useful.

As mentioned in chapter 1, legislation requires that HEWprovide the congressional committees having responsibilityfor education with an annual report evaluating program effec-tiveness in achieving legislative purposes. The report isrequired to set forth goals and specific objectives in

16

qualitative and quantitative terms for all programs andprojects assisted which are evaluated and relate those goalsand objectives to the program's purposes.

Our review of the annual evaluation report on OE pro-grams for fiscal year 1975 showed that most of its statementsof program goals and objectives merely restated the legisla-tive purposes or general goals, and did not set forth specificobjectives. Quantitative objectives, even in the broadestsense, were established for very few programs.

OE's Assistant Commissioner for Planning, Budgeting,and Evaluation agreed with our observations and said that OEhas seldom established specific objectives. He stated thatthis should be corrected, but that OE sometimes faces opposi-tion from the Congress and others when it specifies objec-tives. We agree that the Congress has major responsiblityfor specifying program objectives. In our view, however,the legislative requirement and OE's limited responsivenessto it, as well as the need for providing a clear basis forprogram evaluation, dictate that OE make a better effortto set forth specific qualitative and quantitative programobjectives for congressional consideration.

USE OF EVALUATION RESULTSNEEDS MORE ATTENTION

In 1972, OE's Office of Planning, Budgeting, and Evalua-tion instituted a procedure which entails drafting and im-plementing a "policy implication memorandum" to increasethe use of the evaluation findings with which OE concurs inpolicy and program decisions. However, the procedure hasnot been used to its potential.

The memorandum procedure was developed to translate thefindings of evaluation studies into a list of "action items"for program management. It represents an attempt to makesure that study results receive proper attention from OE anddepartment decisionmakers. For instance, the Commissionerof Education may use the memorandum as a base for policydecisions. He may selectively direct actions to be taken,the office responsible for implementing them, and their duedates.

According to OE's Assistant Commissioner for Planning,Budgeting, and Evaluation, the procedure is one of the mostimportant parts of the whole evaluation process, which encom-passes evaluation planning through implementation of results.He believes it is superior to other implementation methods

17

because the implications of evaluation results which C(E acceptsfor implementation, including related followup requirements,are explicitly set forth in areas such as basic policy, budget-ing, staffing, and program regulations. Also, the AssistantCommissioner believes, decisions are more likely to be made onaction items under such a procedure.

An OE evaluation official said that policy memorandumswere to be written after the completion of each evaluationstudy in which important findings were produced. FromMarch 1972 to March 1974, OE's evaluation unit completed 32studies, costing a total of about $8.2 million. The Secre-tary's evaluation office, using education evaluation fundsof about $3.5 million, completed 18 studies during asimilar period.

Although OE considers the policy implication memoran-dum procedure a key to assuring the use of study results, itwas followed on only two evaluation studies completed duringthis eriod. 1/ We reviewed its use in both instances--thest 'its cst 120,000 and $772,000--to ascertain how itaf. ed policy and program changes.

The first policy memorandum was dated December 26, 1972;the other August 19, 1974. Nine months elapsed between thefirst study's completion date and the preparation of thepolicy memorandum, and 10 months elapsed for the second study.

Several sources doubted the impact of the first studyand memorandum. A program official affected by the study,relating to title I of the Elementary and Secondary EducationAct, said that there was no evidence showing that memorandumrecommendations were being taken seriously or were influentialin causing program change. In addition, OE program officialssaid that they were pursuing some of the recommendations con-tained in the memorandum prior to availability of the resultsof the evaluation study. Also, OE's Assistant Commissionerfor the Office of Legislation stated that neither the policymemoradum nor the study has had much influence on the processof developing new legislation for title I. H. explained thatone reason for their lack of impact was the Administration'semphasis on educational revenue-sharing at the time of thestudy's publication. This emphasis reduced interest in amend-ing existing legislation that might have been substantially

l/In November 1976, OE's Assistant Commissioner for Planning,Budgeting, and Evaluation stated that a third memoran-dum had been written.

18

eliminated and replaced if the educational revenue-sharinglegislation had been enacted.

The evaluation office official who wrote the memorandumsaid in November 1976 that the evaluation office was notfollowing up on all memorandum recommendations despite thefact that some were still open issues. He agreed thatbetter followup of such open issues should be a part of thepolicy implication memorandum system.

An OE official involved in the evaluation study saidthat although very few of its recommendations were actedupon, it compiled evidence to support certain conclusionsfor the first time.

A program official stated that if program officials hadbeen consulted about the subject matter, content, and designof the study, they would have been in a better position touse the study's results. Evaluation office officials dis-agreed with this view and stated that they extensively in-volve program officials in designing evaluation studies.

The second policy memorandum contained only one majorrecommendation which required further action. The recommen-dation was implemented, and substantially changed programemphasis. The OE project monitor for this study said thata policy implication memorandum was written for it becausethe procedure was given high priority at the time.

Regarding other evaluation studies for which no policymemorandums were written, OE project monitors gave theseexplanations:

-- The priority placed on writing the memorandums wasnot high enough.

-- Evaluation studies had overlapping cycles; therefore,before one was completed another could start, detract-ing from full appreciation of the earlier one.

-- Delays in receiving study reports could have affectedwriting the memorandums.

An OE official commented on this situation, stating thatbecause policy memorandums are not being written, meaningfulstudy conclusions fail to reach policy planners and programadministrators who have a voice in the legislative process.He felt that the procedure is needed to call attention tothe significant recommendations in each study.

19

In our opinion- delays of nearly 1 year before the twopolicy memorandums were written and approved and the generallack of such memorandums clearly point to the need for moreOE emphasis on assuring increased use of the evaluation find-ings with which OE concurs. This includes giving a higherpriority to policy implication memorandums or some other pro-cedure for achieving this purpose. OE's Assistant Commissionerfor Planning, Budgeting, and Evaluation agreed that OE has notsuccessfully carried out the policy implication memorandumsystem. Although he believes the memorandums are of centralimportance, other priority matters have preempted staff time.

CONCLUSIONS

Opportunities exist for OE's evaluation studies to betterserve the Congress. These include better timing of the studiesand more frequently briefing congressional committee staff.

There is a need to better define the program objectivesto be evaluated. The use and implementation of evaluationresults also need more attention.

RECOMMENDATIONS TOTHE SECRETARY OF HEW

We recommend that the Secretary of HEW direct OE to:

--More strongly emphasize the purpose of providing in-formation to the Congress when planning, implemen-ting, and reporting on evaluation studies. In particu-lar, more attention should be given to timing thestudies so that they more clearly coincide with thelegislative cycle and briefing congressional committeestaff more frequently.

-- Take steps to comply with the General Education Provi-sions Act (20 U.S.C. 1226c) requirement that, in theannual evaluation report to the House Committee onEducation and Labor and the Senate Committee on HumanResources, HEW set forth goals and specific objectivesin qualitative and quantitative terms for all programswhich are evaluated. OE should indicate in the evalua-tion report that it is setting forth specific objectivestentatively in response to the congressional requirementand as a basis for future discussion and agreement with thecommittees on program evaluation matters. These mattersshould include the acceptable evaluation data neededby congressional decisionmakers and the measures tobe used. If HEW still does not intend to comply with

20

this requirement, it should propose legislativechanges to the Congress to avoid continued agencynoncompliance. In the meantime OE should initiatedialogues with the appropriate House and Senatecommittees to seek understanding and agreement on thespecific program objectives to be used for evaluationsas well as acceptable evaluation data and measures foreach program to be evaluated.

--Improve the implementation of evaluation results bygiving greater attention and priority to proceduressuch as the issuance of policy implication memorandumsdesigned to assure implementation of those results.

AGENCY COMMENTS AND OUR EVALUATION

HEW commented on matters discussed in this report in aletter dated June 15, 1976. (See app. III.)

HEW agreed with our recommendation that OE should morestrongly emphasize meeting congressional information needs bytiming the studies to better coincide with the legislativecycle and briefing congressional staff more frequently. HEWstated that it has initiated a series of reviews of itsstudies, focusing on predicted production dates for findingsand recommendations in relation to critical dates for legis-lative input. HEW also stated that the need for congressionalcommitte staff briefings has not been given proper attentionbut that it has recently decided to institute such briefingson all major evaluation studies and will initiate this pro-cedure in the coming weeks.

Regarding our recommendation that OE should improve theimplementation of its evaluation results, HEW agreed andstated that the policy implications memorandum procedure,which is an invention of OE's evaluation office, has notbeen used nearly as extensively as it should have been. HEWsaid that efforts are currently underway to expand the useof these memorandums and that agency officials are now con-ducting periodic reviews of the schedule for producing thememorandums and emphasizing their high priority.

Our draft report proposed that OE better define theprogram objectives to be evaluated as required by the Gen-eral Education Provisions Act (20 U.S.C. 1226c). Thisincludes translating the legislative purposes of individualprograms evaluated into specific qualitative and quantitativeprogram objectives, and clearly stating these objectives, andthe progress made toward achieving them, in the annual evalua-tion report.

21

HEW disagreed with this proposal because it believesthere are limits on OE's "authority and ability" to increasethe clarity and specificity of its program objectives. HEWcomments, however, ignored the fact that the General Educa-tion Provisions ct requires the agency to set forth suchobjectives for individual programs included in its annualevaluation report to the House Committee on Education andLabor and the Senate Committee on Human Resources.

HEW said that in most cases legislation fails to statea program's objectives with sufficient clarity for evaluation.It appears that the Congress has also recognized that programlegislation does not provide sufficiently clear and specificobjectives for evaluation; therefore, in the General EducationProvisions Act it has required HEW to set these forth in areport to the appropriate congressional committees. HEW com-ments dd not respond to its noncompliance with this require-ment, although OE's Assistant Commissioner for lanning,Budgeting, and Evaluation agreed with our finding on thismatter. (See p. 17.)

HEW stated that it "proceeds at considerable peril intrying to further specify legislation" and that

"* * * in many cases it has been the Congress'specific intention to avoid specificationof program objectives and leave such judgme:ntsand decisions up to State and local officials."

However, in conducting national program evaluations the Officeof Education is often implicitly establishing and using specificprogram objectives; for example, standardized tests, frequentlyused in these evaluations, are based on specific instructionalobjectives. We believe there is an important distinction bet-ween specific program objectives explicitly set forth forFederal evaluations and those which would be established todictate to State and local education agencies specificallyhow to design and run their programs which use Federal funds.If OE cannot explicitly set forth specific program objectivesit would use for Federal evaluation purposes, then we believeit is inconsistent to conduct national program evaluationswhich contain such objectives implicitly.

We agree with HEW that there is political opposition,but we believe such opposition is really directed toward Federalinfringement on State and local education agency perogatives.Such opposition effectively restrains the Federal agency from

22

trying to dictate State and local program goals, specific ob-jectives, approaches, etcetera. In our view, HEW should complywith the requirement, but in doing so it should clearly indicatein the evaluation report that the specific objectives aretentative, are being set forth in response to the legislativerequirement, and are intended only for congressional scrutinyand as a basis for mutual discussion and agreement on programevaluation matters, i-cluding the acceptable evaluation dataneeded by congressions_ decisionmakers.

In its general comments on our report HEW stated its be-lief that there is growing professional opinion that OE'sstudies have, over the past 10 years, been responsible formany major changes in existing lgislation. Also, in HEW'sview the assumption that there are certain decisionmakers,and that effective evaluations provides timely data to them,is increasingly being questioned. Instead HEW believes thateffective evaluations affect the broad political climatewithin which particular decisions are made.

Obviously, HEW believes that OE studies are affectinglegislative decisions. However, as discussed in our report,three of the four key congressional committee staff membersinterviewed, who are responsible for education matters, saidthat the studies often have been completely ineffective orhave had little impact on legislation. Therefore, we continueto believe that the primary purpose of these evaluations shouldbe to provide useful information to decisionmakers.

RECOMMENDATION TO THE CONGRESS

The Congress should recognize that HEW is not in com-pliance and does not intend to comply with the General Educa-tion Provisions Act requirement (20 U.S.C. 1226c) as notedabove. HEW feels that its authority and ability to complywith this legislative requirement are limited. The Chairmenof the House Committee on Education and Labor and the SenateCommittee on Human Resources should discuss this matter furtherwith agency officials to seek a common understanding withthem on the process or approach to be used for (1) clarifyingprogram objectives for evaluation and (2) reaching agreementon acceptable evaluation measures and data for each programto be evaluated.

23

CHAPTER 4

PROBLEMS WITH STATE AND LOCAL EVALUATION REPORTS

A large amount is spent to evaluate State educationagency title I and III, and local education agency title I,III, and VII elementary and secondary education programs.Agency officials at these levels, responding to our question-naires, indicated a need to improve program evaluation re-ports, including these important areas for determining pro-gram effectiveness: the credibility of findings and thequalification and quantification of measurement data.

Other areas needing attention and improvement to makethe State and local evaluation reports more useful include

-- the relevance of the reports to policy issues,

-- the completeness and comparability of the datareported, and


The significance of these problems and other factorsraise a question about whether the present reporting systemsbased on aggregated local data can be improved so that theymeet program information needs at Federal, local, and/orState levels.

STATE AND LOCAL EVALUATIONEXPENDITURES

In addition to the funds authorized for program evalua-tion at the Federal level (see ch. 1), the Elementary andSecondary Education Act requires annual State and local edu-cation agency evaluations of title I and III programs, andlocal evaluations for title VII. The following tables show--on the basis of State and local agency responses to ourquestionnaire--estimates of the evaluation funds expended byState and local education agencies for the programs duringfiscal year 1974.

24

Estimated State-level Expendituresfor Evaluatin Selected

Elementary and Secondary Edu'aton ProgramsFiscal Year 1974 (note a)

Title I Title III

Total State-level program expendi-tures reported for evaluation $ 2,066,020 $1,723,805Average State program grant 24,520,132 2,127,455

Average expenditures on evaluationper Sate 43,958 39,177Average: percent of grant spent forevaluation (note b) 1.2% 5.4%

a/All amounts are based on unverified questionnaire responsesfrom 47 States for title I and 44 States for title III.

b/The percentages shown are based on the averages of the per-cent of grant funds reportedly spent for evaluation by theStates. However--overall, two-tenths of one percent oftitle I and 1.8 percent of title III funds were reportedlyspent for evaluation. The differences between these per-centages and those shown are because larger percentages ofthe smaller grants were generally used for evaluation.

25

Estimated Local-level Expendituresfor Evaluating Selected Eementaryand Secondary Education Programs

Fiscal Year 1974 (note a)

Title I Title III Title VII

Local project expendituresreported for evaluation $31,790,960 $5,089,344 $1,574,320

Average project grant perlocal district grantee 161,417 50,240 164,705

Average evaluation expendi-tures 3,860 2,101 4,537

Average percent of grantspent for evaluation(note b) 6.4% 5.0% 3.1%

a/All amounts are based on unverified questionnaire responsesfrom local school district respondents representing 8,236title -, 2,422 title III, and 347 title VII projects.

b/The percentages shown are based on the averages of the per-cent of grant funds reportedly spent for evaluation by thelocal projects. However--overall, 2.4, 4.2, and 2.8 per-cent of title I, III, and VII funds, respectively, werereportedly spent for evaluation. The differences betweenthese percentages and those shown above are because largerpercentages of the smaller grants were generally used forevaluation.

OE officials stated that OE does not collect State andlocal education agency data on evaluation expenditures.

STATE AND LOCAL EVALUATIONREPORTS NEED IMPROVEMENT

Our questionnaires asked State and local officials con-nected with administering title I, III, and VII programsunder the Elementary and Secondary Education Act to rate theadequacy of these aspects of the State and local evaluationreports: credibility of findings, presentation of requiredmanagement information needs, qualification of findings,qualification and quantification of measurement data, andfocus and scope. In our opinion, the adequacy of theseaspects is likely to greatly influence how much the reportssatisfy the policy, management, and program information needsof State and local officials.

26

In most cases, more respondents rated the various aspectsof the evaluation reports in the "adequate or better" catego-ries than in the "less than adequate" categories. However,in each aspect, many respondents to the questionnaires indi-cated that State and local evaluation reports were inadequate;most of these ratings were in the "marginal" category. Inaddition substantial numbers of State and local officialsrated State and local evaluation reports to be less than ade-quate in these important areas for determining program effec-tiveness: the credibility of findings and the qualificationand quantification of measurement data. In our opinion, suchlarge nu-oers of less-than-adequate ratings indicate a seriousneed for improvement in the reports.

The following table summarizes the respondents' less-than-adequate ratings of State evaluation reports.

State Evaluation-Reports:Summary of Respondents' Less--Than-

Adequate Ratings (note a)

State Localprogram project

officials officialsof title: of title:

(percent)-

I III I III VII

Credibility of findings 59 40 45 36 61Presentation of requiredmanagement informationneeds 59 40 48 38 59

Qualification of findings 47 47 50 39 69Qualification and quantifica-

tion of measurement data 57 55 49 42 51Focus and scope 42 43 38 33 60

a/Percentages for State program officials are in all c sesbased on questionnaire responses from 48 or 49 State. fortitle I and between 45 and 47 States for title III. Dpr--centages for local project officials are based on sampleresponses and in all cases represenit more than 8,300title I projects, more than 2,350 title III projects, andabout 300 title VII projects. See app. I, questions 9-13and 20-24, and app. II, questions 9-13, 21-25, and 33-37.

The following table summarizes the questionnaire respond-ents' less-than-adequate ratings of local evaluation reports:

27

Local Evaluation Reports:Summary f Respondents' Less-Than-

Adequate Ratings (note a)

State Localprogram projectofficials officialsof title: of title:

(percent)- -

I III I III VII

Credibility of findings 67 60 38 31 47Presentation of requiredmanagement information needs 71 56 39 31 42

Qualification of findings 69 60 41 34 39Qualification and quantifica-

tion of measurement data 57 58 44 40 41Focus and scope 50 40 30 25 43

a/Percentages for State program officials are in all casesbased on questionnaire responses from 48 or 49 States fortitle I and 48 States for title III. Percentages for localproject officials are based on questionnaire sample re-sponses and in all cases represent more than 8,500 title Iprojects, more than 2,400 title III projects, and about350 title VII projects. See app. I, questions 9-13 and20-24, and app. II, questions 9-13, 21-25, and 33-37.

Credibility of findings

The questionnaire defined this aspect as the degree ofconfidence expressed in the findings through statements aboutstatistical certainty, soundness of method, evidence ofreplication, consensual agreements, similar experiences, sup-porting expert judgment and opinions, and reasonableness ofassumptions.

As the table on page 27 shows, between 36 and 61 per-cent of State and local respondents from the various pro-grams rated the credibility of findings in their program'sState evaluation reports to be less than adequate.

As the table above shows, the percentage of Stateand local respondents from the various programs that ratedlocal evaluation reports to be less than adequate in thisaspect ranged from 31 to 67 percent.

28

Presentation of requiredmanagement information needs

The questionnaire defined this category as the extent towhich the reports are informative to those who evaluate andupdate current policies and transfer policy decisions intoplans, budgets, curriculum or program implementation, opera-tional oversights, resource allocations, forecasts, statusassessments and reports, educational accountability, costs,benefits, and efficiency assessments.

The percentage of State and local respondents findinathis aspect of their program's State evaluation eports to beless than adequate ranged from 38 to 59 percent. Similarly,for local evaluation reports, the range was from 31 to 71 per-cent.

Qualification of findings

The questionnaire defined this ,.pect as the extent towhich the reports properly qualify the findings and assump-tions and identify those conditions and situations to whiLhthe findings are not applicable.

The percentage of State nd local respondents retingthis aspect of their program's State evaluation reports tobe less than adequate ranged from 39 to 69 percent; forlocal evaluation reports, the percentage ranged from 34 to69 percent.

Qualification and quantificationof measurement ata

The questionnaire defined this category as the extent towhich the evaluation assessments can be qualified and quanti-fied into measurable attributes and parameters that addressthe problem in measurable, operational, or concrete terms.

.:e percentage of State and local respondents ratingthis aspect of their program's State evaluation reports tobe less than adequate ranged from 42 to 57 percent; forlocal evaluation reports, the range was from 40 to 58 percent.

Focus and scope

The questionnaire defined focus and scope as the adequacywith which the reports covered essential and related materialand the appropriateness of the emphasis and treatment givento the relevant topics and details and high and lower prior-ity information.

29

The percentages of State and local respondents ratingthis aspect of their program's State evaluation reports tobe less than adequate ranged from 33 to 60 percent; for localevaluation reports, the range was from 25 to 50 percent.

Other questionnaire results

Only 61 percent of State title I officials and 62 percentof State title III officials said that local evaluation re-ports "generally" or "very often" adequately show evidence ofqualifiable or measurable student benefits; and only 64 per-cent of State title I officials and 50 percent of Statetitle III officials said that State evaluation reports aregenerally or very often adequate in this respect.

Although over 75 percent of the State respondents saidthey use local and State evaluation results for policy, pro-grammatic, or management decisions, the only data containedin State and local reports which was frequently found ade-quate by State officials was

-- the number of children in the program and

-- the per-pupil expenditures.

Most local school district respondents stated that theirreports are generally or more often than generally adequatein providing essential information on the

-- number of children in the program,

-- per-pupil expenditures for each program,

--evidence of quantifiable and measurable achievement,and

-- evidence of quantifiable or measurable pupil benefits.

OTHER PROBLEMS

To be useful in making decisions at the Federal level,State and local evaluation reports should be timely, com-plete, comparable, and relevant to policy issues. Among theElementary and Secondary Education Act State and local evalu-ation reports submitted to OE that we reviewed, most did notmeet any of these criteria.

30

Three factors need to be considered:

-- The significance of these and other problems discussedin this chapter.

--Constraints on the Federal role in education as dis-cussed in chapter 1. (See pp. 8 to 10.)

-- Questions about whether ongoing efforts to resolvethese problems with evaluation models for title Iwill be effective.

If the questions about the models are not resolved, thefeasibility of producing improvements that will enable thepresent reporting systems based on aggregated local data toprovide the information needed is questionable.

Usefulness and-relevanceto policy issues

Evaluation reports have the best chance of affectingpolicy decisions if they are designed to directly addresspolicy issues. The reports should, for example, indicateprograms' or projects' successes and failures.

In relation to this, OE officials generally said thatState and local evaluation reports were rarely used to sup-port operational or policy changes. State compensatoryeducation program officials made similar statements, sayingthat State and local evaluations are of limited usefulnessto those making decisions. In addition, O's AssistantCommissioner for Planning, Budgeting, and Evaluationstated that there is no question that the State and localevaluation reports are not useful.

Similarly, two OE-contracted research studies questionthe usefulness of State and local evaluation reports:

--A study analyzed the policy-relevance rating oftitle I State evaluation reports for the 5 fiscalyears 1969-73. Study results revealed serious prob-lems concerning the validity of data reported by mostStates, precluding any meaningful interpretation ofdata. The study noted that the majority of the re-ports examined were seriously deficient in reportingpolicy-relevant information.

31

--A 1974 study of a nationwide sample of title VIIevaluations concluded that no strong relationshipcould be established between the content or qualityof evaluation reports and funding levels awarded toprojects. It also concluded that none of the eval-uations presented data which would i dicate projectfailure, noting that such information is essentialif a report is to be useful to decisionmakers. Thestudy also noted that over half the evaluations itreviewed were of limited or no use for making judg-ments about project effectiveness.

Our review of selected evaluation reports and discus-sions with OE program officials generally confirmed that theState and local education agency reports, because of problemscited in this chapter, have little effect on Federal-leveldecisions.

Complete and comparabledata in reports

Evaluation of Federal programs at the local level shouldproduce reports containing valid, complete, and comparableresults if data from each report is to be aggregated to pro-vide Federal policymakers and program administrators with agood perspective on how well the program as a whole hasworked and which approaches have produced the best results.

Response to congressionalrequirement

The Congress has recognized the need for OE to makeState and local education agency evaluation reports moreusable. The Education Amendments of 1974 (Public Law 93-380)directed the Commissioner of Education to carry out certainevaluation activities for programs authorized by title I ofthe Elementary and Secondary Education Act. Section 151 ofthe act directs the Commissioner to

-- provide for independent evaluations which describeand measure impact of programs and projects,

--develop and publish standards for evaluation ofprogram or project effectiveness,

--provide models for evaluation to State educationagencies,

32

-- provide such technical and other assistance as maybe necessary,

--specify objective criteria which shall be utilizedin the evaluation of all programs, and

-- outline techniques and methodology for producing datawhich is comparable on a statewide and nationwidebasis.

OE has begun activities to address each of these re-quirements, and evaluation models are being developed andrefined. OE plans to require State and local use of themodels (or use of alternatives that the Commissioner certi-fies will generate compatible evaluation data). However,OE Planning and Evaluation Office officials have stated thatthey do not believe that evaluation needs at the local, State,and Federal levels can all be met by the same approach. Inaddition, these officials, including OE's Assistant Commis-sioner for Planning, Budgeting, and Evaluation told us thatalthough the models will make aggregation of locally collecteddata theoretically possible, accomplishing successful imple-mentation for 14,000 school districts is doubtful, or at leastquestionable. They noted that the problems that need to beovercome are methodological, fiscal, and political. Forexample, some State and local officials do not want compari-sons of educational results.

kn OE evaluation official said further that the modelsare based on assumptions about such things as the commonmetric (scale) used, the soundness of the tests employed, andwhether those tests are similar enough to provide data thatis truly comparable; these assumptions represent compromises,and whether they will satisfy everybody is unclear. He statedthat because of its technical nature the information based onthe models may be provided to those who are responsible fordecisions without an explanation of the assumptions involved.We believe that such information, when provided to the Con-gress and other users, should set forth these assumptionsas clearly as possible.

OE's evaluation office is offering these types oftechnical assistance on title I: written handbooks onevaluation topics, such as the models; technical assistanceworkshops to train State administrators and evaluators inusing the models and to prepare them to train local schooldistrict personnel; and consulting services to States to helpthem use the models. Technical assistance centers have beenestablished throughout the country under OE contracts to

33

provide these services. According to OE, such services mightinclude writing computer programs, conducting workshops totrain local personnel, and helping with data analysis. Toalleviate many of the States' staffing and technical problemsin using the models, OE encourages States to call upon thetechnical assistance personnel to solve problems at the Statelevel, and, depending on the desires of each State, at thelocal school district level, too.

Need to improve data

As stated earlier, OE program officials and State educa-tion agency officials feel that State evaluation reports aregenerally not useful for making management decisions; bothexpressed the need for uniform evaluation methods whichwould lead to comparable data reporting.

When local title I officials were asked whether or notlocal districts could compare the data from their localreports with data on the same program contained in State andFederal reports, 49 percent said they could do this onlyoccasionally or seldom with State reports, and 64 percentreplied similarly regarding Federal reports. Correspondingresults were obtained for titles III and VII.

The results of a 1972 OE-contracted study done by aneducational research firm to identify successful State pro-grams and local projects in compensatory education illus-trates the lack of reliable, comparable data. According toOE's Assistant Commissioner for Planning, Budgeting, andEvaluation, the study was able to identify only "a dis-couragingly low number of successful projects," because manyprojects did not have an evaluation design good enough toproduce reliable data on cognii iJe results and many projectswere poorly designed, poorly managed, or badly implemented.Furthermore, the study concluded that the lack of representa-tive data from each State that could be combined in a meaning-ful way made it extremely difficult to address the effective-ness of the Elementary and Secondary Education Act title Iprogram.

In an attempt to provide a means for Federal, State, andlocal education agencies to develop comparable evaluativedata, OE, through the National Center for Educational Statis-tics (which was part of OE at that time), funded the "AnchorTest Study." The study was intended to develop tables andprocedures for equating test scores among the eight mostwidely used standardized reading tests for fourth, fifth,and sixth grade children. OE developed the basic plan and

34

the detailed specifications for the study, which was intendedto become an integral part of Federal elementary and second-ary education program evaluation. OE expected the study tobe useful at all levels of educational administration; thatis, evaluative results would become useful to State andFederal Governments because program evaluation test datacould be combined in a meaningful way. In addition, localschool systems could have flexibility in selecting achieve-ment tests to be used in their evaluations.

The results of the Anchor Test Study became availablein September 1974. According tc our questionnaire, however,as of April-June 1975 all of the 282 projected local educa-tion agency respondents indicated they had little or noinformation about the study. Although the great majority ofState respondents knew about the study, only 18 percent hadused it.

Test publishers and educational evaluators informed usthat the study was technically excellent and potentiallyuseful, but OE had not adequately planned for its use anddissemination. They also noted that the study's usefulnessis diminishing as new versions of those tests included inthe study are developed and published. State and localeducation officials interviewed said that the study has hadlittle impact on their evaluation efforts primarily becauseOE did not pursue its implementation; little effort has beenmade to direct, encourage, or promote the use of the studyso that evaluation results could be made more comparable.

Timeliness

OE often does not receive annual State evaluation reportsin a timely manner. For instance, 1 month before the end offiscal year 1975, less than half of the fiscal year 1974State elementary and secondary education, title I, programevaluation reports had been received. Notwithstanding OE'sattempts to obtain delinquent reports, several States are upto 2 years behind in submitting theirs.

Delinquent reporting helps to prevent meaningful aggre-gation of State evaluation reports into a national pictureof program effectiveness that can affect Federal decisionson funding and program operation. According to an OE programofficial, some of the more significant reasons they have beengiven by States for late filing include delays due to

-- inclusion of summer program results,

-- uncooperative local education agencies,

35

-- differences between OE, State, and/or local agencypersonnel over what information the evaluationreports should contain,

-- low State priority for the programs or their evalua-tion, and

-- schedule slippages in printing, computer, and staffprocessing of the reports.

CONCLUSIONS

The amount of funds spent to evaluate State educationagency titles I and III, and local education agency titles I,III, and VII elementary and secondary education programs issubstantial. If the reporting systems based on aggregatedlocal agency data are to be effective, there is a need toimprove the adequacy of various aspects of the evaluationreports, including two areas important in determining programeffectiveness:

-- The credibility of findings.

-- The qualification and quantification of measurementdata.

The usefulness of the State and local evaluation reportsalso needs improvements in

--report relevance to policy issues,

--data completeness and comparability, and


Valid, complete, and comparable evaluation data isimportant for results to meaningfully contribute to decisionsat all levels. If local and State evaluation data continuesto be aggregated for use at higher levels, standardizationof data collection efforts and techniques is needed to pro-vide comparable results.

Because of the constraints on the Federal role in educa-tion, as discussed in chapter 1 (see pp. 8 to 10), in ouropinion it is unlikely that the Federal Government would seekto provide the valid and comparable data needed by dictatinguniform evaluation methods and procedures to State and localeducation agency grantees. In addition, questions exist

36

about whether the title I evaluation models will be able toprovide valid and acceptable data to meet program informationneeds. If these issues are not resolved, we question whetherimprovements can be made that will enable the reportingsystems based on aggregated local data to meet programinformation needs at Federal, local, and/or State levels.

RLCOMMENDATION TO THE SECRETARY OF HEW

We recommend that the Secretary direct OE to assesswhether State and/or local evaluation reports for titles Iand VII of the Elementary and Secondary Education Act can beimproved so that they supply officials at Federal, local,and/or State levels with the reliable program informationthey need for decisionmaking. This includes assessing theadequacy of the title I evaluation models and related data.

-- If OE determines that it is unrealistic to expendresources on improving these programs' State and localevaluation reporting systems based on aggregated data,then it should take the needed steps to adopt morefeasible and effective approaches. This should in-clude eliminating unwarranted reporting requirementsand if necessary proposing to the Congress any legis-lative changes needed to accomplish this.

-- If, however, these reporting systems are continued,OE should more strongly emphasize improving the ade-quacy of State and local evaluation reports fortitle I and local evaluation reports for title VII.This should be done by giving greater managementattention and priority to making reports more com-plete and comparable, relevant to policy issues,timely, credible, and adequate in the qualificationand quantification of their measurement data.


HEW stated that it concurs with the general thrust ofour draft report recommendations for chapters 4 and 7, how-ever, it also stated that most of the actions we proposed arealready underway, many by legislative mandate, and in somecases they are near completion.

Our report extensively discusses these actionsunderway. (See pp. 32 to 34.) HEW's statement implies thatthe actions underway are likely to be successful. Statementsby OE evaluation office offic ials directly contradicted thisimplication and questioned t practical feasibility of

37

aggregating local level data. (See p. 33.) We concluded inchapter 4 that there are issues which raise questions aboutwhether the evaluation models will be able to provide validand acceptable data to meet program information needs.Therefore, we believe that OE needs to assess the effect ofthe problems connected with its "actions underway" in rela-tion to the feasibility and cost-effectiveness of improvingthe reporting systems to provide the reliable informationneeded for decisionmaking at Federal, State, and local levels.

HEW also said that our understanding was incompleteregarding the much needed and legislatively mandated actionsto improve State and local evaluations and reporting. Inclarifying the meaning of this statement in relation totitle I, HEW stated that we should have provided more infor-mation relating to the workshops, technical assistance cen-ters, and information dissemination activities relating tothe evaluation models mandated by the Education Amendmentsof 1974. We disagree. Our report discusses each ofthese issues. (See pp. 32 to 34.) Appendix III providesHEW's additional information on these matters.

Specifically regarding title I, HEW stated that once themodels and revised reporting system are in place across theNation, their use should produce data which can be aggregatedacross school districts and States. At that time, HEW be-lieves that OE will be in a better position to assess whetheror not the data are sufficiently free of systematic errorsto support satisfactory aggregations to the State and nationallevels. If they are not, HEW stated it can then determinewhether technical problems could be overcome or whether dif-ferent kinds of studies are needed to satisfy Federal, State,and local reporting requirements.

We believe that to foster maximum efficiency and economyOE should make the needed assessment as soon as it has a goodenough understanding, if it does not already, of such factorsas the likely validity, reliability, and comparability ofthe reporting system data as well as the other methodological,fiscal, and political problems involved.

HEW also stated that OE has interviewed personnel inFederal policymaking roles and received information from anadvisory title I group (which included parent representa-tives) on the kinds of information that should be includedin the annual State and local evaluation reports. In addi-tion, evaluation models and their reporting forms weredeveloped and reviewed by each State agency and three of itslocal agencies to identify possible problems. As a result of

38

all these efforts, a limited core of essential informationwas identified as "desirable at the Federal level" and thiswill become the Federal evaluation requirement when regula-tions for this portion of the legislation are published.

We commend efforts to identify the Federal informationrequirements. However, we remain concerned about the iden-tification of State and local agency information require-ments and whether these needs can all be met by the samereporting system. We are also concerned about the possibleunnecessary duplication of using both this reporting systemand OE national evaluations on these programs to meet theseFederal information requirements.

With respect to title VII, HEW stated that local evalua-tion reports can be improved and cited the following stepsintended o accomplish this:

-- HEW has recently published regulations strengtheningrequirements for bilingual project evaluation.

-- The National Institute of Education and OE have ajoint project underway to upgrade the technicalexpertise of local evaluators.

Although HEW indicated that aggregating local title VIIdata is more difficult than for title I, aside from concurringwith the general thrust of our recommendation, it did not re-spond to our proposal for OE to assess whether it is realisticto try to improve local title VII reports so that they supplyFederal-level decisionmakers with the reliable informationthey need.

General comments

In addition, HEW provided several general comments.These observations and our responses follow.

HEW comment

HEW stated that the report needs to give more carefulconsideration to evaluation costs and that the quality ofdata the report "appears to expect" would require significantadditional resources which would be high in relation to thepossible payoffs through program improvements.

39

Our response

These comments ignore the cost-effectiveness considera-tions of our recommendations. In addition, HEW has mis-interpreted what our report expects. Although HEW commentsdid not recognize or respond to this proposal, our draftreport proposed that OE assess whether it is realisticto expend additional resources on improving the State andlocal reporting systems for titles I and VII. If OE deter-mines that it is not realistic, then we proposed that itinitiate action to eliminate unwarranted reporting require-ments, including proposing to the Congress any needed legis-lative changes. We would certainly expect cost-effectivenessconsiderations to be a part of OE's assessment.

Our draft report also proposed (see p. 76) that inconnection with this assessment, OE determine (1) whetherit is realistic to try to serve Federal, State, and locallevels with information based on local agency evaluationsand (2) how the information needs at Federal, State, andlocal levels can best be met. HEW comments also did notrecognize or respond to this proposal.

These proposals certainly do not require "significantadditional resources" on reporting systems. In fact, theyquestion the value of present and proposed expenditures andsuggest that OE face this issue. We recommend this assess-ment because it is not clear and has not been demonstratedthat the reporting systems based on aggregated local leveldata are now providing or can be made to provide valid,useful, and cost-effective data to Federal, local, and/orState decisionmakers. In our view, significant additionalresources should not be expended until there is some solidevidence that they would be cost-effective.

HEW comment

HEW stated that the report does not give adequaterecognition to whether the tradeoffs in improved programquality are likely to justify additional spending.

Our response

Once again, HEW has misinterpreted our draft reportproposals. As discussed above, we proposed that OEmake the needed assessment to determine the realism of tryingto improve these reporting systems. We believe that such anassessment, if properly conducted, would necessarily includeconsideration of the tradeoffs involved. In our view it isthe agency's responsibility to make such assessments. This

40

is especially true in this situation where, as discussed inthis chapter, not only have OE-funded studies shown signifi-cant problems but OE evaluation office officials have ex-pressed serious questions about the feasibility of the ap-proach currently being followed to solve these problems.

HEW comment

HEW took exception to our statement that the "amount offunds spent to evaluate State and/or local education agencytitles I, III, and VII elementary and secondary educationprograms is substantial," saying that the percentages offunds involved at the State level are not substantial.

Our response

The amount that we intended to refer to is the totalreportedly spent not only on State evaluations for titles Iand III, but also on local evaluations for titles I, III,and VII. This totals in excess of $42 million--this is asubstantial amount.

41

CHAPTER 5

USING STANDARDIZED TESTS

State and local education agencies generally use stand-ardized "norm-referenced" tests to measure the effectivenessof Federal elementary and secondary education programs. OE'snational evaluations also frequently use these tests.

Most State and local respondents to our questionnairesbelieve there is a substantial or very great need for in-creased efforts to develop alternatives to standardi:ednorm-referenced tests, such as "criterion-referenced tests.Many of these officials also believe that increased effortsare needed to reduce racial, sexual, and cultural biases intests. The questionnaire results indicated a lack of aware-ness among State and especially local agency officials oninformation available to help them select appropriate stand-ardized tests.

USES AND IMPLICATIONS OF STANDARDIZEDNORM-REFERENCED TESTS

Standardized norm-referenced tests were devised to mea-sure the status of an individual in relation to otherindividuals--the norm group. The score an individual receiveshas meaning in relation to the performance of the norm group,not the educational objectives involved; therefore, suchtests are described as norm-referenced.

A standardized norm-referenced test differs from othertests given by schools in that it (1) is almost always con-structed by specialists in educational testing, (2) hasexplicit instructions for standard or uniform administration,and (3) has norms for interpreting te't results. Thesenorms have been derived from giving the test to a sample ofpersons intended to represent the whole group for whom thetest is designed.

There are many kinds of standardized tests given inschools, business, and the military services: intelligence,academic aptitude, achievement, personality, attitude,interest inventory, and vocational aptitude tests. In dis-cussing tests, this report deals almost exclusively withachievement tests--those which measure current knowledge orcompetencies. Most of the standa dized tests that cildrentake in school are of this kind. Today the typical school-child takes one to three such tests every year.

42

Five or six companies account for about three-fourthsof the total test sales in the country. These companieshave all been in the testing business for a long time; mosthelped pioneer the testing field in the 1920's. The companiesall sell a wide variety as well as a large volume of tests(most list more than 100 teits in their catalogues), andprovide extensive services to test customers. These factorscontribute to the widespread use and acceptance of the tests.

Standardized norm-referenced achievement tests are usedto evaluate both individuals and programs. For studentevaluations, the tests are used to rank or compare studentsfor such purposes as counseling them, assigning them to aclass within a grade or a group within a class, assigningstudents to a special program (for example, for the retardedor gifted), indicating the kind of courses a student maytake in junior high school or high school, or gaining admit-tance to college or graduate school. In relation to programevaluations, standardized tests are used widely at the Fed-eral, State, and local levels, to determine the effective-ness of OE programs such as titles I, III, and VII of theElementary and Secondary Education Act.

WIDESPREAD STATE AND LOCAL USE OFSTANDARDIZED ORM-REFERENCED TESTSTO EVALUATE FEDERAL PROGRAMS

Most OE-funded evaluations of Federal education programsat the national, State, and local levels have at least onecommon purpose: they are to measure and report on the effec-tiveness of federally funded programs and projects. ManyState and local evaluations are congressionally mandated,and national evaluations, according to OE evaluation officials,are conducted either because of a congressional request orto provide responsible agency officials and congressionalmembers with nationally comparative data about a particularprogram or approach to education. These national evaluationsfrequently se standardized tests.

The General Education Provisions Act (20 U.S.C. 1231a)requires OE to collect information intended to objectivelymeasure the effectiveness cF education programs and permitslocal education agencies to use systematic measurementapproaches, approved by the Commissioner of Education, thatwill assure adequate evaluation of each program.

Our questionnaire asked State education agencies whichof several techniques they employed for their 1973-74

43

evaluations of projects funded through titles I and III of theElementary and Secondary Education Act. The following tableshos the results for respondents in the 48 States that answeredthis question:

S.-te Education Agency Techniquesfor Evaluating Titles I and III

Title I Title IIIPercent Percent

Techniques Number (note a) Number (note a)

Aggregation andanalysis of datafrom local educa -

tion agency reports 45 94 41 85

Educational auditsand their results 9 19 27 56

Statewide testing ofstudents 1. 23 5 10

Other 5 10 12 25

a/Does not total 100 percent because some States used morethan one technique.

The distribution of responses shows that some Statesused a combination of techniques, and most States aggregatedlocal education agency data. Only a small number of Statesemployed statewide testing to evaluate their programs. How-ever, the great majority of those that did test statewide intheir program evaluations used standardized norm-referencedtests.

Because most States perform their evaluations by aggre-gating and analyzing data from local education agency rports,it is very important that local education agencies use testmeasures which reflect meaningful and comparative data. Ourquestionnaire asked local education agencies to indicatewhich types of tests they used in their 1973-74 evaluation ofFederal programs funded through the Elementary and SecondaryEducation Act, titles I, III, and VII. The following tableshows their responses.

44

Types of Tests Used byLocal Educaion A7enies ( note a)

Title VIITitle I Title III (note b)

Type of Test Number Percent Number Percent Number PercenT

Standardizednorm-referencedtests 8,103 94.4 1,380 79.1 357 90.7

Criterion-referencedtests(note c) 2,110 24.6 424 24.3 148 37.5

Other tests 1,168 13.6 515 29.5 91 23.0

a/Local education agency figures are projected on the basis ofa statistical sample of local agencies. (See app. II.) Thepercent columns do not total 100 percent because some localagencies used more than one type of test.

b/Title VII provides for direct OE grants to local agencieswithout State level administration.

c/Tests which are designed and scored in relation to specificlearning objectives or behaviors and include an explicitstatement of performance standards.

The local education agency responses show predominant useof standardized norm-referenced tests for program evaluation,but also relatively large use of criterion-referenced tests

Frequent use of criterion-referenced tests was alsoreflected in St;te ducation agencies' responses to a questionon the use of statewide assessment programs to achieve duca-tional accountability--an effort separate and distinct fromtesting used to evaluate Federal programs. Of 47 Statesresponding to this question, 36 said that criterion-referencedtests have been or will be used extensively in that context.

EFFORTS TO EVALUATESTANDARDIZED TESTS

Since its establishment, NIE's major research effort forimproving educational measurement has been an ongoing evalua-tion of commonly used standardized tests. In 1972 NIE

45

assumed from OE the sponsorship of the Center for the Studyof Evaluation at the University of California at Los Angeles.The Center extensively assesses published standardized testsand as a primary objective issues reliable guides for use inselecting tests.

As of September 1975, the Center had published guideswhich evaluate preschool and kindergarten, elementary, andsecondary school achievement tests. Each guide contains acompendium of tests, keyed to educational objectives andevaluated by measurement experts and educators for such char-acteristics as meaningfulness, examinee appropriateness,administrative useability, and quality of standardization.

A Center official said that he believes that commerciallyavailable standardized norm-referenced achievement tests aregenerally inappropriate for measuring program effectiveness.Various Center reports also state this opinion for reasonsdiscussed in chapter 6, such as the tests' low degree ofcorrespondence with actual instructional objectives, theirfailure to indicate the extent that the full range of in-structional objectives has been mastered, problems in testadministration, and the test's lack of information on specificskill and knowledge development. The Center official notedthat the tests are especially inappropriate for broad-basedintervention programs, such as title I of the Elementary andSecondary Education Act.

Since the major purpose of the guides is to provideState and local educators and other test consumers with in-formation that will assist them in selecting the most appro-priate measurement devices for evaluations, our questionnaireincluded a question relating to the Center's work. We askedState and local education agencies to indicate the degree offamiliarity they had with the Center's research on evaluatingthe utility of many popular commercially available standard-ized norm-referenced tests. Their responses follow:

46

Local

State education

education agenciesagencies Number

Responses Number Percent (note a) Percent

Have little or no

information 6 12 5,059 85

Aware of the Center'swork in test evalua-

tio'i 14 28 601 10

Read some of theCenter's publications 17 34 192 3

Used the Center'smaterial in the selec-tion of commerciallyavailable standardizedtests 13 26 135 2

Total 50 100 5,987 100

a/Local education agency figures are projected on the basis

of a statistical sample of local agencies. (See app. II.)

There was general agreement among the educators and

test publishers interviewed that the Center's work was a

definite improvement over other existing evaluations of stand-

ardized tests. Some test publishers, however, criticized the

Center's work because it was based on incomplete data,

excluding, for example, the highly technical specifications

from which test questions are developed. They also criticized

it for establishing educational goals on which test ratings

were based without consulting publishers, and for relying

on the work of graduate students.

In response, Center officials indicated that they had

sent letters to test publishers advising them of the study

and its purpose, stating what information was desired, and

asking for any other material the test publishers wished to

send. They acknowledged that their curricular goal system

is not perfect; however, they believe it is a tremendous

step forward, providing a clear statement of expected student

behaviors which test users can employ to match tests to cur-

riculums. Officials also stated that the system is justified

because test makers often measure skills that are different

from what they claim to measure. Center officials also noted

47

that all participating graduate students demonstrate adequatecompetence in research and measurement before being hired;they followed set procedures, were routinely monitored, anddiscussed questionable points with supervisors.

State education agency officials and measurement expertsinterviewed stated that the Center's work should ,)e continuallyupdated and should include an assessment of available criterion-referenced tests.

Commenting on our report, HEW stated that NIE issponsoring an effort by the Center to prepare a new testevaluation book reviewing all commercially published criterion-referenced achievement tests for grades kindergarten through12. The target date for publishing this book was June 1977;however, as of August 1, 1977, it had not been published.

CRITERION-REFERENCED TESTS

Our questionnaire asked State and local education agen-cies to i'icate areas, if any, in which there is a need toincrease t.Le educational community's efforts devoted tomeasurement and assessment techniques. Over 75 percent ofthe State education agency respondents and about 51 percentof school district respondents indicated "development ofalternatives to the classic standardized norm-referencedtests (e.g., criterion-referenced tests)" as an area needingsubstantial to very great increases. More than half of theState respondents and a third of the local respondents alsobelieve there is a great need to reduce racial, sexual, andcultural biases in tests. The following table shows thepercent and number of State and local agency respondentsindicating a substantial or very great need to increasethe educational community's efforts devoted to measurementand assessment techniques.

48

State agency Local agencyresponses responses

Percent Percent Numer(note a) Number (note a) (note b)

Development of methods fortest design and construc-tion 52 25 34 2,507

Reduction of cultural,racial, and sexualbiases in tests 51 25 35 2,581

Development of alterna-tives to the classicstandardized norm-referenced tests (e.g.,criterion-referencedtests) 77 37 51 4,061

Development of more andimproved standardizednorm-referenced tests e 9 27 2,177

Development and utiliza-tion of methods to betterevaluate standardizednorm-referencedtests in use 40 20 43 3,409

a/Does not add up to 100 percent because more than one item

could be checked. The Arcentages reflect only the number

of State and local acencies that responded to each suggested

item. (See apps. I and II, question 3.)

b/The number of local agencies is projected on the basis of a

a statistical sample. (See app. II.)

NIE is currently funding research intended to meet these

needs in the areas of test biases and criterion-referencedmeasurement.

Criterion-referenced tests are designed to remedy some

weaknesses in standardized norm-referenced tests (see ch. 6)

by (1) being more accurately interpretable, (2) detecting

the effects of good instruction, and (3) allowing more ac-

curate diagnoses of the individual learner's capabilities.

49

A well-devised criterion-referenced test relates scoresto specific learning objectives or behaviors and includes anexplicit statement of performance standards. The objectivesmust be described without ambiguity to permit an accuratedescription of what an examinee does and does not know orcan and cannot do. Criterion-referenced tests pinpoint thestudent's deficiencies, while a norm-referenced test identi-fies only general student weaknesses.

Because criterion-referenced tests are not required toyield large variances in examinee's scores, they can retainquestions based on the primary curricular emphasis even if,after instruction, most learners answer them correctly. Con-sequently, criterion-referenced tests are considered morecapable of discerning instructional effects than norm-referenced tests.

An increasing number of educators have begun to questionthe use of standardized norm-referenced tests and to proposecriterion-referenced tests as an alternative. Some peopleassume that the latter are fully developed and ready to use.According to experts, however, the technical status ofcriterion-referenced measurement is far less advanced thanmany educators and others believe it to be.

An expert in criterion-referenced testing from theUniversity of Michigan has pointed out that producing testquestions that can be defended as valid and fair for both themajority of examinees and various minority groups is a majorproblem that affects the development of both standardizednorm-referenced and criterion-refer-enced tests.

Another expert in criterion-referenced testing from theCalifornia Test Bureau of McGraw-Hill has stated that care-fully constructed criterion-referenced tests can provideboth diagnostic and evaluative information that is appropriatenot only for disadvantaged students but also for all sdents,assuming some consensus can be obtained on instructionalobjectives to be tested. Therefore, such tests could discoverreal educational problems and indicate appropriate remedialhelp for students, rather than simply showing that the stu-dents are "below grade level" on a general test of readingor mathematics. He addeJ, however, that constructing suchcriterion-referenced tests is not a simple matter. Moreknowledge is needed about the structure of subject matterthan now exists. Nevertheless, successful statewide assess-ments and evaluations have been carried out using onlycriterion-referenced tests.

50

This testing expert believes that in the future it will

be possible to evaluate basic skills on a large scale using

appropriate criterion-referenced tests when sufficient know-

ledge has been acquired to specify the important skills and

subskills required to assure competence in these areas. For

this to occur, an understanding of the particular problems

fa:ing disadvantaged and minority students as well as the

basic logical and cognitive structure of disciplines will

be needed. In his opinion, progress is being made on these

problems, but as yet no widespread consensus of what is

important has emerged. Until it does, no national criterion-

referenced evaluations seem likely.

Commenting on our report, HEW stated that, based

on the test evaluation being made by the Center for Study of

Evaluation at the University of California at Los Angeles

commercially published criterion-referenced tests, like many

norm-referenced tests, ere generally unsatisfactory for

program evaluation.

CONCLUSIONS

Federal, State, and local education agencies frequently

use standardized norm-referenced tests to measure the effec-

tiveness of Federal education programs. The next chapter

provides further discussion of some of the tests' problems,

including test biases. Additional research may be needed on:

-- Criterion-referenced tests and other alternatives

to standardized norm-referenced achievement tests,for uses which include program evaluation.

-- How to reduce racial, sexual, and cultural biases

in standardized tests.

There is also a need to increase State and especially

local education agency awareness of available NIE-funded

information that is intended to help select the most appro-

priate tests for use in evaluation.

RECOMMENDATIONS TO THE SECRETARY OF HEW

We recommend that the Secretary direct NIE to:

-- Consider the need for funding additional research

in the future on (1) criterion-referenced tests and

other alternatives to standardized norm-referencedachievement tests for uses that include program

evaluation and (2) the nature and extent of racial,

51

sexual, and cultural biases in standardized testsand how such biases may be reduced.

-- Improve dissemination of available NIE-funded informa-tion, which is intended to help select the most ap-propriate standardized tests, thereby increasingState and local education agency officials' awarenessand use of this information.


HEW agreed with the above recommendations. Regardingour recommendation that NIE consider the need for fundingadditional testing research, HEW described efforts currentlyunderway and stated that more emphasis will be given to theseprograms in fiscal years 1977-79 if NIE's appropriation levelspermit. However, HEW also stated that it is not clear whetherwe believe NIE deserves additional appropriations for such aneffort since NIE cannot divert substantial funds from its pre-sent budget. It is not our intention to call for either addi-tional appropriations or redirection of NIE's present budget.Our recommendation calls on NIE to consider funding additionalresearch, as needed to address the problems discussed, thatcould begin when research efforts currently underway are com-pleted.

Regarding our recommendation that NIE improve dissemina-tion of its materials designed to help select standardizedtests, HEW stated that NIE intends to make school personnelfamiliar with these materials through several disseminationapproaches. These include:

-- Listing the test evaluation consumer guides in acatalog of NIE products sent to school superintendentsand district curriculum directors and which will nowbe sent to school district evaluation directors.

-- Using the new "Lab and Center R&D Exchange," which HEWbelieves will possibly reach 50 percent of the country'sschool systems.

-- Using the dissemination network formed by NIE's sevenresearch and development utilization contractors.

52

CHAPTER 6

STANDARDIZED TESTS AND PROGRAM EVALUATION

Serious questions have been raised about standardizednorm-referenced achievement tests in spite of their wide-spread use. Based on the views of test and evaluatici expertsand others, this chapter provides information concerning:(1) criticism and defense of these tests and (2) suggestionsfor alleviating problems in conducting large-scale evaluationsof compensatory education and desegregation programs.

CRITICISM OFSTANDARDIZED NORM-REFERENCED TESTS

The content and use of standardized norm-referencedachievement tests have been widely criticized by testingexperts, educators, and others. The National EducationAssociation, the National Association for the Advarcenentof Colored People, and others have called for morato; iums onusing standardized tests. 1/

The National Association of Elementary Schoc! Principalsconvened the appointed representatives of 25 national educa-tional associations, government agencies, and educatic; groupsin November 1975 to explore the implications of the widespreaduse of standardized achievement tests. They recommendedthat the educators give higher priority to developing andusing new assessment processes that are more fair and effec-tive than those currently in use and that educators moreadequately consider the diverse talents and cultural back-grounds of children.

Criticism of norm-referenced tests

Critics of the standardized norm-referenced achievementtests believe test bias and score interpretation, as well asother problems, are some of the test's deficiencies.

1/Robert L. Williams, et al., "Critical Issues in AchievementTesting of Children from Diverse Ethnic Backgrounds," pre-pared for the Office of Education's Conference on "Achieve-ment Testing of Disadvantaged and Minority Students ForEducational Program Evaluation," May 27-30, 1976, pp. 8-12.

53

Test bias

A frequent criticism is that the tests discriminate un-fairly against racial and cultural minorities because (1)their norms continue to be based on populations not repre-sentative of a pluralistic, multicultural society and (2)test questions ask for knowledge most familiar to the whitemiddle class or reflect the cultural biases of test developersand question writers, who mostly represent the white middleclass. Therefore, the tests are considered biased againstthe poor, black, Hispanic, and other minority Americans.Although some test publishers have attempted to minimizecultural and racial biases, these problems have not beensolved.

Score interpretation

Some critics cite certain weaknesses in the interpre-tation of test scores, as follows:

-- Raw scores (scores based on the total number of correctanswers on a test) are typically interpreted in termsof national norms, which are estimates of nationwideperformance. The national norms are derived from giv-ing the test to what is intended to be a representa-tive sample of students. But since the samples aredifferent and taken at different times, the normsfor different tests vary. As a result, the normedscore for a student dependL partly on which test thestudent takes. 1/

-- The grade equivalent score represents the estimatedaverage score that pupils in that month of that gradewould achieve on the test nationwide. For example,a 3.8 in reading is the average score for a child inthe eighth month of the third grade. A frequentmisconception is that te score means the child hasmastered the standard curriculum up to that point inschooling. Even if the 3.8 grade-equivalent werealways an acc"rate estimate of the average achieve-ment for the child in the eighth month of the third

l/George Weber, "Uses and Abuses of Standardized Testingin the Schools," Council for Basic Education, May 1974,pp. 13-14.

54

grade, that average child has not necessarily "mastered"the reading curriculum to that point, although parents,and even teachers usually do not understand this.Another problem is that on some tests a few answersone way or the other can make as much as a wholeyear's difference in the grade-equivalent score, andthe tests are simply not that accurate. Despitethese and other shortcomings, grade-equivalent scoresare usually used in interpreting the results ofstandardized achievement tests. 1/ 2/

-- Test score- .re meaningful only in terms of nationalaverage achievement, and do not indicate whether thisis good, bad, or indifferent in terms of "reasonable"standards defined independently of such average scores.For example, if a given third grade class does as wellon a given reading test as the national third gradeaverage, this does not reveal how well the childrencan read in absolute terms. According to this view-point, since reading achievement in the primary gradesis generally below what could reasonably be accom-plished, reading scores suggest a better achievementthan is in fact the case. 3/ Moreover, critics say t'atnorm-referenced tests result in half of the studentsbeing above the norm and half below, as a statisticalfact of life. Then how, they ask, does one "raisescores to the general norm?" 4/

l/Ibid., pp. 14 and 16.

2/Ralph Tyler, "Discussion of Hoepfner's Paper on Achieve-ment Test Selection for Program Evaluation," prepared forthe Office of Education's conference on "Achievement Test-ing of Disadvantaged and Minority Students For EducationalProgram Evaluation," May 27-30, 1976, p. 7.

3/Weber, p. 20.

4/Miriam Clasby, et al., "Laws, Tests, and Schooling:Changing Contexts for Educational Decision-Making,"RR-11, Educational Policy Researcih Center, SyracuseUniversity Research Corporation, Syraculse, N.Y. Oct.1973, p. 174.

55

Other prorem :ecting test use

Regarding the use of standardized norm-referenced testsin prograin evaluation, critics state that the tests areinadequate for eval.atig program effectiveness and citetne following deficiencies.

Test results not specific--Normative test scores mightbe very useful for program evaluation if one knew what theymeant. But the test results do not indicate specificallywhat students have learned. 1/ The test scores reflect thenumber of correct answers students give, but do not indicatehow well students achieve intended educational objectivesor whether they answer particular questions correctly. 2/ 3/Therefore, the test results are too general to providespecific guidance for improving the quality of schooling. 4/

T-sts do not coincide with instrutional-objectives--Thetests content often has a low degree of correspondence wniactual instructional objectives at any given time or place.This is a serious deficiency because one cannot determinethe effectiveness of an educational program unless the testsused actually measure the objectives that the teacher,teachers, or school district is attempting to accomplish over

l/Stephen Klien, "Evaluating Tests in Terms of the Informa-tion They Provide," Evaluation Comment, Vol. 2, June1970, Center for the Study of Evaluation, University ofCalifornia at Los Angeles, p. 2.

2/National Assessment of Educational Progress, "General Infor-mation Yearbook," Dec. 1974, Rept. No. 03/04-GIY, pp. 1, 3,and 4.

3/Carmen J. F..ley, "Not Just Another Standardized Test,"Compact Vol. 6, No. 1, Feb. 1972, Education Commissionf the States, pp. 10 and 11.

4/W. James Popham, "Appropriate Assessment Devices for Educa-tional Evaluation," presented at the National Forum on Ed-ucational Accountability in Denver sponsored by the Officeof Education and the Cooperative Accountability Project,May 8-9, 1975, pp. 3 and 4.

56

a defined period of time. 1/ As a major test publisher 2/and others have pointed out, it is only to the extent thata program's instructional objectives coincide with those ofthe test that the instrument is valid for measuring how wellthe learning program has succeeded. If the test fails tomeasure certain objectives included in the learning programand/or measures other objectives that are not part of thatprogram, to that extent the test is not a valid measure ofsuccess in the program.

In addition, test publishers describe their standardizednorm-referenced tests in very general terms, calling them,for example, tests of reading comprehension. The generalityof these descriptions increases the possibility of unrecog-nized differences between what the schools teach and whatthe tests specifically measure. According to this view, suchdifferences result in misleading data and false conclusionsabout program effectiveness. 3/

Tests are designed to differentiate students, not diag-nose specific problems--Since the tests are intended to com-pare examinees, they must yield a reasonably large degreeof "response variance"--different scores for differentexaminees. Test questions that are answered correctly yhalf the eaminees maximize a test's response variance. Ifa test question is answered correctly by a large or increas-ing proportion of examinees, it tends to be removed fromthe test or modified. Thus, as norm-referenced tests areperiodically revised, questions on which examinees performwell are systematically eliminated. Yet, test critics main-tain that such questions often deal with the very conceptsteachers thought important enough to emphasize in their in-struction. If a concept is taught well, questions measuringit will likely be removed in the next test revision. The

l/Rodney Skager, "The System for Objectives-Based Evalu-ation-Reading," Evaluation Comment Vol. 3, No. 1, Sept.1971, Center for the Study of Evaluation, Universityof California at Los Angeles, p. 6.

2/J. Wayne Wrightstone, et. a., "Accountability in Educa-tion and Associated Measurement Problems," Test ServiceNotebook 33, Issued by the Test Department, HarcourtBrace Jovanovich, Inc., New York, p. 4.

3/Popham, p. 3.

57

result is that (1) the tests are particularly insensitive todetecting the effects of instruction 1/ 2/ and (2) sometimesthe tests do not contain test questions dealing with thecentral concepts in the field. These are serious deficienciesfor a test used for program evaluation. 1/ 3/

Teaching the test--According to some critics, the use ofstandardized achievement tests for program evaluation and"accountability" has led and will lead to corruption and dis-honesty among educational professionals and to the furthererosion of public trust in the schools and the people who runthem. Faced with public pressure that is often in itself"irrational and destructive," it is all too easy for princi-pals and teachers to respond to subtle pressure to "preparestudents for the assessment" by teaching students responsesto specific test questions rather than by developing theunderlying skills which these questions reflect. Standardizedtests are readily available at all levels of any school sys-tem, and are brief enough to be highly susceptible to coach-ing. Test security and control may be feasible for programslike the college boards or the American College Testing Pro-gram, in which representatives of the testing agency handlethe assessment and the examinees come to a central location.Similar controls are not feasible in large-scale evaluationsof school programs. 4/

nappropriate norms--In the typical large-scale programevaluation, the focus is on performance of groups by stu-dents--categorized by classes, buildings, or school systems--not by individuals. The reference norms one needs for suchpurposes are distributions of averages for appropriate refer-ence schools, not norms for individuals; and these types of

1/Ibid., pp. 4 and 5.

2/Richard M. Jeger, "A Discussion of Classical Test Develop-ment Solutions," prepared for the Office of Education's con-ference on "Achievement Testing of Disadvantaged and MinorityStudents For Educational Program Evaluation," May 27-30,1976, p. 6.

3/W. James Popham, Statement presented at U.S. House of Repre-sentative hearings on the Elementary and Secondary EducationAct, March 28, 1973, pp. 2323 and 2324.

4/See Skager, p. 7.

58

norms are seldom available unless they ere collected as partof the evaluation study itself. 1/

Measurement of growth

A major test publishfng company has described 2/ varioustechnical problems with the tests that are related to theiruse in measuring academic growth, as contrasted against theirtraditional use for measuring present status. This is sig-nificant, because according to the company most educationalprogram evaluations have involved using nationally stand-ardized norm-referenced achievement tests--especially formeasuring "growth." Besides the problems related to usingtests to measure present status--such as selecting a test thatmeasures what the user intends, assuring that teachers followdirections, and the like--the test publisher noted that whenthe tests are used to measure growth they are attended byspecial problems, such as the following:

-- Defining "normal growth." There are serious questionsabout the legitimacy of defining normal growth in termsof grade-equivalent scores. However, expected or nor-mal gain is almost universally defined in terms of gradeequivalents for standardized achievement tests used atthe elementary level.

-- Interpolating or estimating norms so that they may beapplied to tests taken at times during the year forwhich norms have not been empirically determined.These estimates are almost certainly in error bysome small amount in most cases and by a substantialamount in some cases.

-- Converting scores from different levels and alter-native forms of a standardized test series so thatthe scores are equivalent. If they are not equivalent,this can lead to invalid measurement of gains andpossibly erroneous conclusions as to the merit of theprogram evaluated.

1/William E. Coffman, "Classical Test Development Solutions,"prepared for the Office of Education's conference on"Achievement Testing of Disadvantaged and Minority StudentsFor Educational Program Evaluation," May 27-30, 1976, p. 24.

2/Wrightstone, et. al., pp. 5-12.

59

The above criticisms of standardized norm-referencedtests indicate that the tests may be inappropriate forwide-scale use in evaluating educational programs.

Why are standardized tests used?

Considering all these criticisms, why are the tests usedfor Federal program evaluations? A former OE evaluation of-ficial explained that standardized achievement tests are usedin educational program evaluation "for many good and not-so-good reasons" such as the following:

-- Since many standardized achievement tests or subtestswere developed primarily for basic skill performancemeasurement, they become prime candidates for evaluat-ing programs that seek to improve basic skills.

-- Such tests are readily available in large quantities,at short notice, and at relatively low cost. Ifoff-the-shelf tests were not available, the cost ofdeveloping and standardizing such measures for aspecific evaluation might be prohibitive and there-fore might cause abandoning evaluation plans.

Other, more technical reasons given by this former OEofficial for the widespread use of standardized achievementtests in education program evaluation include their generaltechnical excellence, their standardized administration pro-cedures, the representativeness of their questions to thepossible universe of questions on basic skill performance,their normative reference, ease in scoring, alternativeand equated test forms, high reliabilities, and apparentvalidity. Another factor, he stated, is that most achieve-ment tests are part of a battery of tests designed so thatstudent growth can be measured as the student progressesthrough school by administering different test levelsand forms. 1/

1/Michael J. Wargo, "An Evaluator's Perspective," preparedfor the Office of Education's conference on "AchievementTesting of Disadvantaged and Minority Students For Educa-tional Program Evaluation," May 27-30, 1976, pp. 19-21.

60

DEFENSE OF STANDARDIZED TESTS

Defenders of the tests say that there is no convenientalternative for those outside the schools to evaluate students'collective achievements. Therefore, in their view, despitethe tests' shortcomings and abuses, they provide the best in-formation available. According to OE's Assistant Commissionerfor Planning, Budgeting, and Evaluation, it is true thatstandardized tests do not allow for program and project differ-ences, but the Congress wants to know if the overall programis effective, and he believes the tests provide this informa-tion acceptably.

Defenders of standardized tests tend to emphasize testmisuse as the major problem, and they express the need fortraining teachers and administrators in proper test selec-tion, administration, and interpretation. Many testingexperts admit that the tests have deficiencies--such as testand test question bias--but state that wholesale rejectionof the tests and their norm-referenced interpretations isunwarranted. Instead, they favor refining the tests andlearning to avoid pitfalls, such as lack of congruence among(1) test content, (2) course content or curricular emphasis,and (3) the purpose and design of the evaluation. Some statethat progress has been made in the last 10 to 15 years in suchareas as

-- constructing efficient tests that reliably measureimportant educational skills,

-- developing nationally representative norms, and

-- providing test users with relevant informationabout the test areas. / 2/

l/or C. Bianchini, "Achievement Tests and DifferentiatedNo;.i.s," prepared for the Office of Education's conferenceon "Achievement Testing of Disadvantaged and MinorityStudents For Educational Program Evaluation," May 27-30,1976, pp. 36-37.

2/Ralph Hoepfner, "Achievement Test Selection For Progran.Evaluation," prepared for the same conference, p. 2.

61

One expert from Systems Development Corporation, forexample, stated that the quality, forthrightness, and focusof standardized achievement tests have improved remarkably,particularly within the last decade, and that many well-aimedattacks on standardized achievement tests made about 10 yearsago are no longer valid. 1/ This expert, who has considerableexperience in evaluating test adequacy, stated that his re-view of available criterion-referenced tests--the most likelyalternative--found them to be of uniformly bad quality.

An expert from RMC Research Corporation said that thestandardized tests are not the problem. In his view, OE'slarge-scale national evaluations as well as most local evalua-tions have been poorly designed and poorly done, and thetests have often been misused by evaluators. He said thathe recently spent about 2 weeks in every State working ontitle I, dividing his time equally between the State educationagencies and some local education agencies in each State.Based on this experience, he believes that the great major-ity of local title I projects is not providing students witheducational treatments that differ in any significant wayfrom regular classes. In such circumstances, the generallack of evidence of marked improvement in basic skills shouldnot be too surprising.

Suggested improvements

Included among suggestions for improvement offered bytest defenders are the following:

-- Test publishers, in developing standardized testsof basic skills, should break away--at least in theelementary grades--from the current practice of de-signing tests for measuring achievement at multiplegrade levels. Test publishers should develop seriesof tests, each designed for a specific grade, withsufficient numbers of questions at various difficultylevels to yield reliable measurement for essentiallyall students at that grade.

-- Test publishers should provide more detailed informa-tion about the content of test questions, the instruc-tional objectives on which questions are based, andthe skill characteristics needed to answer them to

1/Ibid., p. 2.

62

provide test users with a general framework forassessing the logical congruence between the testcontent and the content of the curriculum. In oneexpert's view, such a detailed classification ofindividual questions is as important to the testuser as is the extensive statistical data currentlyprovided about the mental measurement characteristicsof the questions.

-- Test publishers should expand the services they pro-vide their clients to include developing specialnorms when they would produce more appropriate useof test results. Test publishers need to be moreactive in assuring that their tests and subsequenttest results are used fairly and effectively.

--The state of the art should be extended to providetest users with practical procedures to assist themin selecting tests and relating test results toinstructional programs and program evaluation.

--Program evaluators should recognize that the processfor selecting appropriate standardized tests forevaluation must go beyond a naive inspection of thetest and normative data. The process ought to includea careful inquiry into such elements as relevant stu-dent and school characteristics, test content andits relationship to curriculum content, and the ade-quacy of normative data in relation to the evaluationdesign. / Evaluators should select tests whichmaximize coverage of the objectives desired. 2/

CONFERENCE ON USING TEST- TO EVALUATE PROGRAMS

In May 1976 OE sponsored a special four-day conferenceon "Achievement Testing of Disadvantaged and Minority Studentsfor Educational Program Evaluation." OE invited about 50experts in testing, program evaluation, and related fieldsincluding university and other researchers, and representa-tives of leading test publishers. Federal and local educa-tion agencies, and education and other interest groups werealso represented. The conference focus was on large-scale

1/Bianchini, pp. 36-38.

2/Hoepfner, F, 33.

63

program evaluations of elementary and secondary schoolcompensatory education and desegregation programs--programs onwhich OE concentrates most of its effort. The purpose of theconference was to identify, define, and analyze the manyproblems associated with using standardized achievementtests in the context of large-scale evaluations of these pro-grams and to develop interim and long-term solutions tothose problems.

Conference participants mentioned many of the sameproblems and issues discussed previously in this chapter.Five small working groups were formed at the close of theconference to write recommended solutions. All five groupsrecommended (1) either limiting or ceasing large-scale Fed-eral education program evaluations like those contractedfor by OE's Office of Planning, Budgeting, and Evaluationand/or (2) placing greater emphasis on local evaluations.

The value of conducting large-scale evaluations wasquestioned because of problems such as the current state ofthe art in evaluation; inherent bias problems in datacollection instruments, methods, and analysis procedures;and community differences in the education programs beingevaluated. Suggestions for increased emphasis on locallevel evaluations included providing funding for adequatelocal evaluations, training local evaluators, and tech-nical assistance in designing and implementing technicallysound evaluations.

Related conclusions and recommendations included thefollowing:

1. The Federal Government's basic policy for evaluatingits education programs for culturally different stu-dents should require that each local education agencycarry out evaluation studies designed to assess howwell its local project objectives have been achieved.Also, approved local agency budgets should includesufficient funds to provide for adequate evaluationstudy design, data collection, analysis, and reporting.The studies should involve at ail stages the partici-pation of members of the minority culture or culturesinvolved. Beyond this, Federal responsibility shouldbe limited to (a) conducting and publicizing the re-sults of audits that determine whether funds were usedas intended and whether evaluation data relevant toprogram objectives were collected, analyzed and re-ported; (b) providing general guidelines and training

64

for evaluation, and encouraging the developmentof guidelines and consulting resources by Stateagencies and Federal regional offices; and (c)developing summary reports based on the aggregationof information from local evaluations.

2. To resolve Federal program evaluation problems:

-- The funds and activities devoted to larre-scaleevaluation should be immediately rechanneled intodevelopment of program evaluation methods andtools that are likely to be more productive, whileaffording safeguards for recipient populations.

--Studies should be initiated to explore whetherit is feasible to draw overall program impactconclusions based on aggregations. Valid, locallyrelevant project evaluations should also be started.

-- Immediate congressional needs for program impactinformation should be satisfied through carefuland extensive analysis in phases of data alreadycollected and data currently being gathered undercontract.

-- Model local project evaluations should be fundedas demonstrations, with support from expertsfunded by OE's Office of Planning, Budgeting, andEvaluation. This support should be provided tolocal school systems on a cooperative basis. Localevaluation personnel should be trained. The mosteffective local evaluation strategies demonstratedshould be adopted in an ever-widening pattern, tobuild a basis for effective national program evalua-tion by aggregating valid local project evaluations.

--Studies should be funded on tools and proceduresneeded to make local evaluations that will reflectvalid conclusions on the worth of specific projectsand will adequately identify the processes and in-puts of those projects. This includes holdinga workshop to identify needed tools, methods, andpriorities.

-- Studies should be funded on OE and State educationagency regulations, guidelines, and administrativedecisions that affect the quality of local educationagency evaluation activities and reports. These

65

studies should produce model administrative andregulatory strategies, including incentives, forupgrading the quality and validity of local educa-tion agency evaluations of federally supportedprojects.

3. One reason for greater emphasis on local evaluationsis that instructional treatments are not uniform undernationwide programs. The Emergency School Aid Actprogram, for example, does not provide uniform treat-ments for students. Neither do title I or title VIIprograms. But there are various identifiable, de-scribable, instructional treatments funded by theseprograms and a small number of them are effective.However, information about these effective treatmentsis lost in large-scale evaluations that cover a largenumber of ineffective treatments.

4. Guidelines for local evaluation studies shouldinclude the following:

--Recommendations that each evaluation report includea description of what actually happened to pupilsinvolved in the program. Without such informationit is impossible to reach a meaningful interpre-tation of any measures of change.

--Encouragement to local projects to collect evidenceof progress toward improved skills in reading andmathematics and obtain data regarding other outcomesof the particular methods employed, particularlyon developing self-concept and interpersonal rela-tions.

-- Encouragements to local education agencies to in-clude in their evaluation procedures systematicattention to the selection or development of tech-niques designed to minimize cultural bias in testsand other data-collecting procedures used in evaLua-tion.

5. Money saved by curtailing large-scale evaluationsshould support a national panel responsible for ex-ploring and developing more responsive and effec-tive alternative program evaluation models.

66

6. Research into alternative evaluation models shouldalso be conducted to determine a) the evaluationinformation the Congress, OE, other Federal agencies,and local agencies receiving Federal funds needand (b) whether adequate evaluation designs canbe effectively implemented to meet these needs,and their cost.

7. Because of conflicts in the interpretation of off-the-shelf commercial standardized tests, only custom-designed tests should be used for federally sponsoredlarge-scale program evaluation, when such large-scaleevaluations are essential. The associated cost andeffort to define program objectives and define mea-sures of their effect should be part of the Federalagency's responsibility along with survey and analysiscosts.

Other small working group conclusions and recommendationsare shown in appendix IV.

In response to the conferee's recommendations, OE'sAssistant Commissioner for Planning, Budgeting, and Evaluationsaid he agrees that there is a need for increased evaluationactivity and capability at the State and local levels. Hedisagreed, however, that OE's nationally planned evaluationsshould be deemphasized.

CONCLUSIONS

As noted in chapter 5, Federal, State, and local educa-tion agencies frequently use standardized norm-referencedachievement tests to measure the effect of Federal educationprograms. However, there is a great deal of disagreementamong testing experts and educators on the adequacy of thesetests for their intended uses. Serious questions have beenraised about the tests, and some organizations have calledfor a moratorium on their use and a higher priority on de-velopment and use of alternatives.

Although the tests' critics and defenders agree thatcertain problems exist, views differ greatly about the im-portance or severity of the problems and their remedies.Those who defend the tests and their continued use recognizethat improvements are needed on such issues as test and testquestion bias; appropriate test norms; test selection, in-terpretation, and administration, including uses for programevaluation; and other issues.

67

Some questions raised about the adequacy of the testshave great importance in determining the appropriateness,validity, and proper conduct of educational program evalua-tions, including those that are federally funded. Decision-makers should be aware of these issues when using suchinformation.

68

CHAPTER 7

PREFERENCES FOR EVIDENCE OF

PROGRAM EFFECTIVENESS DIFFER

Responses to our questionnaire showed that State andlocal education agency officials responsible for administer-ing Federal title I, III, and VII elementary and secondaryeducation programs perceive important differences in thetype of evidence of program effectiveness that Federal,State, and local officials refer.

-- Local education agency officials believe that Stateand OE officials are predominantly interested instandardized norm-referenced test scores for demons-trating program results. Local officials themselvesprefer broader, more diverse types of informationthat only these test scores.

-- State education agency officials prefer criterion-referenced tests. They favor less emphasis on stand-ardized norm-referenced test results as evidenceof program effectiveness than they believe OE officialswant, but prefer them more than local agency officialsdo.

OE's Assistant Commissioner for Planning, Budgeting,and Evaluation believes that hard objective data on students'cognitive improvements is the centrally important informa-tion needed. He noted that this means gain scores on stand-ardized norm-referenced achievement tests because these aremost available.

As noted in chapter 5, State and local evaluations fortitles I, III, and VII have been most often based on stand-ardized norm-referenced tests. Therefore, evaluationsusually have not reflected the kinds of results that Stateand local officials themselves prefer, but rather thosethat they believe would be likely to most impress higherlevel officials.

69

STATE AND LOCAL EDUCATION OFFICIALS BELIEVEANDARDIZED TESTY RESULTS MOST

IMPRESS HIGHER LEVEL OFFICIALS

Legislative and other requirements for annual evalua-tions at the State and local levels of titles I, III, andVII of the Elementary and Secondary Education Act haveseveral purposes, according to OE's Assistant Commissionerfor Planning, Budgeting, and Evaluation. These include

-- reporting on local project effectiveness or providingother data, on which local, State, and Federal of-ficials can base programmatic, financial, and policydecisions; and

-- providing data to State and/or Federal officials toselect successful and exemplary projects for dis-semination.

Educators must increasingly evaluate their instructionalendeavors and present evidence which will permit others tojudge their effectiveness. Through our questionnaire we haveattempted to ascertain (1) what kinds of evidence of programeffectiveness local officials administeri.g federally supportedelementary nd secondary education programs think State andOE program otficers expect, (2) what kinds of evidence of ef-fectiveness State officials administering the programs thinkOE program officers expect, and (3) what types of evidenceof effectiveness the State and local officials think is mostuseful to them.

Our questionnaire asked

-- State program officials to rate, on a seven pointscale, the types of information or findings whichmost impress them and, they believe, OE officials,as demonstrating program results and

-- local project officials to rate on the same sevenpoint scale the types of information or findings whichmost impress them and, they believe, OE or State of-ficials as demonstrating program results.

As noted in chapter 5 (see op. 43 to 4 4 ), only a fewStates employed statewide testing to evaluate their titleI and III programs, and the great ma4ority of these Statesused standardized norm-referenced te. 3 Most State evalua-tion reports for titles I and III were agregations of localeducation agency evaluition data. As shor on page 45,

70

based on our qu stionnaire sample responses, local title I,III, and VII projects indicated overwhelming use of stand-ardized norm-referenced tests--far greater than their useof criterion-referenced tests and all other tests.

State officials' views

The demonstration of program results should be animportant factor in deciding to continue OE program funding.As the table below shows, State title I and III of-ficials agreed on the types of program results they thinkare likely to impress OE officials most in this regard.

State officials overwhelmingly perceive OE officialsto be most impressed by results obtained from standardizednorm-referenced tests. Eighty-two percent of the Statetitle I and 65 percent of the State title III officialsresponding ranked this category first. (See app. I fordetails.) State title I and III officials ranked resultson improvements in educational management and accountability,as well as findings obtained from criterion-referenced tests,as either the second or third most impressive data for OEofficials.

Titles I and III State Officials' Per-eptions of-h-Tmpresse E OEfficias an-Wh- mes_-

tate or i c i (noe a)

Impresses OE Impresses sellTTT TTTTT_ T -T y-TTe -TTTn

1. Improvement in educationalmanagement or accountability 3.1 3.9 4.0 4.5

2. Improvement in school servicesor facilities 6.0 6.1 6.1

3. Student improvements throughgain scores or grades onteacher ratings 4.1 4.6 4.3 5.1

4. Student improvement throu-hgain scores on standardize_noim-referenced tests 1.3 1.6 2.5 3.3

5. Student improvement throughgain scores on -riter-ion-refer enccd tests 3.2 2.7 .4 2.4

6. Student improvement throughgains in the affective domain(e.g., likes, dislikes, inter-ests, attitudes, motives, etc.) 5.3 4.4 3.7 3.1

7. Improvements in curriculumand instructional methods 4.5 4.5 3.8 3.4

a/The numbers shown are the rspon'-cLs' average rankings of the alter-nati"-s given. Each respondo-nt was asked to give the most impressivetype of resulL a ranking of "'1" and t:le next most impressive "2," etc.

71

State officials themselves appear to be impressed byslightly different types of data and information on programresults. The table above shows that officials from titles Iand III would be most impressed by gain scores on criterion-referenced tests. Although State title I officials rankedgain scores from norm-referenced tests almost equal to thosefrom criterion-referenced tests, the title II: respondentsconsider gains in the affective domain (for e:ample, likes,dislikes, interests, attitudes, motives) as ':. second mostimportant result, followed by gain scores LOin norm-referencedtests. State title officials ranked gains in the affectivedomain third. Title I and III officials agre·ed on the remain-ing categories.

In general, State officials believe objective test re-sults of program performance are most likely to impress bothOE and State program officers. However, State respondentssee themselves as more likely than OE program officials tobe impressed by criterion-referenced test results and otherfactors cited above. State officials se- themselves as moreopen than OE officials to various types of information demons-trating program results. At the same time, they agree withwhat they see as OE officials' view that impact evaluationsbased on test scores are the most impressive findings.

Emphasis on test scores for evidence of program effective-ness continues at the State and national levels. Reasons mayinclude the following:

-- Legislators and Federal and State officials are demand-ing evidence that the infusion of Federal and Statedollars for special rograms works.

-- Test scores have traditionally been the only measureof effectiveness.

-- Few alternatives to test scores exist and those thatare available are unproven and not as widely accepted.

Local officials' views

The local title I and III respondents' perspectivesare ve,:y nearly alike on what evidence of program resultsimpresses State and OE officials. Title VII respondents'perspective is somewhat difiernt; however, this may be atleast partly be uise local title VII projects are responsibledirectly to OE program officers and not to State programofficers.

72

The table below shows that local title I, III, and VIIproject officials feel that OE or State program officers aremost impressed by norm-referenced test results.

Local Title I, III, and VII Officials' Perceptionsof What Impresse's-3tae or O offTicials an

What impreses Local fficials (note a)

Tit mp r e sses sefImpresses OE'or State Impresses seifmle F rtie Title Title Title TitleI III VII I III VII

1. Improvement in educa-tional management oraccountability 3.9 3.8 3.8 4.7 4.5 4.4

2. Improvement in schoolservices or facilities 5.2 5.1 4.8 49 4.9 4.8

3. Student improvementsthrough gain scores orgrades on teacherratings 4.3 4.3 5.1 4.0 4.3 4.4

4. Student improvementthrough gain scoreson standardized norm-referenced tests 2.2 2.6 2.1 3.7 3.7 3.4

5. Student improvementthrough gain scoresor, criterion-referencedtests 3.1 3.5 3.9 3.4 3.7 4.0

6. Student improvementthrough gains in theaffective domair(e.g., likes,:islikes, interests,attitudes, motives,etc.) 4.6 4.2 4.4 3.3 3.3 3.4

7. Improvements incurriculum andinstructionalmethods 4.2 3.3 3.7 3.2 3.0 3.2

a/The numbers shown are the respondents' average rankings of the givenalternatives. Each respondent was asked to give the most impressivetype of result a ranking of "1" and the next most impressive "2," etc.

Local title I an. III project officials ranked criterion-referenced test results as the second most impressive data forOE or State officials, followed closely by improvements ineducational management and accountability, and improvements

73

in curriculum and instructional methods. Local title VII proj-ect officials, however, considered improvements in curriculumand instructional methods the second most impressive programresult to OE, followed by improvements in educational manage-ment and accountability. Program results from criterion-referenced tests were ranked fourth.

Program results that impress local project officials aredifferent from what they believe impresses OE or State programofficers. Officials from all three local project types rankedimprovement in curriculum and instructional methods as themost impressive program result. Local title I, III, and VIIproject officials consider gains in the affective domainthe second most impressive program result, but with titleVII, gain scores on norm-referenced tests also received thesame ranking.

Concerning these results, (1) local title I projectofficials appear to consider results from criterion-referencedtests more impressive than norm-referenced test results,(2) local title III officials consider them equially impres-sive, and (3) local title VII officials clearly prefer norm-to criterion-referenced test results. In all cases, however,test results are not the most impressive program result tolocal project c ficials.

Since local project officials perceive results fromimprovements in curriculum and instructional methods andimprovements in the affective domain as more meaningful tothem th3n to State or OE officials, the extent to which suchresults are excluded from evaluations will probably reducethe adequacy of the evaluations and perhaps make them lessuseful to local officials. Generally, the degree to whichlocal perceptions of what will most impress State or OE of-ficials causes evaluations to emphasize test results, andeducational management and accountability will also probablyaffect the adequacy and perhaps the usefulness of evaluationsat the local level.

As wiz the State officials' perception of OE, localproject officials generally believe OE or State program of-ficers are most impressed by student outcome measures ofprogram effectiveness, such as norm- and criterion-referencedtests. Local project officials themselves are considerablyleis impressed by these measures. hey are more impressedby a variety of measures, but do not believe OE or Stateprogram officers fully share this interest.

74

Correspondingly, 51 percent of the local agency respond-ents and 77 percent of State agency respondents to our ques-tionnaire stated that the educational community needs togreatly increase efforts to develop alternatives to theclassic standardized norm-referenced tests.

Certain differences in questionnaire ratings among Fed-eral, State, and local projects may be because of differingproject objectives and priorities. However, local officialsclearly do not regard test scores as the sole or most impres-sive criterion for determining program effectiveness. Thismay indicate a growing disenchantment with using standardizedn, rm-referenced tests as program evaluation tools. Lccal of-ficials are apparently interested in knowing how the projectsas a whole are functioning and may see test measures as onlyone of several factors to be used in assessing individual andproject performance.

CONCLUSIONS

Local and State evaluation reports on Federal elementaryand secondary education program effectivent.s are intended toprovide information on which local, State, and Federal offi-cials can base policy and program decisions. However, ourquestionnaire results show that State and local officials seeimportant differences in the types of evidence of programeffectiveness that they themselves and officials at otherlevels--Federal, State, and local--prefer. Therefore, bettercommunication is needed among the three levels about the in-formation they need to facilitate policy and program decisionsat each level. The questionnaire results also raise thisquestion: Should all three levels be served by a reportingsystem based on the same reports?

Plthough State officials view OE program officials asbeing most impressed by standardized norm-referenced testresults, and local officials view State and OE officials inthe same manner, State and local officials say that they arenot most impressed by such results. Local officials preferbroader, mere diverse information on program results thanjust these test scores and they are most impressed by improve-ments in curriculum and instructional methods and gains inthe affective domain (likes, dislikes, interests, attitudes,motives, etc.). State officials are most impressed by resultsfrom criterion-referenced tests.

OE's Assistant Commissioner for Planning, Budgeting,and Evaluation th ieves that hard, objective data on students'cognitive improvements is the most important information

75

needed. HE noted that this means gain scores on standardizednorm-referenced achievement ests because these are mostavailable.

The widespread use of standardized norm-referenced teststo evaluate State and local title I, III, and VII programsindicates that State and local officials have more frequentlybased their evaluations on the kinds of results they believewould be likely to most impress higher level officials thanon their own preferences.

RECOMMENDATION TO THE SECRETARY OF HEW

In connection with the assessment recommended in chapter4, we recommend that the Secretary direct OE to review thetypes of State and/or local program evaluation informationcollected (or planned to be collected) on programs authorizedby titles I and VII of the Elementary and Secondary EducationAct. The review should include an assessment of the informa-tion's usefulness at each level and should determine:

-- Whether it is realistic to attempt to serve Federal,State, and local levels with aggregated data based onlocal agency evaluation reports.

-- How the information needs at the local, State, andFederal levels can best be met.

-- Whether unnecessary duplication exists or willexist in meeting Federal information requirementsthrough the State and local reporting systems aswell as through OE national evaluations on theseprograms, and if so, how it should be eliminated.

During this review process, to define the evaluation infor-mation needed at State and local levels, OE should seek theviews and cooperation of the State and local officials whoare intended to use the results. 1/

1/HEW combined its comments on our recommendations forthis chapter and chapter 4. For discussion of thesecomments, see "Agency comments and our evaluation" sectionon p. 37.

76

APPENDIX I APPENDIX I

RESULTS CF GAO'S

STATE EDUCATION AGENC- QUESTIONNAIRE (note a)

Numberresponding Responses

from 50 Percent(note b) Number (note c)

Section A: General

1. How familiar are you with the researchbeing conducted by the Center for theStudy of Evaluation at the Universityof California a Los Angeles to evaluatethe utility of many popular commerciallyavailable standardized norm-referencedtests? (note d) (Check the one responsewhich best expresses your familiaritywith the Center's research.) 50

Have little or no information 6 12.0

Aware of the Center's work in testevaluation 14 28.0

Read some of the Center's publicationson the evaluation of norm-referencedtests 17 34.0

Used the Center's material to assistin the selection of commercially avail-able standardized norm-referenced tests 13 26.0

100.0

2. How familiar are you with the Anchor TestStudy conducted by the Educational TestingService for the U.S. Ofice of Education(OE) to provide the ability to translate achild's score on any one of the eight mostwijely ised standardized reading tests intoa score on any of the other tests. 50

Have little or no information 3 6.0

Aware of the Anchor Test Study 13 26.0

Read the Anchor Test Study 25 $0.r

_ Used the Anchor Test Study 9 18.0

100.0

77


.l L( rj . C 4 )I b

a, vr1, (u EJ O r0 ( N q Ow IU0-( N U I,

>" ola co a o

*4 -lz mI

· I '-' 0 0 0C .1 _CIou (n I) I -OO

i o o

£Q Z.

U V v co co

u al ., o o o O

e e. - , o o o)L.1

t, 40o.. 0

' '= 4o q oO~~L~~ ~,,

ft 'w ' C i o * 4 ._C C ( 0 - " *

C ' " .Q :_ ~ C 3 X. Q .o V I O N V0 X W t. r 0 4 4

O Li,

V

4C

XC - VQ X0CV 4X.4... 0. 4 1 0 0 U 0 Q * 0

to t) 4 > 4 . 4.* (LII

4 0 V Oc < * n 1 D 4 a£ 0 fV4, 0 *4 :1 In 0 N U -

IOO c IU(a0> C'I'r . 4 Q ' 0 _ CIU) C ,4 4 ) 4 4

o 4. -4. OUc~ CTCUo >D 'IU4)4 44) U JC U (A

E C1 3 O ' t1 3 0) C 1 u > 0 * 0 > O 4 > V 4 ' v X

V'V4)0..4)4)C~C0 Uu m U 4)4Jcfl 'VC4)*-. O a -'4 ) O~D-~4)~ 'V L0 'V0-. CL 'V.aNjJ ·C'Vr _ ~ U)

O X X > X 0 QX O a ' X O 4 0 XC X X ) 0. ). )0 OC X f4 ) C 4 a c a O. 4 10 ' V O

14) 0 o U v 'V'V %' 00 .' 4. 4IC .4 Eu

~c04)0m ~ O 4)0 fl*-. ~~i 0.4) -Eu 0.a w 0. c t: LO -m> U'V4)0p ..ela I.(a 0 a) - 0 w o 4 I - 0

. _- E a uc 0 a a 0 4 4L . W47 4 ) 0

78


, vl ~ 4 N O O W N

LZ N 0

W : c

.CO.U~~~~PENI I4

z1_

j .1l o I ' oC w

I01.01 '1) %O v1 C4 C) %O co

a C eU on N 4 o N

.,~1 ~ e4 ~ C',,I C,4-4 -

4

m l

Eii _ uC o . 4 1 o N

0 0I , t411 N N qA 0 N UrA ' com',-'f1a N 0 N 4

Ai ale N _ N

4111

· 1" IOU' 0 w; ~ .C

w~~~ ~ ~ ~ v " _i AC II Iw _CD

do~~ ~ ~ ~ % W (D 61 h 14 Im r _

.Ifat I *0 q. 0 ,-C 0

.% , 0. ,, ".t .- C -4 X IV Aj =.~~~~~ id -4 .-aLA .4

.~~ ~ ~ o 1, ,-I to ,

a 0 ,-, .13 0 :).i Ml ,

V -4 .4 N .- N

v1 01 V 2- fXvt N .4

O C: ', Q1.~ CO

=~~~~~~ ,: o, oo . .At . 0, I,

01 L. .W U o N N C O 02 ,

~e ,:, "3ov wol c.- o ~ 0.- 0_"o~"

WU01~ S tS -o -e

01U4 01,- ,

- e 00iZ 0v .A c O l v 0 .4 . 4 0

Ai = La W0101 0q. 4

v.4 v Z la

= a% >v41 C 4

(C0 . 0-. ..

III 6-%4 to Z .. 4 a UI, c.^ " 3.c ec _ ii, .., - ,o , , . ,,o o

At S:" 0 -in O g 3 1 ,

O1 16 C C .X_ 4) - ,l

I 40 CAZm .0 0 1 c 0 0 - I

0 W to: A · i,. 0 C 0 O).,u I c = I ·, Ai

w C~ w ID : . i l 4 4 " C~ 4 C 4~ ) .M -, . I 0 C1 - 1 4 -4 0 ~P )-

W u mO 0 0 0 e o 0 0 a. e .

-. * C.~ O- .1 .. ram.

J CwUp1 i C W *C . 0..-4

= CO o OD 1 @^ C C 0 C

*UL1 J0i .~. 0w -4 0 U* 0*0 0* 0*1Q' 5

F C c eIII V *0 0 ( 1 A - .c e of .c 2 >0 o 1 u (a V a *4 C OD4 _W I = N , I I C

45 01 CQ ' 14 4 4 O C-. C- O O C4 O C O 'O

01. Q4 0 l400 -4 0 n 45 h s O laWO 45 '45 e b e.4051 W b C .4 .C C045 C-- 0 U0400141 0 gwoo * v v _ C4L. 41 V0L 4 -1 j i u

0 ' L = C 0 40. -4 Ai 06 v 01 -a 4 W is-0101 -b3 0 -41 4 C0 : I0 CAa CU l 1 4 C V a J C

'A , W 4 -4 *4 0 Q 0 I " LA Il V V I w o -4 C; Q

C C OW 0 0 e e' a W 0 5 W 0I 0W ai14'O oU t 0 1 0 n : > u C W 0 ( 01 4 0144.. c >

01 w0 to U- 0- N4 04' 0 1 00C OOU 000 0i to W

'- 0)0 W d . -41 C to C C >C 0 * C > C0 > C C"4 V " 4 O1. C03410. C 0 L, Vs -0 U C h . 4Z0 ". a, (A0

44 J .0O W E- 41 _ _ V C ' 4e * M 0 .0 * 4i2 MsJ J A * ' C0 -* h -4 0 01C 01 -4 4 4*01 -4 0 01*'

*C1 00Q 501 o'. 0 05 00 > 41 541 cs sc-

A01.1 041 U1 llt0 t011 C s C0 t -1 C~~~f 4120 > l4010 - 0.0 ,, ; ', WU Wl. .. o*41 v0 .v c 1

*4 a * * |' 02441 (1 10 U20 vni c u20U-oj . e -a,9*0. o o t o _

79


l~ m;I ~, O . r~ C , 11 m 4 0 0 N-

30~~~- 0 Y.C:

0)w co Cm co 0ru r

to un cD cu cin tr e

.r , , u- 0 w

IW 0) u' o ( co ,.-I 4 .4 -4 .- 4

,"r~~~~r

w .- cmeq.V0

U~I C a) n4 C:) eIVL) 2 2 '0 '.0

t041 .-4 4 0

r-

Ln C14 Ln co C~~~~~~~~~~~~~~~~ co('.0

Mr~ ~ ~ Ms CDf~ q 6

~~~~ U)~~~~~~U

'~~~ ~~~ 0 q. ~~~C! r

.) U " a ( -4 r'

41-4 ~~~~~~~~~~~~~~~~~~~~~~~-0..,~ w i f}N .o N. 0 C LA2

o ,~ r- N4 % ,Li ) C o.

C

·4f - · ' · * N .

"3043 C W> o -Ai Oi. I-I .4 N 4

·i n, -. wra 0Q Ai0 LA 0 .. C~~~~~~~q% O

'0 ,--

CJ, C .4 ) i4 .Z U) *

,-d ~ ~ ~ ~ ~ ~ ~ ~ 0 .C4 0',- .G;14 '

.4 >4~c 0.a-c Li &M N ~ 4 %O 03~~~~~ ~~~~~ ~ ~~~~~~~~~~-.t ~0~~~~~~~~~~~~~~~~~~~~~-' .,C ,.,=

0 C.)CL ' 1 0 1 0 N -t O co. 0%

U)- 0 0 Li.,LIV -W i 0Q

IC I ,,,

4 G

C >4 ." 6

WW 4) Ai 4 " 4 = C o 4)

- 4 , 4 i., 01W 43 ,4J -_ >.. m .

.'L gCOaC 0 1 4 I4 )J L

Ai b A = IC . r 4

Pc r(~~~~~~~~~~~~

0 c OI0 0 0 03 4. la33 3J..a ~ -.0.i O00 C CO 4 A C.-' . UUI ;414' Vu q Ai 4. a to AL *gAi .g t-.

41'aw; Li 410 10 .0' EU Eo02 4

a-041CW c C ~ C '~41 la .w .ov4 *-4 L0Li.0L) :3 = *- @ o c 00 A A 001 0 '90 4

Ai o ICI 0 - CU-- 00

dp'C ~1 0 1 U, 43 4) 'C

~~~~~~~~i~~~~.M~d Li. 0,t. 41 '." u- i C 43 ."4 1.~ W43) O0 to O 4 4 t.. .0m 0 1 0. Oi...4. to 0

= CP i~ = to 4.. *.4i i la~ .L41 "4 M -43t W 0 1 to4 0 00aLI 00 U-44a' ..

-4 MEG) CO4 COi COG) C ( O

'0 aU 0 0 .4.C i i 0

t

Li41a 4 -. 0 000' 41 0.0 to I) 41 Li 06 . .a Ai

·-- , LA A ' ,O m0: c 1 a Ai41. 0LiCa - C 0C 0. w L -.4Li -.I G P - A u 'L

·U 0 0 ; GP a r4 0 0 f" u c A 4

Ai .. j 10 .0 *-. 43041 4300 4 30i i 430 4 (c4,ar> 'Jl > 0 IC 0 C:I 4 O , 0 00 > ,W ::C 0 cr 0 Ma A - I .- 4

LA .4 N~ 1 I4L '

0%mi 4 t w Ai4 4 a A 4v i I aAi R

0 M I..cb- 1-4 0 ?A ONAi (A OMt4i~ cc I'

.IrrO C~bo ao a OIso



from 50 Percent(note b) Number (note c)

6. Which of the following techniques did yourState employ for its 1973-74 evaluation oftitle I? (Check all that apply.) 48

Aggregation and analysis of data fromlocal education agency reports 45 93.8

Educational audits and their results 9 18.8

Statewide testing of title I students 11 22.9

-_Other (please specify) 5 10.4

Note: If your State did not test title Istudents statewide, skip questions7 and 8.

7. Which of the folowing types of tests didyour State administer for its title Ievaluation? 26

Standardized norm-referenced tests(note d) 21 80.8

Criterion-referenced tests (note f) 3 11.5

Other tests (please specify) 4 15.4

If you do not use standardized norm-referencedtests, skip question 8. If you do, continue.

8. How did you report the results for thestandardized norm-referenced testing? 28

Raw scores 2 7.1

_Grade equivalents 24 85.7

Percentiles 6 21.4

Quartiles 1 3.6

Stanines 4 14.3

Other (please specify) 2 7.1

81


NOTE: In he following five questions you will be asked to rate several aspectsof the Elementary and Secondary Education Act, title I, local and Stateevaluation reports that affect the degree to which these documents satisfyyour policy, management and programmatic information needs. You are askedto provide overall judgments on the adequacy of the quality, informationalcontent, and utility of the evaluation reports. Do this by consideringeach of these attributes: very deficient, deficient, marginal, adequate,and more than adequate. Check the box which most appropriately reflectshow you feel about the respective local and State evaluations with regardto the particular aspects in question.

9. Rate the FOCUS AND SCOPE of the local andState evaluation reports (notes b and e)

(Focus and Scope: the adequacy withwhich the report covers the essentialand related material and the appropriate-ness of the emphasis and treatment givento the relevant topics, details, andhigh and lower priority information).(Check one box in each row.)

Numberre-spond- Very More thaning deficient Deficient Marginal Adeauate adequatefrom Num- Per- Num -Per -Num- Per -Num- Per- Num- Per-50 - cent ber cent ber cent ber cent ber cent

Local reports 48 0 0 5 10.4 19 39.6 23 47.9 1 2.1State reports 48 0 0 4 8.3 16 33.3 25 52.1 3 6.3

10. Rate the local and State evaluation re-ports on THE PRESENTATION OF REQUIREDMANAGEMENT INFORMATION NEEDS (notes band e)

(Presentation of Required Management In-formation Needs: the extent to which thereport presents the information needed toevaluate and update current policies bythose who transfer policy decisions intoplans, budgets, program implementation,operational oversight, resource alloca-tions, forecasts, status assessments andreports, educational accountability,costs, benefits and efficiency assess-ments). (Check one box in each row.)

'Numberre-

spond- Very More thaning deficient Deficient Marginal Adeauate adequatefrom Num- Per- Num- Per- Num- Per- Num- Per- Num- Per-50 baer cent ber cent ber cent ber cent ber cent

Local reports 49 2 4.1 7 14.3 26 53.1 14 28.6 0 0State reports 49 1 2.0 6 12.2 22 44.9 18 36.7 2 4.1

82


C 1 L1 _ICe GJIE, LI C, CQ I %

Z ~ la z, qz OI

1 JI C U I JiI ,.-

JL$ C o 41C

t10 1 oC m ~

: ZN I:A en

O weJv * 4* q C ·x>W c m X r X W Ai _ -M VW ~ * CD DeV C V a 1 uc w S W 'eIE I U |

- -~ a u oz .- c J,,, ·L c o _l 40.i: QC > II.l Wj ; R q4 0 P.ito.),MM g GiA. C T '1 (Li II 0 CJ C

_ O a =U0 eI ml lo J

004). .) 0C C rd-C )4~tOD o J Ai ta C0e -o ; C JJ0 C q) 0 C3 N

la Loto-4 l Ca .. &tto U -) i

>4 4 .1 4)in -v

.- t r.(C 41 L kw V oI 0 LA

' a i0v4 w Z m 4 ' 0diCC-00. (aU) v - , i ~ to U-.0 0.1. C 41 MO.C: i ON44 0 C -1.4.10 WC 4)

44 0 44 )4 Ai L OGZ M I " C ' p Li L.0 w -4 ,, Ji g, *M Ai I I04 4 -40 W C 4.,

to 4, 4 C 4 V U Q04 w

0.4 1 i 0


'I 41 .. 9 0 0 W.CI

cWI4t 4

030 03*0I

- IC

' . I -4

41 0~~~~~~~

'1 -tn O~

U, CI

"I do~~~~~~~~ a WI

Ai a it of co to~~~~~ el. t-I I %WIZ .01Wwin~ ~~~~~01

2 Vlm 431 0I v

3 1 '0 6 0 .'60.03 I ' ~l~ZI -

uIg*.uI r.,.e 43"C u V ej

WC w~~~~o

3 40 1 1w %W 4 >0U. -m @4

.0~~~~~~~~~~~~0

.0 I C moo; ko % .4 -

a 0C Wi r -v v -

ZJI . *I I W

4313 w Neo3.u9 q q 9 .

A IA II to 0 m

-a 4, '61 "'~4 a W '6 43 CC 4.0 '6 laiC Zi 30 ou 0'aC 0h '6 '

4140 00043CM~~W10 0 U'A 3.. 'A .

C Crl

4., a C 0 ' 6U; .C '640 all~~~~~~~~. 3" 1a'-

40 0 a W C :1- a a 010 'a 0

ID A . 3"M43 IDC43 0b.t' 43 4C 0 460 M41 I 410-o . 433I W 4#43co v3>' . '6w L. %W 0 04341 4 3C0 0. > C. C.CC 4L.C a)W 4 0

>,i-i I I ! 43 1013 1 Li A3 a

to .34 O34-0 'a VM C~(~41OW I i-

0.0 IV OW"'6~~~43 I 43 430- W *L.4 3.. '6

X&~~ ~ .0tc~Oo @404 - 1C.C'6100 ~6 0.I 3.C 'Or. '3"3. m.I43C 0.flI % W 3 0 u u WI-C a C

.4' -CW41L64 20 Ai '4-- 3 3. C a'6 2- C~ C . ) > C 4 0 4 E4 Z 0 C GPF 03 a wC- 3 top O-I 4 0 C04 4.

j C)[ 40 10 C -0 cr -5 C604. I" 3C 0 00 0 0 C 043u C u -'C a 431'0 W 4 443C00 -U~~~~~~~~~~~a2 43 J41~~~~~~~~~1 W 144 41. . 43 434.OZC g 33 00 0 0.0

Ac,5 C m -0 5.

Ai 4- .. ' 4 4 0 0 4.3 41 a1 C' ';p C Cto -24 o -33.(L0.0 0 43* -3 Wi. U.c Go 434 GP 43

4, z vIDAi a -

pL~a iI.3i30l O 1 43433% C 4b C 3 ar

1z&( 3 1 = W a O.Q C. a M

14341Z4 64O 0.0.00 * 4343 "lo 1 9 - .- 0 03

84


I 0% 0% ·

IV

"1 '- , ,

4 0 0 q " r

14i . . .

a2OC 1 h B

4 i1 in c i

11 0 '0 t

.J La I q-.. I

a ,

· - 0 a o t.. o

I I

ca D L @ -

~ 9 o m 0

C U @ 00 0 D 0

GDLa o 0

( 3 0~ 0

C1 e 0 n .4 .La C0 C U 'U

U .: C_ C _ _ ..

/ @11 , --4- i G C

, C O C uf @ @

O 04 '0-4 '0* .,448 @ L _C > >


P ~ ~ ( ' Mr 0 0 (N 0%

: (. o .

znW~~~~~~~~~~~~~~ ~ '_

0) VI C4 4 In @ N do _ Q

- - -4 4

ID. Q| q. 0M _

Z MU1 -In

I1.0

44 j . 0 ~ (N t-I (N q

c l ~i

* el Ql ~ ~ q O @ r•'l _

(-D

-4 · 4 (N·,

3~0 -_ 5e ° ~ (, b" ' GO 0N0 t

A:~ ~~~m 0~,1 i C ~ .i - N - m r .

° ° ~ u4 szc e c S c e cr c J 0| r NO A O °

Ut j. -Q -4 0)

mJ C 01e 312

Ca1 LI la Q

V4 a. = AJ.C w a, C e~~~n . 4

ox ' '. g ' i o ' o _ _ _ a ' a' _ _ 2,-,vS U OCa 9Ou 2 e 4 -I

0 O0 4i~~~~~~~

V m_ · I.M 0 C l " In

o 1 A.4 44 ALZ

o ' e "- , A, ez

uj -*= C " ~C

4, IQN a0 *

__

OMM.J.-. *VOGI.C.C4 6 C-' GlaO86

W l4l tl, at..1, 0% tl,--,

Cl GJl.

Ai"C 0 4 0 GD'CGL' 01- VU 0 GD UGo 00Ai 0 to 61 N 040 "

Io ];'I°.j ° "6 "" '"'0.~O G '" C ' '"D~ C GD M~CC,: 'L .CCAii...,, - AiC 000 00 0*a .- 'I

M1 CM46 U M U N 0 4D 1 - i0 C ie 4 4j11 At'U M Aq 61.... At 0 Ai4 Ai. 0.C4 ul16 0.. Ai.G Ai0

Ca U 11G GCG .CU MC..e N i COV 3M 4

C-) VN £~~~.> D. . .0 CG C CCGD 4. >C a D.1001L.0 Aw 04 O' GD £ G I GD i. 06 . - 0-C GD 61CGD-4.M~~~~~~04.~~qJ4 CO 0C.4 C g *.4 ..ei 4- wie C

o' o. e 'v-L. ri -. 4 = -4 40 ,' - -- ( o- .

0 v Ai ig--N C¶ a C IlaCn.

j ":"a 'a'='" '"o""':'8

86

APPENDIX I

1 ' ~ N t o C o

a . 04 -

l

C' 40 40 A

30 4

,.CI . .. ·4 . o , .

tE1 -

.1 MI .N .' . a -4

ltI I 411 o o u -Z .

S f E - j rrl o 0 (N o 0 o

W V * '.

to °-0 6- Ea 41 C in Z0 , 0 I0 , 0 oi l0 W LI ,

410' 1 ' o

W 1

4 C r1 C I C Cto I Lio . uJ O_ o . 0 4- (N rr E. C w wIs 4 Li_ C-iO de oh O* C O t

v aa o _ O = 00 .4 1 vE C11.4 C.- 4v4 U vF 4C v1 t. 4 go CC 4tl . 4141411 -4

w4.10' 4. Ou v- _. - - O'

v0 c 4 O a C v) v_ I' GZ 0 C ar0 - v4u

U* 0 ...

i . t,,o l aCv 41Cv * e 1 * . ) 0

=1C

M4 410 r4,, ~ C 0 C10C4 CI I C4 C t Ai -L1 i 0 0 04 c ,. * 4 e o u._ .M41 o Ai M4 -44 -.4 141C .0 0 U 004 0 V 0C4.,,d .

- ~c OOcC 40 14 54 idO10 v U . C La .41 4 Ai

0 t4- a el @ ~2 > 0 I · e0 0 0 e. . !e

k _-i C at 0 C 4 tl W O I 41 -n. 4a O-0 _o O C w o Z

0 4W 0 U 4 l 4 4 I - ('4 .I1 C t C 40 C CA _4 0 V CI e .o

87



from 50 PetcenE(note b) Number (note c)

17. Which of the following techniques did yourState employ fo its 1973-74 evaluation oftitle III? (Check all that apply.) 48

__Aggregation and analysis of data fromlocal education agency reports 41 85.4

Educational audits and their results 27 56.3

Statewide testing of title III students 5 10.4


Note: If your State did not test title IIIstudents statewide, skip questions 18and 19.

18. Which of the following types of tests didyour State administer for its title IIIevaluation? 14


Criterion-referenced tests (note f) 8 57.1


If yoo, do not use standardized norm-referencedtests, sip question 19. If you do, continue.

19. How did you report results for the staid-ardized no.m-referenced tests? 10

Raw scores 2 20.0


Percentiles 4 40.0

Quartiles 2 20.0

Stanines 5 50.0

___Other (please specify) 2 20.0

88


NTE: In the following five questions you will be asked to rate several aspectsof the Elementary a.d Secondary Education Act, title III, local andState evaluation reports that affect the degree to which these documentssatisfy your policy, management and programmatic i"nfrmation needs.You are asked to provide overall judoments on the adequay of the quality,informational content, and utility of the evaluation reports. Do thisby considering each of these attributes: very deficient, d!cient,marginal, adequate, and more than adequate. Check the box which mostappropriately reflects how you feel about the respective local and Stateevaluations with regard to the particular aspects in question.20. Rate the FOCUS AND SCOPE of the local and

State evaluation reports (notes b and e)

(Focus and Scope: the adequacy with whichthe report covers the essential and re-lated material and the appropriateness ofthe emphasis and treatment given to therelevant topics, details, and high andlower priority information). (Check onebox in each row.)

Numberre-

spond- Very More thaning deficient Deficient Marginal Adequate adecuatefrom Mum- Per- Mum- Per- Num- Per- Num- Per- Num- Per-50 ber cent ber cent ber cent ber cent ber centLocal reports 48 1 2.1 2 4.2 16 33.3 27 56.3 2 4.2State reports 46 1 2.2 5 10.9 14 30.4 25 54.3 1 2.221. Rate the local and State evaluation re-

ports on THE PRESENTATION OF REQUIREDMANAGEMENT INFORMATION NEEDS (notes band e)

(Presentation of Required ManaementInformation Needs: te extent towhich the report presents the informa-tion needed to evaluate and updatecurrent policies by those who transferpolicy decisions into plans, budgets,program inlementation, operationaloversight, resource allocations, fore-casts, status assessments and reports,educational accountability, costs,benefits, and efficiency assessments).(Check one box in each row.)

Numberre-spond- Very More thaning deficient Deficient Marginal Adeauate adequatefrom Num- Per- Num- Per- Num- Per- Num- Per- N- Per-50 ber cent ber cent ber cent ber cent ber cent

Local reports 48 2 4.2 4 8.3 21 43.8 19 39.6 2 4.2State reports 45 3 6.7 1 2.2 14 31.1 26 57.8 i 2.2

89


· C Z Q m. 'I"I .Q 4

_,1~ GI c· ·° I o'

@iiO q CI , l 0 1 °,- 0c

I0q:5 ~ 4)4) el m4 0

OM.0 Q-' ~.S Z _' 0E C 1 10 @ FO v0v

· v. o .- ~' ,, c) @ c oo i .

· " e|i eI ., o-,,2 C

41 "00'V00 " b Cl. ==0. c o c °

n3 a e w o

90,C

41e0 ' bMUO 41 .0IC'~~4U m oV O Cg 1 4r ,-

I ,. Ai 40 f 'A of

w >%C 5, Go C UOOCOIVOtoI 4 r-,"o. '.d ~WW 041.-'C lOoo t

C ICQ 4 la ) · 4) r .Wm O LI OLc0 4 ) Of

0 Ai- e) '- L 0 -m' . l .Aic r " ( 4I e v . 0 -''a 41 M

.)~ 00-4 a .C 4 M,* O 0 a,'A Iw0'.O

:4 x C .1 > 0.ugoaa4 00 Ai M 9, l lc V c (A 66 0 4) x444 -C0 0L...Po


I C I

I-, 4.4 ** 0 * w

V, CICA, la u ul I.: W.u rl 0

0 1-1~~~~~~~~~ 3.9, .1

3h: 4 -0~~iiJ' - r,a u 'IL I u*4 F .

C UV .cno1 .4 (

is - 4' .v co44413.CI 4.

I Mz I .fl - 0. IV

II MI - 1' to I. . .

11 - IC -0-0 44C .. 1601 -, 4- w 4 * -

iQ- ,fI I -C11.o

'64;; 0 ~~~~~~aWI a C I40 .4 ~ ~ ~ ~ ~ ~ ~ ~~4

C6111,~~~~~~~~~~~~~~~~~~~~~414v a -CIOm A 3444 04 1 - 66 C4 4 II

CbS I I t.4" 1 f.

.M.C 4; C ow 0 M a 0 6 a fL

la I&AC3-IM CI 44I.w 44 4 4 4

144Mb 4, q IZoC , 00 441

a10 16 4i a I. 0 ' " ""2

0 n M Q 4 %" 4a UW I Co61~ 1 3.6 , 0

la z a

44-4 *L0 CM 4Cu

cw~~~ Aic;'~~~~:~ 61 ¶44 ~~~~~3 .~~~4 i:.aU C lU n 4I

to , ea-H4 w40034a.4

'ad. IC4.44-a4 I I -

ID au ~0 am 3 .00 Ai WM3. 0 44 40.~ 4g

XI L

0: 6 00 00 or *-r44

Or C 0643 . 0C 4 cq - . 4C, V

41.1 u a 44~i 0, C I... r C

44.300 44M 44U3.~W 44- 44'6. 4.40. 441 444* 4*444 W2 S1a 0444~ *--4-~' 4*3 WA~ 14* 4444 44-4M41 4444 a0 Ol 43-CV

4 OMa .4 -4 EN '4 4 .AC

Lo 6..S 4 -. -.

b" 9 8 :0 Z . V'j, I. !. 6, r.C . >() U. 'M 0 M Al di C L D C)

Oor a~~~~~~~~~~~~~~~o


Numberre-

spond- Very An often Occa-ing often Gen ali s not_ snily S L domfrom Mum-Pe- N um- Per - um- Pr- Nui- el :Stte reports S ber cent ber cent ber cent ber cent ber cent

(1) Information on the manner in whichthe needs of children are aieSased 40 4 10.0 13 45.0 7 17.5 6 15.0 5 ;2.5(2) Information on the number of children

in the program 40 21 52.5 13 32.5 4 10.0 1 2.5 1 2.5(3) Per pupil expenditures of each program 40 12 30.0 13 32.5 5 12.5 20.0 2 .(4) Evidence of ualifiable or measurable

achievements 39 8 20.5 15 38.5 IA 26.2 2 5.1 ; 7.75) Evidence of oualifiable or m

easurablepupil benefits 40 6 1.0 14 35.0 10 25.0 5 12.5 5 12.5

!/:e reojested that the ouestionnaire be completed b State officials familiar withthe State's evaluation efforts conducted for lementary and Secondary EducationAct, titles I (regular, migrant, neglected, nd delinquent, etc.) and III, and teSt3te's own assessment efforts. The uestionnaire was divided into three sections:section A, National Assessment of Educational Progress and statewide assessment*ection , lementary and Secondary Education Act, title I evaluation effcrts,and section C, title III evaluation efforts. e suggested that the uestionnairzbe taken apart and distributed to )ersons most familiar with each oart, Such asthe State director of research, planning, and evaluation for section A, the Statedirector of title I for section 8, nd the State director of title III for sec-tion C. So-e of the ouestionS from the ouestlonnalre h.Ve been omitted frcm thi!reoort. vuestions pertaining to the National Assessment o Educational Progress,nJ i tateide assesaments are shown in GAO's reoort to the Congress (D-76-113),iated Jl 20, 1976.

I, Acril 15 we sent the ouestionnaire ton the education agencies in all States.~nd the District of Columbia. By June 1975 the District of Columbia and all butone Statz responded. For Furposes o comoiling reesponses to the ouestionnsire,the ictri-t of Columbia is considered to be a State.,/This colJmn hows the percentage of respondents to the uestion tat chose eachspecfic answer.

- tndar;iz?d norm-referenced tests are tests which purport to assess the indlvi-dual tuent's aoility or achievement in broad subject areas as cortpard to therest )f the tYst pplatiton (e.g., the Metropolitan AchievemenP Test (4AT) or theiechsler ntelligence Scale for Children (ISC)).

iThe percent colji-s shov the ercentage of rerpondents to each line item that choseeach category. Where the percentages on each line do not add to 100, it is dueCo roundinn.

f riteri.n-referenced tests ere tests s cificially constructed to measure students'Jttainne t of oecific ducational objectives or proficiency with pecified curtri-culJm material. These tests, which may be teandardised, usually orovide a speci-Cic and operational description of the level and tyea of task performance or be-lavioral teasures used as a criterion to indicate attainment of the educationalb3)ectives. For example, the student must be able to comput? the correct productof 11 single diltt numerals greater than zero with no more than five errors.

92

APPENDIX II APPENDIX II

RESULTS OP GAO'S

LCCAL EDUCATION AGENCY QUESTIONNAIRE (note a)

Number ofprojectedresponses Responsesfrom 8,936 Number Percent(note b) (note b) (note c)

Section A: To be completed by questionnairerespondents from al local education agenciessaampled

1. How familiar are you with the research be-ing conducted at the Center for the Studyof Evaluation at the University of Cali-fornia at Los Angeles to evaluate theutility of many popular commercially avail-able standardized norm-referenced tests?(note d) (Check the one response which bestexpresses your familiarity with the Cen-ter's research.) 5,987

Have little or no information 5,059 84.5Aware of the Center's work in testevaluation 601 10.0Read some of the Center's publicationson the evaluation of norm-referencedtests 192 3.2Used the Center's material to assistin the selection of commercially avail-able standardized norm-referenced tects 135 2.3

100.02. How familiar are you with the Anchor Test

Study conducted by the Educ:ational TestingService for the U.S. Office of Education(OE) to provide the ability to translatea child's scor" on any one of the eightmost widely used standardized readingtests into a score on any o the othertests? 282

Have little or no information 282 100.0_Aware of the Anchor Test Stu' - -_Read the Anchor Test Study

Used the Anchor Test Study -

100.3

93

APPENDIX II APPEIDIX II

' 'U II i Oi r 0 0 0% I ,

L C Z ._ @ o (

4.(AIUIto 0 0% 40 4D r- E

' C . . . . . .

V~ I W iC4

SW 0 O m In n , ..

o ._l~z l ;l , - 0% N

I I 4 C o 0 r- N o

'0 U Ew9 , r- w N P1

0 0 r- ci _p _ 1 4 -

CC4 , 9 . 0 0

O I c -4 -4 -4% co0 0 4 N _ 0 0

O 4 U1 PN0 O

4oC a" I q o' C r'N N "- -

O%_ a o Ni N 0 : eI r,

_ i U E 4 4 4 N *

L. _ 0 "I 0 . 0 o 1 "' 0 m ry t r- Q

a - laU0 1 *

,cC C -4

o .. o, o o,Mhi US I O ,tl- ao @1 Co D4

f. hi S Q -' O " > I La sCI o Im C _

vo0 a rg O,@ 1 0 4i. C .4 CC C 0 C j 4_ U C c -

0 u a a= ' Ai4a SCg U 0 0 0

@a U * 0 Q*. C C 0 .0QC C4 C C AiC Q OIC- C1.4 a a o C-O 00 SO A * 4 j 'U 3

- O 0 0 'I C 5> - O O Mg 4 * O i Oo 4 hi=ZJO 6* 0 0 4 C C O C M

u O 0 0 .-Ai a t : C OC aa -4

o * C 0 V coc 4 0-400 0 0 o o*

* 2 0 0 -4 41 0 O C Ai a C aP4 P1 4' m

* O - 4 C Y4 O ^-- -a e

OC aC _ _3 _ _ __

APPENDIX II kPPENDIX II

II &l N -O Sn -· 0

Mr Cii-It On I O - -l

- N f -4 Sn ' 0 1'IOU:I - N - -"

a MICS

IEWI) * N n 0% Sn - c

10~ ~ ~~~.

*-, - - -

D 0 0 Na(CelA a SO - Sn N N 0& A

0~~~~~.U- u~~~~~-0% -

J02 4 o r - Sn (S -400 T ! T_ 0 0

C~~ 14 .0 o~~~~~n c 0 -~~~ a

A %4 0 ri 0 c-0 5 -

4I II I I .% W

l SaA N a

c 0~~~~ 0 - 4

103 ~ ~ ~ ~ ~ ~ ~ ~ ' -A i I. ' a - 0%u

'c Il C 4 N0 Sn Sn 0 Ai Sn

C ~ ~ G ~w4; 4;l4 4; 4;.4;

5 0 0 40 c 4i a 4 .

at ASI 'C 0 aW C C

~..4j'Cua UI *1 -M * p

-? - 0H 0 C

o s ' .0 ' s S O I a... t IsO"' Is o.*s C6

E'O v-, a a am 'c aa a

1. 40 14.40 WIM-r

9 g; ye I9

.4 '4U U4O~U0Co. 'C M

z1 q~_· ',· M 4 .'U CP9 O C~~~~~~r C hI

*- @p0 C~~r r

O.Ua0S0~I*Om·u * U S....· · ~~~~~~~~~c CS *u *E .

"*~8' ,d"O'Y~~~u "~~ lJ COw 'IC~r

5· 5 Y16=LIC ~~~ .I 8 u, ul u .' e u NI

r U~y ~I· Ic~·j95


CIA. 0 ° ' N ° '._, In . ..

4 l'f-A *0 N '0 0% . N O

Z I .-0 1 0 .4 c 0I~1n -

N 0 - N

ZI

CI Ql _ , i

.'Z 01 .'Cit 0 *0 IA a. ~0 N N

CD co '11, aL D I

£1 o. _ _4 1 _4 -l.

tI -k _I W li 1 I _

IU~ e . 0 . . O .1 I

1 n14 O 00 0 is

141 .4 0 0 .4 7 *0 0

414

1 fi % e -> N _ N -w

- 0 .0 .4 4 M 0

%a In 'A Io

,Zll . . .M . .

_ C,

IwIv.0 N e1 0 - N u

CI O0 _94 0 0 .

a I , . > Il _ I

* 31 -: * O C C

jI I 1 O I e 'n I

1 * 0 0 b 0 *0 j a

"CC 0 *0 a. .4 Ca 41 C ll N C

I v. 0 _ _ N I . W o

A 4 Q 0001 0 IA IouC C C O o a ,

or ror U _a _ 4i u e

16 0C6I0I 0 0 o _ 0 0 0

C C 0 C a a

'6£ fl.4 A a 0 -1 C C C u

.44~ul 660 6~ '.144 '4 t " a QC W C C 6 1 4

C . C6 U 411

3 ur j 0 << .

h6U1 16 .4 * · 4 ?z c' @ @ UQ

0 0 .441..._C 6 01 C C 0 0 C .r·0 16 *. c * o>00 16 IPr

60444PO £6 £ £41 C. O* * 444 44 O 4* 4 44 OC

2t.3Z 0 B= O* 4 ~ O I * 0 It

= 64_ 44444044 U 6 *1. * 1 009

60..4*-6 66 Q4 6 US 06 4* 6 £

96


Number ofprojectedresponses Responses

from 8,936 Number Percent(note b) (note b) (note c)

6. Are your local evaluations of federallyfunded programs usually performed by ex-ternal or internal evaluators? (Checkcre.) 8,205

Internal (e.g., local educationagency staff) 5.468 66.6External (e.g., consultants) 494 6.0

Both 2,243 27.3

100.0 f/7. Which of the following types of tests

did your local education agency adminis-ter for its title I evaluation? If yourlocal education agency did not testtitle I students, skip questions 7 and8. 8,583

Standardized norm-referenced tests(note d) 8,103 94.4Criterion-referenced tests (note g) 2,110 24.6

Other tests (please specify) 1,168 13.6

If you do not employ standardized norm-referenced tests, skip question 8. Ifyou do, continue.

8. How did you report the results for thestandardized norm-referenced testing? 8,029

Raw scores 1,792 22.3

_ Grade equivalents 6,658 82.9

___Percentiles 2,978 37.1

_Quar tiles 528 6.6

___Stanines 946 11.8Other (please specify) 163 2.0

97


NOTE: In the following five uestions you will be asked to rate several aspects ofthe Elementary and Secondary Education Act, title , local and State evalua-tion reports that affect the degree to which these documents satisfy yourpolicy, management, and programmatic information needs. You are asked to pro-vide overall judgments on the adequacy of the quality, informational content,and utility of the evaluation reports. Do this by considering each of theseattributes: very deficient, deficient, marginal, adeauate, and more thanadequate. Check the box which most appropriately reflects how you feel aboutthe respective local and State evaluations with regard to the particular as-pects in question.

9. Rate the FOCUS AND SCOPE of the localand State evaluation report. (notesb and e)

(Pocus and Scope: the adequacywith WhITh the report covers theessential and related material andthe appropriateness of the emphasisand treatment given to the relevanttopics, details, and high and lowerpriority information). (Check onebox in each row)

Projectednumberre-

spond- Very More thaning deficient Deficient Marginal Adeguate adeuate

from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per-8,936 ber cent ber cent ber cent ber cent ber cent

Local reports 8,510 45 0.5 426 5.0 2,097 24.6 4,521 53.1 1,422 16.7

State reports 6,350 196 2.3 733 8.8 2,248 26.9 4,098 49.1 1,076 12.9

10. Rate the local and State evaluation re-ports on the PRESENTATION OF REQUIREDMANAGEMENT INFORMATION NEEDS. (notes band e)

(Presentation of Required ManagementInfrmation Nees: t extent to whichthe report presents the informationneeded to evaluate and update policiesby those who transfer policy decisionsinto plans, budgets, program implement-ation, curriculum, operational over-sight, resource allocations, forecasts,status assessments and reports, educa-tional accountability, costs, benefits,and efficiency assessments). (Checkone box in each row.)

Projectednumberre-

spond- Very More thaning deficient Deficient Marginal A dequ atefrom Num- Per- Num- Per- Num- Per- Num- Per- um- 2er-8,936 ber cent ber cent ber cent ber cent ber cent

Local reports 8,635 119 1.4 484 5.6 2,782 32.2 4,037 46.7 1,212 14.0

State reports 8,486 312 3.7 863 10.2 2,879 33.9 3,313 39.0 1,119 13.2

98


C t 114'1 C il C II4I O -'5 05 ICI Q La

05 .5~~~~~~~~~~~~~~~~~~~~~~~4

.C 4a 4) 01 0 co 4.1405 0Ld1 If co

05mc * 0 51L .CI l

g~~~~~~~~~~~bS ki 0 Y~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~4i41

1 0 44 40

0 5I I - 0 . I c

MA* 4 1MM . .

'555 N CIf I'I6 il m A04Ul01'i M i l0i N N

P I W~0 I '* 0I.. h. in - .E .I C

Olu~~~~~~~~~~~~~~~Ii

Nl Nr N

in tn~~~~~~~~~~c

4*14* .4 en 4*1u

x 4* A Of O0

-0.CJI -14

4i1Al _! W A.1 I rl - V

CS 4Cr CI l

t '5 C .I % co 0

*O10)41C CId~. 4 C 0Cs.IVt 10. 0xk %WI 490.0 ~~~~~-405O.-~~~~~~ ~ ~ ~ 2 09-r.4,- 3 *05 4Go .40 05 405 *A CA MM ~-OC - la Occ0.00 00.4* 05-4 & u 0z O- .410 I 0 .45

05~O .4 '5 Z -- VEJ

4j . 44 de la 4i 1-4 A 0 1 (LO CC m 4*04 4114*1 &fl 4 IV5InC(AAj 60.i 14C L qv .41 .0 a).- 4 J* lO1i C. N 4 . 051

4!*as ,; 0 QI0i4b C w 4

0 ) 0 3 C 0 .501 4 U 0 6, N

'5 0.464 059 05 0)- 05 *Ca. 005 4 '@1 ~ 0 01006 0

0 C Wu1 i r I· 0 C C,1. 0 1OO.C ofL~

0L5M t . ..q -I4 4 5 0..0.* 0C C 00 It, - 0 N r0 CL VI 0 WI Cin 1. n0)(54 05.05 * 1 (4 %a

0 a 0 CO 140 1 ' 0 A 0 0 11) *4 I0 0 5 4 U a " " 0 5 60 5 -. O 4 O ) ·-r A . 0 4'' 0 U - 0 5~ * O

C~ aI. 0 g C 4C' .I~ i

to um .- *. - 10 0 0101 0 1(A

" 6 Y Y1 In C .0CIII 0 c)

*C k. -4 0 * 4* 0 Ca -L4t -- 44

Ca "1 3a C m %I aoG JC C'5 0410C.) C0000 %D c

laC O -' Cn .34 10 u xC m C' a In 0 0 -4 "a05050 8. 8.C 1UN5 4) 0*. '5 0. 0

x 54.0. 0 404 U 1

&L.; OC..,.4CJ '5054 UI'a o.' e-05 c0

0 C0 L. C go5 .-- 04 la0 a--*. -En0

O: 49 4 to 0. A 0. h 10 >U j C- )V=

r a

-I ~ ~ ~ ~ ~ ~ ~ ~ L L~a4 * NI12-4 -4 C I . 0

0 _ . 0C to 'O MD ) co fa wto v > C p1 la Ai Ai 4 1 Ai 4i~ OU~ r1

0 z C -~~~~~~~~~~~~~~~04

· U* O 44 C0 C. U l~i w aC06m r~ w w 0 Ai Vgo C at rJ 00 1 a, t-o I o1 0 04 3 C wL C la000 C4(p Q.

0)4 " WV 9,· W i 0, u0C O . ON to (D w 0)( 0

99cr 0

wiO C u L) U~9


,I IV:

I

01 01

_1.-, L-,,,¢

ICIU 1 1 n 9145 a 0_ N _

>n ' 14

eQl

C~~~~~ l 0 ii0 "

Q 0.1 m

d ... r o *

454510.01 .0

C~ic~l+ ILn 1 _

45(a0tW :I.IU -

.115uX 0 A 5 .. 0 0.

CI I 0 - '

4 10. . w

I~' ~ .~ ,4l. 4j CI I o .

At tol r o4

C45 C1I .- ca II, 4, 4 ' I5X' v0 ' N N

p 4 _ . S4,= ID I . .

_. M_ 'a 0 > C U U9 C (y V n --

¶0.0 N 19 c

e(A (J ra 40'I V 4 o

o_ _ e - i.o N * 4' '0 'a In V NC Q OTIIV 0 m4e N cD- 1 4

o= a v 0a l a u a

aI E v U

I MI MI 'a -I

1-CC CA

S o 4 4 ' r E 4 45C i

X 4~ -u c ei- ' 4 JI c u' 451-.V'454G 452

c0.045V4 o I 0 - U M . 4 - e u.a 45 Q e e 2 c

0, c I o 'A 4 cQ C - O O -P L u .e.c e - _

q o

oc0 v v5 v va 4ci 45-145 f eO cU e 4 41. O OMI Eg- U 4 Ai

,c -'I I 45e 5 e C ea o Q 0 45 4 ' V e

UQ

W .5 . 0 N9 - > : 0 C .V M 45e ...4.C O 4 & C O l o e ' C 4 C 0 e

M5 2 ..l45 u0 - e 4"U =0 - 4C O O 0 vV2 0 4 O * e e -e u -~ .w454sX 450 4 5 45 14

MoI^ Cxr O-9 1- 4 =o *5ucc c e

0 u a455. CI5CO.4 0O1- S.055

CU CU v

2o1 eV

45 Z45 OjI_ .--4 C w

45C0- 1U *045fl..B M 45 452400 45

~44c4500 1- O~N 1- M'>- MM44 50 U Y

APPENDIX II APPENDIX IIAPPENDIX II

Z J IU, tn_L I qD

, . .I

ooIq w q.4

cn-I UI en C

Aill 0 CI

0 C: id ...4 L4 LA

t goI rr-

.C ~; ~,...4 e . .

· -w~, C.). I C -,O ,

4· -. () 4o

01= w -,I L 4 W Jtr~c~0 k Cl . .

:4 i34 ( 4UZ.0l

C41C w to41 ( r.0 0 .

'4.44) 4 j 02 ojL 0 -l '-

0.C WO44)

o = Lit L 4o Ai 0C 0 'r.. 0 SW DA

ir a a r-I a Z 4 co rb0 -4 v W t .

4 )OV4 1 E , 'U E* I a N

c l 0 c a t 1 '-

w Li 0 (1CIla I4 'I 0 Z 4.

0100 p X

O.LIW 41 43 Ea U .0 V) Lie


_~APPENDIX I

z4 .l a . * a . _

I Cl

@1~ 4a I -. o * 0 0 ,^ -

CI 4 4 -4 Ql 0 .I '--'1 * " C _ 0% N'.4-JIN N W -

l 0 0 _ C

[0.1 M n I

D41 - e C 0 o

o, 1 o > e o 0r

Ijl 01 ,c 0 G O,,

.01 II

J.01 i - N _-4

. 'IC I ' 0 U " r *- : ,

IZ nl * In N _ kn c

oi l , c, o .o o I ..-

1Wt 4 i

z,

- . ._ o i

0 4. E : '

' '"" 0 ~ ' 4 , 0 C=

-0-.j 1C e0 00 j~j 41 2 - O O '0 4'0 O" .CC O O Q -0 1 , 4 V 4 O a ,. 1 ! ,

,0 0 1., M , , _ ·a& , t O *, C

IZ U iU'01 % 41 NL- N '.I

0-, ,~ =* 0' > a 0 4- , , e-

-... . .,; cw CiYW 0C Oy .i4.4o 'a- " ' , - C, 5 * o L . ,W = ., 0 .i v

,O tZ FW t ffi 4)10"W C 4Ai 40 M ne

-0' 444 C 0). Uc 0o

>. - . _ r Jo , , 0 A _

la0. .c x.4 J aGP4 CLC. C 0Za C Ed 0 *4 C O 444 44 4 -0-q 50 h C.IC44. AC AAiW 44 .

44-. CC 44 5 C -Wz la A i4 u OC 00 0. C- Q

0,_ 0 Z;c U 44 - 44 G 'D 444 r 44.o C S 44 444"I4 . C 0-. Or- -or CO CO ..4a ao

0.44 4-C U 2r U0uE - 0,u 0C a 0C 0C 4 M A

*i c 04....j0 - eU~gu o 444.. 4444 44u 44ur CC44 40 idS 0 CU 46 CU 44 * >44

0D -ru·- -~~rrr~r~ -Y -Y- * L44 '0a N 4%le A( a t4 o

APPENDIX II APPENDI;: II

%n C4 0 ' -0 rq~

4111 ~ ~ ~~C%n kn at N q. MI04* 4fl 0 NI 0 -4 *2.0 N -, N11411 4) N u'4 '0 4,4

i %. Ci . .

IN -Ia V I

'CI~ ~ ~ ~ ~~~CIr~~~~~~~~~~~~~~V

%OIE'4I Sfl UC 0 l 0 M

4 4I In 4 4 - 0

l.CI VI. .

14

4

10. UI N -4ct N

I WI 4 4 4 N c C

MI,

4l'. % '0 en 4 n 0 0i2 .0 NI N 4 4 N 4 en

: f 4) '0 - co (I- 'I - N ..O

'110

14O cln C4 r-4

la 24 El C 4, CU '1C4 r. 00 *' 0%Ut N N Irl a~~~ .4 LN

112.0 C ' 0 4=

I en on001 en 4 co NO

MI C

z A M C4 l C

a IC O a W 0 c 0 i n0

'4l .4 - -" 01 N N N N N Nl N

I~~~~~~~ D a.g~~i de On o

a O P- P Aia o r

Ai 01 4 c g L. 39c ID O co 7, , rn Yato

CO·~ Pu-

a qu· *p4

id. la -. 1

*0-4*4*a Q4 so 4* 4 W a

c93 0 w 1q -- Ai41W4*44~~~~~C4* 4* U 0 O 0% 04* 04 * C

'40 4. ( W I 0 4

%4* C c 1 - .0* C WC C'.-Iv 00 a 00~~~~~~~~~le A C 41 IC 6 '

34d0- 40V. 4.4.- 0 01 04w 044* 04*' C 5a u1 4* 0i A 0 C 0 M A044,.1 C- 4 O w O C 0 0 la

OX "- luo O aq or la uEBlrY OW'E C Pa C 0 C ·~0u ) 41', wu U

~~~~~~m ~ ~ ~ . 4* '4 '4 ''4 44

0 to . 'a- .0 . .* DI'h 4A-

o~~~~~x44 -- 414 414 41c 41**.. .. u), ~ 4 4C 0 4* N 11.414*OO140 0 D 0 V M -o MC Ai 0CU - 0U4*.MMM * O C '4* CO CO cV.-44 LI '

wo .,C-.4* C 00 . o to4444 4* 0 4*. it0u 044 r 0 4* 0 4 EU WAi Ai W 1.. D

M- 111 -i .-* 4t toV 4*4* 0 40 AiCE41 4 0 4 4 elAC Im 121 4* c 4 10* 0 Iola a adoU4 6.000 CM C V j I1C4*-4CM 41 4* 4-'4 w 0C4* c-t C O 0 4*go 10.4w0%-.4* W4* 4O 0 a 0 V us MEM410 W 4 4* 'a 1 C i 4 4Ai 4*uV-44*4*M-4 0 f4* U 0 Cbe 4**% 4Ui 00% 04* O 4*O*'4 O* : 43 4** a04*00 ia ? 0 A 0 OIL EO"a fa 4* * A.i

-4

103


Number ofprojectedresponses Rt3ponsesfrom 8,936 Number Percent(note b) (note b) (note c)

18. Are your local evaluations of federallyfunded programs usually performed byexternal or internal evaluators? (Checkone.) 2,623

Internal (e.g., local education agencystaff)

1,445 55.1External (e.g., consultants) 392 14.9

_Both 786 30.0

100.019. which of the following types of tests did

your local education agency administerfor its title III evaluation? It yourlocal education agency did not test titleIII students,skip questions 19 and 20. 1,745

Standardized norm-referenced tests(note d)

1,380 79.1Criterion-referenced tests (note g) 424 24.3

__Other tests (please specify) 515 29.5

Ir you do not employ standardized norm-referenced tests, skip question 20. If youdo, continue.

20. How did you report the results for thestandardized norm-referenced testing? 1,319

Raw scores 400 30.3


Percentiles 765 58.0

Quartiles 73 5.5

_Stanines 283 21.5


104


NOTE: In the following five questions you will be asked to rate several aspectsof the Elementary and Secondary Education Act, title III, local and Stateevaluation reports that affect the degree to which these documents satis-fy your policy, management, and programmatic information needs. You areasked to provide overall judgments on the adequacy of the quality, infor-mational content, and utility of the evaluation reports. Do this by con-sidering each of these attributes: very deficient, deficient, marginal,adequate, and more than adequate. Check the box which most appropriatelyreflects how you feel about the respective local and State evaluationswith regard to the particular aspects in question.

21. Rate the FOCUS AND SCOPE of the local andState evaluation reports. (notes b and e)

(Focus and Scope: the adequacy with whichthe report covers the essential and re-lated material and the appropriateness ofthe emphasis and treatment given to therelevant topics, details, and high andlower priority information). (Check onebox in each row.)

Projectednumberre-spond- Very More thaning deficient Deficiert Marginal Adequate adequate

from Num- Per- Num- Pe Num- Per- Num- Per- Num- Per-8,936 ber cent ber cent ber cent ber cent ber cent

Local reports 2,463 35 1.4 44 1.8 531 21.6 1,396 56.7 458 18.6State reports 2,411 16 0.6 162 6.7 607 25.2 1,221 50.6 405 16.8

22. Rate the local and State evaluation re-ports on the PRESENTATION OF REQUIREDMANAGEMENT INFORMATION NEEDS. (notesb and e)

(Presentation of Required ManagementInformation Nees: the extent to whichthe report presents the informationneeded to evaluate and update policiesby those who transfer policy decisionsinto plans, budgets, program implemen-tation, curriculum, operational over-sight, resource allocations, forecasts,status assessments and reports, educa-tional accountability, costs, benefits,and efficiency assessments). (Checkone box in each row.)

Projectednumberre-

spond- Very More thaning deficient Deficient Marginal Adequate adequate

from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per-8,936 ber cent beL cent ber cent ber cent ber cent

Local reports 2,471 58 2.3 147 5.9 558 22.6 1,333 53.9 376 15.2State reports 2,420 45 1.9 204 8.4 672 27.8 1,132 46.8 367 15.2

105


a i lE. CI

0foz 4 e4 aI= - el

I _ I Un %A-PI -I '

N Z.6L -.-clz .1

-crZ 01 C4 CI

C J.cI --

dUI@J4JI Onr

5.O1 I C'JfrI Ail co L

CI L n~

e

I 00

U ~~~~~~~~~~~~~4 4I

Ai u ul a, VI

C I to -: - aII (9 ICW 1-.I I U

.C, £I

It CZ.0j -- 0Vc o.. Ai I

4 - GMtGWJI -010 JGI=Q)L.C0101 --'U -" >1.Ogla Ca.o,*1 It 0 Ai *43 I.A %Q

Ai *

.. I >

j0I I I . .Ic0

'U.C). 0 Co IU.Ccn ~~~~~~~~~~~~~Uz U-

C~~~~'U IU~~~~~~~~4*C0 a, 0~~~~~~~~~i a" 4110'

00. U .W CW IaaCUa

Ai O ZiI 0 . 4*4* a, - .-. 41 *

GJO~~~~~.4I ~~~~~~CG~~~~,'j 00I 4* w 1 L

'.0.A AiU

E .. M C( I $ %DI IC

.4, 0.0. ~~I~1,M Q .

C lao4 o a, . , qr C e . . tO UC0 a0

Of 1 I I ·- p, L) 0

113 q ( C- 0(N 04* W (.C

U ·C A tr Z P6 I" r06

a4 0 AJ LC00(L 'I ' "- = · · .0 0mw

0 0 A i Ia

Q0 L

JC -a O Cey

cC a30 L~~~~~~~~t, P) ·3v, clU, m kwa- - 0 4 A ~ f-,C.

(4 0

U eV~~~~~~~~~

· aUL*uPc

V, 01~~~~~~~~~~~~~~~~~~~~~~~~~~04

a r ua, . ~ c atw OY 1 10 L


4 -Q , ,

U -01II i e , _ _

C 19 C hIl 4 l C Mo ffv14IhiCI * V41'bca OI. il ,I

C l lb II .I

.o4101IAUI , ·

.4011u4 ,0 . - l

:~,le )0 (l- - N4 'o , o~ Iz , 0 i U

N N I011, Ut L .4>

,,Q~~ mcO~ e _ ~a c ioi-1 a

lb o 0c 0% eXetC O c4

e O S ^0 L_ i , .l

10* .C I N . N

IU mCI · , ,

..

0 a re£ C

lbD 1 0.UI .

-4 0N u.

C I.0 4 4: .1Ialbl N W U £4-

tiZO -. I 0l

01I601' ::* -" " ': ' r-

'0~C 0 01 0 4dl Iq 0@ @2

.0 120 Cj.C 0 'U 0 .m

C00 , @00, 4. -4 C U

.£ ".a 1 . . M 0 l1 . 0 0 l

lb.4 41g 0.'0 C £ , 0C4 4 C0 C M 0

C~ . 0 laCIL

0 ,l *-. U 200i -o.'0 011 l 0

Ai W 0104 .4 0 I 4c0 1.4 'C 0

l . .0C0 a

oU1 I g ' M A '01

.- .- 1. ,- .C.-..

01 C 0C o '0C-41 C0

1060) 0'

Cla to "(~ = C0o co... ,.,... 'o . 0o I 1~441p

U0 90 001q 0.- i

4i C 1l C£ CC 0 0fl

107


'0 0 1W I u'

I 41 t m-

° 'I u1 Z 1 en

o 4rI I o

UI) WI D

I t4 U1 nr o4ZU I

0 0t1C Li

o-C~i..l r a o

ol i to 106 (I V Uo

c 41 L~~i a) i L T

.8.1 C jJ'0 0 WOI

O -

C. ,'O 4 e , . I. 'W',- ~0 G i (

4O O ' WU1 ,, J

C.) Q -4 iU W r A U

WL1"- C W O4 CO O

c W Uenc) 0 J

L41 04 0.CiuU 0 4Li .4 0

02 4 - a4 01 OVU : ) w E ^ *

. -i 0 o m

GCI OJ -LZ 0 2

0~ 0 CIE 0

IV $4 4 " I 0

w 0 to W C4 SW U

O3 C If , 0, a o a c L

C D( 0 0 U D I a0 t 0 ro 4) C'W $-4 W C C a *n g OO

108


i I i d N W l 4 ll .i

C 4 ; I .c c

In

- m c

r, I

cic

I I-l in I C 0' ID CDzl 0I N 0 N N N

rZ I -,

l4 Cl . . ..

i ow e -| ._ ,, oI -1 -

l I O - .Z. I

l fO ,I -4 o - '- N N -

Isi- ~" -1 0 0

WI1I I I

Wl iI O, 4'i ' ," 'lZ .01 c

~ I rt

C~l

p a 3O 0 C i 0 N 0.

~1 zIIuV~ E W 1 - 1 7 N O,

l .l ZO ' _ O I 1 C a, '

a. Z a C C _

Il j Oa aA . l N I ,

CUa i- N W , , '

> .- c o, N C C c4*^ C 45 '' C 4 WU QJC g Ci -41 in - C 4 - -

"ao'& C ~; W I >" " C W,1 I

."a. d

1. W

'n C e e. '

45 OW41~i-~~i~.i.&, ~.UUJ~ * CC I) q rI-

45 aa W Ck (a 1Wa"C45

441 M4 4V :- M ,4 -C 4 C I a WU0~~~1 M. UN C.) ) U .O i- V

U * C C

th. 4M,,00.C~ J ,, , @V 4 I 0' 4 C J '. o V O n _cS e. O i4 tn CW Oi*MJ °- 4C

a C C i aco. i C OC C0 C l C . C z sv L. o 04 5 4 >4 45 VW 4 , V . c4 C ' ,° ,C ) 45 I o 45 'U o a -0

CE ~ 02 i-COCC ),~ * 4 aO CI 4 0 4.. 0W a, ('>45 Cr

fi0 A. i-V tU O 0 _ *f

f 4III-iC 0 0 0 M10

4,5 C1 o "0 ->.41 C C. z0 >U >4 >414145I · 0i-0C0MU).CI,~ 0l~ *.) 0.W 0 0 0, -

U.C B~r 4145a-ar, W1A -i C4 'C -MO vaU a da x a u, u Ai %w U 0 W 0W" 40 E


£0 C Q MN N _ -4

NI

11 ' °n A - 4 N 6

t .01 0 ,, - . 4 O

4 C

v ! 6 I I% .- , n i , a

W IU N ,-I I -

Z .0

.0U . N -A 0 Nl

n _

An C l'0 (. 0 *

II4 )'

I I _" N _o cla u . . ^

0 %

WI Ii J I I

-Ai~ CU4 e- C .4

.. 'X e W C M W -M M -A1 ! An % '0 . 0 41 6

aM 0a ~"n U "o c_ c e e n 0

*4o A i C~ o

'- 0 ~ *. u c oe o

uM O MO ' OAi to .0 a * _ > c -0 c

4 to . . - .. A .. M O Co°e

4i F l c a ' a a 0 0 .@

C(*ACr_ 1w n U & ta *U VI 4 a, l:

CL a .. ·. -~ ,'

v_ ~:p O -o :1 O @ @0

-·IS N

IZ

O3 X . Ai U V 0 4 6 4 4 '01

_ Aia F o c>; @ * 4 e v N o

a)4 > °w -d e1 "O C @ @ U 4

lr u h a -4

a N ci VA A0 c u c m v a a _. 4

41

CP 041 ca f - Ai41.4aC04U4 'g 6 6

61 ~q OC0 -o 6 6 a m

* 4 J1 LMg41 0 04

4.4 O C, -Y -Y - -

110 · U C rI


Number ofprojectedresponses Responsesfrom 8,936 Number Percentf(note b) (note b) (note c)

30. Are your local evaluations of federallyfunded programs usually performed by ex-ternal or internal evaluators? (Checkone.) 426

Internal (e.g., local educationagency staff) 205 48.1

External (e.g., consultants) 91 21.3

Both 131 30.7

100.0 f/

31. Which of the following types of testsdid your local education agency adminis-ter for its title VII evaluation? Ifyour local education agency did nottest title VII students, skip questions31 and 32. 394


Criterion-referenced tests (note g) 148 37.5


32. How did you report the results for thestandardized norm-referenced testing? 396

Raw scores 132 33.4

Grade equivalents 288 72.7

Percentiles 204 51.6

Quartiles 48 12.0

Stanines 35 8.8


111

APPENDIX II APPENDIX IINOTE: In the following five questions you will be asked to rate several aspectsof the Elementary and Secondary Education Act, title VII, local and Stateevaluation reports that affect the degree to which these documents satisfyyour policy, management. and programmatic information needs. You areasked to provide overall judgments on the adequacy of the quality, infor-mational content, and utility of the evaluation reports. Do this by con-sidering each of these attributes: very deficient, deficient, marginal,

adequate, and more than adequate. Check the box which most appropriatelyreflects how you feel about the respective local and State evaluationswith regard to the particular aspects in question.

33. Rate the FOCUS AND SCOPE of thelocal and State evaluation reports.(notes and e)

(Focus and Scope: the adequacywhir which the report covers thethe essential and related mate-rial and the appropriateness ofthe emphasis and treatment givento the relevant topics, details,and high and lower priority in-formation). (Check one box ineach row.)

Projectednumberre-

spond- Very More thaning deficient Deficient Mar inal Adequate adequate

from Num- Per- Num- Per- Num- Peir- Num- Pe- Num- Per-8,936 ber cent ber cent ber cert ber cent ber cent

Local report 363 0 0 11 3.0 146 40.1 175 /,8.1 32 8.7

State reports 314 24 7.7 23 7.2 143 45.6 120 38.3 4 1.3

34. Rate the local and State evaluationreports on the PRESENTATION OF RE-QUIRED MANAGEMENT INFORMATION NEEDS.(notes b and e)

(Presentation of Required Man-agement Information Needs: Theextent to which the report pre-sents the information needed toevaluate and update policies bythose who transfer policy de-cisions into plans, budgets,program implementation, curri-culum, operational oversight,resource allocations, forecasts,status assessments and reports,educational accountability,costs, benefits, and efficiencyassessments). (Check one boxin each row.)

Projectednumberre-spond- Very More thaning deficient Deficient Marginal Adequate adequate

from Num- Per- Num- Per- Num- Per- Num- Per- Nuni-er-8,936 ber cent ber cent bear cent ber cent ber cent

Local reports 363 0 0 17 4.7 135 37.1 163 45.0 48 13.2

State reports 314 18 5.7 28 8.8 139 44.2 121 38.4 9 2.9

112


35. Rate the local and State evaluationreports on the adequacy with whichthey properly QUALIFY FINDINGS.(notes b and e)

(Qualification of Findings:the extent to which the reportproperly qualifies the findingsand assumptions and identifiesthose conditions and situationswhere the findings are not ap-plicable). (Check one box ineach row.)

Projectednumberre-

spond- Very More thaning deficient Deficient Marginal Adequate adequatefrom um- Num- Per- Num- Per- Num u- er- Num- Per-8,936 ber cent ber cent ber cent ber cent ber centLocal reports 363 0 0 17 4.6 127 34.9 183 50.5 37 1'.1State reports 314 23 7.3 29 9.3 165 52.6 95 30.1 2 0.636. Rate local and State evaluation

reports on the CREDIBILITY OFFINDINGS. (notes b and e)

(Credibility of Findings: thedegree o confidence expressedin the findings through state-ments of statistical certainty,soundness of method, evidenceof replication, consensual agree-ments, similar experiences, sup-porting expert judgment andopinions, and reasonableness ofassumptions). (Check one boxin each row.)

Projectednumberre-

spond- Very More thaning deficient Deficient Marginal Adequate adequatefrom Num- Per- Num- Per- Num- Per- Num- Per- Nun- Per-8,936 ber cent ber cent ber cent ber cent ber centLocal reports 362 1 0.3 28 7.7 143 39.4 138 38.2 52 14.4State reports 314 24 7.7 25 7.8 142 45.1 122 38.8 2 0.6

113


37. Rate the local and State evaluationreports on the adequacy of the QUAL-IFICATION AND QUANTIFICATION OF hEAS-UREMENT DATA. (note b and e)

(Qualification and Quantifica-tion of Meaaurem nt Data: theextent to which the evaluationassessments can be ualifiedand quantified into :easurableattributes and parameters thataddress the problem in measur-able, ope ational, or concreteterms). (Check one box in eachrOw.)

Projectednumberre-

spond- Very More thaning deficient Deficient Marginal Adequate ad ate

from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per-8,936 ber cent bher cent bar cent ber cent bar cent

Local reports j49 2 0 .9 138 39.6 165 47.3 41 11.8

State reports 29' 25 8.3 21 7.1 106 35.9 127 43.1 17 5.638. Now often do youi Elementary and

Secondary E'cation Act, title VII,local prog evluatiol, adequatelyreport inf on on the need as-sessment, nu of children, perpupil expendt.. e, project achieve-ment, and pupil nenefit parameters?(Check one box in each row.) (noteso and e)

Projectednumberre-

spond- As often Occa-ing Very often Gener.lly as not sionall[ Sel ofrom Nm - r- r- Num- -r -a-Per- guF :8,936 ber cet.t ber cent ber cent ber cent het cent

(I) Information on the manner in whichthe needs of children are assessed 369 146 39.4 161 43.6 4 1.1 45 12.1 14 3.3

(2) Information on the number ofchildren in the program 368 198 53.7 106 28.8 2 0.5 50 13.5 13 3.5

(3) Per pupil expenditures of each pro-grim 367 89 24.3 103 27.9 69 19.9 79 21.4 28 7.6

(4) Evidence of qualifiable and measur-able achievements 369 135 36.6 152 41.2 28 7.6 43 10.8 14 3.8

(5) Evidence of qualifiaole or measur-able pupil benefits 368 148 40.1 123 33.4 28 7.6 54 14.5 16 4.3

114


39. From your experience, how often canyou draw comparisons between the re-sults of your local Federal programand Federal programs in other loca-lities from State and Federal eval-uation reports? (Check one box foreach type of report,) (notes b ande)

Projectednumberre-

spond- As often Occa- No basising yLyoften Generally as not sionally Seldom to udgefr"n N--rum - NueM- felr- um-Pr Num- Per- um- Per m- Per-f,936 ber cent ber cent ber cent ber cent ber cent ber centState reports 361 40 11.1 72 19.9 12 3. 3 104 28.8 37 10.2 97 26.9Federal reports 374 22 5.9 39 10.4 34 9.1 95 25.4 88 2;.5 97 25.9a/We requested that the questionnaire be completed by local education agencyofficials familiar with the local agency's evaluation efforts conducted forElementary and Secondary ducation Act, titles I (regular, migrant, neglectedand delinquent, etc.), III, and VII, and the local agency's own assessment ef-forts. The questionnaire was divided into four sections: section A, NationalAssessment f Educational Progress and local education agency testing programs;section , Elementary and Secondary Education Act, title I; section C, titleIII; and section D, title V evaluation efforts. We suggested that the ques-tionnaire be taken apart and distributed to persons most familiar with eachpart, such as the local education agency director of research, planning, andevaluation for section A, the respective local education agency directors oftitle I for section 8, title III for section C and title VII for section D.Some of the questions from the questionnaire have been omitted from this re-port. Questions pertaining to the National Assessment of Educational Progressand local education agency's own testing programs are shown in GAO's report tothe Congress (HRD-76-113), dated July 20, 1976.b/In April 1975 we sent the questionnaire to a national statistical sample of832 local school districts. By June 1975 we received responses from 710 schooldistricts or 85 percent. The numbers shown above represent the number of localschool districts in the Nation--out of the 11,666 in the defined universe with300 or more pupils--to which our local questionnaire sample responses have beenprojected. We projected a total of 8,936 local education agencies respondinginstead of 11,666 for technical reasons--based on the weighting and the responserates across the various strata in our sample, this method allows us to obtainthe most accurate percentage breakdowns on the answers given. Where the numberof responses to each line item do not total to the 'Number of projected re-sponses" column, it is due to rounding.

c/This column shows for each question the percentage of projected respondentschoosing each specific answer.

d/Standardized norm-referenced tests are tests which purport to assess the individ-ual student's ability or achievement in broad subject areas as compared to therest of the test population (e.g., the Metropolitan Achievement Test (MAT) or theWechsler Intelligence Scale for Children (WISC)).

e/Tne percent columns show the percentage of projected respondents to each lineitem choosing each category. Where the percentages on each line do not add to100.0, it is due to rounding.

f/Total does not add to 100.0 due to rounding,

2/Criterion-referenced tests ae tests specifically constructed to measure students'attainment of specific educational objectives or proficiency with specified cur-r'culum material. These tests, which may be standardized, usually provide aspecific and operational description of the level and type of task performanceor behavioral measures used as a criterion to indicate attainment of the educa-tional objectives. For example, the student must be able to compute the correctproduct of all single digit numerals greater than ero with no more than fiveerrors.

115

APPENDIX III APPENDIX III

DEPARTMENT OF HEALTH. EDUCATION. AND WELFARE

OFFICE OF THE SECRETARY

WASHINGTON. D.C. 20201

JUN 1 5 1977

Mr. Gregory J. Ahart

Director

Human Resources DivisionUnited States General

Accounting OfficeWashington, D.C. 20548

Dear Mr. Ahart:

The Secretary asked that I respond to your )quest for our commentson your draft report entitled, "Problems and Needed Improvements InEvaluating Office of Education Programs." The enclosed commentsrepresent the tentative position of the Department and are subjectto reevaluation when the final version of this report is received.

We appreciate the opportunity to comment on this draft report beforeits publication.

Sincerely yours,

Thomas D. Morris

Inspector General

Enclosure

116


Comments of the Department of Health, Education, and Welfare on theComptroller General's Report to the Congress entitled "Problems andNeeded Improvements in Evaluating Office of Education Programs,"February 22, 1977, B-164031 (1)

General Department Comments

The subject GAO report critically assesses the Office of Education'sprogram evaluation mechanisms, concludes that opportunities exist forimprovement, and recommends that the Secretary of HEW direct theCommissioner of Education and the Director of the National Institute ofEducation to take a number of corrective actions. Several changes wouldstrengthen the report. There needs to be more careful consideration ofthe costs of evaluati-. While there is an assertion (p. 50) that "theamount of funds spent to evaluate State and/or local education agencyTitle I, III, and VII elementary and secondary education programs is sub-stantiel," on p. 33 the amounts are respectively 0.2% of Title I and 1.8%of Title III. Twenty cents per $100 is not "substantial."

While the quality of data the report appears to expect would requiresignificant additional resources and this expenditure is not possiblewithin existing budgets, it would also Le high in relation to the possiblepayoffs through improvements in the programs. Are the increased costs ofproviding better information to State and local decision-makers likely toresult in sufficient improvements to justify the expenditure? Do Stateand local decision-makers have the incentives and does the state-of-knowledge permit improvements or economies at the local level equal (atleast) to the increased costs of evaluation improvements? If the costsof improved evaluation are not recoverable in improved productivity orefficiency, calling for more is a questionable recommendation. Asanother example, the call for the NIE to support more research oncriterion-referenced testing is acceptable, but this is. as recognizedin the report, a long-term effort, and it is not clear whether GAObelieves NIE deserves additional appropriations for such n effort. NIEcan hardly divert substantial resources from its present $70 million.

A final point deserves mention. In spite of the reports of three of thefour key congressional committee staff members that OE's evaluations"often have been completely ineffective or have had little impact onlegislation" (p. 21), there is a growing body of professional opinionthat O's studies have, over the past ten years, been responsible formany major changes in existing legislation. In retrospect, even thoughapparently many of the reports did not reach "Congressional decision-makers" in a timely manner, the reports were solid pieces of work, widelydiscussed and debated, and by the time the next piece of legislation cameup, the climate of opinion about particular programs had changed.

117


As a result of such retrospective analysis, the approach to evaluationwhich assumes there are certain "decision-makers" and that effectiveevaluations provide timely data to them is increasingly being questioned.Rather it appears that effective ealuations affect the broad politicalclimate within whicn particular decisions are made. The large scale OEevaluations have stood the test of time in this view of evaluation farbetter than any of the alternatives proposed in the report appear likelyto. (To underscore this point witness the large number of citations ofthese large-scale OE studies in the report itself.)

In short, the report is well done within its limits. The recommendationsdo not give adquate recognition to the additional financial burden theyentail, nor to whether the tradeoffs in improved program quality arelikely to justify the additional expenditures. And there is no discussionof the limitations of the view that evaluation ought to serve ecision-making in a direct and linear sense.

GAO Recommendation

That the Secretary of HEW direct the Cmmissioner of Education to orestrongly emphasize the purpose of providing information to the Congresswhen planning, implementing and reporting on evaluation studies. Inparticular, more attention should be iven to timing the studies so thatthey closely coincide with the legislative cycle, (See GAO notep. 124) and briefing congressionalcommittee staff more frequently.

Department Comments

We concur with the eneral conclusion that Congressional needs can bebetter served than we are now doing. First, we concur that it isclearly desirable that evaluation studies should be timed to more closelycoincide with the legislative cycle. It is rot the case, however, rF GAOhas apparently concluded, that this obviously important problem has notreceived attention. Rather, the frequent failure to coincide studieswith the legislative cycle hs been due to such factors as: inadequatefunds to initiate evaluations at the right time; delays in the procure-ment cycle due to uncertainties in the appropriation process; increaseddifficulty in getting studies and data collection instruments cleared;and failure to anticipate difficulties and delays in the data collectionprocess in the field. Nevertheless, we believe these roblems can bemore effectively addressed and, in order to do so, we have initiated aseries of reviews of our studies which focus on predicted productiondates for findings and recommendations vis-a-vis critical dates forinput to legislation renewal.

Second, we concur that congressional committee staff should be briefedmore directly and fully on the findings of evaluation studies. Thisneed has not received the attention it deserves, but we have recentlymade the decision to institute such briefings on all major evaluationstudies, and expect to get this new procedure underway in the comingweeks.

118


(See GAO note p, 124)

GAO Recommendation

That the Secretary of HEW direct the Commissioner of Education to betterdefine the program objectives to be evaluated as required by the GeneralEducation Provisions Acc. This includes translating the legislative ur-poses of each program into specific qualitative and measurable programobjectives, and clearly stating these objectives, and the progress madetoward achieving them, in the annual evaluation report.

Department Comments

We do not concur. In most cases, legislation fails to state a program'sobjectives with sufficient clarity that they are readily susceptible toevaluation. But very often this failure of the legislation to be clearand precise on program objectives is the price paid through politicalcompromise for getting the legislation passed at all. The Office ofEducation proceeds at considerable peril in trying to further specifylegislation. It has in fact been criticized on several occasions forgoing further than the Congress intended, and of "trying to legislateby means of regulation." Furthermore, in many cases it has been theCongress' specific intention to avoid specification of program objec-tives and to leave such judgments and decisions up to State and localofficials.

For example, in turning down a proposed amendment to concentrate seventy-five percent of compensatory funds on basic skills Title I of ESEA, theSenate Committee said:

"The Committee believed it inappropriate for the Federal governmentto sustain its judgments on appropriate Compensatory Education Pro-grams for that of State and local officials." (Senate Report93-76Z, 1974, p. 30)

On the same question the House observed:

"The Committee feels strongly that i Local School Agency is theappropriate level to determine the special needs of educationallydeprived children and should be primarily responsible for deter-mining approaches to meeting those needs." (H Report 93-805, 1974,p. 20-21)

119


For all these reasons, we believe there are very definite limits on theOffice of Education's authority and ability to increase the clarity andspecificity of program objectives.

GAO Recommendation

That the Secretary of HEW direct the Commissioner of Education to improvethe implementation of evaluation results by giving greater attention andpriority to procedures such as the issuance of Policy Implication Memo-randums designed to insure implementation of those results.

Department Comments

We concur. The need to improve the utilization and implementation ofevaluation reports is a major problem which all Federal agencies and theCongess jointly face, and we further concur that increased efforts shouldbe devoted to its solution. The Policy Implications Memorandum (PIM) isan invention of OE's evaluation office, and while its potential forimproving the utilization of evaluation findings is considerable, GAO iscorrect in observing that it has not been used nearly as extensively inOE as it should have been. Efforts are currently underway to expand theproduction and use of the PIM. We are now conducting periodic reviewsof the production schedule for PIMs and emphasizing their high priority.

GAO Recommendation

That the Secretary of HEW direct the Commissioner of Education to assesswhether State and/or local evaluation reports for Title I and VII of theElementary and Secondary Education Act can be improved so that theysupply officials at Federal, State, and/or local levels with the reliableprogram information they need for decision making.

GAO Recommendation

That the Secretary of HEW direct the Commissioner of Education to reviewthe types of State and/or local program information collected on programsauthorized by Titles I, III, and VII of the Elementary and Secondary Edu-cation Act.

Department Comments

We concur with the general thrust of GAO's recommendations in this area,but most of the actions that GAO recommends are already underway, manyby legislative mandate.

With respect to Title I, new evaluation requirements (Section 151created by P.L. 93-380) directed the Commissioner to develop evaluationmodels and standards and to provide technical assistance to the Statesand local districts in order to improve the quality of local Title I evalu-ations and to yield comparable data which could be aggregated to State andNational levels. All of this is being carried out: OE has interviewedpersonnel in policy making roles at the Federal level (both Congressional

120


and HEW staff) to determine the kinds of information they felt should beincluded in the annual State and local evaluation reports; an advisorygroup of personnel from the different operating levels of the Title Iprogram (including parent representatives) indicated the kinds of infor-mation they thought should be included in the reports; and evaluationmodels and their associated reporting forms were developed and reviewedby each State agency and three of its locals to determine the kinds ofproblems they might have in using them. As a result of these efforts, alimited core of essential information was identified as being desirableat the Federal level. This core of information is substantially lessthan what has frequently been contained in State and local reports inthe past. It will become the Federal evaluation requirement when regu-lations for t'.s portion of the legislation are published.

OE has sponsored a series of workshops for State and local evaluationstaff in the use of the standard evaluation models and reporting system.We have established ten technical assistance centers--one for each ofthe HEW regions--to provide technical assistance and training for Stateand local staff on a continuing basis. We have prepared training manualsand guidebooks for widespread dissemination to current and future usersof the models and reporting system. Once the models and system are inplace across the nation, their use should result in data which can beaggregated across States and across school districts. And, once theyare in place, OE will be better able to assess whether or not the result-ant data are sufficiently free of systematic errors to support aggregationsto the State and national levels in a satisfactory manner. If they arenot, then a determination can be made as to whether technical problemsexisted that could be overcome or whether different kinds cf studiesneeded to be done to satisfy Federal, State, and local repo-tingrequirements.

With respect to the recommendation on ESEA Title VII local evaluationreports, we believe they can be improved and we have taken the followingsteps to do o. Recently published Title VII regulations strengthen therequirements for evaluation of LEA bilingual projects. In addition, NIEand OE have a joint project underway to upgrade the technical expertiseof persons responsible for local evaluations. While we believe it isworthwhile to improve local evaluations, the extent to which such evalu-ations will be useful at the Federal level is not yet evident. Certainlythe problems of aggregating across local bilingual evaluations are moresevere than in Title I, if only because of the multiplicity of languagesinvolved.

(See GAO note p. 124)

Thus, as regards the much needed and legislatively mandated actions toimprove State and local evaluations and reporting under Titles I, IV,and VII, we believe GAO's understanding is incomplete (See GAO note p. 124)that the problems they refer to are well understood by both the Officeof Education and Congress and that appropriate actions to deal with themare already well underway, and in some cases near completion.

121


GAO Recommendation

That the Secretary direct the Director, National Institute of Education,to consider the need for funding additional research on (1) criterion-referenced tests another alternatives to standardized norm-referencedachievement tests for uses which include program evaluation and (2) thenature and extent of racial, sexual, and cultural biases in standardizedtests and how such biases may be reduced.

Department Comment

We concur. Several groups within the Institute presently have ongoingresearch programs in criterion-rL :erenced testing and test bias. Moreemphasis will be given to these pograms in FY 1977, FY 1978, and FY 1979if appropriations for the Institute are increased significantly abovepresent levels.

Under NIE leadership, the Center for the Study of Reading at the Univer-sity of Illinois and the Center for the Study of Evaluation at UCLA aredoing research that 'till lead to alternatives to standardized achieve-ment tests in readi 3 comprehension and writing. In addition, atatis-tical research is in progress on how to assess the probability of makingerrors in classifying students on such tests, how to set the length ofsuch tests and how to determine passing scores.

The SOBER-Espanol project funded by NIE is developing criterion-referenced tests to assess competency in reading Spanish for gradesK-6. The SOBER system allows teachers to create "tailor-made tests"by matching prepared test items to reading objectives. Currently,K-3 tests are being published and distributed through Science ResearchAssociates. Grades 4-6 will be published later this year.

NIE is also supporting research and development on criterion-referencedtesting and other alternatives to norm-referenced achievement tests forthe purpose of educational exit testing and occupation entry selection.

In December 1975 NIE held a conference for test developers, test criticsand others to consider methods of identifying and eliminating bias inreadin; achievement tests. Since that time, NIE has funded two grantson teit bias: one is a project to detect and eliminate the motivationalcauses of test bias and the other is a project to debias the languagefound in a widely used, standardized achievement test. And finally, NIEhas an in-house project which applies new methods of qualitative dataanalysis to the intractable problem of detecting bias in test items.

NIE is also supporting work on sex bias in the assessment of a person'soccupational interests and biases in educational exit and occupationalentry testing--the latter being of particular importance given theGriggs v. Duke Power decision of the U.S. Supreme Court. The productsof these studies inc'ade the NIE Guidelines on Sex-Fair Vocational-Interest Measurement and the Abt kit on the interpretation and usageof sex-fair vocational-interest tests.

122


GAO Recommendation

That the Secretary direct the Director, National Institute of Education,to improve dissemination of available NIE-funded information, which isintended to help in selecting the most appropriate standardized tests,thereby increasing State and local education agency officials' awarenessand use of this information.

Department Corr 'nt

We agree that there is a need to improve the dissemination of NIE-fundedmaterial intended to help educators select appropriate standardized tests.The Institute's policy is not to force its products on school personnel,but we do intend to make school personnel familiar with the products thatare available. In the case of our consumer's guides to standardizedtests, that is, the test evaluation books produced for NIF by the Centerfor the Study of Evaluation at UCLA and mentioned in this GAO report, ourBasic Skills Goup and our Dissemination and Resources Group are collabo-rating in the dissemination of these products. Several approaches are orwill be used. First, the test evaluation books are listed in the Insti-tute's Catalog of NIE Education Products. This catalog has been offeredfree o each superintendent of schools and each district director ofcurriculum in the country. Five thousand copies have been distributed tosuperintendents. A similar distribution will now be made to the districtdirectors of evaluation, who presumably will be particularly interestedin the test evaluation books. In addition, we plan to use the newlyformed Lab and Center R&D Exchange to give school personnel more informa-tion about the acquisition and use of test evaluation books (e.g., throughbrochures or workshops). Th.s dissemination network has the potential ofreaching 50 percent of the country's school systems. Still another dis-semination network that will be used in a similar way is the one formedby our seven R&D utilization contractors (five State education agenciesand two nongovernmental organizations). If the GAO report is correct inidentifying a need for the test evaluation books--and we think that itis--these approaches should give the books much wider dissemination thanthey have had up to this point.

NIE is also sponsoring the dissemination of test information in morespecialized areas. The Education and Work Group has supported consumerguides to standardized tests in career education and occupational prep-aration. Dissemination of products will also be built into the furtherwork of this group to improve testing in career education. The Educa-tional Equity Group has funded American Institutes for Research todevelop a catalog reviewing assessment instruments for children oflimited English-speaking ability at the K-6 levels. Descriptive infor-mation (author, publisher, research data available, etc.) and analysesof the appropriateness of the tests for use with bilingual childrenwill be provided. All information will be comprehensible to educationalpractitioners. This contract will also identify those areas and levelsfor which existing tests are inadequate or nonexistent.

123

APPENDIX IIIAPPENDIX III

(See GAO note below)

GAO note: Deleted comments pertain to materialpresented in the draft report whichhas been revised or not included inthe final report.

124

APPENDIX IV APPENDIX IV

ADDITIONAL Uq'ESTIONS-BY OE CONFEREES

FOR-IMPROVING TESTING-AND EVALUATION

As discussed in chapter 6, the Office of Educationsponsored a special 4-day conference on "Achievement Testingof Disadvantaged and Minority Students for Educational Pro-gram Evaluation" in May 1976. The Office invited about50 experts in testing, program evaluation, and relatedfields, including university and other researchers, and rep-resentatives of leading test publisher_. Federal and localeducation agencies, as well as education and other interestgroups, were also represented. The conference focus was onlarge-scale program. evaluations of elementary and secondaryrchool compensatory and desegregation programs--programs onwhich OE concentrates much of its effort. The purpose ofthe conference was to identify, define, and analyze the manyproblems associated with using standardized achievement testsin these programs, and was to develop interim and long-termsolutions.

In addition to the suggestions discussed in chapter 6,conclusions and recommendations from the five small workinggroups formed at the conference's close included thefollowing:

-- In the context of educational program evaluation:development, standards, administration, and use ofstandardized tests -oust be accounted for and moni-tored. Alternative approaches which should be ex-plored include: (1) Federal legislation to estab-lish a monitoring body with enforcement powers tooversee the testing practices of test developers,State and school district educational systems, re-searchers, and others, (2) an independent monitoringagency sponsored by major test developers composed ofminority and organizational representatives, and(3) enforceable testing code of ethics, includingmandatory withdrawal of services by test producers inestablished cases of misuse of tests and test informa-tion.

-- There is need for federally funded studies to increaseunderstanding of the nature and extent of biases intests and how such biases might be reduced. Studiesof this problem might include detailed studies ofindividual pupils in interviews or computer simula-tions of bias models.

125


-- There is no clear consensus on test bias definitionsnor clear technical procedures for identifying abiased question or test. There is, nevertheless,enough documentation of. public concern--includingcalls for cessation of testing--and empirical datato justify change and development of guidelines. Itis imperative that the testing community join withother interested groups to agree on the steps to betaken to develop and use tests that are judged to befair.

-- Specific guidelines are also needed for test adminis-tration and assessment of bilingual groups.

--There clearly needs to be an extension of the presentprofessional testing standards of the American Psycho-logical Association to cover the use of achievementtests in program evaluation.

-- OE should support developing a procedures manual fordetermining the appropriateness of using achievementtests in program evaluation and properly selecting,administering, scoring, and interpreting data fromsuch tests. Such a manual should include the degreethat other information must be used together withachievement tests to adequately describe program out-comes. This should be the first (and more immediatelyaccomplishable) stage of a longer term effort to pro-duce procedures manuals that address means other thanachievement tests for collecting program evaluationdata. For the longer term effort, more experimentation(field-testing) is needed to investigate alternativedata collection modes to firmly establish them.

-- Publishers of standardized tests should give explicitstep-by-step instructions in their users' or technicalmanual about how to use their tests correctly forvarious purposes and how to avoid misuse. For example,these purposes may include needs assessment, diagnosisand prescription, or project evaluation.

-- The use of grade-equivalent scores on standardizedtests should be eliminated.

-- More tests ought to be developed for diagnosing educa-tional problems and prescribing remedies.

126


--The work statements for evaluations in Federal agencyrequests for proposals are not always adequate.Examples can be cited in which technical approacheshave been overspecified by persons who perhaps havenot fully understood eicher the technical or thepractical problems involved. More time needs to beallocated for writing requests for proposals, andmore professional review of them must be accomplished.Detailed technical and procedural specificationsshould never be included in work statements unlesssuch specifications are the consensus of a panelof national experts in the field.

127

APPENDIX V APPENDIX V

PRINCIPAL OFFICIALS OF-THE

DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE

RESPONSIBLE FOR ACTIVITIES DISCUSSED IN THIS REPOT

-- Tenure'of-office- ---From To

SECRETARY OF HEALTH, EDUCATION, ANDWELFARE:

Joseph Califano Jan. 1977 PresentDavid Mathews Aug. 1975 Jan. 1977Caspar W. Weinberger Feb. 1973 Aug. 1975Frank C. Carlucci (acting) Jan. 1973 Feb. 1973Elliot L. Richardson June 1970 Jan. 1973

ASSISTANT SECRETARY FOR EDUCATION:Mary Berry Apr. 1977 PresentPhilip Austin (acting) Jan. 1977 Apr. 1977Virginia Y. Trotter June ].974 Jan. 1977Charles B. Saunders, Jr.

(acting) Nov. 1973 June 1974Sidney P. Marland, Jr. Nov. 1972 Nov. 1973

COMMISSIONER OF EDUCATION:Ernest L. Boyer Apr. 1977 PresentWilliam F. Pierce (acting) Jan. 1977 Apr. 1977Edward Aguirre Oct. 1976 Jan. 1977William F. Pierce (acting) Aug. 1976 Oct. 1976Terrel H. Bell June 1974 Aug. 1976John R. Ottina Aug. 1973 June 1974John R. Ottina (acting) Nov. 1972 Aug. 1973Sidney P. Marland, Jr. Dec. 1970 Nov. 1972Terrel H. Bell (acting) June 1970 Dec. 1970James E. Allen, Jr. May 1969 June 1970

DIRECTOR, NATIONAL INSTITUTE OFEDUCATION:

John Christensen (acting) July 1977 PresentEmerson J. Elliott (acting) Jan. 1977 July 1977Harold L. Hodgkinson July 1975 Jan. 1977Emerson J. Elliott (acting) Oct. 1974 July 1975Thomas Glennan Oct. 1972 Oct. 1974

(104003)

128

Documents

HRD-76-165 Problems and Needed Improvements in …evaluating the effectiveness of major Federal education programs. This concern stems from the need for objective data to be used in