58
Ministry of Education DRAFT Technical Report of the Pre- and Post-Pilot Testing for the Continuous Assessment Programme in Lusaka, Southern and Western Provinces Coordinated by the Examinations Council of Zambia Research and Test Development Department Under the Direction of the Continuous Assessment Steering and Technical Committees Ministry of Education Lusaka, Zambia October 2007

Ca Baseline and Post test assessment report 2007 12 oct07

Embed Size (px)

DESCRIPTION

A quasi experimental evaluation design study comparing the impact of using the Continuous Assessment strategy in intervention and control schools in Zambia

Citation preview

Page 1: Ca Baseline and Post test assessment report 2007 12 oct07

Ministry of Education

DRAFT Technical Report

of the

Pre- and Post-Pilot Testing for the

Continuous Assessment Programme in Lusaka, Southern and Western Provinces

Coordinated by the Examinations Council of Zambia

Research and Test Development Department

Under the Direction of the Continuous Assessment Steering and Technical Committees

Ministry of Education

Lusaka, Zambia October 2007

Page 2: Ca Baseline and Post test assessment report 2007 12 oct07

Table of Contents ACKNOWLEDGMENTS..................................................................................2

CHAPTER ONE: BACKGROUND ....................................................................3

1.1 Introduction to Continuous Assessment....................................................... 3 1.2 Definition of Continuous Assessment .......................................................... 4 1.3 Challenges in the Implementation of Continuous Assessment .................... 4 1.4 Guidelines for Implementation of Continuous Assessment.......................... 5 1.5 Plan for Implementation of Continuous Assessment.................................... 7

CHAPTER TWO: EVALUATION METHODOLOGY ..............................................8

2.1 Objectives .................................................................................................... 8 2.2 Design.......................................................................................................... 8 2.3 Sample......................................................................................................... 9 2.4 Instruments .................................................................................................. 9 2.5 Administration .............................................................................................. 9 2.6 Data Capture and Scoring.......................................................................... 10 2.7 Data Analysis ............................................................................................. 10

CHAPTER THREE: ASSESSMENT RESULTS..................................................11

3.1 Psychometric Characteristics..................................................................... 11 3.2 Classical Test Theory ................................................................................ 11 3.3 Item Response Theory............................................................................... 14 3.4 Scaled Scores............................................................................................ 15 3.5 Vertical Scaled Scores............................................................................... 18 3.6 Comparison between Pilot and Comparison Groups ................................. 19 3.7 Comparison across Regions ...................................................................... 24 3.8 Performance Categories ............................................................................ 25

CHAPTER FOUR: SUMMARY AND CONCLUSIONS .........................................28

APPENDIX 1: ITEM STATISTICS BY SUBJECT APPENDIX 2: SCORES AND FREQUENCIES - GRADE 5 PRE-TESTS APPENDIX 3: SCORES AND FREQUENCIES - GRADE 5 POST-TESTS

APPENDIX 4: HISTOGRAMS BY SUBJECT AND GROUP

1

Page 3: Ca Baseline and Post test assessment report 2007 12 oct07

ACKNOWLEDGMENTS The Continuous Assessment Joint Steering and Technical Committees and the Examinations Council of Zambia wish to express profound gratitude to the professional and material support provided by the Provincial Education Offices, District Education Boards, Educational Zone staff in the different districts, school administrators, teachers and pupils. Without this support, the baseline and post-pilot assessment exercises would not have succeeded. Other appreciations go to the management in the Directorate for Curriculum and Assessment in the Ministry of Education for providing professional support towards the Continuous Assessment programme in general and the assessment exercises in particular. We wish to specifically thank the Director for Standards and Curriculum, the Director for the Examinations Council of Zambia, and the Chief Curriculum Specialist for allowing their personnel to take part in the assessment exercise. Finally, we wish to express our appreciation to the USAID and the EQUIP2 Project for providing the finances and technical support towards the Continuous Assessment programme in Zambia. All of the participants and stakeholders listed above have played a crucial role in not only developing and implementing the Continuous Assessment programme, but have also been supportive of the quantitative evaluation of the programme presented in this technical paper. It is because of their interest in improving student learning outcomes that the Continuous Assessment programme has had the necessary financial, administrative and technical support. Our hope is that the programme will prove to be valuable for all of the pupils and teachers in Zambian schools.

2

Page 4: Ca Baseline and Post test assessment report 2007 12 oct07

Chapter One: Background

1.1 Introduction to Continuous Assessment

Over the years in Zambia, the education system has not been able to provide enough spaces for all learners to proceed from Grade 7 to Grade 8, from Grade 9 to Grade 10, and from Grade 12 to higher learning institutions. The system has used examinations for selection of those to proceed to the next level and for the certification of candidates; however, this has been done without formal consideration of the school-based assessment as a component in the final examinations, with the exception of some practical subjects.

The 1977 Educational Reforms explicitly provided for the use of Continuous Assessment (CA). Later, national policy documents, particularly Educating Our Future (1996) and Ministry of Education’s Strategic Plan 2003-2007, stated the need for integrating school-based continuous assessment into the education system, including the development of strategies to combine CA results with the final examination results for purposes of pupil certification and selection.

Furthermore, the national education policy, as stated in Educating Our Future, stipulated that the Ministry of Education will develop procedures that will enable teachers to standardise their assessment methods and tasks for use as an integral part of school-based CA. The education policy document also stated that the Directorate of Standards, in cooperation with the Examinations Council of Zambia (ECZ), will determine how school-based CA can be better conducted so that it can contribute to the final examination results for pupil certification and promotion to the subsequent levels. The policy also stated that the Directorate of Standards, with input from the ECZ, will determine when school-based CA can be introduced.

In order to set in motion the implementation of school-based CA, the ECZ convened a preparatory workshop from 16th to 22nd November 2003 in Kafue. Ninety (90) participants from various stakeholders’ institutions took part. The objectives of the preparatory workshop were to:

• Recommend a plan for developing and implementing CA; • Recommend a training plan for preparing teachers in implementing CA; • Explore ways of ensuring transparency, reliability, validity and

comparability in using CA results; • Agree on common assessment tasks and learning outcomes to be

identified in the syllabuses for CA; • Discuss the development of a teacher’s manual on CA; and • Discuss the nature of summary forms for recording marks that should be

provided to schools.

3

Page 5: Ca Baseline and Post test assessment report 2007 12 oct07

1.2 Definition of Continuous Assessment

Continuous assessment is defined as an on-going, diagnostic, classroom-based process that uses a variety of assessment tools to measure learner performance. CA is a formative evaluation tool conducted during the teaching and learning process with the aim of influencing and informing the overall instructional process. It is the assessment of the whole learner on an ongoing basis over a period of time, where cumulative judgments of the learner’s abilities in specific areas are made in order to facilitate further positive learning (Le Grange & Reddy, 1998).1

The data generated from CA should be useful in assisting teachers to plan for the learning by individual pupils. It also should assist teachers in identifying the unique understanding of each learner in a classroom by informing the pupil of the level of instructional attainment, helping to target opportunities that promote learning, and reducing anxiety and other problems associated with examinations. CA has shown to have had positive impacts on student learning outcomes in hundreds of educational settings (Black & William, 1998).2

CA is made up of a variety of assessment methods that can be formal or informal. It takes place during the learning process when it is most necessary, making use of criterion referencing rather than norm referencing and providing feedback on how learners are changing.

1.3 Challenges in the Implementation of Continuous Assessment

There are several areas in which the implementation of CA in the classroom will present challenges. Some of these are listed below. • Large class sizes in most primary schools are a major problem. It is

common to find classes of 60 and above in Zambian classrooms. Teachers are expected to mark and keep records of the progress of all of these learners.

• CA can take a lot of time for teachers. As a result, teachers get concerned that time spent on remediation and enrichment is excessive and many teachers do not believe that they would finish the syllabus with CA.

• CA will not be successfully implemented if there are inadequate teaching resources / equipment in schools. Teachers need materials and equipment such as stationery, computers and photocopiers (and electricity).

• There may be cases of resistance from school administrators and teachers if they feel left out in the process of developing the CA programme.

• CA requires the cooperation of communities and parents. If they do not understand what is expected of them, they may resist and hence affect the success of the programme.

1Le Grange, L.L. & Reddy, C. 1998. Continuous Assessment: An Introduction and Guidelines to Implementation. Cape Town, South Africa: Juta. 2Black, P. & William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7-74.

4

Page 6: Ca Baseline and Post test assessment report 2007 12 oct07

1.4 Guidelines for Implementation of Continuous Assessment

A teachers’ guide on the implementation of continuous assessment at the basic school level was developed with the involvement of curriculum specialists, Standards officers, Examinations specialists, Provincial Education Officials, District Education Officials, Zonal in-Service training providers, school administrators and teachers.

The Teachers’ Guide on CA comprises the following:

• Sample record forms; • Description of the CA schemes; • Instructions for preparing and administering assessment materials; • Marking and moderation of the CA marks; • Recording and reporting assessment results; and • Monitoring of the implementation of the CA.

The Teachers’ Guide also specifies the roles of stakeholders as follows:

Teachers

• Plan assessment tasks, projects and mark schedules; • Teach, guide and supervise pupils in implementing given tasks; • Conduct the assessment in line with given guidelines; • Mark and record the results; • Provide correction and remedial work to the pupils; • Inform the head teacher and parents on the performance of the child; • Advise and counsel the pupils on their performance in class tasks; • Take part in internal moderation of pupils’ results.

School Administrators

• Provide an enabling environment, such as the procurement of teaching and learning materials;

• Act as links between the school and other stakeholders like ECZ, traditional leaders, politicians and parents;

• Ensure validity, reliability and comparability through moderation of CA; • Compile CA results and hand them to ECZ.

Parents

• Provide professional, moral, financial and material support to pupils. • Continuously monitor their children’s attendance and performance • Take part in making and enforcing school rules. • Attend open days and witness the giving of prizes (rewards) to outstanding

pupils in terms of performance.

5

Page 7: Ca Baseline and Post test assessment report 2007 12 oct07

Standards Officers

• Interpret Government of Zambia policy on education; • Monitor education policy implementation at various levels of the education

system; • Advise and evaluate the extent to which the education objectives have

been achieved; • Ensure that acceptable assessment practices are conducted; • Monitor the overall standards of education.

Guidance Teachers/School Counsellors

• Prepare and store record cards for CA; • Counsel pupils, teachers and parents/ guardians on CA and feedback; • Take care of the pupils’ psycho-social needs; • Make referrals for pupils to access other specialized assistance/support.

Heads of Department/Senior Teachers/Section Heads

• Monitor and advise teachers in the planning, setting, conducting, marking and recording of CA results;

• Ensure validity, reliability and dependability of CA by conducting internal moderation of results;

• Hold departmental meetings to analyze the assessment; • Provide or make available the teaching and learning materials; • Compile a final record of CA results and hand them over to Guidance

Teachers for onward submission to the ECZ.

District Resource Centre Coordinators

• Ensure adequate in service training for teachers in planning, conducting, marking, moderating and recording results at school level in the district;

• Monitor the conduct of CA in the schools and district; • Professionally guide teachers to ensure provision of quality education at

school level.

Provincial Resource Centre Coordinators

• Ensure adequate in-service training for teachers for them to be effective in planning, conducting, marking, moderating and recording CA results;

• Monitor the conduct of CA in the province; • Professionally guide teachers to ensure provision of quality education at

provincial level.

Examinations Specialist

• Analyse and moderate CA results and certify candidates; • Integrate CA results with terminal examination results; • Determine grade boundaries; • Certify the candidates;

6

Page 8: Ca Baseline and Post test assessment report 2007 12 oct07

• Disseminate the results of candidates.

Monitors

As monitors of the CA programme, various officials and stakeholders will look out for the following documents and information:

• Progress chart; • Record of CA results and analysis; • Marked evidence of pupils’ CA work on remedial activities; • Evaluating gender performance; • Pupil’s Record Cards; • CA plans or schedules and schemes; • Evidence of pupils’ work; • CA administration; • Evidence of remedial work; • Availability of planned remedial work in the classroom; • Availability of the teacher’s guide; • Sample CA tasks; • Evidence of a variety of CA tasks; • Teacher’s record of pupils’ performance.

1.5 Plan for Implementation of Continuous Assessment

CA in Zambia is planned to roll out over a period of several years. This will allow for proper stakeholder support and evaluation. The following list provides the brief timeline of important CA activities through 2008: • Creation of CA Steering and Technical Committees (2005); • Development of assessment schemes, teacher’s guides, model

assessment tasks booklets and recordkeeping forms (2005); • Design of quantitative evaluation methodology with focus on student

learning outcomes (2005); • Implementation of CA pilot in Phase 1 schools: Lusaka, Southern and

Western regions (2006); • Baseline report on student learning outcomes (2006); • Implementation of CA pilot in Phase 2 schools: Central, Copperbelt and

Eastern Regions (2007); • Expansion of modified CA pilot to community schools (2007); • Post-test report on student learning outcomes (2007); • Implementation of CA pilot in Phase 3 schools: Luapula, Northern and

Northwestern Regions (2008); • Discussion of scaling up of CA pilot and systems-level planning for

combining Grade 7 end-of-cycle summative test scores with CA scores for selection and certification purposes (2008).

7

Page 9: Ca Baseline and Post test assessment report 2007 12 oct07

Chapter Two: Evaluation Methodology

2.1 Objectives

The main objective of the quantitative evaluation is to determine whether the CA programme has had positive effects on student learning outcomes. The evaluation allows for a determination of whether pupils’ academic performance has changed as a result of the CA intervention, as well as the extent of the change in performance.

2.2 Design

The evaluation design is quasi-experimental, with pre-test and post-tests administered to intervention (pilot) and control (comparison) groups. It features a pre-test at the beginning of Grade 5 and post-tests at the end of Grades 5, 6, and 7. The pilot and comparison groups will be compared at each time point in 6 subject areas to see if there are differences in test scores from the baseline to the post-tests by group (see Figures 1 and 2 below).3 Figure 1: Pre-Test and Post-Test, Pilot and Control Group Design

Pilot Group

Pilot Group

Pilot Group

Pilot Group

Control Group

Control Group

Control Group

Control Group

Grade 5 Pre-test

Grade 5 Post-test

Grade 6 Post-test

Grade 7 Post-test

Figure 2: Expected Results from the Evaluation

200250300350400450500550600650

G5 Pre-test G5 Post-test G6 Post-test G7 Post-test

Assessment

Scal

ed S

core

PilotControl

3 For more information, refer to the Summary of the Continuous Assessment Program August 2007 by the Examinations Council of Zambia and the EQUIP2-Zambia project.

8

Page 10: Ca Baseline and Post test assessment report 2007 12 oct07

With the matched pairs random assignment design, it was expected that the two groups, pilot and control, would have similar mean scores on the pre-test. However, with a successful intervention, it was expected that the pilot group would score higher than the control group on the subsequent post-tests.

2.3 Sample

The sample included all the 2006 (pre-test) and 2007 (post-test) Grade 5 basic school pupils in Lusaka, Southern and Western Provinces in the 24 pilot (intervention) and 24 comparison (control) schools. The schools were chosen using matched pairs by geographic location, school size, and grade levels as matching variables, followed by random assignment to pilot and comparison status. CA activities were implemented in pilot schools but not in the comparison schools.

2.4 Instruments

Student achievement for the Grade 5 baseline and post-pilot administrations was measured using multiple choice tests with 30 items (30 points per test). The test development process included the following steps: • Review of the curriculums for each subject area; • Development of test specifications; • Development of items; • Piloting of items; • Data reviews of item statistics; • Forms pulling (selecting items for final test papers). The test instruments were developed by teams of Curriculum Specialists, Standards Officers, Examination Specialists and Teachers. The baseline tests (pre-tests) were developed based on the Grade 4 syllabus and the post-pilot tests (post-tests) were developed based on the Grade 5 syllabus.

2.5 Administration

The ECZ organized the administration of both pre-test and post-test papers. Teams comprising an Examination Specialist, a Standards Officer and a Curriculum Specialist were sent to each region to supervise the administration. District Education officials, School Administrators and Teachers were involved in the actual administration of the tests. All of the Grade 5 pupils in the pilot and comparison schools sat for six tests, one in each of the six subject areas (English, Mathematics, Social and Development Studies, Integrated Science, Creative and Technology Studies and Community Studies). The baseline tests (Grade 4 syllabus) were administered to the students at the beginning of Grade 5, in February 2006. The post-pilot tests (Grade 5 syllabus) were administered in February 2007. Note that there will be two more administrations of post-tests for the cohort of students in the three provinces. These will take place in February 2008

9

Page 11: Ca Baseline and Post test assessment report 2007 12 oct07

(Grade 6 syllabus) and November 2008 (Grade 7 syllabus). This process will be repeated in Phases 2 and 3 schools (see Table 1 below). Table 1: Implementation Plan for CA Pilot

Phase 2006 2007 2008 2009 2010 Phase 1 (Lusaka, Southern, Western) Grade 5 Grade 6 Grade 7

Phase 2 (Central, Copperbelt, Eastern ) Grade 5 Grade 6 Grade 7

Phase 3 (Luapula, Northern, Northwestern) Grade 5 Grade 6 Grade 7

2.6 Data Capture and Scoring

Data were captured using Optical Mark Readers (OMR) and scored by use of the Faim software at the ECZ. Through this process, tem scores for all students were converted into electronic format and data files were produced for analysis.

2.7 Data Analysis

Data were analysed by use of the Statistical Package for Social Sciences (SPSS). Scores and frequencies by subject were generated. Analysed data were presented in tabular, chart and graphical forms. Additional analyses were conducted using WINSTEPS (item response theory Rasch modelling) software. SPSS was used for scaling the pupils’ scores.

10

Page 12: Ca Baseline and Post test assessment report 2007 12 oct07

Chapter Three: Assessment Results

3.1 Psychometric Characteristics

An initial step in determining the results from the assessments was to conduct analyses to determine the psychometric characteristics of the assessments. Both the Standards for Educational and Psychological Testing (1999)4 and the Code of Fair Testing Practices in Education (2004)5 include standards for identifying quality items. Items should assess only knowledge or skills that are identified as part of the domain being tested and should avoid assessing irrelevant factors (e.g., ambiguous and grammatical errors, sensitive content or language, etc.). Both quantitative and qualitative analyses were conducted to ensure that items on both Grade 5 baseline and post-pilot tests met satisfactory psychometric guidelines. The statistical evaluations of the items are presented in two parts, using classical test theory (CTT) and item response theory (IRT), which is sometimes called modern test theory.6 The two measurement models generally provide similar results, but IRT is particularly useful for test scaling and equating. CTT analyses included 1) difficulty index (p-value), 2) discrimination index (item-test correlations), and 3) test reliability (Cronbach's Alpha for an estimate of internal consistency reliability). IRT analyses included (1) calibration of items, and (2) examination of item difficulty index (i.e., b-parameter).

3.2 Classical Test Theory

Difficulty Indices (p) All multiple-choice items were evaluated in terms of item difficulty according to standard classical test theory practices. Difficulty was defined as the average proportion of points achieved on an item by the students. It was calculated by obtaining the average score on an item and dividing by the maximum possible score for the item. Multiple-choice items were scored dichotomously (1 point vs. no points, or correct vs. incorrect), so the difficulty index was simply the proportion of students who correctly answered the item. All items on Grade 5 pre-tests and post-tests had four response options. Table 2 shows the average p-values for each test. Note that this may also be calculated by taking the average raw score of all students divided by the maximum points (30) per test.

4 American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. 5 Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education. Washington, DC: American Psychological Association. 6 For more information, see Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York: Harcourt Brace.

11

Page 13: Ca Baseline and Post test assessment report 2007 12 oct07

Table 2: Overall Test Difficulty Estimates by Subject Area

Grade 5 Pre-test Grade 5 Post-test Subject Area

# Items Mean p-value # Items Mean

p-value English 30 0.40 30 0.37 Social and Developmental Studies 30 0.34 30 0.42 Mathematics 30 0.41 30 0.40 Integrated Science 30 0.33 30 0.36 Creative and Technology Studies 30 0.35 30 0.36 Community Studies 30 0.32 30 0.37

Items that are answered correctly by almost all students provide little information about differences in student ability, but they do indicate knowledge or skills that have been mastered by most students. Similarly, items that are correctly answered by very few students may indicate knowledge or skills that have not yet been mastered by most students, but such items provide little information about differences in student ability. In general, to provide best measurement, difficulty indices should range from near-chance performance of about 0.20 (for four-option, multiple-choice items) to 0.90. In general, the item difficulty indices for both Grade 5 pre-tests and post-tests were within generally acceptable and expected ranges (see Appendix 1 for a complete list of p-values for all items on each test). Item Discrimination (Item-Test or Point-Biserial Correlations) One desirable feature of an item is that the higher performing students do better on the item than lower performing students. The correlation between student performance on a single item and total test score is a commonly used measure of this characteristic of an item. Within classical test theory, the item-test (or point-biserial) correlation is referred to as the item’s discrimination because it indicates the extent to which successful performance on an item discriminates between high and low scores on the test. The theoretical range of these statistics is –1 to +1, with a typical range from 0.2 to 0.6. Discrimination indices can be thought of as measures of how closely an item assesses the same knowledge and skills assessed by other items contributing to the total score. Discrimination indices for Grade 5 are presented in Table 3. Table 3: Overall Test Discrimination Estimates by Subject Area

Grade 5 Pre-test Grade 5 Post-test Subject Area

# Items Mean Pt-bis # Items Mean

Pt-bis English 30 0.46 30 0.48 Social and Developmental Studies 30 0.38 30 0.45 Mathematics 30 0.37 30 0.41 Integrated Science 30 0.35 30 0.43 Creative and Technology Studies 30 0.38 30 0.44 Community Studies 30 0.29 30 0.43

12

Page 14: Ca Baseline and Post test assessment report 2007 12 oct07

On average, the discrimination indices were within acceptable and expected ranges (i.e., 0.20 to 0.60). The positive discrimination indices indicate that students who performed well on individual items tended to perform well overall on the test. There were no items on the instruments that had near-zero discrimination indices (see Appendix 1 for a complete list of the point-biserial correlations for all items on each pre-test and post-test per subject area).

Test Reliabilities

Although an individual item’s statistical properties is an important focus, a complete evaluation of an assessment must also address the way items function together and complement one another. There are a number of ways to estimate an assessment’s reliability. One possible approach is to give the same test to the same students at two different points in time. If students receive the same scores on each test, then the extraneous factors affecting performance are small and the test is reliable. (This is referred to as test-retest reliability.) A potential problem with this approach is that students may remember items from the first administration or may have gained (or lost) knowledge or skills in the interim between the two administrations. A solution to the ‘remembering items’ problem is to give a different, but parallel test at the second administration. If the student scores on each test correlate highly, the test is considered reliable. (This is known as alternate forms reliability, because an alternate form of the test is used in each administration.) This approach, however, does not address the problem that students may have gained (or lost) knowledge or skills in the interim between the two administrations. In addition, the practical challenges of developing and administering parallel forms generally preclude the use of parallel forms reliability indices. One way to address these problems is to split the test in half and then correlate students’ scores on the two half-tests; this in effect treats each half-test as a complete test. By doing this, the problems associated with an intervening time interval, and of creating and administering two parallel forms of the test, are alleviated. This is known as a split-half estimate of reliability. If the two half-test scores correlate highly, items on the two half-tests must be measuring very similar knowledge or skills. This is evidence that the items complement one another and function well as a group. This also suggests that measurement error will be minimal. The split-half method requires a judgment regarding the selection of which items contribute to which half-test score. This decision may have an impact on the resulting correlation; different splits will give different estimates of reliability. Cronbach (1951)7 provided a statistic, α (alpha), that avoids this concern about the split-half method. Cronbach’s α gives an estimate of the average of all possible splits for a given test. Cronbach’s α is often referred to as a measure of internal consistency because it provides a measure of how well all the items in the test measure one single underlying ability. Cronbach’s α is computed using the following formula:

7Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests. Psychometrika, 16, 297–334.

13

Page 15: Ca Baseline and Post test assessment report 2007 12 oct07

( )2

121

1 x

n

ii

Yn

n

σα

σ=

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

= −−

where, i : Item : Total number of items, n : Individual item variance, and )(2

iYσ : Total test variance 2

xσ For standardized tests, reliability estimates should be approximately 0.80 or higher. According to Table 4, the reliabilities for the tests on the pre-test ranged from 0.63 (Community Studies) to 0.87 (English). The reliability estimate for Community Studies was low due to the absence of a national curriculum for use in test construction. In contrast, the reliability estimates for the post-tests ranged 0.83 (Mathematics) to 0.89 (English). It is likely that the post-tests had higher reliability estimates since the test developers had more experience than they had when they developed the baseline tests. Table 4: Test Reliability Estimates by Subject Area

Grade 5 Pre-test Grade 5 Post-test Subject Area

# Items Coefficient Alpha # Items Coefficient

Alpha English 30 0.87 30 0.89 Social and Developmental Studies 30 0.80 30 0.87 Mathematics 30 0.79 30 0.83 Integrated Science 30 0.76 30 0.85 Creative and Technology Studies 30 0.80 30 0.86 Community Studies 30 0.63 30 0.85

3.3 Item Response Theory

Item Response Theory (IRT) uses mathematical models to define a relationship between an unobserved measure of student ability, usually referred to as theta (θ ), and the probability ( p ) of getting a dichotomous item correct. In IRT, it is assumed that all items are independent measures of the same construct or ability (i.e., the same θ ). The process of determining the specific mathematical relationship between θ and p is referred to as item calibration. Once items are calibrated, they are defined by a set of parameters which specify a non-linear relationship between θ and p .8

8 For more information about item calibration, see the following references: Lord, F.M. and Novick, M.R. (1968). Statistical Theories of Mental Test Scores. Boston, MA: Addison-Wesley; Hambleton, R.K. and Swaminathan, H. (1984). Item Response Theory: Principles and Applications. New York: Springer.

14

Page 16: Ca Baseline and Post test assessment report 2007 12 oct07

For the CA programme, a 1-parameter or Rasch model was implemented. The equation for the Rasch model is defined as probability of giving correct response to item i by a student with ability level of θ :

)(exp1)(exp

)(i

ii bD

bDP

−+−

θθ

Where, i = item,

b = item difficulty, D = a normalizing constant equal to 1.701.

In IRT, item difficulty ( ib ) and student ability (θ ) are measured on a scale of

to . A scale of ∞− ∞+ 0.3− to 0.3+ is used operationally in educational assessment programmes. with 0.3− being low student ability or an easy item and being high student ability or a difficult item. The ib parameter for an item is the position on the ability scale where the probability of a correct response is 0.50.

0.3+

The WINSTEPS program was the software used to do the IRT analyses. The item parameter files resulting from the analyses are provided in Appendices 2 and 3. This presentation is direct output from WINSTEPS.9 Raw scores were then scaled using the item response theory model, with a range of 100-500 (see Appendices 2 and 3 for the raw score to scale score conversion tables for each subject area).

3.4 Scaled Scores

The Grade 5 pre-test and post-test scores in each subject area are reported on a scale that ranges from 100 to 500. Students’ raw scores or total number of points, on the pre-tests and post-tests are translated to scaled scores using a data analysis process called scaling. Scaling simply converts raw points from one scale to another. In the same way that distance can be expressed in miles or kilometres, or monetary value can be expressed in terms of U.S. dollars or Zambian Kwacha, student scores on both pre and post-tests could be expressed as raw scores (i.e., number of points) or scaled scores. Cut points were established on the raw score scale both for the pre-tests and post-tests (see Section 3.8 “Performance Levels” for an explanation of how these cut points were determined). Once the raw score cut points were determined via standard setting, the next step was to compute theta cuts using the test characteristic curve (TCC) mapping procedure and then calculate the transformation coefficients that would be used to place students’ raw scores onto the theta scale then onto the scaled score used for reporting. As previously stated, student scores on the assessments are reported in integer values from 100 to 500 with two scores representing cut scores on each assessment. Two cut points (Unsatisfactory/Satisfactory and Satisfactory/Advanced) were pre-set at 250 and 350, respectively.

9 See the WINSTEPS user’s manual for additional details regarding this output (at http://www.winsteps.com).

15

Page 17: Ca Baseline and Post test assessment report 2007 12 oct07

Figure 3: Scaled Score Conversion Procedure

Raw Score Cut Scores (from Standard Setting)

Conversion of Raw Score Cuts into theta cuts 1θ and 2θ Using TCC Mapping

Calculation of Scaled Score constants (b and m) using theta cuts ( 1θ , 2θ ), and scaled score cuts (250 and 350)

Calculation of Scaled Score using bm +)(θ

The scaled scores are obtained by a simple linear transformation of the theta score using the values of 250 and 350 on the scaled score metric and the associated theta cut points to define the transformation. The scaling coefficients were calculated using the following formulae:

)(250 1θmb −= )(350 2θmb −=

)()250350(

12 θθ −−

=m

Where m is the slope of the line providing the relationship between the theta and scaled scores, b is the intercept, 1θ is the cut score on the theta score metric for the Unsatisfactory/Satisfactory cut (i.e., corresponding to the raw score cut for Unsatisfactory/Satisfactory), and 2θ is the cut score on the theta score metric for the Satisfactory/Advanced cut (i.e., corresponding to the raw score cut for Satisfactory/Advanced). Scaled scores were then calculated using the following linear transformation (see Figure 1):

Scaled Score = bm +)(θ Where, θ represents a student’s theta (or ability) score. The values obtained using this formula were rounded to the nearest integer and then truncated such that no student received a score below 100 or above 500. Table 4 presents the mean raw score for each grade/subject area combination in pre and post-tests. It is important to note that converting from raw scores to scaled scores does not change the students’ performance-level classifications. For the Zambia CA programme, a score of 250 is the cut score between Unsatisfactory and Satisfactory and a score of 350 is the cut score between Satisfactory and Advanced. This is true regardless of which subject area, grade, or year one may be concerned with. Scaled scores supplement the pre-test and post-test results by providing information about the position of a student’s results within a performance level. For instance, if the range for a performance level is 200 to 250, a

16

Page 18: Ca Baseline and Post test assessment report 2007 12 oct07

student with a scaled score of 245 is near the top of the performance level, and close to the next higher performance level. School level scaled scores are calculated by computing the average of student-level scaled scores. Table 5 provides the raw score averages for each of the subject areas, while Table 6 provides the same information in scaled scores.

Table 5: Grade 5 Mean Raw Scores by Subject Area

Grade 5 Pre-test Grade 5 Post-test Subject Area #

Items N Mean Std. Dev. N Mean Std.

Dev. English 30 3798 12.2 6.5 4025 11.7 7.1 Social and Developmental Studies 30 3962 10.1 5.3 4104 13.2 6.6 Mathematics 30 3883 12.3 5.3 4127 12.4 5.8 Integrated Science 30 4039 9.9 4.9 4135 11.1 6.3 Creative and Technology Studies 30 4032 10.5 5.3 4097 11.7 6.2 Community Studies 30 4037 9.5 4.0 4141 11.2 6.4

According to Table 5, overall mean raw scores (with both pilot and comparison groups taken together) across the subject areas on the pre-test ranged from 9.5 (Community Studies) to 12.3 (Mathematics) out of possible score point of 30. In contrast, the overall mean raw scores for the post-tests ranged from 11.1 (Integrated Science and Creative and Technology Studies) to 13.2 (Social and Developmental Studies). From Table 6, the scaled score averages for Grade 5 pre-tests ranged from 214 (Community Studies) to 239 (English) out of possible score point of 100-500. In contrast, the scaled score averages for the post-tests ranged from 233 (English) to 262 (Mathematics).

Table 6: Grade 5 Mean Scaled Scores by Subject Area

Grade 5 Pre-test Grade 5 Post-test Subject Area #

Items N Mean Std. Dev. N Mean Std.

Dev. English 30 3798 238.8 83.7 4025 233.4 88.1 Social and Developmental Studies 30 3962 230.5 86.2 4104 241.2 83.9 Mathematics 30 3883 222.4 89.2 4127 261.9 72.6 Integrated Science 30 4039 226.5 80.2 4135 245.7 73.7 Creative and Technology Studies 30 4032 224.1 85.3 4097 244.3 83.0 Community Studies 30 4037 214.0 83.7 4141 236.9 72.3

It was stated earlier that scaled score is a simple linear transformation of the raw scores, using the values of 250 and 350 on the scaled score metric. Student’s relative position on the raw score matrix does not change due to this scale transformation.

Note that the primary interest of this evaluation is not whether the raw scores and/or scaled scores increase or decrease from pre-test to post-test. These differences will occur mainly through variations in test difficulty. The main analysis will compare the relative changes in the two groups, i.e., pilot and

17

Page 19: Ca Baseline and Post test assessment report 2007 12 oct07

comparison, across the two time points, i.e., pre-test to post-test. At a later point, post-tests will also be conducted when the cohort of students is in Grade 6 and Grade 7, followed by extended analyses for the two additional time points.

3.5 Vertical Scaled Scores

In vertical scaling, tests that vary in difficulty level, but that are intended to measure similar constructs, are placed on the same scale. Placing different tests on the same scale can be implemented in a number of ways, such as, linking items across the tests or social moderation. For the CA programme, a social moderation (Linn, 1993) procedure was employed for vertical scaling.10

In social moderation, assessments are developed in reference to a common content framework. Performance of individual students, and schools, is measured against a single set of common standards. For Zambia, an analysis of the Grade 4 and 5 curriculums showed that the content was vertically aligned, i.e., students were expected to progress in their learning along the same constructs from one grade level to the next. This allowed the test developers to link the pre-tests and post-tests through common performance standards. The visual representation of the vertical scaling scheme for the CA programme is shown below. Figure 4: Vertical Scaling Scheme

250 350

350 450

450 550

550 650

Grade 5 Pre-test:

Grade 5 Post-test:

Grade 6 Post-test:

Grade 7 Post-test:

In other words, students who were classified as Advanced in the Grade 5 pre-test (i.e., end of Grade 4 syllabus) would also be considered as Satisfactory in Grade 5 post-test (i.e., end of Grade 5 syllabus) and students who classified as Advanced in Grade 5 post-test would be considered as Satisfactory in Grade 6 post (end of Grade 6 test) so on through Grade 7. In the vertical

10 Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1), 83-102.

18

Page 20: Ca Baseline and Post test assessment report 2007 12 oct07

scaled score matrix, students who earned a grade level scaled score of 250 on Grade 5 post-test would also earn a vertical scaled score of 350 (because 350 is the equivalent grade level scaled score in Grade 5 pre-test). Therefore, grade level scaled scores and vertical scaled scores is differed by a constant value of 100 points. The mean vertical scaled scores for each subject are shown in Table 7.

Table 7: Grade 5 Mean Vertical Scaled Scores by Subject Area

Grade 5 Pre-test Grade 5 Post-test Subject Area #

Items N Mean Std. Dev. N Mean Std.

Dev. English 30 3798 238.8 83.7 4025 333.4 88.1 Social and Developmental Studies 30 3962 230.5 86.2 4104 341.2 83.9 Mathematics 30 3883 222.4 89.2 4127 361.9 72.6 Integrated Science 30 4039 226.5 80.2 4135 345.6 73.7 Creative and Technology Studies 30 4032 224.1 85.3 4097 344.4 83.0 Community Studies 30 4037 214.0 83.7 4141 336.9 72.3

Figure 5 shows that mean vertical scaled scores on pre and post-tests across the subject areas. Vertical scaled scores for the pre-test are basically the grade level scaled scores. As expected, vertical scaled scores for Grade 5 post-test are higher than the Grade 5 pre-test scaled scores.

Figure 5: Vertical Scaled Mean Scores by Subject Area

0

100

200

300

400

Eng. SDS Math. ISC CTS CS

vert

ical

Sca

led

Scor

e

PRE

POST

3.6 Comparison between Pilot and Comparison Groups

The comparisons between pilot and comparison groups were made in raw scores and vertical scaled scores. Although raw scores in the pre and post tests are not on the same scale as the tests are of varied difficulty, however the comparison was made for simplicity. Comparison would be more relevant, valid, and beneficial when they are compared on the vertical scaled score. Note that vertical scaled scores for the pre and post tests are on the same scale.

19

Page 21: Ca Baseline and Post test assessment report 2007 12 oct07

Raw Scores Table 8 shows that the raw score mean differences between the pilot and comparison schools on the Grade 5 pre-tests were small for each subject area. The mean differences, analyzed using t-tests, were statistically significant only in English and Mathematics, with the pupils in comparison group performing better than those in the pilot group (p<.05). In the other four subjects, the t-tests showed no significant differences between the two groups on the baseline. In raw scores, differences in English and Mathematics were about a half-point, while the differences for the other subjects had a maximum difference of two-tenths of a point. These results reflected the expectation of very small differences on the pre-tests, since the schools were randomly assigned to one of the two groups based on a matched pairs design. Table 8: Mean Raw Scores by Subject Area and Group

Grade 5 Pre-test Grade 5 Post-test Subject Area Group

N Mean Std. Dev. N† Mean Std. Dev. Pilot 1785 11.9 6.4 1773 13.3* 1.6 Comparison 2013 12.4* 6.6 1967 12.2 1.6 English Total 3798 12.2 6.5 3740 12.8 1.6 Pilot 1907 10.0 5.2 1895 14.9* 1.3 Comparison 2055 10.2 5.5 2008 13.7 1.3

Social and Developmental Studies Total 3962 10.1 5.3 3903 14.3 1.3

Pilot 1861 12.0 5.3 1849 13.8* 1.4 Comparison 2022 12.6* 5.3 1975 13.2 1.4 Mathematics Total 3883 12.3 5.3 3824 13.5 1.4 Pilot 1961 9.8 4.9 1949 13.2* 1.9 Comparison 2078 9.9 4.9 2031 11.2 1.8 Integrated Science Total 4039 9.9 4.9 3980 12.2 1.9 Pilot 1967 10.5 5.2 1955 12.9* 1.5 Comparison 2065 10.6 5.4 2018 11.7 1.5 Creative and

Technology Studies Total 4032 10.5 5.3 3973 12.3 1.5 Pilot 1979 9.5 4.0 1967 13.4* 1.6 Comparison 2058 9.5 3.9 2011 12.5 1.6 Community Studies Total 4037 9.5 4.0 3978 13.0 1.6

* Significant at p<0.05; † represents adjusted weighted sample size. The differences between the two groups for all subject areas in Grade 5 post-test (also in Table 8),were evaluated using an Analysis of Covariance (ANCOVA), with the pre-test scores as the covariates. In other words, the pre-tests scores were made statistically equivalent so that the groups could be evaluated on an equal basis on the post-tests. Using the raw scores, the results were statistically significant in each of the subject areas, with the pilot group outperforming the comparison group (p<.05). Note that all statistical comparisons were made at the school level, not at the student level. This was due to changes in student population at each school from pre-test to post-test. The design was based on cohorts (student groups

20

Page 22: Ca Baseline and Post test assessment report 2007 12 oct07

over time) and not on panels (the same students over time). A panel design would have been statistically possible, but it would also have led to skewed results due to student attrition. Vertical Scaled Scores As started, vertical scaled scores on the pre and post tests were computed independently both for pilot and comparison groups and were measured on the same scale (i.e., vertical scale). This makes the comparison more relevant and valid to assess the impact of CA in the pilot schools compared to the comparison schools. Table 9: Mean Vertical Scaled Scores by Subject Area and Group

Grade 5 Pre-tests Grade 5 Post-tests Subject Area Group

N Mean Std. Dev. N† Mean Std. Dev. Pilot 1785 236.1 82.4 1773 352.3* 20.3 Comparison 2013 241.2* 84.8 1967 339.9 20.3 English Total 3798 238.8 83.7 3740 346.1 20.3 Pilot 1907 229.1 84.3 1895 362.4* 17.7 Comparison 2055 231.8 87.9 2008 346.2 17.7

Social and Developmental Studies Total 3962 230.5 86.2 3903 354.3 17.7

Pilot 1861 217.8 89.3 1849 380.5* 17.1 Comparison 2022 226.7* 88.9 1975 373.1 17.1 Mathematics Total 3883 222.4 89.2 3824 376.8 17.1 Pilot 1961 225.5 80.1 1949 369.5* 20.4 Comparison 2078 227.4 80.4 2031 348.0 20.4 Integrated Science Total 4039 226.5 80.2 3980 358.8 20.4 Pilot 1967 223.0 84.0 1955 357.1* 16.0 Comparison 2065 225.1 86.5 2018 343.5 16.0 Creative and

Technology Studies Total 4032 224.1 85.3 3973 350.3 16.0 Pilot 1979 213.7 84.3 1967 365.8* 22.1 Comparison 2058 214.2 83.1 2011 352.8 22.1 Community Studies Total 4037 214.0 83.7 3978 359.3 22.1

* Significant at p<0.05 Table 9 shows that the vertical scaled score mean differences between the pilot and comparison schools on the Grade 5 pre-tests were small for each subject area. The mean differences in all six subject areas, analyzed using t-tests, were not statistically significant (p>.05). In contrast, when the differences between the two groups for all subject areas in Grade 5 post-test (also in Table 9),were evaluated using an ANCOVA (with the pre-test scores as the covariates), the results were statistically significant in all subject areas, with the pilot group outperforming the comparison group (p<.05). Figures 6 through 11 show the differences in vertical scaled scores from the Grade 5 pre-test to the Grade 5 post-test for each of the subject areas. The graphs show clearly the greater score increases by the pilot groups in all subject areas except for Mathematics, where the increases were not as evident as in the other groups, though the pilot group started off lower.

21

Page 23: Ca Baseline and Post test assessment report 2007 12 oct07

Figure 6: English Mean Vertical Scores by Group

200220240260280300320340360380400

Grade 5 Pre-test Grade 5 Post-test

Vert

ical

Sca

led

Scor

e

PilotComparison

Figure 7: Social & Dev. Studies Mean Vertical Scores by Group

200220240260280300320340360380400

Grade 5 Pre-test Grade 5 Post-test

Vert

ical

Sca

led

Scor

e

PilotComparison

Figure 8: Mathematics Mean Vertical Scores by Group

200220240260280300320340360380400

Grade 5 Pre-test Grade 5 Post-test

Vert

ical

Sca

led

Scor

e

PilotComparison

22

Page 24: Ca Baseline and Post test assessment report 2007 12 oct07

Figure 9: Integrated Science Mean Vertical Scores by Group

200220240260280300320340360380400

Grade 5 Pre-test Grade 5 Post-test

Vert

ical

Sca

led

Scor

e

PilotComparison

Figure 10: Creative & Tech. Studies Mean Vertical Scores by Group

200220240260280300320340360380400

Grade 5 Pre-test Grade 5 Post-test

Vert

ical

Sca

led

Scor

e

PilotComparison

Figure 11: Community Studies Mean Vertical Scores by Group

200220240260280300320340360380400

Grade 5 Pre-test Grade 5 Post-test

Vert

ical

Sca

led

Scor

e

PilotComparison

23

Page 25: Ca Baseline and Post test assessment report 2007 12 oct07

3.7 Comparison across Regions

While not the focus of the evaluation, the next two sections have useful information on student performance. Tables 10 and 11 contain a brief analysis of the scores by region, providing information on the scores on a disaggregated basis. As with the overall analyses, the comparisons across the three regions were made in raw scores and vertical scaled scores. Lusaka Region consistently had the highest mean scores (both raw scores and vertical scaled scores) in all subjects on the Grade 5 pre-tests, followed by Western and Southern. The same pattern of results was also observed for Grade 5 post-tests. Table 10: Subject Area Mean Raw Scores by Region

Grade 5 Pre-test Grade 5 Post-test Subject Area Region

N Mean Std. Dev. N Mean Std Dev. Southern 1010 11.0 6.2 1157 10.4 6.6 Western 994 11.7 5.9 1103 11.9 6.7 Lusaka 1794 13.1 6.9 1765 12.4 7.5

English

Total 3798 12.2 6.5 4025 11.7 7.1 Southern 1014 9.4 4.8 1214 11.7 6.0 Western 1112 9.9 4.9 1125 13.2 6.1 Lusaka 1836 10.7 5.8 1765 14.1 7.0

Social and Developmental Studies

Total 3962 10.1 5.3 4104 13.2 6.6 Southern 1002 11.5 5.4 1226 11.1 5.2 Western 1086 12.2 5.2 1120 12.7 5.3 Lusaka 1795 12.9 5.2 1781 13.0 6.3

Mathematics

Total 3883 12.3 5.3 4127 12.4 5.8 Southern 1025 9.2 4.4 1212 9.6 5.4 Western 1151 9.4 4.6 1154 11.7 6.4 Lusaka 1863 10.6 5.3 1769 11.8 6.7

Integrated Science

Total 4039 9.9 4.9 4135 11.1 6.3 Southern 1016 9.6 4.8 1205 9.9 5.6 Western 1140 10.2 5.0 1146 11.3 6.0 Lusaka 1876 11.2 5.7 1790 11.9 6.9

Creative and Technology Studies

Total 4032 10.5 5.3 4141 11.2 6.4 Southern 1015 9.0 3.5 1191 10.5 5.3 Western 1146 9.4 4.3 1122 11.5 6.0 Lusaka 1876 9.8 4.0 1784 12.7 6.8

Community Studies

Total 4037 9.5 4.0 4097 11.7 6.2

24

Page 26: Ca Baseline and Post test assessment report 2007 12 oct07

Table 11: Subject Area Mean Vertical Scaled Scores by Region

Grade 5 Pre-test Grade 5 Post-test Subject Area Region

N Mean Std. Dev. N Mean Std Dev. Southern 1010 224.1 80.3 1157 317.3 82.8 Western 994 232.3 72.9 1103 335.0 81.0 Lusaka 1794 250.7 89.3 1765 343.0 94.1

English

Total 3798 238.8 83.7 4025 333.4 88.1 Southern 1014 218.5 77.4 1214 321.7 76.7 Western 1112 226.4 79.1 1125 341.1 78.1 Lusaka 1836 239.6 93.6 1765 354.7 89.5

Social and Developmental Studies

Total 3962 230.5 86.2 4104 341.2 84.0 Southern 1002 209.2 91.0 1226 346.6 66.1 Western 1086 219.9 86.2 1120 366.6 65.5 Lusaka 1795 231.3 89.0 1781 369.5 79.3

Mathematics

Total 3883 222.4 89.2 4127 361.9 72.6 Southern 1025 215.7 72.1 1212 328.9 63.5 Western 1151 218.1 76.1 1154 353.0 74.2 Lusaka 1863 237.5 85.5 1769 352.4 78.0

Integrated Science

Total 4039 226.5 80.2 4135 345.7 73.7 Southern 1016 209.8 77.9 1191 327.6 70.7 Western 1140 218.9 79.7 1122 340.7 79.5 Lusaka 1876 234.9 90.8 1784 357.7 90.3

Creative and Technology Studies

Total 4032 224.1 85.3 4097 344.3 83.0 Southern 1015 204.2 74.8 1205 323.4 64.3 Western 1146 213.1 88.6 1146 338.7 66.8 Lusaka 1876 219.8 84.6 1790 344.9 79.1

Community Studies

Total 4037 214.0 83.7 4141 336.9 72.3

3.8 Performance Categories

Depending on test difficulty and score distributions, performance categories were established for each of the tests using a procedure called standard setting. An Angoff (1971)11 standard setting method was implemented to set the cut scores between Unsatisfactory and Satisfactory and between Satisfactory and Advanced both for pre-tests and post-tests. The resultant cut scores are presented in Tables 12 and 13. In English, for example, students who got a score of 1-12 would be classified Unsatisfactory, students who got a score of 12-21 would be classified as Satisfactory and students who earned a score of 22-30 would be classified as Advanced on the pre-test. For Mathematics, the corresponding ranges are 1-13 Unsatisfactory, 14-19 Satisfactory, and 20-30 Advanced for the pre-test. The post-test ranges for each subject area are different from those on the pre-tests; the reason is that the pre-tests and post-tests covered different content and had different levels of difficulty.

11 Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In R.L. Thorndike (Ed.) Educational Measurement (2nd ed.). (pp. 508-560). Washington, DC: American Council on Education.

25

Page 27: Ca Baseline and Post test assessment report 2007 12 oct07

Table 12: Performance Categories for Pre-tests by Subject

Grade 5 Pre-test 1 2 3 Subject Area

Unsatisfactory (Fail)

Satisfactory (Pass)

Advanced (Pass)

English 1-12 13-21 22-30

Social and Developmental Studies 1-10 11-17 18-30

Mathematics 1-13 14-19 20-30

Integrated Science 1-10 11-17 18-30

Creative and Technology Studies 1-11 12-18 19-30

Community Studies 1-10 11-15 16-30

Table 13: Performance Categories for Post-tests by Subject

Grade 5 Post-test 1 2 3 Subject Area

Unsatisfactory (Fail)

Satisfactory (Pass)

Advanced (Pass)

English 1-12 13-21 22-30

Social and Developmental Studies 1-13 14-21 22-30

Mathematics 1-10 11-19 20-30

Integrated Science 1-10 11-20 21-30

Creative and Technology Studies 1-11 12-21 22-30

Community Studies 1-11 12-19 20-30

Tables 14 and 15 provide the percentages of students classified in the 3 performance categories by subject. On the pre-test, the percentages in each category by group were similar for most of the subjects. For instance, in Integrated Science, similar percentages of students were in the Satisfactory (Pass) category for the pilot (34%) and comparison (33%) groups. However, on the post-test, there were some differences for the groups, mostly in favour of the pilot group. In Integrated Science, 53% of students in the pilot group were Satisfactory vs. 43% of students in the comparison group. The percentages for each group favoured the pilot group on the post-test, with the exception of Mathematics where the rounded percentage passing was the same in the pilot (65%) and comparison (65%) groups.

26

Page 28: Ca Baseline and Post test assessment report 2007 12 oct07

Table 14: Percentages of Students in Performance Categories for Pre-tests

Grade 5 Pre-test

Subject Area Group 1 Unsatisfactory

(Fail)

2 Satisfactory

(Pass)

3 Advanced

(Pass) Pilot 63.0 27.2 9.8

English Comparison 59.7 28.2 12.1

Pilot 62.8 26.9 10.3 Social and Developmental Studies Comparison 64.4 24.0 11.6

Pilot 64.3 26.2 9.5 Mathematics

Comparison 60.1 29.4 10.5

Pilot 65.9 25.6 8.5 Integrated Science Comparison 67.3 22.9 9.8

Pilot 67.5 22.9 9.6 Creative and Technology Studies Comparison 68.4 20.1 11.5

Pilot 66.8 25.4 7.8 Community Studies Comparison 66.8 24.8 8.4

Table 15: Percentages of Students in Performance Categories for Post-tests

Grade 5 Post-test

Subject Area Group 1 Unsatisfactory

(Fail)

2 Satisfactory

(Pass)

3 Advanced

(Pass) Pilot 60.0 26.5 13.5

English Comparison 64.0 24.0 11.9

Pilot 51.4 33.4 15.3 Social and Developmental Studies Comparison 59.3 30.6 10.2

Pilot 35.2 53.9 10.9 Mathematics

Comparison 34.8 56.3 8.9

Pilot 46.7 40.2 13.1 Integrated Science Comparison 57.3 36.0 6.7

Pilot 54.5 35.1 10.4 Creative and Technology Studies Comparison 62.3 31.0 6.7

Pilot 50.4 33.9 15.6 Community Studies Comparison 54.4 36.2 9.5

27

Page 29: Ca Baseline and Post test assessment report 2007 12 oct07

Chapter Four: Summary and Conclusions The main objective of the evaluation was to determine whether the CA programme is having positive effects on student learning outcomes in the first year of implementation. This was accomplished by measuring and comparing the levels of learning achievement of pupils in pilot (intervention) and comparison (control) schools. A baseline (pre-test) assessment occurred before implementation of the proposed interventions at the beginning of Grade 5 in randomly selected pilot schools. This created a basis upon which the impact of CA was measured at the end of the Grade 5 pilot year. A sample of 48 schools was selected from Lusaka, Southern and Western Provinces using a matched pairs design and random assignment, resulting in 24 pilot schools and 24 comparison schools. Student achievement for the Grade 5 baseline and post-test administrations was measured using multiple choice tests in 6 subject areas with 30 items each (30 points per test). The Grade 5 baseline tests were based on the Grade 4 curriculum, while the Grade 5 post-tests were based on the Grade 5 curriculum. Overall, the psychometric characteristics of the tests were very satisfactory on both the pre-tests and post-tests. Items were within acceptable difficulty (p-value) ranges and discrimination (point-biserial correlation) levels. Overall tests were found reliable, using Cronbach's Alpha as an estimate of internal consistency reliability. Performance of the schools in the baseline and post-tests were compared using mean raw scores and mean vertical scaled scores. The vertical scaled score comparison was found more relevant, valid, and beneficial, since the school mean scores both on the baseline and post-tests were evaluated on the same measurement scale (i.e., vertical scale). In addition, statisticians generally prefer using scaled scores for longitudinal comparisons since the scale is equal interval, thus making comparisons more accurate. Overall, the pupils’ scores on the baseline pre-test were very similar in the pilot and comparison schools. The comparison schools scored slightly higher on the English and Mathematics tests, but the score differences for the two groups on the other four tests were minimal. On the post-test, which was administered after one year of the CA programme, the scores of the pilot schools on all six tests were significantly higher than those in the comparison schools. This provides strong initial evidence that the CA programme had a significantly positive effect on pupil learning outcomes. When the performance of the schools on the baseline and post-tests were compared by region, Lusaka Region consistently had the highest mean scores in all subjects on the Grade 5 pre-tests and post-test, followed by Western and Southern. The number of schools by region was too small to make statistically valid region-by-region comparisons of pre-test to post-test scores for the pilot and comparison groups. Students were also classified into three performance level categories (Unsatisfactory, Satisfactory, and Advanced) in each subject area based on their performance in baseline and post-tests. On the pre-tests, the

28

Page 30: Ca Baseline and Post test assessment report 2007 12 oct07

percentages in each category by group were similar for most of the subjects. However, on the post-test, there were differences in favour of the pilot group in virtually all subjects. For instance, in Integrated Science, 53% of students in the pilot group were Satisfactory and above vs. 43% of students in the comparison group. This provided strong evidence that a greater percentage of students in the pilot group were achieving a passing score on the post-test than those in the comparison group. The next round of post-tests in the Phase 1 schools will be administered when the same cohort of pupils completes Grade 6. This will be followed by a final test administration (a third post-test) when the cohort of pupils completes Grade 7. At that point, with four time points (a baseline and three post-tests), more substantial conclusions will be drawn on the effectiveness of the CA programme. Note also that the evaluation process is being repeated in the Phase 2 and Phase 3 schools, which will provide a complete national quantitative evaluation of the programme at the end of Year 5 of implementation (2010). Based on guidance from the CA Steering Committee, results from the evaluation will be used at a selected point in the implementation period as a criterion for scaling up the CA programme to other primary schools in Zambia.

29

Page 31: Ca Baseline and Post test assessment report 2007 12 oct07

Appendix 1: Item Statistics by Subject

Page 32: Ca Baseline and Post test assessment report 2007 12 oct07

Table A1: English Item Statistics

Seq. P-value Pre-test

Pt-Biserial Pre-test

1 .65 .47 2 .63 .53 3 .63 .52 4 .48 .56 5 .52 .55 6 .40 .53 7 .56 .58 8 .54 .55 9 .46 .56 10 .46 .41 11 .61 .52 12 .40 .52 13 .38 .47 14 .39 .50 15 .27 .46 16 .29 .42 17 .28 .40 18 .47 .55 19 .33 .40 20 .36 .46 21 .24 .46 22 .34 .30 23 .33 .36 24 .37 .47 25 .39 .46 26 .35 .42 27 .31 .38 28 .25 .28 29 .27 .32 30 .20 .29

Seq. P-value Post-test

Pt-Biserial Post-test

1 .65 .55 2 .51 .58 3 .48 .44 4 .41 .54 5 .40 .48 6 .29 .36 7 .50 .45 8 .46 .46 9 .52 .61 10 .35 .61 11 .26 .46 12 .21 .35 13 .33 .58 14 .36 .56 15 .35 .55 16 .33 .40 17 .22 .24 18 .36 .59 19 .42 .54 20 .40 .51 21 .34 .53 22 .38 .47 23 .21 .35 24 .38 .56 25 .41 .49 26 .35 .46 27 .34 .50 28 .30 .40 29 .38 .52 30 .27 .40

Page 33: Ca Baseline and Post test assessment report 2007 12 oct07

Table A2: Social and Developmental Studies Item Statistics

Seq. P-value Pre-test

Pt-Biserial Pre-test

1 .49 .52 2 .47 .39 3 .39 .49 4 .37 .32 5 .35 .47 6 .36 .35 7 .43 .51 8 .41 .41 9 .36 .21 10 .37 .43 11 .38 .49 12 .37 .48 13 .35 .42 14 .33 .34 15 .30 .46 16 .33 .41 17 .28 .30 18 .31 .26 19 .30 .46 20 .40 .45 21 .25 .44 22 .26 .43 23 .25 .41 24 .26 .29 25 .36 .31 26 .26 .32 27 .26 .19 28 .27 .37 29 .29 .19 30 .30 .25

Seq. P-value Post-test

Pt-Biserial Post-test

1 .66 .57 2 .53 .60 3 .66 .60 4 .58 .50 5 .51 .57 6 .48 .61 7 .52 .61 8 .42 .31 9 .44 .56 10 .49 .50 11 .34 .42 12 .39 .43 13 .51 .49 14 .43 .54 15 .36 .58 16 .36 .44 17 .39 .40 18 .42 .42 19 .37 .55 20 .34 .51 21 .32 .38 22 .35 .36 23 .32 .44 24 .38 .26 25 .38 .25 26 .34 .39 27 .36 .31 28 .32 .24 29 .27 .22 30 .30 .39

Page 34: Ca Baseline and Post test assessment report 2007 12 oct07

Table A3: Mathematics Item Statistics

Seq. P-value Pre-test

Pt-Biserial Pre-test

1 .81 .43 2 .59 .51 3 .46 .34 4 .49 .48 5 .54 .55 6 .57 .51 7 .44 .42 8 .46 .25 9 .43 .29 10 .50 .51 11 .43 .51 12 .34 .26 13 .39 .42 14 .46 .42 15 .48 .45 16 .30 .25 17 .36 .30 18 .32 .23 19 .33 .36 20 .27 .28 21 .52 .40 22 .57 .48 23 .32 .33 24 .40 .46 25 .31 .43 26 .27 .32 27 .30 .26 28 .21 .17 29 .19 .15 30 .25 .32

Seq. P-value

Post-test Pt-Biserial Post-test

1 .70 .56 2 .65 .55 3 .71 .57 4 .56 .55 5 .60 .54 6 .64 .52 7 .46 .48 8 .50 .50 9 .47 .32 10 .55 .34 11 .38 .44 12 .39 .44 13 .39 .45 14 .40 .45 15 .42 .28 16 .34 .32 17 .34 .46 18 .38 .48 19 .29 .34 20 .30 .35 21 .25 .37 22 .27 .40 23 .23 .34 24 .24 .33 25 .18 .23 26 .27 .33 27 .24 .28 28 .36 .48 29 .16 .18 30 .23 .30

Page 35: Ca Baseline and Post test assessment report 2007 12 oct07

Table A4: Integrated Science Item Statistics

Seq. P-value Pre-test

Pt-Biserial Pre-test

1 .49 .42 2 .33 .17 3 .45 .41 4 .41 .44 5 .31 .20 6 .40 .39 7 .28 .43 8 .31 .26 9 .34 .45 10 .29 .26 11 .43 .29 12 .31 .40 13 .52 .28 14 .37 .45 15 .36 .42 16 .41 .43 17 .34 .29 18 .30 .50 19 .37 .50 20 .26 .25 21 .29 .37 22 .26 .38 23 .28 .34 24 .24 .39 25 .20 .35 26 .25 .25 27 .27 .33 28 .29 .21 29 .23 .45 30 .30 .27

Seq. P-value Post-test

Pt-Biserial Post-test

1 .53 .56 2 .53 .56 3 .39 .57 4 .51 .49 5 .44 .52 6 .57 .48 7 .45 .49 8 .47 .53 9 .44 .48 10 .33 .51 11 .38 .34 12 .42 .49 13 .31 .44 14 .36 .51 15 .36 .40 16 .36 .49 17 .38 .55 18 .21 .21 19 .28 .42 20 .38 .48 21 .29 .47 22 .34 .49 23 .25 .29 24 .22 .16 25 .31 .38 26 .25 .29 27 .25 .36 28 .27 .40 29 .23 .27 30 .21 .33

Page 36: Ca Baseline and Post test assessment report 2007 12 oct07

Table A5: Creative & Technology Studies Item Statistics

Seq. P-value Pre-test

Pt-Biserial Pre-test

1 .25 .55 2 .41 .50 3 .33 .34 4 .56 .45 5 .38 .16 6 .40 .34 7 .35 .46 8 .36 .34 9 .39 .54 10 .47 .48 11 .43 .48 12 .41 .31 13 .30 .40 14 .28 .41 15 .26 .39 16 .37 .52 17 .29 .27 18 .36 .35 19 .41 .40 20 .30 .41 21 .29 .54 22 .25 .25 23 .50 .40 24 .31 .34 25 .28 .387 26 .22 .14 27 .47 .37 28 .34 .32 29 .39 .35 30 .17 .08

Seq. P-value Post-test

Pt-Biserial Post-test

1 .29 .34 2 .41 .50 3 .43 .55 4 .49 .64 5 .46 .54 6 .40 .55 7 .47 .45 8 .48 .52 9 .43 .37 10 .44 .53 11 .29 .46 12 .40 .52 13 .36 .55 14 .39 .56 15 .32 .46 16 .28 .37 17 .36 .37 18 .40 .52 19 .33 .51 20 .22 .25 21 .36 .35 22 .36 .28 23 .29 .25 24 .30 .36 25 .27 .42 26 .28 .44 27 .27 .32 28 .33 .24 29 .23 .52 30 .32 .44

Page 37: Ca Baseline and Post test assessment report 2007 12 oct07

Table A6: Community Studies Item Statistics

Seq. P-value Pre-test

Pt-Biserial Pre-test

1 .62 .41 2 .52 .35 3 .46 .42 4 .43 .48 5 .41 .33 6 .36 .32 7 .31 .21 8 .36 .33 9 .27 .20 10 .37 .21 11 .30 .35 12 .40 .38 13 .30 .19 14 .30 .45 15 .20 .18 16 .30 .36 17 .30 .25 18 .28 .38 19 .26 .21 20 .25 .19 21 .31 .34 22 .26 .21 23 .25 .26 24 .25 .24 25 .30 .31 26 .22 .28 27 .26 .28 28 .23 .21 29 .19 .16 30 .21 .16

Seq. P-value Post-test

Pt-Biserial Post-test

1 .53 .52 2 .44 .60 3 .53 .61 4 .52 .57 5 .44 .49 6 .44 .40 7 .47 .51 8 .42 .57 9 .38 .56 10 .44 .50 11 .30 .41 12 .42 .52 13 .39 .51 14 .36 .43 15 .44 .41 16 .33 .49 17 .43 .50 18 .36 .42 19 .37 .29 20 .32 .31 21 .34 .44 22 .32 .39 23 .32 .29 24 .26 .31 25 .29 .37 26 .30 .28 27 .28 .41 28 .27 .24 29 .24 .21 30 .24 .23

Page 38: Ca Baseline and Post test assessment report 2007 12 oct07

Appendix 2: Scores and Frequencies – Grade 5 Pre-Tests

Page 39: Ca Baseline and Post test assessment report 2007 12 oct07

Table A7: English Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

1 -3.59 100 24 1.3 1.3 30 1.5 1.5 2 -2.84 100 28 1.6 2.9 31 1.5 3.0 3 -2.38 102 43 2.4 5.3 61 3.0 6.1 4 -2.04 126 54 3.0 8.3 45 2.2 8.3 5 -1.76 146 66 3.7 12.0 76 3.8 12.1 6 -1.52 163 112 6.3 18.3 112 5.6 17.6 7 -1.31 178 138 7.7 26.1 152 7.6 25.2 8 -1.11 192 145 8.1 34.2 137 6.8 32.0 9 -0.93 205 151 8.5 42.6 146 7.3 39.2

10 -0.76 217 140 7.8 50.5 142 7.1 46.3 11 -0.60 228 118 6.6 57.1 158 7.8 54.1 12 -0.44 239 105 5.9 63.0 111 5.5 59.7 13 -0.29 250 68 3.8 66.8 109 5.4 65.1 14 -0.14 261 83 4.6 71.4 85 4.2 69.3 15 0.01 271 67 3.8 75.2 68 3.4 72.7 16 0.16 282 55 3.1 78.3 68 3.4 76.1 17 0.30 292 50 2.8 81.1 41 2.0 78.1 18 0.46 303 41 2.3 83.4 45 2.2 80.3 19 0.61 314 43 2.4 85.8 52 2.6 82.9 20 0.77 325 44 2.5 88.2 50 2.5 85.4 21 0.94 337 35 2.0 90.2 50 2.5 87.9 22 1.12 350 24 1.3 91.5 27 1.3 89.2 23 1.31 363 25 1.4 92.9 36 1.8 91.0 24 1.52 378 19 1.1 94.0 37 1.8 92.8 25 1.75 395 19 1.1 95.1 46 2.3 95.1 26 2.03 415 26 1.5 96.5 28 1.4 96.5 27 2.37 439 14 .8 97.3 18 .9 97.4 28 2.82 471 19 1.1 98.4 28 1.4 98.8 29 3.56 500 23 1.3 99.7 20 1.0 99.8 30 4.80 500 6 .3 100.0 4 .2 100.0

Total 1785 100.0 2013 100.0

Page 40: Ca Baseline and Post test assessment report 2007 12 oct07

Table A8: Social and Developmental Studies Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

1 -3.42 100 28 1.5 1.5 28 1.4 1.4 2 -2.69 100 30 1.6 3.0 35 1.7 3.1 3 -2.24 100 49 2.6 5.6 46 2.2 5.3 4 -1.91 112 78 4.1 9.7 66 3.2 8.5 5 -1.65 139 129 6.8 16.5 138 6.7 15.2 6 -1.42 162 164 8.6 25.1 188 9.1 24.4 7 -1.22 183 179 9.4 34.5 209 10.2 34.5 8 -1.04 201 210 11.0 45.5 253 12.3 46.9 9 -0.87 218 175 9.2 54.6 191 9.3 56.2

10 -0.71 235 155 8.1 62.8 169 8.2 64.4 11 -0.56 250 143 7.5 70.3 118 5.7 70.1 12 -0.42 264 111 5.8 76.1 97 4.7 74.8 13 -0.27 280 79 4.1 80.2 78 3.8 78.6 14 -0.14 293 60 3.1 83.4 65 3.2 81.8 15 0.00 307 39 2.0 85.4 46 2.2 84.0 16 0.14 321 36 1.9 87.3 50 2.4 86.5 17 0.28 336 45 2.4 89.7 39 1.9 88.4 18 0.42 350 32 1.7 91.3 36 1.8 90.1 19 0.56 364 28 1.5 92.8 30 1.5 91.6 20 0.71 380 29 1.5 94.3 32 1.6 93.1 21 0.87 396 27 1.4 95.8 24 1.2 94.3 22 1.04 413 14 .7 96.5 28 1.4 95.7 23 1.22 432 22 1.2 97.6 17 .8 96.5 24 1.42 452 16 .8 98.5 19 .9 97.4 25 1.65 476 6 .3 98.8 17 .8 98.2 26 1.91 500 12 .6 99.4 14 .7 98.9 27 2.24 500 7 .4 99.8 13 .6 99.6 28 2.69 500 3 .2 99.9 7 .3 99.9 29 3.42 500 1 .1 100.0 1 .0 100.0 30 4.65 500 0 .0 100.0 1 .0 100.0

Total 1907 100.0 2055 100.0

Page 41: Ca Baseline and Post test assessment report 2007 12 oct07

Table A9: Mathematics Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

1 -3.62 100 23 1.2 1.2 20 1.0 1.0 2 -2.86 100 21 1.1 2.4 20 1.0 2.0 3 -2.39 100 40 2.1 4.5 31 1.5 3.5 4 -2.04 100 39 2.1 6.6 40 2.0 5.5 5 -1.76 100 64 3.4 10.0 41 2.0 7.5 6 -1.51 100 81 4.4 14.4 75 3.7 11.2 7 -1.30 120 100 5.4 19.8 111 5.5 16.7 8 -1.10 142 125 6.7 26.5 119 5.9 22.6 9 -0.92 162 139 7.5 34.0 132 6.5 29.1

10 -0.75 181 138 7.4 41.4 158 7.8 36.9 11 -0.59 199 150 8.1 49.4 154 7.6 44.6 12 -0.43 217 147 7.9 57.3 149 7.4 51.9 13 -0.28 233 129 6.9 64.3 165 8.2 60.1 14 -0.13 250 103 5.5 69.8 133 6.6 66.7 15 0.01 266 106 5.7 75.5 114 5.6 72.3 16 0.16 282 91 4.9 80.4 110 5.4 77.7 17 0.31 299 73 3.9 84.3 83 4.1 81.8 18 0.46 316 59 3.2 87.5 76 3.8 85.6 19 0.61 332 57 3.1 90.5 78 3.9 89.5 20 0.77 350 39 2.1 92.6 59 2.9 92.4 21 0.94 369 27 1.5 94.1 33 1.6 94.0 22 1.12 389 34 1.8 95.9 39 1.9 95.9 23 1.31 410 23 1.2 97.2 29 1.4 97.4 24 1.52 433 24 1.3 98.4 15 .7 98.1 25 1.75 459 15 .8 99.2 13 .6 98.8 26 2.03 490 6 .3 99.6 12 .6 99.4 27 2.37 500 5 .3 99.8 7 .3 99.7 28 2.82 500 2 .1 99.9 4 .2 99.9 29 3.56 500 1 .1 100.0 1 .0 100.0 30 4.80 500 0 .0 100.0 1 .0 100.0

Total 1861 100.0 2022 100.0

Page 42: Ca Baseline and Post test assessment report 2007 12 oct07

Table A10: Integrated Science Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

1 -3.44 100 16 .8 .8 18 .9 .9 2 -2.71 100 21 1.1 1.9 24 1.2 2.0 3 -2.26 100 53 2.7 4.6 44 2.1 4.1 4 -1.93 110 83 4.2 8.8 72 3.5 7.6 5 -1.66 138 113 5.8 14.6 115 5.5 13.1 6 -1.43 161 183 9.3 23.9 176 8.5 21.6 7 -1.23 182 195 9.9 33.9 239 11.5 33.1 8 -1.05 200 230 11.7 45.6 268 12.9 46.0 9 -0.88 217 225 11.5 57.1 236 11.4 57.4

10 -0.72 234 173 8.8 65.9 206 9.9 67.3 11 -0.56 250 135 6.9 72.8 141 6.8 74.1 12 -0.42 264 107 5.5 78.2 103 5.0 79.0 13 -0.28 279 96 4.9 83.1 84 4.0 83.1 14 -0.14 293 56 2.9 86.0 51 2.5 85.5 15 0.00 307 48 2.4 88.4 36 1.7 87.2 16 0.14 321 35 1.8 90.2 36 1.7 89.0 17 0.28 336 26 1.3 91.5 25 1.2 90.2 18 0.42 350 22 1.1 92.7 24 1.2 91.3 19 0.57 365 23 1.2 93.8 22 1.1 92.4 20 0.72 381 22 1.1 95.0 38 1.8 94.2 21 0.88 397 18 .9 95.9 24 1.2 95.4 22 1.05 414 20 1.0 96.9 30 1.4 96.8 23 1.23 433 17 .9 97.8 22 1.1 97.9 24 1.43 453 15 .8 98.5 15 .7 98.6 25 1.66 477 9 .5 99.0 11 .5 99.1 26 1.93 500 12 .6 99.6 14 .7 99.8 27 2.26 500 6 .3 99.9 3 .1 100.0 28 2.7 500 2 .1 100.0 1 .0 100.0 29 3.43 500 0 .0 100.0 0 .0 100.0 30 4.67 500 0 .0 100.0 0 .0 100.0

Total 1961 100.0 2078 100.0

Page 43: Ca Baseline and Post test assessment report 2007 12 oct07

Table A11: Creative and Technology Studies Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

1 -3.46 100 15 .8 .8 17 .8 .8 2 -2.73 100 21 1.1 1.8 28 1.4 2.2 3 -2.28 100 42 2.1 4.0 38 1.8 4.0 4 -1.95 100 66 3.4 7.3 59 2.9 6.9 5 -1.68 125 104 5.3 12.6 119 5.8 12.6 6 -1.45 148 162 8.2 20.8 172 8.3 21.0 7 -1.24 169 198 10.1 30.9 193 9.3 30.3 8 -1.06 187 206 10.5 41.4 234 11.3 41.6 9 -0.89 204 211 10.7 52.1 218 10.6 52.2

10 -0.73 220 186 9.5 61.6 167 8.1 60.3 11 -0.57 236 116 5.9 67.5 168 8.1 68.4 12 -0.43 250 123 6.3 73.7 126 6.1 74.5 13 -0.28 265 89 4.5 78.2 77 3.7 78.3 14 -0.14 279 64 3.3 81.5 47 2.3 80.5 15 0.00 293 59 3.0 84.5 41 2.0 82.5 16 0.14 307 55 2.8 87.3 55 2.7 85.2 17 0.28 321 33 1.7 89.0 38 1.8 87.0 18 0.43 336 28 1.4 90.4 31 1.5 88.5 19 0.57 350 29 1.5 91.9 37 1.8 90.3 20 0.73 366 25 1.3 93.1 28 1.4 91.7 21 0.89 382 27 1.4 94.5 31 1.5 93.2 22 1.06 399 19 1.0 95.5 32 1.5 94.7 23 1.24 417 23 1.2 96.6 33 1.6 96.3 24 1.45 438 20 1.0 97.7 30 1.5 97.8 25 1.68 461 23 1.2 98.8 28 1.4 99.1 26 1.95 488 12 .6 99.4 8 .4 99.5 27 2.28 500 8 .4 99.8 8 .4 99.9 28 2.73 500 2 .1 99.9 1 .0 100.0 29 3.46 500 0 0 99.9 1 .0 100.0 30 4.70 500 1 .1 100.0 0 .0 100.0

Total 1967 100.0 2065 100.0

Page 44: Ca Baseline and Post test assessment report 2007 12 oct07

Table A12: Community Studies Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

1 -3.48 100 15 .8 .8 14 .7 .7 2 -2.74 100 31 1.6 2.3 19 .9 1.6 3 -2.29 100 46 2.3 4.6 42 2.0 3.6 4 -1.96 100 53 2.7 7.3 66 3.2 6.9 5 -1.68 100 110 5.6 12.9 120 5.8 12.7 6 -1.45 128 184 9.3 22.2 166 8.1 20.7 7 -1.24 157 211 10.7 32.8 237 11.5 32.3 8 -1.06 182 239 12.1 44.9 266 12.9 45.2 9 -0.88 207 216 10.9 55.8 240 11.7 56.9

10 -0.72 229 216 10.9 66.8 205 10.0 66.8 11 -0.57 250 166 8.4 75.1 156 7.6 74.4 12 -0.42 271 114 5.8 80.9 125 6.1 80.5 13 -0.27 292 98 5.0 85.9 98 4.8 85.2 14 -0.13 311 75 3.8 89.6 78 3.8 89.0 15 0.01 331 51 2.6 92.2 53 2.6 91.6 16 0.15 350 39 2.0 94.2 45 2.2 93.8 17 0.29 369 29 1.5 95.7 33 1.6 95.4 18 0.43 389 22 1.1 96.8 32 1.6 96.9 19 0.58 410 17 .9 97.6 28 1.4 98.3 20 0.73 431 13 .7 98.3 14 .7 99.0 21 0.89 453 6 .3 98.6 7 .3 99.3 22 1.06 476 14 .7 99.3 5 .2 99.6 23 1.25 500 5 .3 99.5 7 .3 99.9 24 1.45 500 1 .1 99.6 1 .0 100.0 25 1.68 500 4 .2 99.8 1 .0 100.0 26 1.95 500 4 .2 100.0 0 .0 100.0 27 2.28 500 0 .0 100.0 0 .0 100.0 28 2.72 500 0 .0 100.0 0 .0 100.0 29 3.46 500 0 .0 100.0 0 .0 100.0 30 4.69 500 0 .0 100.0 0 .0 100.0

Total 1979 100.0 2058 100.0

Page 45: Ca Baseline and Post test assessment report 2007 12 oct07

Appendix 3: Scores and Frequencies – Grade 5 Post-Tests

Page 46: Ca Baseline and Post test assessment report 2007 12 oct07

Table A13: English Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

0 -4.94 100 161 8.47 8.47 109 5.13 5.13 1 -3.52 100 6 0.32 8.79 12 0.56 5.69 2 -2.78 100 11 0.58 9.37 20 0.94 6.64 3 -2.32 101 33 1.74 11.11 22 1.04 7.67 4 -1.99 126 49 2.58 13.68 79 3.72 11.39 5 -1.71 146 64 3.37 17.05 118 5.55 16.94 6 -1.48 163 103 5.42 22.47 142 6.68 23.62 7 -1.27 178 123 6.47 28.95 174 8.19 31.81 8 -1.08 192 127 6.68 35.63 193 9.08 40.89 9 -0.90 205 128 6.74 42.37 148 6.96 47.86

10 -0.74 217 124 6.53 48.89 132 6.21 54.07 11 -0.58 228 111 5.84 54.74 116 5.46 59.53 12 -0.43 239 100 5.26 60.00 96 4.52 64.05 13 -0.28 250 81 4.26 64.26 78 3.67 67.72 14 -0.14 261 68 3.58 67.84 87 4.09 71.81 15 0.00 271 68 3.58 71.42 62 2.92 74.73 16 0.15 282 65 3.42 74.84 52 2.45 77.18 17 0.29 292 55 2.89 77.74 41 1.93 79.11 18 0.44 303 49 2.58 80.32 46 2.16 81.27 19 0.59 314 39 2.05 82.37 52 2.45 83.72 20 0.74 325 35 1.84 84.21 50 2.35 86.07 21 0.91 337 44 2.32 86.53 43 2.02 88.09 22 1.08 350 44 2.32 88.84 53 2.49 90.59 23 1.27 364 29 1.53 90.37 34 1.60 92.19 24 1.48 379 44 2.32 92.68 42 1.98 94.16 25 1.71 396 37 1.95 94.63 32 1.51 95.67 26 1.98 416 37 1.95 96.58 33 1.55 97.22 27 2.32 440 28 1.47 98.05 30 1.41 98.64 28 2.77 473 16 0.84 98.89 16 0.75 99.39 29 3.51 500 16 0.84 99.74 10 0.47 99.86 30 4.93 500 5 0.26 100.00 3 0.14 100.00

Total 1900 100 2125 100

Page 47: Ca Baseline and Post test assessment report 2007 12 oct07

Table A14: Social and Developmental Studies Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

0 -4.94 100 167 8.80 8.80 77 3.49 3.49 1 -3.51 100 1 0.05 8.85 6 0.27 3.76 2 -2.77 100 4 0.21 9.06 9 0.41 4.17 3 -2.32 100 8 0.42 9.48 13 0.59 4.76 4 -1.98 100 17 0.90 10.38 25 1.13 5.89 5 -1.70 120 25 1.32 11.70 43 1.95 7.84 6 -1.47 140 53 2.79 14.49 84 3.81 11.65 7 -1.26 157 74 3.90 18.39 113 5.12 16.77 8 -1.07 173 95 5.01 23.39 120 5.44 22.21 9 -0.90 187 109 5.74 29.14 163 7.39 29.60

10 -0.73 201 108 5.69 34.83 193 8.75 38.35 11 -0.57 214 118 6.22 41.04 164 7.43 45.78 12 -0.42 226 95 5.01 46.05 162 7.34 53.13 13 -0.28 238 101 5.32 51.37 136 6.17 59.29 14 -0.13 250 98 5.16 56.53 119 5.39 64.69 15 0.01 262 77 4.06 60.59 100 4.53 69.22 16 0.15 274 73 3.85 64.44 97 4.40 73.62 17 0.30 285 78 4.11 68.55 83 3.76 77.38 18 0.44 297 86 4.53 73.08 76 3.45 80.83 19 0.59 310 70 3.69 76.77 75 3.40 84.22 20 0.74 322 77 4.06 80.82 69 3.13 87.35 21 0.91 336 74 3.90 84.72 55 2.49 89.85 22 1.08 350 65 3.42 88.15 51 2.31 92.16 23 1.26 365 57 3.00 91.15 47 2.13 94.29 24 1.47 382 55 2.90 94.05 40 1.81 96.10 25 1.70 401 46 2.42 96.47 30 1.36 97.46 26 1.97 423 26 1.37 97.84 17 0.77 98.23 27 2.30 451 29 1.53 99.37 23 1.04 99.27 28 2.75 488 11 0.58 99.95 14 0.63 99.91 29 3.48 500 1 0.05 100.00 2 0.09 100.00 30 4.75 500

Total 1898 100 2206 100

Page 48: Ca Baseline and Post test assessment report 2007 12 oct07

Table A15: Mathematics Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

0 -5.14 100 192 9.88 9.88 91 4.17 4.17 1 -3.70 100 5 0.26 10.14 8 0.37 4.53 2 -2.95 100 6 0.31 10.45 9 0.41 4.95 3 -2.48 120 15 0.77 11.22 14 0.64 5.59 4 -2.12 145 13 0.67 11.89 22 1.01 6.59 5 -1.83 165 26 1.34 13.23 33 1.51 8.10 6 -1.58 183 34 1.75 14.98 55 2.52 10.62 7 -1.36 198 69 3.55 18.53 86 3.94 14.56 8 -1.16 212 83 4.27 22.80 132 6.04 20.60 9 -0.97 226 103 5.30 28.10 134 6.14 26.74

10 -0.79 238 138 7.10 35.20 175 8.01 34.75 11 -0.62 250 127 6.54 41.74 186 8.52 43.27 12 -0.45 261 141 7.26 49.00 175 8.01 51.28 13 -0.29 273 168 8.65 57.64 164 7.51 58.79 14 -0.14 284 125 6.43 64.08 167 7.65 66.44 15 0.02 294 131 6.74 70.82 152 6.96 73.40 16 0.17 305 111 5.71 76.53 117 5.36 78.75 17 0.33 316 101 5.20 81.73 114 5.22 83.97 18 0.48 327 87 4.48 86.21 85 3.89 87.87 19 0.65 338 57 2.93 89.14 70 3.21 91.07 20 0.81 350 39 2.01 91.15 54 2.47 93.54 21 0.98 362 44 2.26 93.41 32 1.47 95.01 22 1.17 375 32 1.65 95.06 25 1.14 96.15 23 1.36 389 37 1.90 96.96 23 1.05 97.21 24 1.58 404 23 1.18 98.15 13 0.60 97.80 25 1.82 421 10 0.51 98.66 21 0.96 98.76 26 2.10 440 14 0.72 99.38 16 0.73 99.50 27 2.44 464 9 0.46 99.85 3 0.14 99.63 28 2.90 496 2 0.10 99.95 4 0.18 99.82 29 3.65 500 3 0.14 99.95 30 5.07 500 1 0.05 100.00 1 0.05 100.00

Total 1943 100 2184 100

Page 49: Ca Baseline and Post test assessment report 2007 12 oct07

Table A16: Integrated Science Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

0 -4.93 100 203 10.39 10.39 70 3.21 3.21 1 -3.51 100 2 0.10 10.49 9 0.41 3.62 2 -2.77 104 7 0.36 10.85 15 0.69 4.31 3 -2.32 134 18 0.92 11.77 35 1.60 5.91 4 -1.98 156 39 2.00 13.77 65 2.98 8.90 5 -1.71 175 64 3.28 17.04 127 5.82 14.72 6 -1.48 190 91 4.66 21.70 168 7.70 22.42 7 -1.27 204 133 6.81 28.51 208 9.54 31.96 8 -1.08 217 123 6.29 34.80 215 9.86 41.82 9 -0.91 228 113 5.78 40.58 197 9.03 50.85

10 -0.74 239 120 6.14 46.72 140 6.42 57.27 11 -0.59 250 104 5.32 52.05 157 7.20 64.47 12 -0.43 260 90 4.61 56.65 126 5.78 70.24 13 -0.29 270 102 5.22 61.87 93 4.26 74.51 14 -0.14 280 73 3.74 65.61 88 4.03 78.54 15 0.00 289 67 3.43 69.04 64 2.93 81.48 16 0.15 299 78 3.99 73.03 70 3.21 84.69 17 0.29 309 79 4.04 77.07 53 2.43 87.12 18 0.44 318 65 3.33 80.40 43 1.97 89.09 19 0.59 329 62 3.17 83.57 43 1.97 91.06 20 0.75 339 65 3.33 86.90 48 2.20 93.26 21 0.91 350 41 2.10 89.00 38 1.74 95.00 22 1.08 362 49 2.51 91.50 27 1.24 96.24 23 1.27 374 46 2.35 93.86 32 1.47 97.71 24 1.48 388 38 1.94 95.80 18 0.83 98.53 25 1.71 404 35 1.79 97.59 21 0.96 99.50 26 1.98 422 26 1.33 98.93 5 0.23 99.72 27 2.32 444 11 0.56 99.49 4 0.18 99.91 28 2.77 474 5 0.26 99.74 2 0.09 100.00 29 3.51 500 4 0.20 99.95 30 4.93 500 1 0.05 100.00

Total 1954 100 2181 100

Page 50: Ca Baseline and Post test assessment report 2007 12 oct07

Table A17: Creative and Technology Studies Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

0 -4.86 100 225 11.49 11.49 86 3.94 3.94 1 -3.45 100 5 0.26 11.74 7 0.32 4.26 2 -2.71 100 8 0.41 12.15 14 0.64 4.90 3 -2.27 125 17 0.87 13.02 22 1.01 5.91 4 -1.93 147 25 1.28 14.29 66 3.02 8.94 5 -1.67 166 54 2.76 17.05 104 4.77 13.70 6 -1.44 181 95 4.85 21.90 159 7.29 20.99 7 -1.24 195 127 6.48 28.38 178 8.16 29.15 8 -1.05 207 118 6.02 34.41 190 8.71 37.86 9 -0.88 219 144 7.35 41.76 196 8.98 46.84

10 -0.72 230 122 6.23 47.98 180 8.25 55.09 11 -0.57 240 128 6.53 54.52 157 7.20 62.28 12 -0.42 250 98 5.00 59.52 122 5.59 67.87 13 -0.28 260 88 4.49 64.01 106 4.86 72.73 14 -0.14 269 87 4.44 68.45 93 4.26 76.99 15 0.00 279 76 3.88 72.33 67 3.07 80.06 16 0.14 288 64 3.27 75.60 62 2.84 82.91 17 0.28 298 52 2.65 78.25 66 3.02 85.93 18 0.42 307 76 3.88 82.13 57 2.61 88.54 19 0.57 317 48 2.45 84.58 34 1.56 90.10 20 0.72 328 49 2.50 87.09 41 1.88 91.98 21 0.88 338 50 2.55 89.64 28 1.28 93.26 22 1.05 350 49 2.50 92.14 30 1.37 94.64 23 1.23 362 51 2.60 94.74 35 1.60 96.24 24 1.44 376 31 1.58 96.32 32 1.47 97.71 25 1.67 392 31 1.58 97.91 20 0.92 98.63 26 1.93 410 23 1.17 99.08 20 0.92 99.54 27 2.27 432 13 0.66 99.74 6 0.27 99.82 28 2.71 463 3 0.15 99.90 2 0.09 99.91 29 3.45 500 1 0.05 99.95 2 0.09 100.00 30 4.86 500 1 0.05 100.00 1959 100 2182 100

Page 51: Ca Baseline and Post test assessment report 2007 12 oct07

Table A18: Community Studies Scores and Frequencies

Pilot Group Comparison Group Raw Score

Theta Scale Score Freq. % Cum. % Freq. % Cum. %

0 -4.87 100 219 11.24 11.24 104 4.84 4.84 1 -3.46 100 4 0.21 11.44 7 0.33 5.17 2 -2.72 100 8 0.41 11.85 9 0.42 5.59 3 -2.27 100 20 1.03 12.88 17 0.79 6.38 4 -1.94 118 38 1.95 14.83 45 2.09 8.47 5 -1.67 141 67 3.44 18.27 65 3.03 11.50 6 -1.44 161 109 5.59 23.86 93 4.33 15.83 7 -1.24 179 128 6.57 30.43 129 6.01 21.83 8 -1.06 195 137 7.03 37.46 183 8.52 30.35 9 -0.88 210 110 5.64 43.10 176 8.19 38.55

10 -0.72 224 143 7.34 50.44 174 8.10 46.65 11 -0.57 237 118 6.05 56.49 166 7.73 54.38 12 -0.42 250 117 6.00 62.49 126 5.87 60.24 13 -0.28 262 82 4.21 66.70 132 6.15 66.39 14 -0.14 275 79 4.05 70.75 107 4.98 71.37 15 0.00 287 66 3.39 74.14 97 4.52 75.88 16 0.14 299 56 2.87 77.01 85 3.96 79.84 17 0.28 311 78 4.00 81.02 65 3.03 82.87 18 0.43 324 65 3.34 84.35 89 4.14 87.01 19 0.57 337 60 3.08 87.43 76 3.54 90.55 20 0.73 350 47 2.41 89.84 52 2.42 92.97 21 0.89 364 58 2.98 92.82 45 2.09 95.07 22 1.06 379 41 2.10 94.92 30 1.40 96.46 23 1.24 395 47 2.41 97.33 25 1.16 97.63 24 1.44 412 19 0.97 98.31 25 1.16 98.79 25 1.67 432 17 0.87 99.18 11 0.51 99.30 26 1.94 456 10 0.51 99.69 11 0.51 99.81 27 2.27 485 5 0.26 99.95 4 0.19 100.00 28 2.72 500 1 0.05 100.00 29 3.45 500 30 4.69 500

Total 1949 100 2148 100

Page 52: Ca Baseline and Post test assessment report 2007 12 oct07

Appendix 4: Histograms by Subject and Group

Page 53: Ca Baseline and Post test assessment report 2007 12 oct07

English Pilot Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

150

100

50

0

Pilot Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

150

100

50

0

Comparison Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

200

150

100

50

0

Comparison Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

200

150

100

50

0

Page 54: Ca Baseline and Post test assessment report 2007 12 oct07

Social and Developmental Studies

Pilot Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Pilot Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Comparison Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

300

250

200

150

100

50

0

Comparison Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

300

200

100

0

Page 55: Ca Baseline and Post test assessment report 2007 12 oct07

Mathematics Pilot Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

150

100

50

0

Pilot Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

150

100

50

0

Comparison Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

200

150

100

50

0

Comparison Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

200

150

100

50

0

Page 56: Ca Baseline and Post test assessment report 2007 12 oct07

Integrated Science Pilot Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Pilot Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Comparison Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

300

200

100

0

Comparison Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

300

200

100

0

Page 57: Ca Baseline and Post test assessment report 2007 12 oct07

Creative and Technology Studies Pilot Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Pilot Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Comparison Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Comparison Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Page 58: Ca Baseline and Post test assessment report 2007 12 oct07

Community Studies Pilot Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Pilot Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

250

200

150

100

50

0

Comparison Group: Grade 5 Pre-test

Raw Score302520151050

Freq

uenc

y

300

250

200

150

100

50

0

Comparison Group: Grade 5 Post-test

Raw Score302520151050

Freq

uenc

y

300

250

200

150

100

50

0