Upload
trinhtruc
View
214
Download
1
Embed Size (px)
Citation preview
In this module, we will discuss how tests are only tools and why test scores are fallible. We will also dis2nguish between tes2ng and assessment, and further explain why tes2ng and assessment skills are vital to today’s classroom teacher. In exploring the recent history of educa2onal measurement, we will be able to beAer iden2fy the implica2ons of current trends in educa2onal measurement for today’s classroom teacher.
2
There are three schools of thought in terms of the usage of tes2ng in educa2on.
The first one is that tes2ng provides liAle benefit to measuring the learning and may have a detrimental effect on how students feel about educa2on.
The second is that tests are impera2ve in the educa2on system.
Lastly there are people who believe that tests are important tools used to evaluate students, curricula, and instruc2on but ques2on how much educa2onal power is placed on tests and test scores.
The authors of your textbook belong to the third group.
3
Tests are only tools and their usefulness can vary. The usefulness of the test depends on five factors: 1) the use of the test, 2) the test design, 3) the users of the test, 4) the purpose or the popula2on that the test is designed for, and 5) limited informa2on provided for the decision.
As a tool, the test can be appropriately used, uninten2onally misused, or inten2onally abused.
Like other tools, the test can be well-‐designed or poorly designed. Although the test is well-‐designed, it can be dangerous in the hands of ill-‐trained or inexperienced users. The test can be limited if the test is used for the purpose or the popula2on that the test is not designed for. The test may meet these four criteria but the test results can only provide some of the informa2on for the best educa2onal decision about a student.
The five concerns men2oned above help us recognize that the usefulness of tests depends on a variety of factors. Let’s explore these factors further.
4
A crucial factor that affects a test’s usefulness is its technical adequacy. The technical adequacy of a test includes evidence of test validity and test score reliability.
Validity evidence helps us to determine whether the test is measuring what it is purported to measure.
Score reliability indicates the extent to which test scores are consistent and stable.
Validity and reliability are not fixed characteris2cs of a test because they can be changed by many factors, such as the competency of test user, whether the test is being used as intended, and whether the test takers matches the popula2on for which the test was wriAen.
5
The evidence of a test’s usefulness can vary depending on the competency of the people administering, scoring, and interpre2ng the test.
Competent test users can make beAer use of a test.
6
The usefulness of a test depends on whether or not it is being used for its intended purpose.
Tests have been designed for many specific educa2onal purposes, such as intellectual func2oning, personality func2oning, voca2onal ap2tudes, and so on.
To use the test for a different purpose may limit the usefulness of the tests for that unintended purpose. For instance, a test designed to iden2fy the ability of recognizing typos in the manuscript is used to predict the ability of wri2ng a book. In this case, the test is more limited.
7
In addi2on to specific purposes, educa2onal tests can also be designed for the more general purposes including summa2ve and forma2ve assessment.
The tests for summa2ve assessment are intended to measure students’ learning aUer the comple2on of a unit of instruc2on. Summa2ve tests can be used to assign grades, evaluate curriculum effec2veness, and annual gains at student, school, and district levels. Summa2ve tests are designed to measure larger and broader changes in achievement rather than small, daily gains.
In contrast, the tests for forma2ve assessment will be more useful to inform daily gains and effec2veness of instruc2on. Forma2ve tests tend to be short to minimize interference with instruc2onal 2me and to facilitate repeated administra2on in the classroom. Curriculum-‐based measurement that is one of forma2ve assessment can be used as part of the instruc2on process to monitor students progress.
8
To enhance test usefulness, we have to consider if the test matches diverse test takers.
Since our school systems contain students represen2ng a large array of cultures, language, and educa2onal backgrounds, we cannot expect the technical adequacy of these tests to be the same when used with popula2ons from diverse backgrounds, such as middle Eastern learners, limited English learners, and lower socioeconomic learners.
As educators, we need to strive to choose tests that are the most useful assessments for our student popula2on. However, educa2onal tests may not be as useful for students with differing backgrounds as those of the intended student popula2ons. If we cannot find a good match between the test and the students taking the test, we need to be especially aware and careful when interpre2ng the results of the test.
9
Test usefulness is also related to how test results are used and considered.
What should we do about test results?
The ideal situa2on is that important decisions should never be made as the results of a single test administra2on. This sugges2on of using test results can be applied even when technical adequacy, competency of test user, and intended purpose have all been met for the assessment.
In reality, however, single test administra2ons are oUen used to make very important educa2onal decisions, such as promo2on and grada2on. To bridge the gap between this theory and the reality, efforts (such as transla2on) have been made to adapt the assessments to align more closely with the needs of diverse popula2ons. However, the technical adequacy and fairness of these adapta2ons has been hard to determine and need to be studied more in depth.
10
Instead of using a single test, it is best to collect more assessment informa2on about student achievement to make important educa2onal decisions.
A single test is just like a limited snapshot or photograph of student performance. The test results can be considered to be part of the whole assessment process. The whole assessment process is like a video. To make appropriate educa2onal decisions, watching the whole video is always recommended.
11
In the beginning of the measurement course, it is important to clarify some technical test-‐related terminology.
The terms “tests” and “assessments” can be regarded as synonyms.
The public likes the term “ assessment” rather than “tes2ng” because the use of the term “assessment” is less evalua2ve, threatening, or nega2ve. Furthermore, a clear dis2nc2on can be made between tests (or assessments) and the assessment process.
A test (or assessment) can be thought of as a single measure at a single point in 2me while the assessment process spans a period of 2me and uses mul2ple measures to gain a broader view of student achievement or characteris2cs.
A test is either forma2ve or summa2ve while the assessment process can contain both forma2ve and summa2ve measures administered at different points in 2me.
12
Let’s look at some of the different types of assessments.
Understanding the differences between these types of assessments will help as you progress through the remainder of the textbook.
The differences will be related to the types of answers/responses the test-‐takers will produce, type of informa2on assessed, and also how the responses are scored. Here, the inten2on is only to highlight the major differences among four types of assessments: 1) objec2ve tests vs. essay or performance and porbolio assessment; 2) teacher-‐made tests vs. standardized tests; 3) norm-‐referenced tests vs. criterion-‐referenced tests; and 4) curriculum-‐based measurements.
13
Tests with the most consistent and objec2ve scoring are “objec2ve items.”
Next in terms of scoring consistency and objec2vity would be “comple2on items.” Essays, performances, and porbolios may be difficult to score consistently and objec2vely. However, they are more commonly used than objec2ve items when assessing higher-‐order skills. All four types of these items can prove useful at different 2mes in the assessment process.
14
Teacher-‐made tests have a lot of flexibility or variability in terms of construc2on, administra2on, and scoring while standardized tests are designed to eliminate that flexibility. Standardized tests are strictly regulated in terms of administra2on and scoring and they are wriAen by tes2ng professionals.
While both types of assessments commonly contain objec2ve items, teacher-‐made tests are much more likely to contain essay items than standardized tests.
15
Norm-‐referenced tests compare individual student performance to a norm group which can be world-‐wide, na2onal, state-‐wide, or district-‐wide.
Criterion-‐referenced tests compare individual student performance to an absolute standard or criterions.
Norm-‐referenced tests have a broad focus and can be rather lengthy while criterion-‐referenced tests typically have a narrower focus and are shorter in length.
16
Curriculum-‐based measurements are rela2vely new in regular educa2on classrooms but they have been used in special educa2on classrooms for quite some 2me.
They are commonly used to assess daily gains in math, reading, wri2ng, and spelling.
Curriculum-‐based measurements can be teacher-‐wriAen or wriAen by a commercial tes2ng company. If curriculum-‐based measurements are used as a norm-‐referenced assessment it is important to note that the ‘norm’ group used for comparison is not as carefully selected as that of typical norm-‐referenced tests and is more likely a convenience sample. They are rela2vely short but have evidence of validity and reliability.
17
Various regular and special educa2on reform efforts have had a major impact on classroom tes2ng and assessment.
18
No Child LeU Behind and the Elementary and Secondary Educa2on Act have raised expecta2ons of the regular educa2on system in the United States. The raised expecta2ons have included the crea2on of expecta2ons and performance standards, local decision making, new teacher trainings, performance pay, higher teacher salaries, increased accountability, and high-‐stakes tes2ng.
The passage of the Educa2on of All Handicapped Children Act in 1975 served as a way to ensure all students, including handicapped students, received a free and appropriate public educa2on. The passage of this act served to have special educa2on be a separate en2ty rather than regular educa2on with separate rules, staff, procedures, and requirements.
However, this created a nega2ve impact on special educa2on as most students were educated in segregated special educa2on seings instead of mainstreamed into regular educa2on seings with the excep2on of non-‐academic ac2vi2es.
Due to these nega2ve consequences, the Individuals with Disabili2es Act and Individuals with Disabili2es with Educa2on Improvements Act were passed to ensure that special educa2on students were educated in regular educa2on seings as much as possible, by regular educa2on staff, and held to the same standards as their regular educa2on peers when possible.
The Individuals with Disabili2es with Educa2on Improvements Act and No Child LeU Behind are meant to be complementary pieces of legisla2on rather than being viewed as special educa2on and regular educa2on reform. They both emphasize the use of scien2fically based instruc2on and emphasize reading achievement for all students as well as ongoing progress monitoring and the use of forma2ve assessment.
19
The passage of Individuals with Disabili2es with Educa2on Improvements Act changed how students with learning disabili2es are iden2fied, thus increasing the impact that regular educa2on teachers have on the iden2fica2on process.
Since this act focuses on forma2ve assessments, new forma2ve assessment methods have evolved including the response-‐to-‐interven2on model, universal screening, and progress monitoring using CBM.
20
Concerns about the ability of US students to compete in the global economy has increased our reliance on test results and accountability to ensure our educa2on systems are effec2ve.
According to the 2007 Trends in Interna2onal Mathema2cs and Science Study data, performance of US fourth and eighth grade students improved in math but remained the same in science.
US students s2ll fall behind Asian countries in math and science and behind some European countries in science.
The results of the 2006 Program for Interna2onal Student Assessment were not as promising. The data suggested that 15 year-‐old students in the USA performed below the average for industrialized na2ons in math and science, scoring lower than average than 15 year-‐old students in Asian and European countries.
21
The reliance on a paper-‐and-‐pencil test to determine if a teacher possesses the complex skills required to be a good classroom teacher has raised some concerns. As with student assessment, reliance on a single test may pose a problem. With the emphasis on the student assessment process, involving porbolios may lead to the use of porbolios as a method of assessing the skills of classroom teachers.
22
Some of the areas of interest are high-‐stakes tes2ng, performance and porbolio assessments, tes2ng students with disabili2es, language, and culturally-‐appropriate assessments, and computer-‐based tes2ng.
Professional organiza2ons have, historically, influenced policy and reform and therefore, may help shape the future of educa2onal assessment.
23
Classroom teachers are becoming more and more responsible for the crea2on of valid and reliable classroom tests as well as the administra2on of standardized tests.
Along with those responsibili2es comes the need to understand how to interpret test data and report that data to administrators and parents. With the increased emphasis on accountability, NCLB, and Individuals with Disabili2es Educa2on Improvement Act, teachers should have at least a basic knowledge of various types of tests and how to interpret results.
24