Upload
tony-coloma
View
280
Download
2
Embed Size (px)
Citation preview
Test ConstructionTony ColomaFaculty MemberPAREF Northfield School for Boys
What is testing?• “A test is a sample of behavior, products, answers, or
performances from a particular domain” (Carrington, 1994)
“A test will predict performance levels, and the learner will somehow reconstruct its parts in meaningful situations when necessary” (McCann, 2000)
“ Testing is generally concerned with turning performance into numbers.” (Baxter, 1997)
Test [scores] can give parents useful information about their children. Retrieved May 26, 2016 from http://PAREonline.net/getvn.asp?v=1&n=1
“A test is an instrument or systematic procedure for measuring a sample of behavior” (Gronlund and Linn 1990)
Why Are Students Tested?• They help teachers, principals, and
administrators:– evaluate and improve the school district– evaluate and improve the individual school– identify a child's academic strengths– identify areas where a child may need to improve
• REMEMBER: Children are never measured on the basis of ONE test alone.
Retrieved May 26, 2016 from http://PAREonline.net/getvn.asp?v=1&n=1
Why pay attention to test construction?• Teachers who are trained on test construction and
analysis prepared tests that were more valid and reliable (Magno, 2003)
• Studies have suggested that faulty test items affect students’ comprehension and ability to provide accurate answers to the items (Koksal, 2004; Leighton and Gokiert, 2005)
13% of students who got low grades in exams are caused by faulty test questions (WORLDWATCH: The Philadelphia Trumpet, August 2005)
• Poorly designed test items can lead to inaccurate measurements of learning and provide false information regarding student performance as well as instructional effectiveness (Education Up Close, 2005).
• Length of the tests also affected the quality of the tests. According to Wells and Wollack (2003), longer tests produce higher reliabilities and validities.
• Any item answered correctly or incorrectly because of extraneous factors in the item results in misleading feedback to both examinee and examiner (Frey, 2007).
Characteristics of Good Tests
• Validity– refers to the accuracy of an assessment
• Reliability– the consistency with which a test measures
what it is supposed to measure• Usability
– the test can be administered with ease, clarity and uniformity
Retrieved May 26, 2016 from http://fcit.usf.edu/assessment/basic/basicc.html
Characteristics of Good Tests• Scorability
– easy to score• Interpretability
– test results can be properly interpreted and is a major basis in making sound educational decisions
• Economical– the test can be reused without compromising
the validity and reliability
Steps in Test Construction
Retrieved May 26, 2016 http://www.unesco.org/iiep/PDF/TR_Mods/Qu_Mod6.pdf
Table of Specifications (TOS)
• A two way chart that relates the learning outcomes to the course content
• It enables the teacher to prepare a test containing a representative sample of student behavior in each of the areas tested.
Sample TOS
Teachers who used the table of specification to design test items generated tests with higher validity and reliability than those who did not. (Linn and Gronlund,1995)
Commonly used Test Format• Multiple Choice• True or False• Matching Type• Fill-in the blanks (Sentence Completion)• Essay
Multiple Choice
Rules for Writing Multiple-Choice Items• When checking the stems for correctness:
– Ensure that the stem asks a clear question.– Reading level is appropriate to the students– The stem is grammatically correct.– Negatively stated stems are discouraged.
• When using incomplete statements place the blank space at the end.
• All options should be homogenous and nearly equal in length.• Stem (question) should contain only one main idea.• Keep all options either singular or plural.• Have four or five responses per stem (question).
Gronlund, N., Assessment of student achievement, 7th ed. Pearson Education, Inc., Boston, 2003)
PARTS• Stem
– is the section of a multiple-choice item that poses the problem that the students must answer.
– Stems can be in the form of a question or an incomplete sentence.
– Poorly written stems fail to state clearly the problem when they are vague, full of irrelevant data, or negatively written.
http://www.duq.edu/about/centers-and-institutes/center-for-teaching-excellence/teaching-and-learning/multiple-choice-exam-construction
PARTS• Alternatives
– consist of the answer and distractors that are inferior or incorrect.
– common mistakes in writing exam alternatives have to do with how the various alternatives relate.
– They should be:• mutually exclusive• homogenous,• plausible and consistently phrased.
http://www.duq.edu/about/centers-and-institutes/center-for-teaching-excellence/teaching-and-learning/multiple-choice-exam-construction
Poorly Written Stems Avoid vague stems by stating the problem in the stem:Poor ExampleCalifornia:a. Contains the tallest mountain in the United States.b. Has an eagle on its state flag. c. Is the second largest state in terms of area.*d. Was the location of the Gold Rush of 1849.
Good ExampleWhat is the main reason so many people moved to California in 1849?a. California land was fertile, plentiful, and inexpensive.*b. Gold was discovered in central California.c. The east was preparing for a civil war.d. They wanted to establish religious settlements.
Avoid wordy stems by removing irrelevant data:Poor ExampleSuppose you are a mathematics professor who wants to determine whether or not your teaching of a unit on probability has had a significant effect on your students. You decide to analyze their scores from a test they took before the instruction and their scores from another exam taken after the instruction. Which of the following t-tests is appropriate to use in this situation?*a. Dependent samples.b. Heterogenous samples.c. Homogenous samples.d. Independent samples.
Good ExampleWhen analyzing your students’ pretest and posttest scores to determine if your teaching has had a significant effect, an appropriate statistic to use is the t-test for:*a. Dependent samples.b. Heterogenous samples.c. Homogenous samples.d. Independent samples.
Poorly Written AlternativesAvoid Overlapping Alternatives
Poor ExampleWhat is the average effective radiation dose from chest CT?a. 1-8 mSvb. 8-16 mSvc. 16-24 mSvd. 24-32 mSv
Good ExampleWhat is the average effective radiation dose from chest CT?a. 1-7 mSvb. 8-15 mSvc. 16-24 mSvd. 24-32 mSv
Avoid Dissimilar Alternatives
Poor ExampleIdaho is widely known as:*a. The largest producer of potatoes in the United States.b. The location of the tallest mountain in the United States.c. The state with a beaver on its flag.d. The “Treasure State.”
Good ExampleIdaho is widely known for its:a. Apples.b. Corn.*c. Potatoes.d. Wheat
Avoid implausible alternativesPoor ExampleWhich of the following artists is known for painting the ceiling of the Sistine Chapel?a. Warhol.b. Flinstone.*c. Michelangelo.d. Santa Claus.
Good ExampleWhich of the following artists is known for painting the ceiling of the Sistine Chapel?a. Botticelli.b. da Vinci.*c. Michelangelo.d. Raphael.
True or False
True or False•Each statement is clearly true or clearly false.
•Trivial details should not make a statement false.
•Statements are written concisely without more elaboration than necessary.
•Statements are NOT quoted exactly from text.
•Give emphasis on the use of quantitative terms than qualitative terms.
•Avoid using of specific determiners which usually gives a clue to the answer.
•False = all, always, never, every, none, only•True = generally, sometimes, usually, maybe, often
•Discourage the use of negative statements.
•Whenever a controversial statement is used, the authority should be quoted.
•Discourage the use of pattern for answers.
Express the item statement as simply and as clearly as possible.
Undesirable: • When you see a highway
with a marker that reads, “Interstate 80” you know that the construction and upkeep of that road is maintained by the state and federal government
Desirable:• The construction and
maintenance of interstate highways are provided by both state and federal governments.
Express a single idea in each test item
Undesirable: • Water will boil at a higher
temperature if the atmospheric pressure on its surface is increased and more heat is applied to the container.
Desirable: • Water will boil at a higher
temperature if the atmospheric pressure on its surface is increased.
Avoid the use of extreme modifiers or qualifiers.
Undesirable:• All sessions of Congress
are called by the President. (F)
• The Supreme Court frequently rules on the constitutionality of law. (T)
• An objective test is generally easier to score than an essay test. (T)
• Desirable: • The sum of the angles of
a triangle is always 180o . (T)
• The galvanometer is the instrument usually used for the metering of electrical energy used in a home. (F)
Extreme Modifiers:• all • none• always • never• Only• nobody• Invariably• no one• best • absolutely• worst • absolutely not• everybody • certainly• everyone • certainly not
Qualifiers:• usually • frequently • often • sometimes • some • many • much • probably • a majority • apt to • Most• might • a few • unlikely
Avoid lifting statements from the text, lecture or other materials so that memory alone will not permit a correct answer.Undesirable: • For every action there is
an opposite and equal reaction.
Desirable: • If you were to stand in a
canoe and throw a life jacket forward to another canoe, chances are your canoe would jerk backward.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Avoid the use of unfamiliar vocabularyUndesirable: • According to some
politicians, the raison d’etre for capital punishment is retribution
Desirable: • According to some
politicians, justification for capital punishment is retribution.
Writing Hint… One method for developing true-false items is to write a set of true statements that cover the content, then convert approximately half of them to false statements. Remember: When changing items to false (as well as in writing the true statements initially), state the items positively, avoiding negatives or double negatives
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Matching Type
Directions: "Place the letter of the term in the right hand column on the line to the left of the definition column."
Circle the letter(s) that describe the best way to revisethese directions:A. Add: “Match the following” B. Add:“Each term may not be used more than once” C. Change the order of the directions provided D. No changes needed
Problem: Faulty directions.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Problem: Unrelated topics.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Problem: Mixing matching with completion
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Use only items that share the same foundation of information
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Avoid grammatical or other clues to the correct response
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Completion or Fill-in-the-Blank Test Items
Omit only significant words from the statement.
Do not omit so many words from the statement that the intended meaning is lost.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Be sure there is only one correct response.
If possible, put the blank at the end of a statement rather than at the beginning.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Essay Test Item
Formulate the question so that the task is clearly defined for the student.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Choose a scoring model.• The major task in scoring essay tests is to maintain
consistency, to make sure that answers of equal quality are given the same number of points. There are two approaches to scoring essay items: (1) analytic or point method and (2) holistic or rating method.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
AnalyticBefore scoring, prepare an ideal answer in which the major components are defined and assigned point values.
Read and compare the student’s answer with the model answer. If all the necessary elements are present, the student receives the maximum number of points.
Partial credit is given based on the elements included in the answer. In order to arrive at the overall exam score, the teacher adds the points earned on the separate questions.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Holistic:
This method involves considering the student’s answer as a whole and judging the total quality of the answer relative to other student responses or the total qualityof the answer based on certain criteria that youdevelop.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Prepare students to take essay exams
• Essay tests are valid measures of student achievement only if students know how to take them. Many college freshmen do not know how to take an essay exam, because they have not been required to learn this skill in high school.
• Take some class time to tell students how to prepare for and how to take an essay exam. Use old exam questions and let students see what an "A" answer looks like and how it differs from a "C" answer
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
REFERENCESThis presentation is patterned after the Powerpoint presentation of :Arnel O. Rivera. Faculty Member. BNHS-Villa Maria.CAS, LPU-Cavite
http://pareonline.net/genpare.asp?cx=partner-pub-8146434030680546%3A8994471369&cof=FORID%3A9&ie=UTF-8&wh=5&q=validity+and+reliability
http://www.cte.cornell.edu/documents/Test%20Construction%20Manual.pdf)
http://arc.duke.edu/documents/The%20difference%20between%20assessment%20and%20evaluation.pdf
http://www.edudemic.com/summative-and-formative-assessments/
http://pareonline.net/getvn.asp?v=5&n=2
http://www.duq.edu/about/centers-and-institutes/center-for-teaching-excellence/teaching-and-learning/multiple-choice-exam-construction
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf