PROFESSIONAL EDUCATION VI “ASSESSMENT OF LEARNING” (Basic Concepts) Prof. Yonardo Agustin Gabuyo

Preview:

Citation preview

PROFESSIONAL EDUCATION VI

“ASSESSMENT OF LEARNING”(Basic Concepts)

Prof. Yonardo Agustin Gabuyo

Basic Concepts in Assessment of Learning

Assessment refers to the collection of data to describe or better understand an issue.

measures "where we are in relation to where we should be?"  Many consider it the same as Formative Evaluation.

is a process by which information is obtained relative to some known objective or goal.

teacher’s way of gathering information about what students have learned , and they use them to make important decisions-about students’ grades, the content of future lessons, the revision of the structure or content of a course.

Measurement refers to the process by which the attributes or dimensions of some physical object are determine.is a process of measuring the individual’s intelligence, personality, attitudes and values, achievement and anything that can be expressed quantitatively. it answer the question, “ how much”?

Evaluation

determines "how well did we do what we set out to do?"  Evaluation is tied to stated goals and objectives.  Many equate this to summative evaluation.

it refers to the process of determining the extent to which instructional objectives are attained. refers to the comparison of data to standard for purpose of judging worth or quality.

Evaluation

Test is an instrument designed to measure any quality, ability, skill or knowledge.Testing is a method used to measure the level of performance or achievement of the learner.

TESTING refers to the administration, scoring and interpretation of an instrument (procedure) designed to elicit information about performance in a sample of a particular area of behavior.

ASSESSMENT EVALUATION

Content: timing, primary purpose

Formative: ongoing, to improve learning

Summative: final, to gauge quality

Orientation: focus of

Process-oriented: how learning is going

Product-oriented: what’s been learned

Findings: uses thereof

Diagnostic: identify areas for improvement

Judgmental: arrive at an overall grade/score

MODES OF ASSESSMENT

A. Traditional Assessmentpreparation of the instrument is time

consuming and prone to cheating.the objective paper-and-pen test

which usually assess low level thinking skills.

scoring is objective and administration is easy because students can take the test at the same time.

B. Performance Assessment

the learner performs a behavior to be measured in a "real-world" context. 

The learner demonstrates the desired behavior in a real-life context and the locus of control is with the student.

A mode of assessment that requires actual demonstration of skills or creation of products of learning.

Scoring tends to be subjective without rubrics.

Preparation of the instrument is relatively easy and it measures behavior that cannot be deceived.

B. Performance Assessment

C. Portfolio Assessment A process of gathering multiple indicators of students progress to support course goals in dynamic, ongoing and collaborative processes.

Development is time consuming and rating tends to be subjective without rubrics.

Measures student’s growth and development .

TYPES OF ASSESSMENT PROCESSES

A. Placement Assessment

Determine the entry behavior of the students.

Determine the student’s performance at the beginning of instruction.

Determine the position of the students in the instructional sequence.

Determine the mode of evaluation beneficial for each student.

B. Diagnostic Assessment

is given at the start: to determine the student’s levels of competence.

to identify those who have already achieve mastery of the requisite learning.

to help classify students into tentative small group of instruction.

C. Formative Assessment is given to: monitor learning progress of the students.

provide feedback to both parents and students.

it answer the question "Where we are in relation to where we should be?”

this type of assessment can be done informally and need not use traditional instruments such as quizzes and tests.

D. Summative Assessment given at the end of a unit:to determine if the objectives were achieved. tends to be formal and use traditional instruments such as tests and quizzes.

it answer the question "How well did we do what we set out to do?" 

determine the extent of the student’s achievement and competence.

provide a basis for assigning grades.

provide the data from which reports to parents and transcripts can be prepared.

Principles of Quality Assessment

1.Clarity of the Learning Target2.Appropriateness of the Assessment Method

3. Validity 4. Reliability5. Fairness6. Practicality and Efficiency

Principles of Quality Assessment

1.Clarity of the Learning Target

Learning Target. Clearly stated, focuses on student learning objective rather than teacher activity, meaningful and important target.Skill Assessed.  Clearly presented, can you "see" how students would demonstrate the skill in the task itself?

Performance Task - Clarity. Could students tell exactly what they are supposed to do and how the final product should be done?

Rubric - Clarity.  Would students understand how they are to be evaluated? Are the criteria observable and clearly described?

2.Appropriateness of the Assessment Method

Does it work with type of task and learning target?

Does it allow for several levels of performance?

Does it assess skills as stated?

The type of test used should much the learning objective of the subject matter.

Two general categories of test items: 1.Objective items require students to select the correct response from several alternatives or to supply a word or short phrase to answer a question or complete a statement.2.Subjective or essay items which permit the student to organize and present an original answer.

Objective Test include true-false, fill-in-the-blank,

matching type, and multiple choice questions.

the word objective refers to the scoring and indicates there is only one correct answer.

Objective tests rely heavily on your skill to read quickly and to reason out the answer.

measure both your ability to remember facts and figures and your understanding of course materials.

prepare yourself for high level critical reasoning and making fine discriminations to determine the best answer.

a) Multiple-Choice Itemsused to measure knowledge outcomes and various types of learning outcomes.

they are most widely used for measuring knowledge , comprehension, and application outcomes.

scoring is easy, objective, and reliable.

Principles of Quality Assessment

Advantages in Using Multiple-Choice Items

Multiple-choice items can provide ...

versatility in measuring all levels of cognitive ability.

highly reliable test scores.

scoring efficiency and accuracy.

objective measurement of student achievement or ability.

Multiple-choice items can provide…

a wide sampling of content or objectives.

a reduced guessing factor when compared to true-false items.

different response alternatives which can provide diagnostic feedback.

b. True-False Items typically used to measure the ability to identify whether statements of fact are correct.

the basic format is simply a declarative statement that the student must judge are true or false.

item is useful for outcomes where there are two possible alternatives.

True-False Items….. do not discriminate between students of varying ability as well as other item types.

can often include more irrelevant clues than do other item types.

can often lead an instructor to favor testing of trivial knowledge.

c. Matching Type Items consist of a column of key words

presented on the left side of the page and a column of options place at the right side of the page. Students are required to match the options associated with a given key word(s).

provide objective measurement of students achievement.

provide efficient and accurate test scores.

Matching Type Items if options can not be used more than once, the items are not mutually exclusive; getting one answer incorrect automatically means a second question is incorrect.

all items should be of the same class, and all options should be of the same class. (e.g., a list of events to be matched with a list of dates.

d. Short Answer Items requires the examinee to supply the appropriate words, numbers, or symbols to answer a question or complete a statement. items should require a single word answer or brief and definite statement. can efficiently measure lower level of cognitive domain.

B) Essays or Subjective test

may include either short answer questions or long general questions. these exams have no one specific answer per student.

they are usually scored on an opinion basis, although there will be certain facts and understanding expected in the answer.

essay test are generally easier and less time consuming to construct than are most objective test items.

the main reason students fail essay tests is not because they cannot write, but because they fail to answer the questions fully and specifically, their answer is not well organized.

students with good writing skills have an advantage over students who have difficulty expressing themselves through writing.essays are more subjective in nature due to their susceptibility to scoring influences.

C) PERFORMANCE TEST

also known as alternative or authentic assessment

is designed to assess the ability of a student to perform correctly in a simulated situation (i.e., a situation in which the student will be ultimately expected to apply his/her learning).

a performance test will simulate to some degree a real life situation to accomplish the assessment.

in theory, a performance test could be constructed for any skill and real life situation.

most performance tests have been developed for the assessment of vocational, managerial, administrative, leadership, communication, interpersonal and physical education skills in various simulated situations.

Advantages in Using Performance Test Items

Performance test items:

can appropriately measure learning objectives which focus on the ability of the students to apply skills or knowledge in real life situations.

usually provide a degree of test validity not possible with standard paper and pencil test items.

are useful for measuring learning objectives in the psychomotor domain.

SUGGESTIONS FOR WRITINGPERFORMANCE TEST ITEMS

1.Prepare items that elicit the type of behavior you want to measure.

2. Clearly identify and explain the simulated situation to the student.

3. Make the simulated situation as "life-like" as possible.

4. Provide directions which clearly inform the students of the type of response called for.

5. When appropriate, clearly state time and activity limitations in the directions.

6. Adequately train the observer(s)/scorer(s) to ensure that they are fair in scoring the appropriate behaviors.

D) Oral questioning

the most commonly-used of all forms of assessment in class. assumes that the learner can hear, of course, and shares a common language with the assessor.

the ability to communicate orally is relevant to this type of assessment.

The other major role for the "oral" in summative assessment is in language learning, where the capacity to carry on a conversation at an appropriate level of fluency is relatively distinct from the ability to read and write the language.           

E) Observation refers to measurement proceduresin which child behaviors in the school or classroom are systematically monitored, described, classified, and analyzed,with particular attention typically given to the antecedent and consequent events involved in the performance and maintenance of such behaviors.

F) Self-reports

Students are asked to reflect on make a judgment about, and then report on their own or a peer's behavior and performance.

typical evaluation tools could include sentence completion, Likert scales, checklists, or holistic scales.

responses may be used to evaluate both performance and attitude.

3. Validity

is the degree to which the test measures what is intended to measure.

it is the usefulness of the test for a given purpose.

a valid test is always reliable.

Procedure Meaning

1. Face Validity Done by examining the physical appearance of the test.

2. Content-Related Validity

Done through a careful and critical examination of the objectives of the test so that it reflects the curricular objectives .Compare the test tasks to the test specifications describing the task domain under consideration.

How well the sample of test tasks represents the domain of tasks to be measured.

Approaches in Validating Test

3. Criterion-related Validity

Establish statistically such that a set of scores revealed by the test is correlated with the scores o0btained in another external predictor or measure.

Compare the test scores with another measure of performance obtained

How well test performance predicts future performance or estimates current performance on some valued measures other than the test itself (called criterion).

at a later date (for prediction) or another measure of performance obtained concurrently (for estimating present status).

4. Construct-related Validity

Establish statistically by comparing psychological traits or factors that theoretically influence scores in the test. Establish the meaning of the scores on the test by controlling (or examining) the development of the test, evaluating the relationships of the scores with other relevant measures, and experimentally determining what factors influence test performance.

How well test performance can be interpreted as a meaningful measure of some characteristic or quality.

Factors Affecting Content Validity of Test Items

A. Test itselfB. The administration and scoring of a

test.C. Personal factors influencing how

students response to the test. D. Validity is always specific to a

particular group.

Factors Affecting Content Validity of Test Items

A. Test Itself:Ways that can reduce the validity

of test results1. Unclear Directions2. Poorly constructed test items3. Ambiguity4. Inappropriate level of difficulty5. Improper arrangement of items

6. Inadequate time limits7. Too short test8.Identifiable pattern of

answers.9.Test items inappropriate for

the outcomes being measured.

10.Reading vocabulary and sentence structure to difficult.

B. The administration and scoring of a test.

assessment procedures must be administered uniformly to all students. Otherwise, scores will vary due to factors other than differences in student knowledge and skills.

the test should be administered with ease, clarity and uniformity so that scores obtained are comparable.

uniformity can be obtained by setting the time limit and oral instructions.

insufficient time to complete the test

giving assistance to students during the testing

subjectivity in scoring essay tests

C. Personal factors influencing how students response to the test.

students might not mentally prepared for the test. students can subconsciously be exercising what is called response set.

D. Validity is always specific to a particular group.

the measurement of test results can be influence by such factors as age, sex, ability level, educational background and cultural background.

Validity is the most important quality of a test.

does not refer to the test itself.

generally addresses the question: "Does the test measure what it is intended to measure?"

refers to the appropriateness, meaningfulness, and usefulness of the specific inferences that can be made from test scores.

is the extent to which test scores allow decision makers to infer how well students have attained program objectives.

4. Reliability

it refers to the consistency of score obtained by the same person when retested using the same instrument or one that is parallel to it.refers to the results obtained with an evaluation instrument and not the instrument itself.

an estimate of reliability always refer to a particular type of consistency.

reliability is necessary but not a sufficient condition for validity.

reliability is primarily statistical.

Methods of Computing Reliability Coefficient

Method Procedure Type of Reliability Measure

1.Test-retest method Give a test twice to the same group with any time interval between tests.

Measure of Stability

2. Parallel Method (Equivalent Forms)

Give parallel forms of test with close time intervals between forms.

Measure of Equivalence

3. Split-Half Method Give a test once. Score equivalent halves of the test e.g. odd and even numbered items.

Measure of Internal Consistency

4. Kuder-Richardson Give the test once then correlate the proportion /percentage of the students passing and not passing a given item.

Measure of Internal Consistency

Relationship of Reliability and Validity

test validity is requisite to test reliability.

if a test is not valid, then reliability is moot. In other words, if a test is not valid there is no point in discussing reliability because test validity is required before reliability can be considered in any meaningful way.

Reliability is the degree to which test scores are free of errors of measurement due to things like student fatigue, item sampling, student guessing.

if as test is not reliable it is also not valid.

5. Fairness the assessment procedures do not discriminate against a particular group of students (for example, students from various racial, ethnic, or gender groups, or students with disabilities).

6. Practicality and Efficiency

Teacher’s familiarity with the method

Time required

Complexity with the administration Ease in scoring -the test should be easy to score such that directions for scoring are clear, the scoring key is simple; provisions for answer sheets are made.

Cost- (economy) - the test should be given in the cheapest way, which means that the answer sheets must be provided so that the test can be given from time to time.

Development of Classroom Assessment Tools

Steps in Planning for a Test

Identifying test objectives

Deciding on the type of objective test to be prepared

Preparing a Table of Specifications (TOS)

Construction the draft test items

Try-out and validation

Identifying Test Objectives.

An objective test, if it is to be comprehensive, must cover the various levels of Bloom’s taxonomy. Each objective consists of a statement of what is to be achieved and preferably, by how many percent of the students.

Cognitive Domain 1. Knowledge recognizes students’ ability to used rote memorization and recall certain facts. Test questions focus on identification and recall information.

Sample verbs of stating specific learning outcomes

Cite, define, identify label, list, match, name, recognize, reproduce, select state.

At the end of the topic, students be able to identify major food groups without error. (instructional objective)

Test Item: What are the four major food groups?

What are the three measures of central tendency?

2. Comprehension involves students’ ability to read course content, interpret important information and put other’s ideas into their own words. Test questions should focus on the use of facts, rules and principles.

Sample verbs of stating specific learning outcomes.

Classify, convert, describe, distinguish between, give examples, interpret, summarize.

At the end of the lesson, the students be able to summarize the main events of the story in grammatically correct English. (instructional objective)

Summarize the main event in the story in grammatically correct English. (test item)

3. Application students take new concepts and apply them to new situation. Test questions focuses on applying facts and principles.

Sample verbs of stating specific learning outcomes.

Apply, arrange, compute, construct, demonstrate, discover, extend, operate, predict, relate, show, solve, use.

At the end of the lesson, the students be able to write a short poem in iambic pentameter. (instructional objective)

Write a short poem in iambic pentameter.

4. Analysisstudents have the ability to take new

information and break it down into parts and differentiate between them. The test question focus on separation of a whole into component parts.

Sample verbs of stating specific learning outcomes.

Analyze, associate, determine, diagram, differentiate, discriminate, distinguish, estimate, point out, infer, outline, separate.

At the end of the lesson, the students be able to describe the statistical tools needed in testing the difference between two means. (instructional objective)

What kind of statistical test would you run to see if there is a significant difference between pre-test and post-test?

5. Synthesis students are able to take various pieces of information and form a whole creating a pattern where one did not previously exist. Test question focuses on combining new ideas to form a new whole.

Sample verbs of stating specific learning outcomes.

Combine, compile, compose, construct, create, design, develop, devise, formulate, integrate, modify, revise, rewrite, tell, write.

At the end of the lesson, the student be able to compare and contrast the two types of error. (instructional objective)

What is the difference between type I and type II error?

6. Evaluation

involves students’ ability to look at someone else’s ideas or principles and the worth of the work and the value of the conclusion.

Sample verbs of stating specific learning outcomes.

Appraise, assess, compare, conclude, contrast, criticize, evaluate, judge, justify, support.

At the end of the lesson, the students be able to conclude the relationship between two means.

Example: What should the researcher conclude about the relationship in the population?

Preparing Table of Specification

A table of specification is a useful guide in determining the type of test items that you need to construct. If properly prepared, s table of specifications will help you limit the coverage of the test and identify the necessary skills or cognitive level required to answer the test item correctly.

Gronlund (1990) lists several examples of how a table of specifications should be prepared.

Format of a Table of Specifications

Specific Objectives these refer to the intended learning outcomes stated as specific instructional objectives covering a particular test topic.

Cognitive Level this pertains to the intellectual skill or ability to correctly answer a test item using Bloom’s taxonomy of educational objectives. We sometimes refer to this as the cognitive demand of a test item. Thus entries in this column could be knowledge, comprehension, application, analysis, synthesis and evaluation.

Type of Test Item this identifies the type or kind of test a test items belongs to. Examples of entries in this column could be “multiple choice, true or false, or even essay.

Item Number this simply identifies the question number as it appears in the test.

Total Number of Points this summarizes the score given to a particular test item.

(1) Sample of Table of specifications

Specific objectives Cognitive level

Type of test Item number

Total Points

Solve easy, moderately difficult and difficult problems applying the principles of percentage composition.

Analysis Multiple choice

1 and 2

4 points

(2) Sample of Table of specifications

Content Number of Class

Sessions

Number of Items

Number of Items

1.Subtraction Concepts 4 5 1- 5

2. Subtraction as the Inverse of Addition

4 5 6-10

3. Subtraction without Regrouping

8 10 11- 20

4. Subtraction with Regrouping 5 6 21- 26

5.Subtraction Involving Zeros 8 10 27- 36

6.Mental Computation through Estimation

4 5 37- 41

7. Problem Solving 7 9 42- 50

TOTAL 40 50 1- 50

Content Class

Session

K C Ap An Sy Ev

1.Conversion of Unit 3 1 1 1 1 1 1

2. Speed and Velocity 3 1 1 2 1 1

3.Acceleration 2 1 1 1 1

4. Free Falling Bodies 1 1 1

5. Projectile Motion 1 1 1

6. Force 1 1 1

7. Vector 2   2   1   1

8.Work,Energy, & Power

3 1 1 2 1 1

9.Conservation of Energy

 2 2 1 1

10.Conversation of Momentum

2   1 2   1

TOTAL 20 4 6 8 8 7 7

(3) Sample of Table of specifications

Points to Remember in preparing a table of Specifications1)Define and limit the subject matter coverage of the test depending on the length of the test.2) Decide on the point distribution per subtopic.3) Decide on the type of test you will construct per subtopic.

4) Make certain that the type of test is appropriate to the degree

of difficulty of the topic.5) State the specific instructional objectives in terms of the specific types of performance students are expected to demonstrate at the end of instruction.

6) Be careful in identifying the necessary intellectual skill needed to correctly answer the test item. Use Bloom’s taxonomy as reference.

Suggestions for Constructing Short-Answer Items1)Word the item so that the required answer is both brief and specific.2)Do not take statements directly from textbooks to use as a basis for short-answer items.

3)A direct question is generally more desirable than an incomplete statement.

4) If the answer is to be expressed in numerical units, indicate the type of answer wanted.

5) Blanks for answer should be equal in length and in column to the right of the question.

6) When completion items are used, do not include too many blanks.

Example for:

1) Poor: An animal that eats the flesh of other animals is (carnivorous)

Better: An animal that eats the flesh of other animals is classified as (carnivorous)

2) Poor: Chlorine is a (halogen). Better: Chlorine belongs to a group of elements

that combine with metals to form salt. It is therefore called a (halogen)

Development of Classroom Assessment Tools

Suggestions for Constructing Short-Answer Items

3) Poor: John Glenn made his first orbital flight around the earth in (1962).Better: In what year did John Glenn make his first orbital flight around the earth? (1962)

Selecting the Test Format

Selective Test – a test where there are choices for the answer like multiple choice, true or false and matching type.Supply Test – a test where there are no choices for the answer like short answer, completion and extended-response essay.

Construction and Tryouts Item Writing

Content Validation

Item Tryout

Item Analysis

Item Analysisrefers to the process of examining the student’s response to each item in the test.

There are two characteristics of an item. These are desirable and undesirable characteristics. An item that has desirable characteristics can be retained for subsequent use and that with undesirable characteristics is either be revised or rejected.

Use of Item Analysis Item analysis data provide a basis for efficient class discussion of the test results.

Item analysis data provide a basis for remedial work.

Item analysis data provide a basis for general improvement of classroom instruction.

Use of Item AnalysisItem analysis data provide a basis for increased skills in test construction.

Item analysis procedures provide a basis for constructing test bank.

Three criteria in determining the desirability and undesirability of an item.

a) difficulty of an itemb) discriminating power of an

itemc) measures of attractiveness

Difficulty indexrefers to the proportion of the number of students in the upper and lower groups who answered an item correctly.

Development of Classroom Assessment Tools

Index Range Difficulty Level

0.00-0.20 Very Difficult

0.21-0.40 Difficult

0.41-0.60 Moderately Difficult

0.61-0.80 Easy

0.81-1.00 Very Easy

Level of Difficulty of an Item

Discrimination Index

refers to the proportion of the students in the upper group who got an item correctly minus the proportion of the students in the lower group who got the an item right.

Development of Classroom Assessment Tools

Level of Discrimination

Development of Classroom Assessment Tools

Index Range Discrimination Level

Below – 0.10 Questionable Item

0.11-0.20 Not discriminating

0.21-0.30 Moderately discriminating

0.31-0.40 Discriminating

0.41-1.00 Very Discriminating

Positive Discrimination Indexmore students from the upper group got the item correctly than in the lower group.Negative discrimination IndexMore students from the lower group got the item correctly than in the upper group.

Types of Discrimination Index

Zero Discrimination IndexThe number of students from the upper group and lower group are equal

MEASURES OF ATTRACTIVENESS

To measure the attractiveness of the incorrect option (distractors) in a multiple-choice tests, count the number of students who selected the incorrect option in both the upper and lower groups. The incorrect options should attract less of the upper group than the lower group.

Rubrics a systematic guideline to evaluate students’ performance through the use of a detailed description of performance standard. used to get consistent scores across all students

it provides students with feedbacks regarding their weakness and strength, thus enabling them to develop their skills.

allows students to be more aware of the expectations for performance and consequently improve their performance.

Holistic Rubric vs Analytic Rubric

Holistic Rubric is more global and does little to separate the task in any given product, but rather views the final product as a set of all interrelated tasks contributing to the whole.

Provide a single score based on an overall impression of a students’ performance on task.

May be difficult to provide one over all score.

Advantage: quick scoring, provide overview of students achievement.

Disadvantage: does not provide detailed information about the student performance in specific areas of the content and skills.

Use a holistic rubric when:

You want a quick snapshot of achievement.

A single dimension is adequate to define quality.

Example of Holistic Rubrics

Analytic Rubric breaks down the objective or final product into component part each part is scored independently.

provide specific feedback along several dimension.

Analytic RubricAdvantage: more detailed feedback, scoring more consistent across students and graders.

Disadvantage: time consuming to score

Use an analytic rubric when: you want to see relative strengths and weaknesses.

you want detailed feedback. you want to assess complicated skills or performance.

you wants students to self-assess their understanding or performance.

Example of Analytic Writing Rubric

Example of Analytic Writing Rubric

Norm-Referenced Interpretation result is interpreted by comparing a student with another student where some will really pass.designed to measure the performance of the students compared to other students. Individual score is compared to others. usually expressed in term of percentile, grade equivalent or stanine.

Utilization of Assessment Data

Norm-referenced grading is a system typically used to evaluate students based on the performance of those around them. IQ tests and SAT exams would be two examples of this system, as well as grading “on the curve.

Norm-referenced grading is more common in schools that emphasize class rank rather than understanding of skills or facts.

Criterion-Reference Interpretation result is interpreted by comparing student based on a predefined standard where all or none may pass.

designed to measure the performance of students compared to a pre-determined criterion or standard, usually expressed in terms of percentage.

Utilization of Assessment Data

Criterion-referenced evaluation should be used to evaluate student performance in classrooms.

it is referenced to criteria based on learning outcomes described in the provincial curriculum.

the criteria reflect a student's performance based on specific learning activities.

a student's performance is compared to established criteria rather than to the performance of other students.

evaluation referenced to prescribed curriculum requires that criteria are established based on the learning outcomes listed under the curriculum.