64
Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Embed Size (px)

Citation preview

Page 1: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Test Evaluation and Selection

Determining Appropriate Assessments

for Conducting Comprehensive Evaluations

Page 2: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

2

• Welcome & Introductions• General Principles of Measurement I

• Break (15 minutes)

• General Principles of Measurement II• Wrap-Up: Questions & Answers

Agenda

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 3: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

3

Training Objectives

Participants will:• Learn how to evaluate assessment tools based

on their characteristics, including norms, reliability and validity scores and administrative procedures; and

• Learn how to select assessment tools appropriate for a comprehensive evaluation.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 4: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

General Principlesof Measurement I

Page 5: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

5

General Principles of Measurement

Measurement: process of assigning (numerical) values to behaviors, objects or events, or their properties, according to rules

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Theoretical Concept

Assigning Value

Page 6: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

6

General Principles of Measurement

Measurement requires a theoretical concept be operationalized (or defined in concrete terms) and then systematically observed before a value can be assigned.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Theoretical Concept

Operational Definition

Systematic Observation

Assigning Value

Page 7: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

7

Important Definitions

• Construct: theoretical concept that is inferred by observations (NOTE: In educational evaluations, the constructs examined are typically psychological attributes such as intelligence or social competence)

• Operational definition: concrete descriptions of observable behaviors that are legitimate indicators of the construct

• Assessment: a standardized procedure for obtaining a behavior sample from a specific domain or across multiple domains that may represent either optimal or typical performanceRegional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 8: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

8

Important Definitions

• Test: a series of tasks presented using standardized procedures for obtaining a behavior sample from a specific domain or across multiple domains that may represent either optimal or typical performance

• Assessment: a standardized procedure for obtaining a behavior sample from a specific domain or across multiple domains that may represent either optimal or typical performance

• Evaluation: a process for gathering information in systematic ways for making important decisions and/or rendering a judgment Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 9: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

9

Model of Measurement

Evaluations

Assessments

Tests

Regional Training 2012-2013

Page 10: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

10

Problems in Measurement

No single approach to the measurement of any construct is universally accepted.• Measurement is always indirect.• Measurement is based on behaviors that are

perceived as relevant.• Assessment developers have selected different

behaviors, and sometimes very different types of behaviors, to define a construct operationally.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 11: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

11

Problems in Measurement

Psychological measurements are usually based on limited samples of behavior.• It is not practical to include all possible

behaviors in an assessment.• A major problem in developing assessments is

to determine the number and variety of items that are required to provide an adequate sample of the behavioral domain to be assessed.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 12: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

12

Problems in Measurement

The measurement obtained is always subject to error.• Assessments are collected at one point in time.

• They are highly affected by the internal conditions of the student (e.g., fatigue, boredom, forgetfulness, guessing or carelessness).

• Assessments are collected by one assessor.• Assessors cannot conduct a perfect administration

(e.g., misadministration of items, failure to observe behaviors actually performed or mis-scoring items).

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 13: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

13

Problems in Measurement

The lack of well-defined units on the measurement scales poses problems.• Subject-centered measurements are used to

place a person along a continuum for a specific theoretical concept.

• This introduces debates over the appropriate level of scaling to be used, the labels to be assigned to the units and how to meaningfully interpret specific values obtained.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 14: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

14

Problems in Measurement

Psychological constructs cannot be defined only in terms of operational definitions but must also have demonstrated relationships to other constructs or observable phenomena.• We define constructs in terms of observable

behavior to determine how it is measured.• We define constructs in mathematical or logical

relationships to other constructs in a theoretical system to determine how it is interpreted.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 15: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

15

Two Types of Comparisonsfor Decision-Making

To determine if concerns are significant or have been resolved, evaluators must make two different kinds of comparisons.• When comparing a child’s performance to the

performance of other children, evaluators use norm-referenced measures.

• When comparing a child’s performance to expected content standards, evaluators use criterion-referenced measures.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 16: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

16

DO YOU HAVE A CONCERN?• Describe the problem

IS IT SIGNIFICANT?• Compare to peers and expectations

WHAT SHOULD WE DO?• Develop and implement a plan

IS IT WORKING?• Monitor progress

IS THE CONCERN RESOLVED?• Compare to peers and expectations

Problem-Solving Model

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 17: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

17

Types of Assessments

• Non-standardized vs. Standardized• Norm-referenced vs. Criterion-referenced vs.

Self-referenced (ipsative scales)• Individual vs. Group• Objective vs. Subjective• Power vs. Speed• Sample vs. Sign• Verbal vs. Nonverbal vs. Nonlanguage vs.

PerformanceRegional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 18: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

18

Norms

• Norms are generally organized either by age or student grade.

• Norms should proportionally represent the population on key demographic indicators such as:• Race and ethnicity;• Socioeconomic status;• Community size; and• Region of the country.

• Norms are sometimes broken down by other important characteristics such as gender or language status.

• The demographics should not vary by 5% or more from the general population compared to census data.Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 19: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

19

Norm-Referenced Scores

Developmental scores• Types

• Developmental (i.e., age or grade) equivalents • Developmental quotients

• Problems with these developmental scores• Systematic misinterpretations• Need to estimate data for some ages or grades• Promotion of typological thinking• Implication of a false standard performance• Tendency for scales to be ordinal—not equal interval

Regional Training 2012-2013

Page 20: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

20

Norm-Referenced Scores

Percentile scores• Types

• Percentile ranks: score that indicates the percentage of people whose scores are at or below a given raw score (e.g., a percentile of 53 indicates that the person scored as well as or better than 53% of the comparison group)

• Performance bands• Deciles: each band contains 10% of the norm group

• Quartiles: each band contains 25% of the norm group

• Problems• Not equal interval scores• Cannot be added or subtracted

Regional Training 2012-2013

Page 21: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

21

Norm-Referenced Scores

Standard scores• Types

• Z-scores: mean of 0 and standard deviation of 1• T-scores: mean of 50 and standard deviation of 10• IQ scores: mean of 100 and standard deviation of 15*• Stanines: standard score bands that divide the

distribution into nine parts

• Problems• Difficult to explain to people without some statistical

knowledge

Regional Training 2012-2013

Page 22: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

22

Distribution of Scores

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 23: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

23

Scores

• Obtained (raw) scores: numerical value received on an assessment

• True score: hypothetical value that best represents an individual’s true ability

• Error score: the difference between an obtained score and the true score

ST = SO ± SE

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 24: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

24

Sources of Error

• Temporary characteristics of the student (e.g., fatigue, lack of motivation, boredom, or carelessness)

• Conditions of the testing environment (e.g., loud/ distracting setting,

• Characteristics of the test (e.g., poorly worded directions, tricky questions, or ambiguous items)

• Conditions of scoring (e.g., carelessness, disregard or lack of unambiguous scoring standards, or computational errors)

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 25: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

25

• The standard error of measurement (SEM), an index of test error, reflects the average standard deviation of error around a person’s true score.

• Confidence intervals provide a range within which the true score is likely to be found.

Standard Error of Measurement

Regional Training 2012-2013

Page 26: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

26

Reliability describes the extent to which measurements can be depended on to provide consistent, unambiguous information.

Reliability

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 27: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

27

Reliability

The error present when measuring may be systematic or random. Reliability refers to the relative absence of random error during measurement.• There are three types of reliability or

generalizations from assessments:• Item reliability;• Stability; and• Interobserver agreement.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 28: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

28

Item Reliability

Assessments only administer a sample of items from all possible items from a domain. Item reliability concerns whether or not we can generalize the student’s performance on the sampled items to the entire domain.

• Alternate-form reliability uses two or more forms of a test to see if they are equivalent (i.e., they measure the same trait or skill to the same extent and have the same means and variances).

• Internal consistency splits one test into two parts to see if they are equivalent (i.e., split-half reliability).

• Coefficient alpha (α) or KR-20 is average split-half reliability.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 29: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

29

Stability Reliability

Assessments assume that student performance reflects their abilities and skills at times other than during the administration. Stability concerns the consistency between the measured performance and performance at some time in the future.

• Test-retest reliability is calculated by administering the same measure twice over a short period of time (e.g., 2 weeks).

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 30: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

30

Interobserver Agreement Reliability

Assessments assume that student performance is not affected by the assessor who administers the measure. Interobserver agreement is the degree to which one assessor’s rating/scoring of the student’s performance is consistent with the ratings/scores obtained by comparably trained observers/assessors.

• Percentage agreement is calculated by simple agreement, agreement for occurrence, or point-to-point agreement.

• Kappa (κ) is a correlation coefficient of IA.Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 31: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

31

Validity describes the extent to which measurements are useful in making decisions and providing explanations for specific students.

Validity

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 32: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

32

Validity

Types of evidence for determining validity:• Test content• Internal structure• Relations to other variables• Response processes• Consequences of testing

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 33: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

33

Validity: Test Content

Evidence for validity based on test content focuses on the extent to which the test items actually represent the domain to be measured. Evidence considered includes:

• The appropriateness of the items included;• The content not included; and• How the content is measured.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 34: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

34

Validity: Relations to Other Variables

Evidence for validity based on relations to other variables focuses on the relationship between the test’s results and the results obtained from other sources. Evidence considered includes:

• The extent to which a person’s performance on the measure can be used to estimate that person’s performance on a criterion measure (i.e., convergent validity typically expressed as either concurrent or predictive criterion-related validity); and

• The extent to which a person’s performance on the measure does not relate to performance on unrelated criterion measures (i.e., divergent validity).

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 35: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

35

Validity: Response Processes

Evidence for validity based on response processes focuses on the way in which students answer test questions and the way in which assessors score student responses. Evidence considers:

• The expected way that students are intended to solve the problems or answer the questions; and

• The degree to which students actually use those processes to solve the problems or answer the questions indicated by having students describe their process or show their work.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 36: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

36

Factors Affecting Validity

• Reliability• Systematic Bias

• Enabling Behaviors• Differential Item Effectiveness

• Systematic Administrative Errors• Norms

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 37: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

37

Comparison of Reliability and Validity

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 38: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

General Principlesof Measurement II

Page 39: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

39

Selection of Assessment Tools

To select appropriate assessment tools or tests for decision-making, assessors must answer the following questions:

• What is the domain you want to assess?• Are you qualified to administer the assessment tool or

test?• Can the assessment tool or test be used appropriately

with the age or grade of students you need to assess?• Can the assessment tool or test be administered to

groups of students or to individuals?• Is the assessment tool or test current?

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 40: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

40

Domains for Assessment Tools

• Physical Development• Sensory Functioning• Motor Development• Medical Status

• Communication Development• Receptive and Expressive Language• Voice, Fluency, and Articulation

• Cognitive and Academic Development• Sensory Processing• Achievement• Intelligence

• Social-Emotional Development• Adaptive FunctioningRegional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 41: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

41

Qualified Assessors

Each test will list qualifications for the assessor:• Level A: Requires either:

• Bachelor’s degree in a relevant discipline (e.g., education, psychology, counseling, social work, etc.) and coursework relevant to psychological testing; OR

• Equivalent training in psychological assessments from a reputable organization; OR

• Certification by or full active membership in a professional organization (ASHA, AOTA, APA, AERA, ACA, AMA, NASP, NAN, INS) that requires training and experience in a relevant area of assessment; OR

• Practical experience in the use of psychological assessments.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 42: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

42

Qualified Assessors

Each test will list qualifications for the assessor:• Level B: Requires either:

• Graduate degree in a relevant discipline (e.g., education, psychology, counseling, social work, etc.) and graduate-level coursework in the ethical administration, scoring, and interpretation of clinical assessments; OR

• Equivalent training focused on psychological testing or measurement from a reputable organization; OR

• Certification by or full active membership in a professional organization (ASHA, AOTA, APA, AERA, ACA, AMA, NASP, NAN, INS) that requires training and experience in a relevant area of assessment.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 43: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

43

Qualified Assessors

Each test will list qualifications for the assessor:• Level C: Require a high level of expertise in test

interpretation and either:• Doctorate degree in a relevant discipline (e.g.,

education, psychology, counseling, social work, etc.) and graduate-level coursework in the ethical administration, scoring, and interpretation of clinical assessments; OR

• Direct supervision of a qualified professional in a related discipline; OR

• Licensure or certification to practice independently in your state in the relevant field.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 44: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

44

Qualified Assessors

Each test will list qualifications for the assessor:• Level Q: Requires formal training in the ethical use,

administration, and interpretation of standardized assessment tools and psychometrics along with one of the following:

• Q1: Degree or license to practice in the healthcare or allied healthcare field.

• Q2: Formal supervised speech/language, mental health, and/or educational training specific to working with parents and assessing children, or formal supervised training in infant and child development.

NOTE: If you meet the criteria for B or C levels, you do not need to pursue qualification under Level Q.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 45: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

45

Age/Grade Considerations

• Use age norms for most assessments (e.g., adaptive functioning, intelligence, or social/ emotional development, etc.)

• Grade norms may be more appropriate for academic functioning in cases where the student is not in the appropriate grade for his/her age.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 46: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

46

Age/Grade Considerations

Be cautious about using norms that have been mathematically estimated!

• Extrapolation scores are estimated beyond the ends of the actual data collected (e.g., scores are estimated for 4 and 17 year olds from a norm sample containing 5-16 year olds).

• Interpolation scores are estimated between the actual data collected (e.g., scores are estimated for 6 year olds from a norm sample containing only 5 and 7 year olds).

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 47: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

47

Current Assessment Tools/Tests

• Tests 15 or more years of age are dated and should not be used unless absolutely necessary such as the following:• The assessment tool or test is the only measure

available to assess the specific domain necessary.• The newer version of the assessment tool or test

lacks adequate norms, reliability, or validity.

• Contact the publisher to make sure that you are using the most recent version of the test or if a newer version of a test will be released soon.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 48: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

48

Reliability Standards for Assessment Tools or Tests

• Minimum of 0.60 reliability for test scores used for administrative purposes and reported for groups of individuals

• Minimum of 0.70 reliability for test scores collected regularly (i.e., weekly or more frequently) for progress monitoring.

• Minimum of 0.80 reliability for test scores to be used for making screening decisions

• Minimum of 0.90 reliability for test scores to be used for making important decisions for individual students.Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 49: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

49

Additional Considerations forSelection of Assessment Tools

To select appropriate assessment tools or tests for decision-making, assessors must also consider:

• For what purpose is the assessment tool or test administered?

• What skills and/or areas of knowledge are assessed?• What kinds of questions and how many of each kind of

question are included?

• What procedures are used? • How do students demonstrate their knowledge?

• How long or difficult is the assessment tool or test?

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 50: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

50

Types of Evaluative Decisions

• Selection• Selection assessments identify who will be accepted or

rejected for a program or project.• These assessments are conducted typically for admittance

to an institution such as a university.

• Placement and Classification• Placement assessments assign individuals to different levels

or types of categories.• Classification assessments place individuals in different

programs. These assessments are conducted typically to place individuals in optimal programs to increase the probability of their success or well-being.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 51: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

51

Types of Evaluative Decisions

• Diagnosis and Remediation• Diagnostics assessments determine strengths and

weaknesses to locate the source of a problem.• These assessments are conducted typically to determine

how to improve performance or well-being through elimination of any problems.

• Feedback• Assessments are conducted typically to inform individuals

regarding their progress or well-being.• These assessments are conducted typically to improve

learning, increase efficiency, and reduce wasted time or energy

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 52: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

52

Types of Evaluative Decisions

• Motivation and Guidance of Learning• Assessments can motivate and guide student learning as long

as they are fair and of reasonable difficulty.• Assessments with trivial content will not guide learning well

and assessments that are too easy or too difficult will undermine motivation for learning.

• Program and Curriculum Improvement• Summative evaluation enables a judgment of overall

effectiveness after completion of a program or project. • Formative evaluation enables effective decision-making for

improving a program or project throughout the duration of the program or project.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 53: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

53

Additional Considerations forSelection of Assessment Tools

Assessment tools or tests may not be valid or interpretable when:

• Important characteristics of the examinee are not represented in the norm group;

• Administration or scoring procedures do not follow those used in standardizing the test;

• Characteristics of the test may affect its utility for the situation (e.g., ceiling and floor effects);

• The test contains tasks that are not culturally relevant to the test taker; or

• The validity evidence does not support decisions made on the basis of the test scores.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 54: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

54

Ethical Testing Practices

Confidentiality• Educators must maintain confidentiality except when:

• A clear and immediate danger to the student is evident. [Contact other professionals or authorities immediately!].

• The student will benefit from talking to other professionals concerned with the case.

• Permission for confidential communications have been given by the student or parent.

Test Security• Dissemination of tests is restricted to those with the

technical competence to use them properly.• Standardized tests must be secured at all times.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 55: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

55

Ethical Testing Practices

Test Interpretation• Test scores and materials should only be available

to individuals qualified to use them.• Test results should be interpreted to parents and

students in ways to prevent misinterpretation and misuse.

Test Publication• Standardized tests should provide a manual and/or

technical handbook describing how, by whom, and for whom the test may be used.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 56: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

56

Unethical Testing Practices

• Do NOT tutor students on the specific items of a standardized test (except for mastery testing).• Scores on standardized tests can only be interpreted when the

tests are given in exactly the same conditions as for the norm or comparison group.

• Do NOT examine the content of standardized tests to determine instructional content.• Standardized tests are supposed to be samples of behavior—not

everything a student should know at a particular point in time.

• Do NOT use items from standardized tests on other exams or in instructional materials.• Standardized tests are copyrighted. Previous exposure to

specific items disrupts the standardized presentation of items.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 57: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

57

Unethical Testing Practices

• Do NOT try to improve student performance by giving parallel items or alternate forms of a standardized test.• Interpretations of scores are dependent on the standardized

procedures—including the amount of exposure to item types. This undermines standardization by giving falsely inflated scores to the students who practiced.

• Do NOT exclude students from district assessments even if you expect that they will perform poorly.• Assessments are typically designed to include students who

will do well and those who will not—and those in special education and those who are not. The only reason students should be excluded from testing on the regular curriculum is if they are receiving instruction in an alternate curriculum.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 58: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

58

Unethical Testing Practices

• Do NOT neglect some students to focus on improving test scores of other students. • Accountability measures are intended to ensure the maximum

achievement of all students—not just a select group.

• Do NOT alter the procedures for assessment as described by the administration manual.• Standardization requires the same procedures if decisions are

to be made based on the results.

• Do NOT create anxiety or rivalry among students, classes or schools.• Assessments are not contests and should not be treated as

such.

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 59: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

59

Types of Tests: Early Childhood

• Global Developmental Measures• Domain-specific Measures

• Motor/Physical • Communication/Language• Cognitive• Achievement• Social/Emotional/Behavior• Adaptive

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 60: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

60

Types of Tests: Middle Childhood

• Global Developmental Measures• Domain-specific Measures

• Motor/Physical • Communication/Language• Cognitive• Achievement• Social/Emotional/Behavior• Adaptive

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 61: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

61

Types of Tests: Adolescence and Young Adulthood

• Domain-specific Measures• Motor/Physical • Communication/Language• Cognitive• Achievement• Social/Emotional/Behavior• Adaptive

• Transition Planning Measures• Aptitude• Interest Inventories

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education

Page 62: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

Questions & Answers

Page 63: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

63

References

• Crocker, L. & Algina, J. (1986). Introduction to classical and

modern test theory. Orlando, FL: Harcourt Brace Jovanovich.

• Salvia, J. Ysseldyke, J. E., & Bolt, S. (2013). Assessment in

special and inclusive education (12th Edition). Belmont, CA:

Wadsworth Publishing Co.

• Sax, G. (1997). Principles of educational and psychological

measurement and evaluation (4th Edition). Belmont, CA:

Wadsworth Publishing Co.

• Urbina, S. (2004). Essentials of psychological testing. From A. S.

Kaufman & N. L. Kaufman, Eds., Essentials of Behavioral Science

series. Hoboken, NJ: John Wiley & Sons.Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office

of Instructional Programs/Office of Special Education

Page 64: Test Evaluation and Selection Determining Appropriate Assessments for Conducting Comprehensive Evaluations

OSE Technical Assistance Staff:Stacy Callender, Prog. Coordinator [email protected]

Gwen Buffington, Prog. Coordinator [email protected]

Valecia Davis, Prog. Coordinator [email protected]

Pleshette Smith, Prog. Coordinator [email protected]

Desma McElveen, Division Director [email protected]

Tanya Bradley, Bureau Director [email protected]

MS Dept. of EducationOffice of Special Education

359 N. West Street P. O. Box 771

Jackson, MS 39205(601) 359-3498

Contact Information

Regional Training 2012-2013 Copyright © 2012 Mississippi Department of Education Office of Instructional Programs/Office of Special Education 64