View
218
Download
0
Category
Preview:
Citation preview
Applying the Concepts of Validity, Reliability, and other St d d f Ed ti l dand other Standards for Educational and Psychological Testing to Assessment of Program Student Learning OutcomesProgram Student Learning Outcomes
Jeremy Penn, Ph.D.Di t U i it A t d T tiDirector, University Assessment and Testing
Faculty Certificate Program / Graduate Student Endorsement in Program Outcomes do se e t og a Outco es
Assessment
• Participate in 8 out of 10 workshopsParticipate in 8 out of 10 workshops– 8 offered this year
Some limited substitutions allowed– Some limited substitutions allowed• Create or modify an assessment plan for a
( b h th ti l)program (can be hypothetical)• Certificate and $500 (faculty) $100
(graduate student) award upon completion
University Assessment and Testing
Inference QualityInference Quality
S dStudents
Faculty
Poor i i
Curriculum
writing skills
A
University Assessment and Testing
Assessment
Standards for Educational and Psychological Testing
• ReliabilityReliability• Validity• Test developmentTest development• Scales, scores,
comparabilitycomparability• Fairness in testing
and test usea d test use
University Assessment and Testing
APA, AERA, NCME (1999)
Why pay attention to the Standards?
• Help avoid drawing misleading or p g ginappropriate inferences
• Provide guidance on test selection / developmentdevelopment
• Protect students’ rights• Fairness in testing and test use• Fairness in testing and test use• Guidance on assessment development and
score reportingp g• Implications for public policy
University Assessment and Testing
Using the StandardsUsing the Standards• Reasonable to expect high-stakes testing, admissions
i li i i l d i i f lltesting, licensure, critical decisions, etc. to carefully follow the Standards
• Some specific Standards are more salient in some contexts than others
• May be unreasonable to expect every quiz, test, interview, portfolio to follow every element of the p yStandards
• May not expect substantial evidence of validity and reliability for a10-point quizy p q– However, principles should be considered when the quiz is
developed and used
University Assessment and Testing
Using the StandardsUsing the Standards• Standards are not a checklist to be marked
– “Evaluating the acceptability of a test or test application does not rest on the literal satisfaction of every standard in this document, and acceptability
t b d t i d b i h kli t” ( 4)cannot be determined by using a checklist” (p. 4). • Professional judgment is critical
– Consideration of intent of the standardConsideration of intent of the standard– Consideration of alternatives– Feasibility of meeting the standardStandards under revision• Standards under revision – Updated version possibly in 2012?
University Assessment and Testing
ReliabilityReliability
“Consistency”Consistency
Could be consistently goodgood…
University Assessment and Testing
ReliabilityReliability
“Reliability refers to the consistency ofReliability refers to the consistency of such measurements when the testing procedure is repeated on a population ofprocedure is repeated on a population of individuals or groups.”
• Multiple raters• Multiple raters• Multiple forms• Multiple administrations (test-retest)
University Assessment and Testing
ReliabilityReliability
Careful consideration should be given toCareful consideration should be given to the training of reviewers / scorers
Inter-rater reliability should be examined d lit t l i l t dand quality control process implemented
University Assessment and Testing
ReliabilityReliability
Efforts taken to control error in exam designEfforts taken to control error in exam design• Items not prone to multiple interpretations
C f ll f d• Carefully proofread• Similar instructions given to all students• Equal difficulty of multiple forms of the
same exam
University Assessment and Testing
Issues in ReliabilityIssues in ReliabilityFAIL: Committee asks faculty members to evaluate students
i thusing the same rubric.
FIX: Faculty members are trained on the rubric so theyon the rubric so they will score students consistently.
University Assessment and Testing
ValidityValidity
“the degree to which evidence and theorythe degree to which evidence and theory support the interpretations of test scores entailed by proposed use of tests” (p 9)entailed by proposed use of tests (p. 9)
“It i th i t t ti f t t“It is the interpretations of test scores required by proposed uses that are
l t d t th t t it lf” ( 9)evaluated, not the test itself” (p. 9)
University Assessment and Testing
ValidityValidity
Sources of evidence:Sources of evidence:– Test content
Students’ response process (related to– Students response process (related to content)
– Internal structure (factor analysis)– Internal structure (factor analysis)– Relationships to other variables
The consequences of testing (intended and– The consequences of testing (intended and unintended consequences of score use)
University Assessment and Testing
Validity ImplicationsValidity Implications
• When selecting / designing a test, mustWhen selecting / designing a test, must consider the possible uses of the test and how scores will be interpretedp
• When sharing results, must consider how different audiences may be tempted to y pmisinterpret findings
• Should gather evidence to support findings g pp g(multiple measures)
University Assessment and Testing
Validity ImplicationsValidity Implications
• Clearly identify the construct (or concepts)Clearly identify the construct (or concepts) the test is intended to measure
• The higher the stakes the more important• The higher the stakes, the more important it is that test-based inferences are supported with strong evidence ofsupported with strong evidence of technical quality
University Assessment and Testing
Item Analysis (briefly)Item Analysis (briefly)• Item difficultyItem difficulty
– Percentage of students who get the item correct– Can indicate poorly worded / developed item,
poorly taught concept, or actual low ability• Item discrimination
Abili f i l “ ” hi h– Ability of item to correctly separate “true” high achievers and “true” low achievers
– Problem if low achievers get item correct but highProblem if low achievers get item correct but high achievers get it incorrect
University Assessment and Testing
Issues in ValidityIssues in ValidityFAIL: Assessment Committee selects aFAIL: Assessment Committee selects a
standardized test because it is inexpensive and used by many other institutions.
FIX: The test is selected because it is a good match for the intended construct, matches the curriculum at the institution, and has evidence the test supports the inferences theevidence the test supports the inferences the AC wants to draw.
University Assessment and Testing
Test DevelopmentTest Development
• Need a systematic process for developingNeed a systematic process for developing a local rubric / test / portfolio
Define content or construct– Define content or construct– Clear instructions for administrators and
examineesexaminees– Careful item / rubric development
Training for scorers and quality checking– Training for scorers and quality checking scorers
University Assessment and Testing
Test DevelopmentTest Development
FAIL: Assessment committee asks facultyFAIL: Assessment committee asks faculty members to submit items for a test.
FIX: Assessment committee clearly defines the domain for the test. A committee ofthe domain for the test. A committee of faculty members evaluate a large number of items for relevance and suitability for ythe test.
University Assessment and Testing
Scales Scores ComparabilityScales, Scores, Comparability• Scales (method to develop a total score), cut scores,
( i ) d bilinorms (comparison groups) and comparability are developed to assist in interpreting scores– Scale calculation and interpretation should be clearly
d ib ddescribed– Norm-referenced or criterion referenced scoring – different
interpretations• Norming groups (if used) should be relevant and updated• Norming groups (if used) should be relevant and updated
– A reasonable process should be used to establish cut scores (e.g., Angoff, Bookmark, etc.)
– Before scores are compared with alternate forms or otherBefore scores are compared with alternate forms or other settings comparability needs to be established
University Assessment and Testing
Scales Scores ComparabilityScales, Scores, ComparabilityFAIL: Assessment committee decides studentsFAIL: Assessment committee decides students
must score above 60% on a test in order to graduate.
FIX: Assessment committee implements a systematic process to develop a cut score on the graduation exam. Evidence is gathered to show that students scoring below the cutshow that students scoring below the cut score do not have the sufficient skills needed.
University Assessment and Testing
Fairness in Testing and Test Use
• Fairness as lack of bias– Occurs when deficiencies in a test or its use result in
different meanings for scores earned by students from identifiable groups – avoid use of prompts or items th t b diff tl i t t dthat may be differently interpreted
• Fairness as equitable treatment in the testing processp– All examinees be afforded appropriate testing
conditions including equal access to materials provided by test developerp y p
– Respect of confidentiality (protect small n groups that could be identified in reporting)
University Assessment and Testing
Fairness in Testing and Test Use
• Fairness in equality in outcomes of testingFairness in equality in outcomes of testing– Examine test for comparable pass rates across
groups – it is not required that they be equal but i ht l ibl i l ti f bi l k fmight reveal possible violations of bias or lack of
equitable treatment that should be investigated• Fairness as opportunity to learnFairness as opportunity to learn
– Low test scores may result from the examinee not having the opportunity to learn the material tested (not generally relevant for employment, credentialing, or admissions testing)
University Assessment and Testing
ActivityActivity
• Application of the Standards to commonApplication of the Standards to common higher education scenarios
University Assessment and Testing
Recommended